## Using Trace Diff to find the differences between two Pytorch Kineto Traces

This notebook provides a step-by-step guide for comparing two PyTorch Kineto traces using TraceLens's TraceDiff tool. You will:

- Load and parse trace files into event trees
- Identify differences and points of difference (PODs) between traces
- Merge the event trees and generate detailed and summary reports
- Use the UID mapping feature to cross-reference events between traces

**Requirements:**
- Two Kineto trace files (JSON format)
- TraceLens installed and available in your Python environment

**Outputs:**
- Merged tree visualization
- CSV files with kernel and op statistics
- UID mapping for cross-referencing events

> **Tip:** You can customize output folder paths and use the UID map to link events between traces for deeper analysis.


In [None]:
# Load and build tree perf analyzer from two trace files
#
# This cell loads two PyTorch Kineto trace files and initializes TraceLens's TreePerfAnalyzer for each.
# This TreePerfAnalyzer internally builds a calls stack tree using TraceLens's TraceToTree.
# After running this cell, you will have two trees ready for comparison and analysis.

import json
from TraceLens import TraceToTree, TreePerfAnalyzer

trace_file1 = "/path/to/trace1.json"
trace_file2 = "/path/to/trace2.json"

perf_analyzer1 = TreePerfAnalyzer.from_file(trace_file1)
perf_analyzer2 = TreePerfAnalyzer.from_file(trace_file2)
tree1 = perf_analyzer1.tree
tree2 = perf_analyzer2.tree

In [None]:
from TraceLens import TraceDiff

# --- Step 2: Merge and analyze the trace trees ---

# This step merges the two event trees and generates data structures that store the important diff information.
# These data structures are then used to generate diff metrics and reports.
#
# After running this cell, you can:
#   - Use the TraceDiff object to access the DataFrames directly for further analysis (see next cells).
#   - Write the reports to files using td.print_tracediff_report_files(output_folder) (see later cell).

# Merge and generate DataFrames (does NOT write files)
td = TraceDiff(tree1, tree2)
td.generate_tracediff_report()



In [None]:
# --- Example: Using the merged_uid_map to cross-reference events between trees ---

# This example demonstrates how to use the TraceDiff UID mapping feature to find the corresponding UID in the other tree for a given UID. This is useful for cross-referencing events between two traces.
#
# Instructions:
# 1. When you create a TraceDiff object, the trees are automatically merged and the UID map is initialized. You do NOT need to call merge_trees manually.
# 2. Pick a UID from tree1 (or tree2). Here, we use the first root UID from tree1 as an example.
# 3. Call td.get_corresponding_uid(tree_num, uid):
#    - tree_num = 1 for tree1, 2 for tree2
#    - uid = the UID you want to map
# 4. If the UID is part of a combined node, you'll get the corresponding UID from the other tree. If not, you'll get -1.
# You can use this to look up the corresponding event in the other tree, or to check if a node is matched. This is useful for analysis, visualization, or linking statistics between traces.

#search aten::convolution events
sample_evt = next(evt for evt in tree1.events if evt['name'] == 'aten::convolution')
sample_uid1 = sample_evt['UID']

node1 = td.baseline.get_UID2event(sample_uid1)

print(f"Tree 1 UID: {sample_uid1}")
print(f"  Name: {node1.get('name', node1.get('Name', 'Unknown'))}")
print(f"  Category: {node1.get('cat', node1.get('category', 'Unknown'))}")
print(f"  Timestamp: {node1.get('ts', 'Unknown')}")
corresponding_uid2 = td.get_corresponding_uid(1, sample_uid1)

if corresponding_uid2 != -1:
    node2 = td.variant.get_UID2event(corresponding_uid2)
    print(f"\nCorresponding Tree 2 UID: {corresponding_uid2}")
    print(f"  Name: {node2.get('name', node2.get('Name', 'Unknown'))}")
    print(f"  Category: {node2.get('cat', node2.get('category', 'Unknown'))}")
    print(f"  Timestamp: {node2.get('ts', 'Unknown')}")
    print(f"You can now look up the corresponding event in tree2 using UID {corresponding_uid2}")
    print("\nSubtree for this op in Tree 1:")
    td.baseline.traverse_subtree_and_print(node1)
    print("\nSubtree for this op in Tree 2:")
    td.variant.traverse_subtree_and_print(node2)
else:
    print("\nThis UID does not have a combined match in tree2.")
    print("\nSubtree for this op in Tree 1:")
    td.baseline.traverse_subtree_and_print(node1)


In [None]:
# The `diff_stats_df` DataFrame contains a detailed, row-by-row comparison 
# This is the most granular report, useful for deep dives.
df_diff_stats = td.diff_stats_df
df_diff_stats

In [None]:
# The `diff_stats_unique_args_summary_df` DataFrame summarizes the above 'df_diff_stats' DataFrame, across unique argument combinations.
df_unique_args = td.diff_stats_unique_args_summary_df
df_unique_args.head(10)

In [None]:
# The `diff_stats_names_summary_df` DataFrame provides the highest-level summary,
# aggregating by operation name. 
df_name_summary = td.diff_stats_names_summary_df
df_name_summary

In [None]:
# Write TraceDiff reports to files ---

# You can write all TraceDiff reports (merged tree, detailed stats, summary stats) to files in a folder using:
#   td.print_tracediff_report_files(output_folder)

# Example: write reports to the default folder 'rprt_diff'
td.print_tracediff_report_files("rprt_diff")
print("TraceDiff reports written to rprt_diff/")