## Using Trace Diff to find the differences between two Pytorch Kineto Traces

This notebook provides a step-by-step guide for comparing two PyTorch Kineto traces using TraceLens's TraceDiff tool. You will:

- Load and parse trace files into event trees
- Identify differences and points of difference (PODs) between traces
- Merge the event trees and generate detailed and summary reports
- Use the UID mapping feature to cross-reference events between traces

**Requirements:**
- Two Kineto trace files (JSON format)
- TraceLens installed and available in your Python environment

**Outputs:**
- Merged tree visualization
- CSV files with kernel and op statistics
- UID mapping for cross-referencing events

> **Tip:** You can customize output folder paths and use the UID map to link events between traces for deeper analysis.


In [None]:
# --- Step 1: Load and build trace trees ---
#
# This cell loads two PyTorch Kineto trace files and builds event trees for each using TraceLens's TraceToTree.
#
# Variables:
#   trace_file1, trace_file2: Paths to the two trace files to compare
#   trace_data1, trace_data2: Loaded JSON data from each trace file
#   events1, events2: Lists of trace events from each file
#   tree1, tree2: TraceToTree objects representing the event trees
#
# After running this cell, you will have two trees ready for comparison and analysis.

import json
from TraceLens import TraceToTree

trace_file1 = "/path/to/your/kineto.json"
trace_file2 = "/path/to/your/kineto.json"

with open(trace_file1, "r") as f:
    trace_data1 = json.load(f)

with open(trace_file2, "r") as f:
    trace_data2 = json.load(f)

# Extract the list of events from each trace
events1 = trace_data1["traceEvents"]
tree1 = TraceToTree(events1)
tree1.build_tree()

events2 = trace_data2["traceEvents"]
tree2 = TraceToTree(events2)
tree2.build_tree()

In [None]:
from TraceLens import TraceDiff

# --- Step 2: Merge and analyze the trace trees ---

# Merge the trees, print merged tree, and print diff stats
#
# This step merges the two event trees and generates three output files in the default output folder ('rprt_diff'):
#   - merged_tree_output.txt: A text representation of the merged tree structure
#   - diff_stats.csv: Detailed kernel time and name statistics for each op
#   - diff_stats_summary.csv: Aggregated summary statistics by op name and input shape
#
# You can change the output folder by passing a different path to td.generate_tracediff_report(output_folder="your_folder")
# For example:
#   td.generate_tracediff_report(output_folder="my_results")
#
# After running this cell, you can open and analyze the output files for further insights.

td = TraceDiff(tree1, tree2)
td.generate_tracediff_report()

# Output files:
# - rprt_diff/merged_tree_output.txt
# - rprt_diff/diff_stats.csv
# - rprt_diff/diff_stats_summary.csv

In [None]:
# --- Example: Using the merged_uid_map to cross-reference events between trees ---

# This example demonstrates how to use the TraceDiff UID mapping feature to find the corresponding UID in the other tree for a given UID. This is useful for cross-referencing events between two traces.
#
# Instructions:
# 1. When you create a TraceDiff object, the trees are automatically merged and the UID map is initialized. You do NOT need to call merge_trees manually.
# 2. Pick a UID from tree1 (or tree2). Here, we use the first root UID from tree1 as an example.
# 3. Call td.get_corresponding_uid(tree_num, uid):
#    - tree_num = 1 for tree1, 2 for tree2
#    - uid = the UID you want to map
# 4. If the UID is part of a combined node, you'll get the corresponding UID from the other tree. If not, you'll get -1.
# You can use this to look up the corresponding event in the other tree, or to check if a node is matched. This is useful for analysis, visualization, or linking statistics between traces.

sample_uid1 = next(
    iter(td.baseline.cpu_root_nodes)
)  # Example: pick the first root UID from tree1
corresponding_uid2 = td.get_corresponding_uid(1, sample_uid1)

node1 = td.baseline.get_UID2event(sample_uid1)

print(f"Tree 1 UID: {sample_uid1}")
print(f"  Name: {node1.get('name', node1.get('Name', 'Unknown'))}")
print(f"  Category: {node1.get('cat', node1.get('category', 'Unknown'))}")
print(f"  Timestamp: {node1.get('ts', 'Unknown')}")
if corresponding_uid2 != -1:
    node2 = td.variant.get_UID2event(corresponding_uid2)
    print(f"\nCorresponding Tree 2 UID: {corresponding_uid2}")
    print(f"  Name: {node2.get('name', node2.get('Name', 'Unknown'))}")
    print(f"  Category: {node2.get('cat', node2.get('category', 'Unknown'))}")
    print(f"  Timestamp: {node2.get('ts', 'Unknown')}")
    print(
        f"You can now look up the corresponding event in tree2 using UID {corresponding_uid2}"
    )
else:
    print("\nThis UID does not have a combined match in tree2.")