## Using Trace Diff to find the differences between two Pytorch Kineto Traces

This notebook provides a step-by-step guide for comparing two PyTorch Kineto traces using TraceLens's TraceDiff tool. You will:

- Load and parse trace files into event trees
- Identify differences and points of difference (PODs) between traces
- Merge the event trees and generate detailed and summary reports
- Use the UID mapping feature to cross-reference events between traces

**Requirements:**
- Two Kineto trace files (JSON format)
- TraceLens installed and available in your Python environment

**Outputs:**
- Merged tree visualization
- CSV files with kernel and op statistics
- UID mapping for cross-referencing events

> **Tip:** You can customize output folder paths and use the UID map to link events between traces for deeper analysis.


In [None]:
# Load and build tree perf analyzer from two trace files
#
# This cell loads two PyTorch Kineto trace files and initializes TraceLens's TreePerfAnalyzer for each.
# This TreePerfAnalyzer internally builds a calls stack tree using TraceLens's TraceToTree.
# After running this cell, you will have two trees ready for comparison and analysis.

import json
from TraceLens import TraceToTree, TreePerfAnalyzer

trace_file1 = "/path/to/trace1.json"
trace_file2 = "/path/to/trace2.json"

perf_analyzer1 = TreePerfAnalyzer.from_file(path1)
perf_analyzer2 = TreePerfAnalyzer.from_file(path2)
tree1 = perf_analyzer1.tree
tree2 = perf_analyzer2.tree

Building tree with add_python_func=False
Building CPU op tree with add_python_func=False
Building tree with add_python_func=False
Building CPU op tree with add_python_func=False


In [14]:
from TraceLens import TraceDiff

# --- Step 2: Merge and analyze the trace trees ---

# This step merges the two event trees and generates data structures that store the important diff information.
# These data structures are then used to generate diff metrics and reports.
#
# After running this cell, you can:
#   - Use the TraceDiff object to access the DataFrames directly for further analysis (see next cells).
#   - Write the reports to files using td.print_tracediff_report_files(output_folder) (see later cell).

# Merge and generate DataFrames (does NOT write files)
td = TraceDiff(tree1, tree2)
td.generate_tracediff_report()



In [15]:
# --- Example: Using the merged_uid_map to cross-reference events between trees ---

# This example demonstrates how to use the TraceDiff UID mapping feature to find the corresponding UID in the other tree for a given UID. This is useful for cross-referencing events between two traces.
#
# Instructions:
# 1. When you create a TraceDiff object, the trees are automatically merged and the UID map is initialized. You do NOT need to call merge_trees manually.
# 2. Pick a UID from tree1 (or tree2). Here, we use the first root UID from tree1 as an example.
# 3. Call td.get_corresponding_uid(tree_num, uid):
#    - tree_num = 1 for tree1, 2 for tree2
#    - uid = the UID you want to map
# 4. If the UID is part of a combined node, you'll get the corresponding UID from the other tree. If not, you'll get -1.
# You can use this to look up the corresponding event in the other tree, or to check if a node is matched. This is useful for analysis, visualization, or linking statistics between traces.

#search aten::convolution events
sample_evt = next(evt for evt in tree1.events if evt['name'] == 'aten::convolution')
sample_uid1 = sample_evt['UID']

node1 = td.baseline.get_UID2event(sample_uid1)

print(f"Tree 1 UID: {sample_uid1}")
print(f"  Name: {node1.get('name', node1.get('Name', 'Unknown'))}")
print(f"  Category: {node1.get('cat', node1.get('category', 'Unknown'))}")
print(f"  Timestamp: {node1.get('ts', 'Unknown')}")
corresponding_uid2 = td.get_corresponding_uid(1, sample_uid1)

if corresponding_uid2 != -1:
    node2 = td.variant.get_UID2event(corresponding_uid2)
    print(f"\nCorresponding Tree 2 UID: {corresponding_uid2}")
    print(f"  Name: {node2.get('name', node2.get('Name', 'Unknown'))}")
    print(f"  Category: {node2.get('cat', node2.get('category', 'Unknown'))}")
    print(f"  Timestamp: {node2.get('ts', 'Unknown')}")
    print(f"You can now look up the corresponding event in tree2 using UID {corresponding_uid2}")
    print("\nSubtree for this op in Tree 1:")
    td.baseline.traverse_subtree_and_print(node1)
    print("\nSubtree for this op in Tree 2:")
    td.variant.traverse_subtree_and_print(node2)
else:
    print("\nThis UID does not have a combined match in tree2.")
    print("\nSubtree for this op in Tree 1:")
    td.baseline.traverse_subtree_and_print(node1)


Tree 1 UID: 59622
  Name: aten::convolution
  Category: cpu_op
  Timestamp: 575662638333.571

Corresponding Tree 2 UID: 60122
  Name: aten::convolution
  Category: cpu_op
  Timestamp: 577751491567.305
You can now look up the corresponding event in tree2 using UID 60122

Subtree for this op in Tree 1:
└── UID: 59622, Category: cpu_op, Name: aten::convolution
    └── UID: 59624, Category: cpu_op, Name: aten::_convolution
        └── UID: 59625, Category: cpu_op, Name: aten::cudnn_convolution
            └── UID: 239760, Category: cuda_driver, Name: cuLaunchKernel
                └── UID: 239758, Category: kernel, Name: void cutlass::Kernel2<cutlass_80_wmma_tensorop_bf16_s161616gemm_.., Duration: 668.001

Subtree for this op in Tree 2:
└── UID: 60122, Category: cpu_op, Name: aten::convolution
    └── UID: 60124, Category: cpu_op, Name: aten::_convolution
        └── UID: 60125, Category: cpu_op, Name: aten::miopen_convolution
            ├── UID: 225500, Category: cuda_runtime, Name: hipE

In [None]:
# The `diff_stats_df` DataFrame contains a detailed, row-by-row comparison 
# This is the most granular report, useful for deep dives.
df_diff_stats = td.diff_stats_df
df_diff_stats

Unnamed: 0,name,input_shape_trace1,input_shape_trace2,concrete_inputs_trace1,concrete_inputs_trace2,input_strides_trace1,input_strides_trace2,input_type_trace1,input_type_trace2,kernel_time_trace1,kernel_time_trace2,kernel_names_trace1,kernel_names_trace2
0,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",14143.582031,9025.873047,[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)]
1,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",14218.526001,7175.130981,[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)]
2,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",14309.021973,6467.176025,[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)]
3,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",14276.733032,6291.319946,[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)]
4,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",14269.342041,6513.765991,[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)]
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9823,aten::_foreach_sqrt,[[]],[[]],[''],[''],[[]],[[]],['TensorList'],['TensorList'],3238.982910,3677.087280,[std::enable_if<at::native::(anonymous namespa...,[void at::native::(anonymous namespace)::multi...
9824,aten::_foreach_div_,"[[], []]","[[], []]","['', '']","['', '']","[[], []]","[[], []]","['TensorList', '']","['TensorList', '']",3284.200928,4721.540527,[std::enable_if<at::native::(anonymous namespa...,[void at::native::(anonymous namespace)::multi...
9825,aten::_foreach_add_,"[[], []]","[[], []]","['', '1e-08']","['', '1e-08']","[[], []]","[[], []]","['TensorList', 'Scalar']","['TensorList', 'Scalar']",3202.952271,4687.167847,[std::enable_if<at::native::(anonymous namespa...,[void at::native::(anonymous namespace)::multi...
9826,aten::_foreach_addcdiv_,"[[], [], [], []]","[[], [], [], []]","['', '', '', '']","['', '', '', '']","[[], [], [], []]","[[], [], [], []]","['TensorList', 'TensorList', 'TensorList', '']","['TensorList', 'TensorList', 'TensorList', '']",5979.150024,7600.299927,[std::enable_if<at::native::(anonymous namespa...,[void at::native::(anonymous namespace)::multi...


In [None]:
# The `diff_stats_unique_args_summary_df` DataFrame summarizes the above 'df_diff_stats' DataFrame, across unique argument combinations.
df_unique_args = td.diff_stats_unique_args_summary_df
df_unique_args.head(10)

Unnamed: 0,name,input_shape_trace1,input_shape_trace2,concrete_inputs_trace1,concrete_inputs_trace2,input_strides_trace1,input_strides_trace2,input_type_trace1,input_type_trace2,kernel_names_trace1,kernel_names_trace2,kernel_time_trace1_mean,kernel_time_trace1_sum,kernel_time_trace2_mean,kernel_time_trace2_sum,diff_mean,diff_sum,abs_diff_mean,abs_diff_sum
0,autograd::engine::evaluate_function: NativeBat...,,,,,,,,,[void at::native::batch_norm_backward_kernel<c...,,374.363778,167714.972656,0.0,0.0,-374.363778,-167714.972656,374.363778,167714.972656
1,aten::copy_,"[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","[[5, 4, 940, 1450], [5, 4, 940, 1450], []]","['', '', 'False']","['', '', 'False']","[[5452000, 1363000, 1450, 1], [5452000, 136300...","[[5452000, 1363000, 1450, 1], [5452000, 136300...","['float', 'float', 'Scalar']","['float', 'float', 'Scalar']",[Memcpy HtoD (Pageable -> Device)],[Memcpy HtoD (Host -> Device)],14105.245514,112841.964111,6901.41124,55211.289917,-7203.834274,-57630.674194,7203.834274,57630.674194
2,aten::convolution_backward,"[[5, 896, 59, 91], [5, 896, 59, 91], [896, 896...","[[5, 896, 59, 91], [5, 896, 59, 91], [896, 896...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","[[4810624, 5369, 91, 1], [4810624, 5369, 91, 1...","[[4810624, 5369, 91, 1], [4810624, 5369, 91, 1...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cutlass::Kernel2<cutlass_75_tensorop_bf1...,[Cijk_Ailk_Bjlk_BBS_BH_MT128x128x32_MI16x16x16...,597.061623,90753.366699,412.629719,62719.717285,-184.431904,-28033.649414,184.431904,28033.649414
3,aten::convolution_backward,"[[5, 224, 235, 363], [5, 224, 235, 363], [224,...","[[5, 224, 235, 363], [5, 224, 235, 363], [224,...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","[[19108320, 85305, 363, 1], [19108320, 85305, ...","[[19108320, 85305, 363, 1], [19108320, 85305, ...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cutlass::Kernel2<cutlass_75_tensorop_bf1...,[Cijk_Ailk_Bjlk_BBS_BH_MT128x128x64_MI16x16x16...,3398.250839,81558.020142,897.268641,21534.447388,-2500.982198,-60023.572754,2500.982198,60023.572754
4,aten::_convolution,"[[5, 896, 59, 91], [896, 896, 1, 1], [], [], [...","[[5, 896, 59, 91], [896, 896, 1, 1], [], [], [...","['', '', '', '[1, 1]', '[0, 0]', '[1, 1]', 'Fa...","['', '', '', '[1, 1]', '[0, 0]', '[1, 1]', 'Fa...","[[4810624, 5369, 91, 1], [896, 1, 1, 1], [], [...","[[4810624, 5369, 91, 1], [896, 1, 1, 1], [], [...","['c10::BFloat16', 'c10::BFloat16', '', 'Scalar...","['c10::BFloat16', 'c10::BFloat16', '', 'Scalar...",[void cutlass::Kernel2<cutlass_75_tensorop_bf1...,"[SubTensorOpWithScalar1d, batched_transpose_32...",460.82141,70044.854248,227.886761,34638.78772,-232.934648,-35406.066528,232.934648,35406.066528
5,aten::convolution_backward,"[[5, 896, 59, 91], [5, 896, 59, 91], [896, 56,...","[[5, 896, 59, 91], [5, 896, 59, 91], [896, 56,...","['', '', '', '[0]', '[1, 1]', '[1, 1]', '[1, 1...","['', '', '', '[0]', '[1, 1]', '[1, 1]', '[1, 1...","[[4810624, 5369, 91, 1], [4810624, 5369, 91, 1...","[[4810624, 5369, 91, 1], [4810624, 5369, 91, 1...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cudnn::engines_precompiled::nchwToNhwcKe...,"[batched_transpose_32x32_half, batched_transpo...",736.44725,53024.202026,571.400623,41140.844849,-165.046627,-11883.357178,165.046627,11883.357178
6,aten::convolution_backward,"[[5, 448, 235, 363], [5, 224, 235, 363], [448,...","[[5, 448, 235, 363], [5, 224, 235, 363], [448,...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","[[38216640, 85305, 363, 1], [19108320, 85305, ...","[[38216640, 85305, 363, 1], [19108320, 85305, ...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cutlass::Kernel2<cutlass_75_tensorop_bf1...,[Cijk_Ailk_Bjlk_BBS_BH_MT128x128x64_MI16x16x16...,6564.989395,52519.915161,1276.322952,10210.583618,-5288.666443,-42309.331543,5288.666443,42309.331543
7,FlashAttnFuncBackward,"[[5, 2048, 32, 64]]","[[5, 2048, 32, 64]]",[''],[''],"[[4194304, 2048, 64, 1]]","[[4194304, 2048, 64, 1]]",['c10::BFloat16'],['c10::BFloat16'],"[void flash_bwd_dot_do_o_kernel<true, Flash_bw...",[void at::native::vectorized_elementwise_kerne...,2051.574961,49237.799072,1880.099101,45122.378418,-171.475861,-4115.420654,171.475861,4115.420654
8,aten::convolution_backward,"[[5, 224, 235, 363], [5, 224, 470, 725], [224,...","[[5, 224, 235, 363], [5, 224, 470, 725], [224,...","['', '', '', '[0]', '[2, 2]', '[1, 1]', '[1, 1...","['', '', '', '[0]', '[2, 2]', '[1, 1]', '[1, 1...","[[19108320, 85305, 363, 1], [76328000, 340750,...","[[19108320, 85305, 363, 1], [76328000, 340750,...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cudnn::engines_precompiled::nchwToNhwcKe...,"[batched_transpose_32x16_half, batched_transpo...",5238.638641,41909.109131,5046.173233,40369.385864,-192.465408,-1539.723267,192.465408,1539.723267
9,aten::convolution_backward,"[[5, 448, 118, 182], [5, 448, 118, 182], [448,...","[[5, 448, 118, 182], [5, 448, 118, 182], [448,...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","['', '', '', '[0]', '[1, 1]', '[0, 0]', '[1, 1...","[[9621248, 21476, 182, 1], [9621248, 21476, 18...","[[9621248, 21476, 182, 1], [9621248, 21476, 18...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...","['c10::BFloat16', 'c10::BFloat16', 'c10::BFloa...",[void cutlass::Kernel2<cutlass_80_tensorop_bf1...,[Cijk_Ailk_Bjlk_BBS_BH_MT128x128x64_MI16x16x16...,639.530049,35813.682739,552.015518,30912.869019,-87.514531,-4900.813721,87.514531,4900.813721


In [20]:
# The `diff_stats_names_summary_df` DataFrame provides the highest-level summary,
# aggregating by operation name. 
df_name_summary = td.diff_stats_names_summary_df
df_name_summary

Unnamed: 0,name,row_count,kernel_time_trace1_sum_ms,kernel_time_trace2_sum_ms,diff_sum_ms,abs_diff_sum_ms
0,aten::convolution_backward,736,541.957809,366.13609,-175.821719,198.297619
1,aten::_convolution,736,229.175081,157.8077,-71.367381,85.731275
2,autograd::engine::evaluate_function: NativeBat...,448,167.714973,0.0,-167.714973,167.714973
3,aten::copy_,2859,149.794197,85.284066,-64.510132,69.409112
4,aten::_batch_norm_impl_index,448,129.995936,43.0816,-86.914335,86.914335
5,aten::mm,300,78.684444,84.654093,5.969649,11.982847
6,FlashAttnFuncBackward,25,59.776381,54.930648,-4.845733,4.845733
7,aten::addmm,147,38.775271,45.761834,6.986563,6.986563
8,aten::mul,408,37.632541,21.646682,-15.985859,15.987573
9,aten::threshold_backward,576,33.166734,31.333856,-1.832878,3.841799


In [19]:
# Write TraceDiff reports to files ---

# You can write all TraceDiff reports (merged tree, detailed stats, summary stats) to files in a folder using:
#   td.print_tracediff_report_files(output_folder)

# Example: write reports to the default folder 'rprt_diff'
td.print_tracediff_report_files("rprt_diff")
print("TraceDiff reports written to rprt_diff/")

TraceDiff reports written to rprt_diff/
