<!--
Copyright (c) 2024 - 2025 Advanced Micro Devices, Inc. All rights reserved.

See LICENSE for license information.
-->

# Trace2Tree Example Notebook

This notebook demonstrates how to navigate the tree structure created by TraceLens.

**What you'll learn:**
- Load traces using `TreePerfAnalyzer.from_file()`
- Traverse subtrees to see operation hierarchies
- Navigate parent chains to understand call stacks
- Access parent/child relationships and GPU events using tree methods

## 1. Load Trace and Build Tree

In [None]:
from pprint import pprint
from TraceLens.TreePerf import TreePerfAnalyzer

# Load trace data using TreePerfAnalyzer
# Set add_python_func=True to include Python function call stack
# This allows you to trace GPU kernels all the way back to your Python code
trace_file = 'tests/traces/mi300/google_owlv2-large-patch14-ensemble__1016001.json.gz'
analyzer = TreePerfAnalyzer.from_file(trace_file, add_python_func=True)

# Access the underlying tree structure
tree = analyzer.tree

print(f'Loaded {len(tree.events)} events')

## 2. Find an Operation to Analyze

Let's find an operation of interest. You can change the filter to find different operations (e.g., `aten::matmul`, `aten::addmm`, `aten::layer_norm`, etc.)

In [None]:
# Find an operation (feel free to change this to any operation)
event_interest = next(
    evt for evt in tree.events 
    if evt.get('name') == 'aten::convolution' and evt.get('cat') == 'cpu_op'
)

print(f"Found operation: {event_interest['name']}")
print(f"Duration: {event_interest.get('dur', 0):.2f} µs")
print(f"UID: {event_interest['UID']}")

## 3. Traverse Subtree

Visualize the entire subtree rooted at this operation to see what happens beneath it. You can optionally include CPU operation details.

In [None]:
print(f"Subtree for {event_interest['name']}:\n")
tree.traverse_subtree_and_print(event_interest)

## 4. Traverse Parent Chain

Trace back through all parent events to see the full call stack that led to this operation. You can optionally include CPU operation details like input dimensions and types.

In [None]:
print(f"Parent chain for {event_interest['name']}:\n")
root = tree.traverse_parents_and_print(
    event_interest,
    cpu_op_fields=('Input Dims', 'Input type')
)
print(f"\nRoot event: {root['name']}")

## 5. Navigating Parent-Child Relationships

Use tree methods to directly access parent and children.

In [None]:
# Get parent using tree method
parent_evt = tree.get_parent_event(event_interest)
if parent_evt:
    print(f"Parent: {parent_evt['name']} (cat: {parent_evt['cat']})\n")

# Get children using tree method
children = tree.get_children_events(event_interest)
print(f"Children ({len(children)}):")
for child in children[:5]:  # Show first 5
    print(f"  - {child['name']} (cat: {child['cat']})")

## 6. Exploring GPU Events

Use the `get_gpu_events()` method to see which GPU kernels this operation launches.

In [None]:
gpu_events = tree.get_gpu_events(event_interest)
print(f"GPU kernels launched by {event_interest['name']} (total: {len(gpu_events)}):\n")
for gpu_evt in gpu_events[:5]:  # Show first 5
    print(f"  Kernel: {gpu_evt['name'][:60]}...")
    print(f"  Duration: {gpu_evt.get('dur', 0):.2f} µs\n")

## Summary

This notebook demonstrated:
- Loading traces via `TreePerfAnalyzer.from_file()` with Python stack
- `traverse_subtree_and_print()` and `traverse_parents_and_print()` to explore call hierarchies
- Using `cpu_op_fields` parameter to display input dimensions and types for CPU operations
- Using tree methods: `get_parent_event()`, `get_children_events()`, `get_gpu_events()`

For more advanced analysis, see:
- `tree_perf_example.ipynb` - Performance metrics and roofline analysis
- `trace_diff_example.ipynb` - Comparing two traces