# Query Language: Thicket Tutorial

Thicket is a python-based toolkit for Exploratory Data Analysis (EDA) of parallel performance data that enables performance optimization and understanding of applications’ performance on supercomputers. It bridges the performance tool gap between being able to consider only a single instance of a simulation run (e.g., single platform, single measurement tool, or single scale) and finding actionable insights in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets.

## 1. Import Necessary Packages

To explore the structure and various capabilities of thicket components, we begin by importing necessary packages. These include python extensions and thicket's statistical functions.

In [1]:
import re

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
from IPython.display import HTML
import hatchet as ht

import thicket as tt

display(HTML("<style>.container { width:80% !important; }</style>"))

<IPython.core.display.Javascript object>

In [2]:
# Disable the Pandas 3 and Numpy Warnings for now
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning) 

## 2. Read in Performance Profiles

For this notebook, we select profiles generated on Lawrence Livermore National Lab (LLNL) machine, lassen. We create a thicket object generated with the same block size of 128. 

In [3]:
problem_sizes = [
    "1048576", 
    "2097152", 
    "4194304", 
    "8388608"
]
lassen1 = [f"../data/lassen/clang10.0.1_nvcc10.2.89_{x}/1/Base_CUDA-block_128.cali" for x in problem_sizes]
lassen2 = [f"../data/lassen/clang10.0.1_nvcc10.2.89_1048576/1/Base_CUDA-block_256.cali"]

# generate thicket(s)
th_lassen = tt.Thicket.from_caliperreader(lassen1, disable_tqdm=True)

## 3. Thicket Query Language 

**Use the Query Language**

Thicket's query language provides users the capability to select or `query` specific nodes based on the call tree component in thicket. The nodes in the performance data and statistics table are updated as well to reflect which nodes are remaining in the call tree.

In [4]:
print("Initial call tree:")
print(th_lassen.tree("Total time"))

Initial call tree:
  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m1.781[0m RAJAPerf[0m
├─ [38;5;22m0.007[0m Algorithm[0m
│  ├─ [38;5;22m0.002[0m Algorithm_MEMCPY[0m
│  ├─ [38;5;22m0.002[0m Algorithm_MEMSET[0m
│  └─ [38;5;22m0.003[0m Algorithm_REDUCE_SUM[0m
├─ [38;5;34m0.185[0m Apps[0m
│  ├─ [38;5;22m0.007[0m Apps_DEL_DOT_VEC_2D[0m
│  ├─ [38;5;22m0.039[0m Apps_ENERGY[0m
│  ├─ [38;5;22m0.004[0m Apps_FIR[0m
│  ├─ [38;5;22m0.035[0m Apps_HALOEXCHANGE[0m
│  ├─ [38;5;22m0.005[0m Apps_HALOEXCHANGE_FUSED[0m
│  ├─ [38;5;22m0.014[0m Apps_LTIMES[0m
│  ├─ [38;5;22m0.014[0m Apps_LTIMES_NOVIEW[0m
│  ├─ [38;5;22m0.008[0m Apps_NODAL_ACCUMULATION_3D[0m
│  ├─ [38;5;22m0.048[0m Apps_PRESSURE[0m
│  ├─ [38;5;22m0.006[0m Apps_VOL3D[0m
│  └─ [38;5;22m0.004[0m Apps_ZONAL_ACCUMULATION_3D[0m
├─ [38;5;34m0.358[0m 

### Example Query 1: Find a Subgraph with a Specific Root

This example shows how to find a subtree starting with a specific root. More specifically, the query in this example finds a subtree rooted at the node with the name "Stream" followed by all nodes down to the leaf nodes.

NOTE: A DeprecationWarning is generated when using “old-style” queries (i.e., queries with QueryMatcher) if you have Hatchet>=2023.1.0 installed.

In [5]:
query_ex1 = (
    ht.QueryMatcher()
    .match (
        ".", 
        lambda row: row["name"].apply(
            lambda x: re.match(
                "Stream", x
            )
            is not None
        ).all()
    )
    .rel("*")
)

# applying the first query on the lassen thicket
th_ex1 = th_lassen.query(query_ex1)
print(th_ex1.tree("Total time"))

  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m0.261[0m Stream[0m
├─ [38;5;22m0.034[0m Stream_ADD[0m
├─ [38;5;22m0.043[0m Stream_COPY[0m
├─ [38;5;46m0.108[0m Stream_DOT[0m
├─ [38;5;22m0.043[0m Stream_MUL[0m
└─ [38;5;22m0.034[0m Stream_TRIAD[0m

[4mLegend[0m (Metric: Total time Min: 0.03 Max: 0.26 indices: {'profile': 1814734126})
[38;5;196m█ [0m0.24 - 0.26
[38;5;208m█ [0m0.19 - 0.24
[38;5;220m█ [0m0.15 - 0.19
[38;5;46m█ [0m0.10 - 0.15
[38;5;34m█ [0m0.06 - 0.10
[38;5;22m█ [0m0.03 - 0.06

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



### Example Query 2: Find All Paths Ending with a Specific Node

This example shows how to find all paths of a GraphFrame ending with a specific node. More specifically, the queries in this example can be used to find paths ending with a node named "Stream".

In [6]:
query_ex2 = (
    ht.QueryMatcher()
    .match("*")
    .rel(
        ".",
        lambda row: row["name"].apply(
            lambda x: re.match(
                "Stream", x
            )
            is not None
        ).all()
    )
)

# applying the second query on the lassen thicket
th_ex2 = th_lassen.query(query_ex2)
print(th_ex2.tree("Total time"))

  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m1.781[0m RAJAPerf[0m
└─ [38;5;34m0.261[0m Stream[0m
   ├─ [38;5;22m0.034[0m Stream_ADD[0m
   ├─ [38;5;22m0.043[0m Stream_COPY[0m
   ├─ [38;5;22m0.108[0m Stream_DOT[0m
   ├─ [38;5;22m0.043[0m Stream_MUL[0m
   └─ [38;5;22m0.034[0m Stream_TRIAD[0m

[4mLegend[0m (Metric: Total time Min: 0.03 Max: 1.78 indices: {'profile': 1814734126})
[38;5;196m█ [0m1.61 - 1.78
[38;5;208m█ [0m1.26 - 1.61
[38;5;220m█ [0m0.91 - 1.26
[38;5;46m█ [0m0.56 - 0.91
[38;5;34m█ [0m0.21 - 0.56
[38;5;22m█ [0m0.03 - 0.21

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



### Example Query 3: Find All Paths with Specific Starting and Ending Nodes

This example shows how to find all call paths starting with and ending with specific nodes. More specifically, the query in this example finds paths starting with a node named "Stream" and ending with a node named "Stream_MUL".

In [7]:
query_ex3 = (
    ht.QueryMatcher()
    .match(
        ".",
        lambda row: row["name"].apply(
            lambda x: re.match(
                "Stream", x
            )
            is not None
        ).all()
    )
    .rel("*")
    .rel(
        ".",
        lambda row: row["name"].apply(
            lambda x: re.match(
                "Stream_MUL", x
            )
            is not None
        ).all()
    )
)

# applying the third query on the lassen thicket
th_ex3 = th_lassen.query(query_ex3)
print(th_ex3.tree("Total time"))

  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m0.261[0m Stream[0m
└─ [38;5;22m0.043[0m Stream_MUL[0m

[4mLegend[0m (Metric: Total time Min: 0.04 Max: 0.26 indices: {'profile': 1814734126})
[38;5;196m█ [0m0.24 - 0.26
[38;5;208m█ [0m0.20 - 0.24
[38;5;220m█ [0m0.15 - 0.20
[38;5;46m█ [0m0.11 - 0.15
[38;5;34m█ [0m0.06 - 0.11
[38;5;22m█ [0m0.04 - 0.06

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



### Example Query 4: Find All Nodes for a Particular Software Library

This example shows how to find all call paths representing a specific software library. This example is simply a variant of finding a subtree with a given root shown in Example Query 1. The example query below can be adapted to find the nodes for a subset of the MPI library, for example. In our example, we look for subtrees rooted at PolyBench_2MM, Basic_DAXPY, and Apps_ENERGY.

In [8]:
api_entrypoints = [
    "Polybench_2MM",
    "Basic_DAXPY",
    "Apps_ENERGY",
]

query_ex4 = (
    ht.QueryMatcher()
    .match(
        ".",
        lambda row: row["name"].apply(
            lambda x: x in api_entrypoints
        ).all()
    )
    .rel("*")
)

# applying the fourth query on the lassen thicket
th_ex4 = th_lassen.query(query_ex4)
print(th_ex4.tree("Total time"))

  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m0.039[0m Apps_ENERGY[0m
[38;5;46m0.017[0m Basic_DAXPY[0m
[38;5;22m0.006[0m Polybench_2MM[0m

[4mLegend[0m (Metric: Total time Min: 0.01 Max: 0.04 indices: {'profile': 1814734126})
[38;5;196m█ [0m0.04 - 0.04
[38;5;208m█ [0m0.03 - 0.04
[38;5;220m█ [0m0.02 - 0.03
[38;5;46m█ [0m0.02 - 0.02
[38;5;34m█ [0m0.01 - 0.02
[38;5;22m█ [0m0.01 - 0.01

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



### Example Query 5: Find All Paths through a Specific Node

This example shows how to find all call paths that pass through a specific node. More specifically, the query below finds all paths that pass through a node named "Stream".

In [9]:
query_ex5 = (
    ht.QueryMatcher()
    .match("*")
    .rel(
        ".",
        lambda row: row["name"].apply(
            lambda x: re.match(
                "Stream", x
            )
            is not None
        ).all()
    )
    .rel("*")
)

# applying the fifth query on the lassen thicket
th_ex5 = th_lassen.query(query_ex5)
print(th_ex5.tree("Total time"))

  _____ _     _      _        _   
 |_   _| |__ (_) ___| | _____| |_ 
   | | | '_ \| |/ __| |/ / _ \ __|
   | | | | | | | (__|   <  __/ |_ 
   |_| |_| |_|_|\___|_|\_\___|\__|  v2024.1.0

[38;5;196m1.781[0m RAJAPerf[0m
└─ [38;5;34m0.261[0m Stream[0m
   ├─ [38;5;22m0.034[0m Stream_ADD[0m
   ├─ [38;5;22m0.043[0m Stream_COPY[0m
   ├─ [38;5;22m0.108[0m Stream_DOT[0m
   ├─ [38;5;22m0.043[0m Stream_MUL[0m
   └─ [38;5;22m0.034[0m Stream_TRIAD[0m

[4mLegend[0m (Metric: Total time Min: 0.03 Max: 1.78 indices: {'profile': 1814734126})
[38;5;196m█ [0m1.61 - 1.78
[38;5;208m█ [0m1.26 - 1.61
[38;5;220m█ [0m0.91 - 1.26
[38;5;46m█ [0m0.56 - 0.91
[38;5;34m█ [0m0.21 - 0.56
[38;5;22m█ [0m0.03 - 0.21

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph

