# Query Language: Thicket Tutorial

Thicket is a python-based toolkit for Exploratory Data Analysis (EDA) of parallel performance data that enables performance optimization and understanding of applications’ performance on supercomputers. It bridges the performance tool gap between being able to consider only a single instance of a simulation run (e.g., single platform, single measurement tool, or single scale) and finding actionable insights in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets.

#### NOTE: An interactive version of this notebook is available in the Binder environment.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/llnl/thicket-tutorial/develop)

***

## 1. Import Necessary Packages

To explore the structure and various capabilities of thicket components, we begin by importing necessary packages. These include python extensions and thicket's statistical functions.

In [1]:
import re

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display
from IPython.display import HTML
import hatchet as ht

import thicket as tt

display(HTML("<style>.container { width:80% !important; }</style>"))

<IPython.core.display.Javascript object>

## 2. Read in Performance Profiles

For this notebook, we select profiles generated on Lawrence Livermore National Lab(LLNL) machine, lassen. We create two thicket objects, one generated with the same problem size of 1048576 and the other generated with different problem sizes (1048576 and 4194304).   

In [2]:
lassen1 = [f"../data/lassen/XL_BaseCuda_01048576_0{x}.cali" for x in range(1, 4)]
lassen2 = [f"../data/lassen/XL_BaseCuda_04194304_01.cali"]

# generate thicket(s)
th_lassen = tt.Thicket.from_caliperreader(lassen1)
th_obj = tt.Thicket.from_caliperreader(lassen1+lassen2)

## 3. More Information on a Function
***
You can use the help() method within Python to see the information for a given object. You can do this by typing help(object). 
This will allow you to see the arguments for the function, and what will be returned. An example is below.

In [3]:
help(tt.median)

Help on function median in module thicket.stats.median:

median(thicket, columns=None)
    Calculate the median for each node in the performance data table.
    
    Designed to take in a thicket, and append one or more columns to the
    aggregated statistics table for the median calculation for each node.
    
    Arguments:
        thicket (thicket): Thicket object
        columns (list): List of hardware/timing metrics to perform median calculation
            on. Note, if using a columnar joined thicket a list of tuples must be passed
            in with the format (column index, column name).



## 4. Append Statistical Calculation(s)
***
In order to attach a metric to the call tree of the thicket object, we first perform a relevant statistical calculation on the performance data and append the values relating to the nodes onto the aggregated statistics table. In the example below, we append the median of each node.

In [4]:
metrics = ["Total time (exc)"]
tt.median(th_lassen, columns=metrics)
th_lassen.statsframe.dataframe

Unnamed: 0_level_0,name,Total time (exc)_median
node,Unnamed: 1_level_1,Unnamed: 2_level_1
"{'name': 'Base_CUDA', 'type': 'function'}",Base_CUDA,0.000636
"{'name': 'Algorithm', 'type': 'function'}",Algorithm,0.000048
"{'name': 'Algorithm_MEMCPY', 'type': 'function'}",Algorithm_MEMCPY,0.000016
"{'name': 'Algorithm_MEMCPY.block_128', 'type': 'function'}",Algorithm_MEMCPY.block_128,0.002440
"{'name': 'Algorithm_MEMCPY.library', 'type': 'function'}",Algorithm_MEMCPY.library,0.002609
...,...,...
"{'name': 'Stream_DOT.block_128', 'type': 'function'}",Stream_DOT.block_128,0.113655
"{'name': 'Stream_MUL', 'type': 'function'}",Stream_MUL,0.000011
"{'name': 'Stream_MUL.block_128', 'type': 'function'}",Stream_MUL.block_128,0.043271
"{'name': 'Stream_TRIAD', 'type': 'function'}",Stream_TRIAD,0.000008


## 4. Thicket Query Language 
***
You can use the help() method within Python to see the information for a given object. You can do this by typing help(object). 
This will allow you to see the arguments for the function, and what will be returned. An example is below.

#### Use the Query Language

Thicket's query language provides users the capability to select or `query` specific nodes based on the call tree of the thicket. The performance data is then updated as part of the operation. 

**Initial call tree:** 

In [5]:
print(th_lassen.statsframe.tree("Total time (exc)_median"))

    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.001[0m Base_CUDA[0m
├─ [38;5;22m0.000[0m Algorithm[0m
│  ├─ [38;5;22m0.000[0m Algorithm_MEMCPY[0m
│  │  ├─ [38;5;22m0.002[0m Algorithm_MEMCPY.block_128[0m
│  │  └─ [38;5;22m0.003[0m Algorithm_MEMCPY.library[0m
│  ├─ [38;5;22m0.000[0m Algorithm_MEMSET[0m
│  │  ├─ [38;5;22m0.001[0m Algorithm_MEMSET.block_128[0m
│  │  └─ [38;5;22m0.001[0m Algorithm_MEMSET.library[0m
│  ├─ [38;5;22m0.000[0m Algorithm_REDUCE_SUM[0m
│  │  ├─ [38;5;22m0.003[0m Algorithm_REDUCE_SUM.block_128[0m
│  │  └─ [38;5;22m0.002[0m Algorithm_REDUCE_SUM.cub[0m
│  └─ [38;5;22m0.000[0m Algorithm_SCAN[0m
│     └─ [38;5;22m0.003[0m Algorithm_SCAN.default[0m
├─ [38;5;22m0.000[0m Apps[0m
│  ├─ [38;5;22m0.000[0m Apps_CONVECTION3DPA[0m
│  │  └─ [38;5;22m0.003[0m Apps

## Ex1: Find a Subgraph with a Specific Root

In [12]:
query_ex1 = (
    ht.QueryMatcher()
    .match (
        ".", 
        lambda row: row["name"].apply(
        lambda x: re.match(
            "Stream", x
        )
        is not None).all()
    )
    .rel("*")
)

# applying the second query on the lassen thicket
th_ex1 = th_lassen.query(query_ex1)
tt.median(th_ex1, columns=["Total time (exc)"])
print(th_ex1.statsframe.tree("Total time (exc)_median"))

  ht.QueryMatcher()


    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.000[0m Stream[0m
├─ [38;5;22m0.000[0m Stream_ADD[0m
│  └─ [38;5;34m0.034[0m Stream_ADD.block_128[0m
├─ [38;5;22m0.000[0m Stream_COPY[0m
│  └─ [38;5;46m0.043[0m Stream_COPY.block_128[0m
├─ [38;5;22m0.000[0m Stream_DOT[0m
│  └─ [38;5;196m0.114[0m Stream_DOT.block_128[0m
├─ [38;5;22m0.000[0m Stream_MUL[0m
│  └─ [38;5;46m0.043[0m Stream_MUL.block_128[0m
└─ [38;5;22m0.000[0m Stream_TRIAD[0m
   └─ [38;5;34m0.034[0m Stream_TRIAD.block_128[0m

[4mLegend[0m (Metric: Total time (exc)_median Min: 0.00 Max: 0.11)
[38;5;196m█ [0m0.10 - 0.11
[38;5;208m█ [0m0.08 - 0.10
[38;5;220m█ [0m0.06 - 0.08
[38;5;46m█ [0m0.03 - 0.06
[38;5;34m█ [0m0.01 - 0.03
[38;5;22m█ [0m0.00 - 0.01

name[0m User code    [38;5;160m◀ [0m Only in left graph    [3

## Ex2: Find All Paths Ending with a Specific Node

In [14]:
query_ex2 = (
    ht.QueryMatcher()
    .match ("*")
    .rel(".",
         lambda row: row["name"].apply(
        lambda x: re.match(
            "Stream", x
        )
        is not None).all()
        )
)

Find All Paths Ending with a Specific Node

# applying the second query on the lassen thicket
th_ex2 = th_lassen.query(query_ex2)
tt.median(th_ex2, columns=["Total time (exc)"])
print(th_ex2.statsframe.tree("Total time (exc)_median"))

    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.001[0m Base_CUDA[0m
└─ [38;5;22m0.000[0m Stream[0m
   ├─ [38;5;22m0.000[0m Stream_ADD[0m
   │  └─ [38;5;34m0.034[0m Stream_ADD.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_COPY[0m
   │  └─ [38;5;46m0.043[0m Stream_COPY.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_DOT[0m
   │  └─ [38;5;196m0.114[0m Stream_DOT.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_MUL[0m
   │  └─ [38;5;46m0.043[0m Stream_MUL.block_128[0m
   └─ [38;5;22m0.000[0m Stream_TRIAD[0m
      └─ [38;5;34m0.034[0m Stream_TRIAD.block_128[0m

[4mLegend[0m (Metric: Total time (exc)_median Min: 0.00 Max: 0.11)
[38;5;196m█ [0m0.10 - 0.11
[38;5;208m█ [0m0.08 - 0.10
[38;5;220m█ [0m0.06 - 0.08
[38;5;46m█ [0m0.03 - 0.06
[38;5;34m█ [0m0.01 - 0.03
[38;5;22m█ [0m0.00 - 0.01

  ht.QueryMatcher()


## Ex3: Find All Paths with Specific Starting and Ending Nodes

In [22]:
query_ex3 = (
    ht.QueryMatcher()
    .match (".",
            lambda row: row["name"].apply(
                lambda x: re.match(
                    "Stream", x
                )
                is not None).all()
           )
    .rel("*")
    .rel(".",
         lambda row: row["name"].apply(
        lambda x: re.match(
            "Stream_MUL", x
        )
        is not None).all()
        )
)



# applying the second query on the lassen thicket
th_ex3 = th_lassen.query(query_ex3)
tt.median(th_ex3, columns=["Total time (exc)"])
print(th_ex3.statsframe.tree("Total time (exc)_median"))

    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.000[0m Stream[0m
└─ [38;5;22m0.000[0m Stream_MUL[0m
   └─ [38;5;196m0.043[0m Stream_MUL.block_128[0m

[4mLegend[0m (Metric: Total time (exc)_median Min: 0.00 Max: 0.04)
[38;5;196m█ [0m0.04 - 0.04
[38;5;208m█ [0m0.03 - 0.04
[38;5;220m█ [0m0.02 - 0.03
[38;5;46m█ [0m0.01 - 0.02
[38;5;34m█ [0m0.00 - 0.01
[38;5;22m█ [0m0.00 - 0.00

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



  ht.QueryMatcher()


## Ex4: Find All Nodes for a Particular Software Library

In [48]:
api_entrypoints = [
            "Polybench_2MM",
            "Basic_DAXPY",
            "Apps_ENERGY",
         ]
'''
query = (
            Query()
            .match("*")
            .rel(".", lambda row: row["name"] == "corge")
            .rel("*")
         )
'''


query_ex4 = (
    ht.QueryMatcher()
    .match (".",
            lambda row: row["name"].apply(
                lambda x: x in api_entrypoints).all()
           )
    .rel("*")
)



# applying the second query on the lassen thicket
th_ex4 = th_lassen.query(query_ex4)
tt.median(th_ex4, columns=["Total time (exc)"])
print(th_ex4.statsframe.tree("Total time (exc)_median"))

  ht.QueryMatcher()


    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.000[0m Apps_ENERGY[0m
└─ [38;5;196m0.039[0m Apps_ENERGY.block_128[0m
[38;5;22m0.000[0m Basic_DAXPY[0m
└─ [38;5;46m0.017[0m Basic_DAXPY.block_128[0m
[38;5;22m0.000[0m Polybench_2MM[0m
└─ [38;5;34m0.006[0m Polybench_2MM.block_128[0m

[4mLegend[0m (Metric: Total time (exc)_median Min: 0.00 Max: 0.04)
[38;5;196m█ [0m0.04 - 0.04
[38;5;208m█ [0m0.03 - 0.04
[38;5;220m█ [0m0.02 - 0.03
[38;5;46m█ [0m0.01 - 0.02
[38;5;34m█ [0m0.00 - 0.01
[38;5;22m█ [0m0.00 - 0.00

name[0m User code    [38;5;160m◀ [0m Only in left graph    [38;5;28m▶ [0m Only in right graph



## Ex5: Find All Paths through a Specific Node

In [29]:
query_ex5 = (
    ht.QueryMatcher()
    .match ("*")
    .rel(".",
         lambda row: row["name"].apply(
        lambda x: re.match(
            "Stream", x
        )
        is not None).all()
        )
    .rel("*")
)



# applying the second query on the lassen thicket
th_ex5 = th_lassen.query(query_ex5)
tt.median(th_ex5, columns=["Total time (exc)"])
print(th_ex5.statsframe.tree("Total time (exc)_median"))

  ht.QueryMatcher()


    __          __       __         __ 
   / /_  ____ _/ /______/ /_  ___  / /_
  / __ \/ __ `/ __/ ___/ __ \/ _ \/ __/
 / / / / /_/ / /_/ /__/ / / /  __/ /_  
/_/ /_/\__,_/\__/\___/_/ /_/\___/\__/  v2023.1.0

[38;5;22m0.001[0m Base_CUDA[0m
└─ [38;5;22m0.000[0m Stream[0m
   ├─ [38;5;22m0.000[0m Stream_ADD[0m
   │  └─ [38;5;34m0.034[0m Stream_ADD.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_COPY[0m
   │  └─ [38;5;46m0.043[0m Stream_COPY.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_DOT[0m
   │  └─ [38;5;196m0.114[0m Stream_DOT.block_128[0m
   ├─ [38;5;22m0.000[0m Stream_MUL[0m
   │  └─ [38;5;46m0.043[0m Stream_MUL.block_128[0m
   └─ [38;5;22m0.000[0m Stream_TRIAD[0m
      └─ [38;5;34m0.034[0m Stream_TRIAD.block_128[0m

[4mLegend[0m (Metric: Total time (exc)_median Min: 0.00 Max: 0.11)
[38;5;196m█ [0m0.10 - 0.11
[38;5;208m█ [0m0.08 - 0.10
[38;5;220m█ [0m0.06 - 0.08
[38;5;46m█ [0m0.03 - 0.06
[38;5;34m█ [0m0.01 - 0.03
[38;5;22m█ [0m0.00 - 0.01