# CallFlow interface demo
CallFlow exposes a Python package callflow that provides functionality to load and manipulate callgraphs.

CallFlow is structured as three components:

* A Python package callflow that provides functionality to load and manipulate callgraphs.
* A D3 based app for visualization.
* A python server to support the visualization client.

In [1]:
import hatchet as ht
import pandas as pd
import os

In [3]:
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:,.2f}'.format

In [4]:
# CallFlow imports
import callflow
from callflow import CallFlow
from callflow.operations import ArgParser

## CallFlow Python package
In particular, CallFlow exposes 4 main classes to handle data structures.
* <strong>GraphFrame</strong> - contains Hatchet's GraphFrame along with some functionality that callflow introduces. (e.g., nxGraph).
* <strong>CallFlow</strong> - to interface between the client API endpoints and other functionality.
* <strong>SuperGraph</strong> - to handle processing of a an input dataset.
* <strong>EnsembleSuperGraph</strong> - to handle processing of an ensemble of datasets

In addition, it exposes a set of modules whose functionality could be useful.
* <strong> algorithms </strong> - Algorithms to compute similarity (graph) using distance metrics and DR calculation.
* <strong> layout </strong> - Computes a nxGraph output based on the layout desired (e.g., node-link for CCT, Sankey for supergraph, and icicle plot for module hierarchy. 
* <strong> modules </strong> - Exposes interactions performed in callflow (e.g., splitting, hierarchy, histograms, scatterplot, box plots, etc. All of them are exposed as API endpoints that can be queried using sockets. 
* <strong> operations </strong> - Filter, group and union operation on single/ensemble of graphs.

In [5]:
dir(callflow)

['CallFlow',
 'EnsembleGraph',
 'GraphFrame',
 'SuperGraph',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_load_ipython_extension',
 'algorithms',
 'callflow',
 'datastructures',
 'get_logger',
 'init_logger',
 'layout',
 'load_ipython_extension',
 'logger',
 'modules',
 'operations',
 'server',
 'timer',
 'utils']

First, we need a config file that specifies the files to be loaded into the interface. We plan to remove this step next and automate it.

In [6]:
# Single dataset --config mode
config_file = os.path.abspath("../data/hpctoolkit-cpi-databases/callflow.config.json")

In [10]:
# Single dataset --data_dir mode
data_dir = os.path.abspath("../data/caliper-cali")

In [11]:
# Set the profile format
profile_format = "hpctoolkit"

ConfigFileReader is a module that helps process the provided config JSON object.

In [13]:
# Read config file.
args = ArgParser("--data_dir " + data_dir + " --profile_format " + profile_format)
# cargs = ArgParser("--config " + config_file) 
args.config

FileNotFoundError: [Errno 2] No such file or directory: '/Users/jarus/Work/llnl/CallFlow/data/caliper-cali/.callflow/callflow.config.json'

In [9]:
cargs.config

NameError: name 'cargs' is not defined

In this demo, I will be focusing on what the `CallFlow` class exposes. First step is to create a callflow object. The parameter, `ensemble` determines whether CallFlow loads a 'single' or 'ensemble' version. 

In [None]:
scf = CallFlow(config=args.config, ensemble=False)

In [None]:
print(args.config)

## Processing datasets
Processing step creates a `.callflow` directory that contains all the processed information. `.callflow` directory is placed in the `save_path` provided using the `config` file.

* .callflow
    * dataset1
        * auxiliary_data.json 
        * df.csv (contains dataframe)
        * nxg.json (contains nxGraph)
    * ...dataset
    * ensemble
        * auxiliary_data.json
        * df.csv (contains dataframe)
        * nxg.json (contains nxGraph)
    

In [None]:
scf.process()

## Loading the supergraphs.
if the preprocessing is already done, we can directly load the supergraphs from `.callflow` directory.

In [None]:
scf.load()

In [None]:
scf.supergraphs

Internally, `SuperGraph` class contains the Hatchet's GraphFrame.

In [None]:
dir(scf.supergraphs['hpctoolkit-cpi-database-base'].gf)

## Socket requests
The socket endpoints are exposed using `request_single` and `request_ensemble` function calls. Both these calls require an input object that specifies what action to perform on the data. 

```
{
    "name": String // action to perform, required.
    "dataset": Array // datasets to perform the action.
    ...other attributes // Each request has its own set of parameters that are required.
}
```

In [None]:
payload = {"name": "cct", "dataset": "hpctoolkit-cpi-database-base", "functionsInCCT": 50}

### CCT

In [None]:
scct = scf.request_single(payload)
print(f"Nodes (count = {len(scct.nodes())}) are: {scct.nodes(data=True)}")
print("\n")
print(f"Edges (count = {len(scct.edges())}) are: {scct.edges(data=True)}")

### Auxiliary information

Auxiliary information contains per-callsite and per-module information, that makes it feasible to posses the information for interactions that are performed using CallFlow, in place rather than querying frequently to the server. 

PS: This could lead to huge JSONs for large HPCtoolkit data. To avoid this, I have implemented a faster lookup/fetch using HDF5 to create per-callsite and per-module storage. But this feature is not part of master yet. 

In [None]:
auxiliary = scf.request_single({"name": "auxiliary", "dataset": "hpctoolkit-cpi-database-base"})
print(auxiliary)

### SuperGraph 

In [None]:
ssg = scf.request_single({"name": "supergraph", "groupBy": "module", "dataset":"hpctoolkit-cpi-database-base"})

In [None]:
print(f"Nodes (count = {len(ssg.nodes())}) are: {ssg.nodes(data=False)}")
print("\n")
print(f"Edges (count = {len(ssg.edges())}) are: {ssg.edges(data=False)}")