# CallFlow interface demo
CallFlow exposes a Python package callflow that provides functionality to load and manipulate callgraphs.

CallFlow is structured as three components:

* A Python package callflow that provides functionality to load and manipulate callgraphs.
* A D3 based app for visualization.
* A python server to support the visualization client.

In [1]:
import hatchet as ht
import pandas as pd
import os

In [2]:
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:,.2f}'.format

In [3]:
# CallFlow imports
import callflow
from callflow import CallFlow
from callflow.operations import ConfigFileReader

## CallFlow Python package
In particular, CallFlow exposes 4 main classes to handle data structures.
* <strong>GraphFrame</strong> - contains Hatchet's GraphFrame along with some functionality that callflow introduces. (e.g., nxGraph).
* <strong>CallFlow</strong> - to interface between the client API endpoints and other functionality.
* <strong>SuperGraph</strong> - to handle processing of a an input dataset.
* <strong>EnsembleSuperGraph</strong> - to handle processing of an ensemble of datasets

In addition, it exposes a set of modules whose functionality could be useful.
* <strong> algorithms </strong> - Algorithms to compute similarity (graph) using distance metrics and DR calculation.
* <strong> layout </strong> - Computes a nxGraph output based on the layout desired (e.g., node-link for CCT, Sankey for supergraph, and icicle plot for module hierarchy. 
* <strong> modules </strong> - Exposes interactions performed in callflow (e.g., splitting, hierarchy, histograms, scatterplot, box plots, etc. All of them are exposed as API endpoints that can be queried using sockets. 
* <strong> operations </strong> - Filter, group and union operation on single/ensemble of graphs.

In [4]:
dir(callflow)

['CallFlow',
 'EnsembleGraph',
 'GraphFrame',
 'SuperGraph',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '__version_info__',
 'algorithms',
 'callflow',
 'datastructures',
 'get_logger',
 'init_logger',
 'layout',
 'logger',
 'modules',
 'operations',
 'timer']

First, we need a config file that specifies the files to be loaded into the interface. We plan to remove this step next and automate it.

In [5]:
# Single dataset
singleConfigPath = os.path.abspath("../data/hpctoolkit-cpi-database/callflow.config.json")

ConfigFileReader is a module that helps process the provided config JSON object.

In [7]:
# Read config file.
singleConfig = ConfigFileReader(singleConfigPath)

In this demo, I will be focusing on what the `CallFlow` class exposes. First step is to create a callflow object. The parameter, `ensemble` determines whether CallFlow loads a 'single' or 'ensemble' version. 

In [8]:
scf = CallFlow(config=singleConfig, ensemble=False)

## Processing datasets
Processing step creates a `.callflow` directory that contains all the processed information. `.callflow` directory is placed in the `save_path` provided using the `config` file.

* .callflow
    * dataset1
        * auxiliary_data.json 
        * df.csv (contains dataframe)
        * nxg.json (contains nxGraph)
    * ...dataset
    * ensemble
        * auxiliary_data.json
        * df.csv (contains dataframe)
        * nxg.json (contains nxGraph)
    

In [9]:
scf.process()

Processing:  calc-pi
Times:
    Calculate Histograms: 0.00s
    Module callsite map data: 0.00s
    Writing data:        0.00s



  exclusive[node_name] = (max_excTime - mean_excTime) / mean_excTime


## Loading the supergraphs.
if the preprocessing is already done, we can directly load the supergraphs from `.callflow` directory.

In [11]:
scf.load()

In [12]:
scf.supergraphs

{'1-core': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b4c352610>,
 '8-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b7389a450>,
 '27-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b4c352190>,
 '64-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b7603cf50>,
 '125-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b73caf7d0>,
 '216-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b73ca11d0>,
 '343-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b73ca10d0>,
 '512-cores': <callflow.datastructures.supergraph.SuperGraph at 0x7f6b73a40110>,
 'ensemble': <callflow.datastructures.ensemblegraph.EnsembleGraph at 0x7f6b3fdf04d0>}

Internally, `SuperGraph` class contains the Hatchet's GraphFrame.

In [13]:
dir(scf.supergraphs['calc-pi'].gf)

['_FILENAMES',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__init__',
 '__init_subclass__',
 '__isub__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__weakref__',
 '_init_sum_columns',
 '_operator',
 'add',
 'add_prefix',
 'copy',
 'deepcopy',
 'df',
 'drop_index_levels',
 'filter',
 'filter_by_name',
 'from_caliper',
 'from_caliper_json',
 'from_config',
 'from_gprof_dot',
 'from_hatchet',
 'from_hpctoolkit',
 'from_lists',
 'from_literal',
 'get_top_by_attr',
 'graph',
 'hatchet_graph_to_nxg',
 'leaves_below',
 'lookup',
 'lookup_with_name',
 'lookup_with_node',
 'lookup_with_vis_nodeName',
 'nxg',
 'print_information',
 'read',
 'squash',
 'sub',
 'subgraph_sum',
 'subtree_sum',
 'tailhead',
 'tailheadDir',
 'to_dot

## Socket requests
The socket endpoints are exposed using `request_single` and `request_ensemble` function calls. Both these calls require an input object that specifies what action to perform on the data. 

```
{
    "name": String // action to perform, required.
    "dataset": Array // datasets to perform the action.
    ...other attributes // Each request has its own set of parameters that are required.
}
```

### CCT

In [15]:
scct = scf.request_single({"name": "cct", "dataset": "calc-pi", "functionsInCCT": 50})
print(f"Nodes (count = {len(scct.nodes())}) are: {scct.nodes(data=True)}")
print("\n")
print(f"Edges (count = {len(scct.edges())}) are: {scct.edges(data=True)}")

Nodes (count = 12) are: [('<program root>', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '<program root>', 'module': 'libmonitor.so.0.0.0'}), ('main', {'time (inc)': 1000306.0, 'time': 0.0, 'name': 'main', 'module': 'cpi'}), ('62:MPI_Finalize', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '62:MPI_Finalize', 'module': 'libmonitor.so.0.0.0'}), ('PMPI_Finalize', {'time (inc)': 1000306.0, 'time': 0.0, 'name': 'PMPI_Finalize', 'module': 'libmpi.so.12.0.5'}), ('294:MPID_Finalize', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '294:MPID_Finalize', 'module': 'libmpi.so.12.0.5'}), ('162:MPIDI_CH3_Finalize', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '162:MPIDI_CH3_Finalize', 'module': 'libmpi.so.12.0.5'}), ('230:psm_dofinalize', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '230:psm_dofinalize', 'module': 'libmpi.so.12.0.5'}), ('36:<unknown procedure>', {'time (inc)': 1000306.0, 'time': 0.0, 'name': '36:<unknown procedure>', 'module': 'libpsm_infinipath.so.1.14'}), ('<unknown procedur

### Auxiliary information

Auxiliary information contains per-callsite and per-module information, that makes it feasible to posses the information for interactions that are performed using CallFlow, in place rather than querying frequently to the server. 

PS: This could lead to huge JSONs for large HPCtoolkit data. To avoid this, I have implemented a faster lookup/fetch using HDF5 to create per-callsite and per-module storage. But this feature is not part of master yet. 

In [24]:
auxiliary = scf.request_single({"name": "auxiliary", "dataset": "calc-pi"})
print(auxiliary)

{'callsite': {'calc-pi': {'162:MPIDI_CH3_Finalize': {'name': '162:MPIDI_CH3_Finalize', 'time (inc)': [999238.0, 999390.0, 1000306.0, 999308.0], 'time': [0.0, 0.0, 0.0, 0.0], 'sorted_time (inc)': [999238.0, 999308.0, 999390.0, 1000306.0], 'sorted_time': [0.0, 0.0, 0.0, 0.0], 'rank': [0, 1, 2, 3], 'id': 'node-33', 'mean_time (inc)': 999560.5, 'mean_time': 0.0, 'max_time (inc)': 1000306.0, 'max_time': 0.0, 'min_time (inc)': 999238.0, 'min_time': 0.0, 'dataset': ['calc-pi', 'calc-pi', 'calc-pi', 'calc-pi'], 'module': 'libmpi.so.12.0.5', 'hist_time (inc)': {'x': [25007.65, 75022.95000000001, 125038.25000000001, 175053.55000000002, 225068.85, 275084.15, 325099.45000000007, 375114.75, 425130.05000000005, 475145.35, 525160.65, 575175.9500000001, 625191.25, 675206.55, 725221.8500000001, 775237.15, 825252.4500000001, 875267.75, 925283.05, 975298.3500000001], 'y': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4], 'x_min': 25007.65, 'x_max': 975298.3500000001, 'y_min': 0.0, 'y_max': 4.

### SuperGraph 

In [25]:
ssg = scf.request_single({"name": "supergraph", "groupBy": "module", "dataset": "calc-pi"})

In [28]:
print(f"Nodes (count = {len(ssg.nodes())}) are: {ssg.nodes(data=False)}")
print("\n")
print(f"Edges (count = {len(ssg.edges())}) are: {ssg.edges(data=False)}")

Nodes (count = 6) are: ['libmonitor.so.0.0.0', 'cpi', 'libmpi.so.12.0.5', 'libpsm_infinipath.so.1.14', 'Unknown', 'libc-2.12.so']


Edges (count = 6) are: [('libmonitor.so.0.0.0', 'cpi'), ('libmonitor.so.0.0.0', 'libmpi.so.12.0.5'), ('libmpi.so.12.0.5', 'libpsm_infinipath.so.1.14'), ('libpsm_infinipath.so.1.14', 'Unknown'), ('libpsm_infinipath.so.1.14', 'libc-2.12.so'), ('libc-2.12.so', 'Unknown')]
