# Graph-Optimizer beta testing round 2

## Tool description
In short, the Graph-Optimizer tool performs the following functions:
- Predicts the execution time (in milliseconds) and energy consumption (in Joules) for a given BGO or DAG of BGOs on a specific hardware configuration.
- Returns the model in symbolical form with graph properties as symbols or predicts execution times if the graph properties are specified.
- This is done via an API where issuing a POST request to `<api_url>/models` with the BGO DAG and hardware configuration returns an annotated DAG with calibrated symbolical models. Calling `<api_url>/evaluate` with the BGO DAG, hardware configuration, and graph properties returns an annotated DAG with predicted execution times.d


### What is a BGO?
A BGO, or Basic Graph Operation, is an atomic graph operation, that can serve as a building block for constructing larger graph _workloads_.
A single BGO can have multiple implementations, possibly targeting different hardware platforms (e.g., CPU or GPU).
A workload is a Directed Acyclic Graph (DAG) of BGOs, where the nodes are BGOs and the edges represent data dependencies between them.
An example of such a DAG is shown below:

<img style="margin-bottom: -245px" src="dag.svg">

In this DAG, we start with the Betweenness Centrality (BC) BGO, followed by the Breadth First Search (BFS) and Find Max BGO's. Finally, we have the Find Path BGO to conclude. This example is a workload that outputs the path from the root node to the node with the highest betweenness centrality, which is the most popular node in the graph.

### Performance modeling

#### Analytical modeling
An implementation of a BGO can have a symbolic model that describes its execution time and energy consumption as functions of graph properties and hardware characteristics. Specifically, these hardware characteristics refer to the execution times of atomic operations in hardware, or operations considered to be atomic, such as reading a value from memory, writing a value to memory, performing an integer addition, and so on. These characteristics are obtained through _microbenchmarks_ run on the hardware where the BGO will be executed. The values obtained from the microbenchmarks are used to calibrate the symbolic models, which then provide the execution time and energy consumption of the BGO based solely on the graph properties. This approach allows for predicting the execution time and energy consumption of a BGO on a specific hardware configuration without needing to run the BGO on the hardware for any particular input graph.

Such a calibrated model might look like this:

$T_{BGO} = 561n \times 924m + 91n^2$,

where $n$ is the number of nodes in a graph, $m$ is the number of edges in the graph, and $T_{BGO}$ is the execution time of the BGO in milliseconds.

#### Sampling based prediction

#### Benchmarks?

## Steps to use the Graph-Optimizer tool
### Step 1: Specify input DAG of BGO's

The first step in using Graph Optimizer is to specify the input DAG of BGO's. From the above list, select the BGO's you want to use and specify the input DAG.
This DAG should include one or multiple BGOs and their dependencies. The BGO name should match the name of the BGO folder in the `models` directory.
There are currently <n> BGO's available. These are:
- bc (Betweenness Centrality)
- pr (PageRank)
- bfs (Breadth First Search)
- find_max (Find Max)
- find_path (Find Path)
- 

The dependencies should be specified as a list of BGO id's that the current BGO depends on. For instance, consider the following example with multiple BGOs and dependencies, representing the DAG shown in the introduction:

In [None]:
input_dag = [
    {
        "id": 0,
        "name": "pr",
        "dependencies": []
    },
    {
        "id": 1,
        "name": "find_max",
        "dependencies": [0]
    },
    {
        "id": 2,
        "name": "bfs",
        "dependencies": [0]
    },
    {
        "id": 3,
        "name": "find_path",
        "dependencies": [1,2]
    }
]

### Step 2: Specify hardware configuration
The hardware configuration aims to describe all available hardware components in a system or data center. The hardware information is used for the calibration of the performance models.

To specify a custom hardware configuration, you need to provide the configuration in JSON format. This configuration should list all unique available hosts in the data center, including details about CPUs and, if applicable, GPUs. Running microbenchmarks is part of this process and is done automatically with a script. An example configuration is provided below.

In [None]:
hardware = {
    "hosts": [
        {
            "id": 1,
            "name": "H01",
            "cpus": {
                "id": 1,
                "name": "intel xeon",
                "clock_speed": 2.10,
                "cores": 16,
                "threads": 32,
                "wattage": 35,
                "amount": 2,
                "benchmarks": {
                    "T_int_add": 1.739769,
                    "T_int_mult": 0.2799874,
                    "T_int_gt": 0.1126517,
                    "T_int_neq": 0.2243598,
                    "T_float_add": 0.4340147,
                    "T_float_sub": 0.4194657,
                    "T_float_mult": 0.421529,
                    "T_float_div": 0.9876428,
                    "T_float_gt": 0.1107689,
                    "T_q_push": 7.381785,
                    "T_q_front": 14.52919,
                    "T_q_pop": 12.06002,
                    "T_heap_insert_max": 37.04162,
                    "T_heap_extract_min": 73.36278,
                    "T_heap_decrease_key": 11.02303,
                    "T_push_back": 5.958142,
                    "L1_linesize": 64,
                    "L2_linesize": 64,
                    "L3_linesize": 64,
                    "T_L1_read": 2.513609681995777,
                    "T_L2_read": 5.099514592155088,
                    "T_L3_read": 27.465506112424432,
                    "T_DRAM_read": 66.34084759860137
                }
            }
        }
    ]
}

#### Obtaining hardware information

The following code cell will fetch the hardware information for the system that this notebook is running on, and fill it in in the hardware configuration JSON object.


In [None]:
import subprocess

lscpu = {k.strip(): v.strip() for item in subprocess.check_output(['lscpu']).decode('utf-8').split('\n') for k, _, v in [item.partition(':')]}
hardware['hosts'][0]['cpus']['name'] = lscpu['Model name']
hardware['hosts'][0]['cpus']['clock_speed'] = float(lscpu['CPU max MHz']) / 1000
hardware['hosts'][0]['cpus']['cores'] = int(lscpu['Core(s) per socket'])
hardware['hosts'][0]['cpus']['threads'] = int(lscpu['Thread(s) per core']) * hardware['hosts'][0]['cpus']['cores']
hardware['hosts'][0]['cpus']['amount'] = int(lscpu['Socket(s)'])

#### Automated microbenchmarks

Running the following python cell will run the microbenchmarks on your machine (this should take a couple of seconds, probably no longer than a minute), and insert them into the hardware configuration.

The resulting values are the measured values for each operation in nanoseconds.

**Note**: for running the microbenchmarks, g++ is required.

In [None]:
import os
os.chdir("../")
from benchmarks.microbenchmarks import all_benchmarks
import json

# Run microbenchmarks on this machine
local_benchmarks = all_benchmarks()

# Pretty print the benchmarks
print(json.dumps(local_benchmarks, indent=4))

# Assign obtained values to the hardware description
hardware["hosts"][0]["cpus"]["benchmarks"] = local_benchmarks

### Step 3: Run the prediction server

The next step is running the prediction server, and getting the performance models for the input DAG for your specific hardware configurations.
This step can be divided into two subtasks. First we start the prediction server, and then we submit a POST request to the server to get the performance models.

1. Start the server using flask, by running the following command from the root directory:
    ```bash
    flask --app api/api.py run
    ```
    _(If you are using windows, use `python -m flask --app api/api.py run`)_

    This will start the server on `localhost:5000`. Use the following command to start the server on a different port:
    ```bash
    flask --app api/api.py run --port <port_number>
    ```
    You can either do this step, or run the python cell below, which will start the server for you.
    - **Note**: If you run the server via the cell below, make sure to wait for the server to start before proceeding to the next cells.
    - If you are finished with the experiments, you can stop the server by running the last cell in this document, which will terminate the server.
2. Run the prediction by submitting a POST request to the api
    - For obtaining the calibrated symbolical models, issue a post request to `localhost:<port_number>/models`, with the following post data:
        - "input_dag": The input BGO DAG in JSON format, as a string.
        - "hardware": The hardware configuration in JSON format, as a string.



In [None]:
# Start the server
import multiprocessing

port = 5000
url = f'http://localhost:{port}/'

def run_server():
    !flask --app api/api.py run --port {port}

server = multiprocessing.Process(target=run_server)
server.start()

#### Interpreting the results

When submitting the post request to the API, the response will be the input DAG, but annotated with the calibrated symbolical models. The models will be in the form of a string representing a mathematical formula, with the graph properties as parameters.

For demonstration purposes, a dropdown and slider are provided below, which allow you to change the microbenchmarking parameters and see the impact they have on the performance models.

#### Submit a request to the prediction server by executing the cells below, and observe how the models change when altering the microbenchmarking parameters.

In [None]:
import requests
import json

def models_request():
    form_data = {
        'hardware': json.dumps(hardware),
        'bgo_dag': json.dumps(input_dag)
    }
    models_response = requests.post(url + '/models', data=form_data)
    print(models_response.text)
    return models_response.text

In [None]:
# Dropdown and slider functionality
from ipywidgets import interact, dlink, Dropdown, FloatSlider, Button
from IPython.display import Markdown, display, clear_output
from datetime import datetime
import os

clear_output()

microbenchmarks = hardware['hosts'][0]['cpus']['benchmarks']
microbenchmark_name = list(microbenchmarks.keys())[0]
dropdown = Dropdown(options=microbenchmarks.keys(), description='Variable')
slider = FloatSlider(min=0, max=200, step=0.01, description='Value', value=microbenchmarks[microbenchmark_name])
response = None

def save_config(arg):
    now = datetime.now().strftime('%Y-%m-%d.%H.%M.%S')
    dir_name = f'./saved_configs/{now}/'
    os.makedirs(dir_name, exist_ok=False)
    with open(f'{dir_name}/hardware.json', 'w') as hw_file:
        hw_file.write(json.dumps(hardware))
    with open(f'{dir_name}/models.json', 'w') as model_file:
        model_file.write(json.dumps(response))
    print(f'Configuration saved to {dir_name} directory')


def set_microbenchmark_name(variable):
    global microbenchmark_name
    microbenchmark_name = variable


def update_slider_value(x):
    global microbenchmark_name
    microbenchmark_name = x
    slider.value = microbenchmarks[x]
    return slider.value


def update_microbenchmark_value(value):
    global response
    hardware['hosts'][0]['cpus']['benchmarks'][microbenchmark_name] = value
    response = json.loads(models_request())
    return response


dlink((dropdown, 'value'), (slider, 'value'), update_slider_value)
display(Markdown('### Change the value of certain microbenchmarks, and see how they impact the performance models'))
save_button = Button(description='Save configuration')
save_button.on_click(save_config)
display(save_button)
interact(set_microbenchmark_name, variable=dropdown)
interact(update_microbenchmark_value, value=slider);


### Step 4: Specify graph characteristics _(optional)_

The final step is to specify the graph characteristics of a specific graph for which you want to predict the execution time. This can be done by submitting a POST request to the API with the input DAG, hardware configuration, and graph properties. The API will return the input DAG annotated with the predicted execution times.

The graph properties are expressed in a simple JSON format of which an example is given below:

In [None]:
graph_props = {
    "n": 15763,
    "m": 171206,
    "average_degree": 21,
    "directed": False,
    "weighted": False,
    "clustering_coefficient": 0.0132526,
    "triangle_count": 591156,
    "s": 1000
}

This can be useful when analyzing theoretical performance of an algorithm on non-existing graphs. However, when predicting performance for a specific graph, a graph file can be given, of which the properties are automatically extracted.

In [None]:
from ipyfilechooser import FileChooser

fc = FileChooser('data/beta_testing')
fc.filter_pattern = '*.mtx'
display(fc)

In [None]:
import networkx as nx
import scipy as sp
import time

graph_name = fc.selected
g = nx.from_scipy_sparse_array(sp.io.mmread(graph_name))

# Extract graph properties from file
n = len(g.nodes())
m = len(g.edges())
extracted_properties = {
    "n": n,
    "m": m,
    "average_degree": n / m,
    "directed": g.is_directed(),
    "weighted": nx.is_weighted(g),
    "diameter": nx.diameter(g),
    "clustering_coefficient": nx.average_clustering(g),
    "triangle_count": sum(nx.triangles(g).values()) // 3
}

print(extracted_properties)
graph_props = extracted_properties

In [None]:
def evaluate_request():
    form_data = {
        'hardware': json.dumps(hardware),
        'bgo_dag': json.dumps(input_dag),
        'graph_props': json.dumps(graph_props)
    }
    evaluate_response = requests.post(url + '/evaluate', data=form_data)
    print(evaluate_response.text)

def set_num_nodes(value):
    graph_props["n"] = value
    evaluate_request()

display(Markdown('### Change the number of nodes in the graph, and see how it impacts the performance and energy predictions'))
interact(set_num_nodes, value=FloatSlider(min=100, max=100000, step=1, description='#nodes', value=graph_props["n"]));

#### Sampling based performance prediction
Explanation here

In [None]:
sampling_rate = 0.1
sampled_graph_name = 'test.mtx'
!./sampling/main {graph_name} {sampling_rate} {sampled_graph_name}

In [None]:
def run_benchmark(bgo, graph_file):
    result = subprocess.check_output(['python3', './autobench/run_bench.py', bgo, '--data', f'G={graph_file}']).decode('utf-8').rstrip().split('\n')[-2:]
    values = dict(zip(*map(lambda x: x.split(','), result)))

    return values['runtime_ns'], values['energy_joules']

def run_and_print(bgo, graph_file):
    time, energy = run_benchmark(bgo, graph_file)
    print(f"Running {bgo} on {graph_file} took {time}ns and {energy}Joules")
    
run_and_print('./bgo/pr/CPU/pr_sequential', sampled_graph_name)
run_and_print('./bgo/pr/CPU/pr_sequential', graph_name)

run_and_print('./bgo/pr/GPU/edgelist', sampled_graph_name)
run_and_print('./bgo/pr/GPU/edgelist', graph_name)

#### Automatic validation

### Stop server

In [None]:
# Stop the server
server.terminate()