# [PUBLIC] Analysis of CLBlast tuning

<a id="overview"></a>
## Overview

This Jupyter Notebook analyses the performance that CLBlast achieves across a range of routines, sizes and configurations.

<a id="data"></a>
## Get the experimental data from DropBox

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook. 

The experimental data was collected on the experimental platform and archived as follows:
```
$ cd `ck find ck-math:script:<...>`
$ python <...>.py
$ ck zip local:experiment:* --archive_name=<...>.zip
```

It can be downloaded and extracted as follows:
```
$ wget <...>.zip
$ ck add repo:<....> --zip=<....>.zip --quiet
```

<a id="code"></a>
## Data wrangling code

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook.

### Includes

#### Standard

In [None]:
import os
import sys
import json
import re

#### Scientific

If some of the scientific packages are missing, please install them using:
```
# pip install jupyter pandas numpy matplotlib
```

In [None]:
import IPython as ip
import pandas as pd
import numpy as np
import matplotlib as mp

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Matplotlib version: %s' % mp.__version__)

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline

In [None]:
from IPython.display import Image
from IPython.core.display import HTML

#### Collective Knowledge

If CK is not installed, please install it using:
```
# pip install ck
```

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Define helper functions

In [None]:
# Calculate FLOPS (flops/second) for C = alpha * A * B + beta * C,
# where A is a MxK matrix and B is a KxN matrix.
def FLOPS(alpha, beta, M, K, N, time_ms):
    time_s = 1e-3 * time_ms
    flops_AB = 2*M*N*K if alpha!=0 else 0
    flops_C = 2*M*N if beta!=0 else 0
    flops = flops_AB + flops_C
    return flops / time_s

In [None]:
# Calculate GFLOPS (Gigaflops/second).
def GFLOPS(alpha, beta, M, K, N, time_ms):
    return 1e-9 * FLOPS(alpha, beta, M, K, N, time_ms)

### Access the experimental data

In [None]:
def get_experimental_results(repo_uoa='local', tags='explore-clblast-matrix-size'):
    module_uoa = 'experiment'
    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
    if r['return']>0:
        print ("Error: %s" % r['error'])
        exit(1)
    experiments = r['lst']
    
    dfs = []
    for experiment in experiments:
        data_uoa = experiment['data_uoa']
        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
        if r['return']>0:
            print ("Error: %s" % r['error'])
            exit(1)

        for point in r['points']:
            with open(os.path.join(r['path'], 'ckp-%s.0001.json' % point)) as point_file:
                point_data_raw = json.load(point_file)
                # Obtain column data.
                run_characteristics = [
                    {
                        'arg_alpha' : np.float64(characteristics['run']['arg_alpha']),
                        'arg_beta'  : np.float64(characteristics['run']['arg_beta']),
                        'arg_m'     : np.int64(characteristics['run']['arg_m']),
                        'arg_n'     : np.int64(characteristics['run']['arg_n']),
                        'arg_k'     : np.int64(characteristics['run']['arg_k']),
                        'strategy'  : tuner_output['strategy'],
                        'kernel'    : config['kernel'],
                        'config'    : config['parameters'],
                        'ms'        : np.float64(config['time'])
                    }
                    for characteristics in point_data_raw['characteristics_list']
                    for tuner_output in characteristics['run']['data']
                    for config in tuner_output['result']
                ]
            # Construct a DataFrame.
            df = pd.DataFrame(run_characteristics)
            df['GFLOPS'] = df.apply(lambda row:
                GFLOPS(row['arg_alpha'], row['arg_beta'], row['arg_m'], row['arg_k'], row['arg_n'], row['ms']),
                axis=1)
            # Set columns and index names.
            df.columns.name = 'characteristic'
            df.index.name = 'configuration' # TODO: add repetition
            df = df.set_index(
                ['kernel', 'strategy', 'arg_alpha', 'arg_beta', 'arg_m', 'arg_k', 'arg_n'], append=True)
            df = df.reorder_levels(
                ['kernel', 'strategy', 'arg_alpha', 'arg_beta', 'arg_m', 'arg_k', 'arg_n', 'configuration'])
            # Append to the list of similarly constructed DataFrames.
            dfs.append(df)
    # Concatenate all constructed DataFrames (i.e. stack on top of each other).
    result = pd.concat(dfs)
    return result.sortlevel(result.index.names)        

In [None]:
df = get_experimental_results(tags='explore-clblast-matrix-size,xgemm-fp32')

In [None]:
pd.options.display.max_columns = len(df.columns)
pd.options.display.max_rows = len(df.index)
# df

In [None]:
df_xgemm_exhaustive = df.loc['Xgemm', 'exhaustive']
df_xgemm_exhaustive

In [None]:
df_xgemm_exhaustive.groupby(level=df_xgemm_exhaustive.index.names[:-1])['GFLOPS'].min()

In [None]:
df_xgemm_exhaustive.groupby(level=df_xgemm_exhaustive.index.names[:-1])['GFLOPS'].max()

In [None]:
max_GFLOPS = df_xgemm_exhaustive.loc[df_xgemm_exhaustive['GFLOPS'].argmax()]['GFLOPS']
max_GFLOPS