# [PUBLIC] CLBlast vs ARM Compute Library on representative matrix sizes

## Overview
1. [Data](#data) [for developers]
1. [Code](#data) [for developers]
1. [Table](#table)
1. [Plot](#plot)

<a id="data"></a>
## Get the experimental data

In [None]:
repo_uoa = 'explore-matrix-size-gemm-libs-odroid-xu3'

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook. 

The experimental data was collected on the experimental platform and archived as follows:
```
$ cd `ck find ck-math:script:<...>`
$ python <...>.py
$ ck zip local:experiment:* --archive_name=<...>.zip
```

It can be downloaded and extracted as follows:
```
$ wget <...>.zip
$ ck add repo:<repo_uoa> --zip=<....>.zip --quiet
```

<a id="code"></a>
## Data wrangling code

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook.

### Includes

#### Standard

In [None]:
import os
import sys
import json
import re

#### Scientific

If some of the scientific packages are missing, please install them using:
```
# pip install jupyter pandas numpy matplotlib
```

In [None]:
import IPython as ip
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as mp

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Seaborn version: %s' % sns.__version__) # apt install python-tk
print ('Matplotlib version: %s' % mp.__version__)

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline

In [None]:
from IPython.display import Image, display
def display_in_full(df):
    pd.options.display.max_columns = len(df.columns)
    pd.options.display.max_rows = len(df.index)
    display(df)

#### Collective Knowledge

If CK is not installed, please install it using:
```
# pip install ck
```

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Define helper functions

In [None]:
# client: 'acl-sgemm-opencl-example' or 'clblast-tune'
def get_mnk(characteristics, client):
    # dim: 'm', 'n', 'k'
    def get_dim_int(characteristics, client, dim):
        if client == 'clblast-tune':
            dim_str = characteristics['run'][dim][0]
            if dim_str[-1] == 'K':
                dim_int = np.int64(dim_str[0:-1])*1024
            else:
                dim_int = np.int64(dim_str)
        else:
            dim_str = characteristics['run'][dim]
            dim_int = np.int64(dim_str)
        return dim_int

    m = get_dim_int(characteristics, client, 'm')
    n = get_dim_int(characteristics, client, 'n')
    k = get_dim_int(characteristics, client, 'k')

    return ('(%d, %d, %d)' % (m, n, k))

In [None]:
def get_GFLOPS(characteristics, client):
    if client == 'acl-sgemm-opencl-example':
        GFLOPS_str = characteristics['run']['GFLOPS_1']
    else:
        GFLOPS_str = characteristics['run']['GFLOPS_1'][0]
    GFLOPS = np.float(GFLOPS_str)
    return GFLOPS

### Plot experimental data

In [None]:
default_colormap = cm.autumn
default_figsize = [20, 12]
default_dpi = 200
default_fontsize = 20
default_legend_fontsize = 'medium'

if mp.__version__[0]=='2': mp.style.use('classic')
mp.rcParams['figure.figsize'] = default_figsize
mp.rcParams['figure.dpi'] = default_dpi
mp.rcParams['font.size'] = default_fontsize
mp.rcParams['legend.fontsize'] = default_legend_fontsize

In [None]:
def plot(df_mean, df_std, rot=90, patch_fontsize=default_fontsize):
    ax = df_mean.plot(yerr=df_std,
        kind='bar', ylim=[0, 20], rot=rot, width=0.9, grid=True, legend=True,
        figsize=default_figsize, colormap=default_colormap, fontsize=default_fontsize)
    ax.set_title('ARM Compute Library vs CLBlast (dv/dt)', fontsize=default_fontsize)
    ax.set_ylabel('SGEMM GFLOPS', fontsize=default_fontsize)
    ax.legend(loc='upper right')
    for patch in ax.patches:
        text = '{0:2.1f}'.format(patch.get_height())
        ax.annotate(text, (patch.get_x()*1.00, patch.get_height()*1.01), fontsize=patch_fontsize)

### Access experimental data

In [None]:
def get_experimental_results(repo_uoa='local', tags='explore-matrix-size-libs-sgemm'):
    module_uoa = 'experiment'
    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
    if r['return']>0:
        print ("Error: %s" % r['error'])
        exit(1)
    experiments = r['lst']
    
    dfs = []
    for experiment in experiments:
        data_uoa = experiment['data_uoa']
        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
        if r['return']>0:
            print ("Error: %s" % r['error'])
            exit(1)

        for point in r['points']:
            with open(os.path.join(r['path'], 'ckp-%s.0001.json' % point)) as point_file:
                point_data_raw = json.load(point_file)
            characteristics_list = point_data_raw['characteristics_list']
            num_repetitions = len(characteristics_list)
            client = data_uoa[len('explore-matrix-size-gemm-libs-'):]
            # Obtain column data.
            data = [
                {
                    'client': client,
                    '(m, n, k)': get_mnk(characteristics, client),
                    'GFLOPS': get_GFLOPS(characteristics, client),
                    'repetition_id': repetition_id
                }
                for (characteristics, repetition_id) in zip(characteristics_list, range(num_repetitions)) 
            ]
            #Construct a DataFrame.
            df = pd.DataFrame(data)
            # Set columns and index names.
            df.columns.name = 'characteristics'
            df.index.name = 'index'
            df = df.set_index(['client', '(m, n, k)', 'repetition_id'])
            # Append to the list of similarly constructed DataFrames.
            dfs.append(df)
    # Concatenate all constructed DataFrames (i.e. stack on top of each other).
    result = pd.concat(dfs).unstack('client').swaplevel(axis=1)
    return result.sort_index(level=result.index.names)

<a id="table"></a>
## Table

In [None]:
df = get_experimental_results(repo_uoa=repo_uoa)
display_in_full(df)

<a id="plot"></a>
## Plot

In [None]:
df_mean = df.groupby(level=df.index.names[:-1]).mean()
df_std = df.groupby(level=df.index.names[:-1]).std()
plot(df_mean, df_std)