# [cknowledge.org/ai](https://cknowledge.org/ai): Crowdsourcing benchmarking and optimisation of AI

A suite of open-source tools for [collecting knowledge on optimising AI](http://bit.ly/hipeac49-ckdl):
* [Android app](https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments&hl=en_GB)
* [Desktop app](https://github.com/dividiti/ck-crowdsource-dnn-optimization)
* [CK-Caffe](https://github.com/dividiti/ck-caffe)
* [CK-Caffe2](https://github.com/ctuning/ck-caffe2)
* [CK-TensorFlow](https://github.com/ctuning/ck-tensorflow)
* [CK-TensorRT](https://github.com/dividiti/ck-tensorrt)
* etc.

# [PUBLIC] Benchmarking Caffe with OpenBLAS on Samsung Chromebook 2

## Table of Contents

1. [Overview](#overview)
1. [See the code](#code) [for developers]
1. [Get the data](#data) [for developers]
1. [See the tables](#tables)
  1. [All data](#df_all)
  1. [All execution time data](#df_time)
  1. [Mean execution time per batch](#df_mean_time_per_batch)
  1. [Mean execution time per image](#df_mean_time_per_image)
  1. [Mean number of images per second](#df_mean_images_per_second)
1. [See the graphs](#graphs)
  1. [Mean number of images per second](#graphs)

<a id="overview"></a>
## Overview

We study the **execution time** of inference (forward propagation):
- on the [Samsung Chromebook 2](http://www.samsung.com/us/computer/chrome-os-devices/XE503C12-K01US-specs) **platform**:
  - [CPU] quad-core ARM Cortex-A15 @ 1900 MHz;
  - [CPU] quad-core ARM Cortex-A7 @ 1300 MHz (not used);
  - [GPU] quad-core ARM Mali-T628 @ 600 MHz (not used);
  - [GPU] dual-core ARM Mali-T628 @ 600 MHz (not used);
  - [GPU] OpenCL driver 6.0 (`r6p0-02rel0.b77b627bc37583eeaa34bbee29868088`);
  - [GPU] OpenCL standard 1.1;
  - [RAM] 2 GB;
  - Gentoo Linux [over](community.arm.com/groups/arm-mali-graphics/blog/2014/12/18/installing-opencl-on-chromebook-2-in-30-minutes) ChromeOS:
```
$ cat /etc/lsb-release
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={24E2E4F7-F92C-6115-3E26-02C7EAA02946}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={24E2E4F7-F92C-6115-3E26-02C7EAA02946}
CHROMEOS_RELEASE_BOARD=peach_pit-signed-mp-v3keys
CHROMEOS_RELEASE_BRANCH_NUMBER=69
CHROMEOS_RELEASE_BUILDER_PATH=peach_pit-release/R58-9334.69.0
CHROMEOS_RELEASE_BUILD_NUMBER=9334
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=58
CHROMEOS_RELEASE_DESCRIPTION=9334.69.0 (Official Build) stable-channel peach_pit 
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=stable-channel
CHROMEOS_RELEASE_VERSION=9334.69.0
DEVICETYPE=CHROMEBOOK
GOOGLE_RELEASE=9334.69.0
$ uname -a
Linux localhost 3.8.11 #1 SMP Wed May 10 18:37:16 PDT 2017 armv7l ARMv7 Processor rev 3 (v7l) SAMSUNG EXYNOS5 (Flattened Device Tree) GNU/Linux
```


- using 3 CNN **models** (net architecture + weights):
  - [AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet);
  - [GoogleNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet);
  - [SqueezeNet 1.1](https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1);

- using 1 **library** version:
  - [CPU] [OpenBLAS](https://github.com/xianyi/OpenBLAS) 0.2.19;

- with the **number of threads** varying from 1 to 4;

- with the **batch size** varying from 1 to 4.

<a id="code"></a>
## Data wrangling code

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook.

### Includes

#### Standard

In [None]:
import os
import sys
import json
import re

#### Scientific

If some of the scientific packages are missing, please install them using:
```
# pip install jupyter pandas numpy matplotlib
```

In [None]:
import IPython as ip
import pandas as pd
import numpy as np
import matplotlib as mp

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Matplotlib version: %s' % mp.__version__)

In [None]:
from IPython.display import Image
from IPython.core.display import HTML

from IPython.display import display
def display_in_full(df):
    pd.options.display.max_columns = len(df.columns)
    pd.options.display.max_rows = len(df.index)
    display(df)

In [None]:
import matplotlib.pyplot as plt; plt.style.use('classic')
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
%matplotlib inline

In [None]:
default_title = 'Caffe with OpenBLAS on Samsung Chromebook 2'
default_ylabel = 'Execution time (ms)'
default_colormap = cm.autumn
default_fontsize = 16
default_figsize = [16, 16]
default_dpi = 200

In [None]:
if mp.__version__[0]=='2': mp.style.use('classic')
mp.rcParams['figure.figsize'] = default_figsize
mp.rcParams['figure.dpi'] = default_dpi
mp.rcParams['font.size'] = default_fontsize
mp.rcParams['legend.fontsize'] = 'medium'

#### Collective Knowledge

If CK is not installed, please install it using:
```
# pip install ck
```

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Access the experimental data

In [None]:
def get_experimental_results(repo_uoa, tags, time_ms = 'time_fw_ms'):
    module_uoa = 'experiment'
    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
    if r['return']>0:
        print ("Error: %s" % r['error'])
        exit(1)
    experiments = r['lst']
    
    dfs = []
    for experiment in experiments:
        data_uoa = experiment['data_uoa']
        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
        if r['return']>0:
            print ("Error: %s" % r['error'])
            exit(1)

        # Get (lib_tag, model_tag) from a list of tags that should be available in r['dict']['tags'].
        # Tags include 2 of the 3 irrelevant tags, a model tag and a lib tag.
        # NB: Since it's easier to list all model tags than all lib tags, the latter list is not expicitly specified.
        tags = r['dict']['tags']
        irrelevant_tags = [ 'explore-batch-size-openblas-threads', 'caffe-time', 'samsung-chromebook2' ]
        model_tags = [ 'bvlc-alexnet', 'bvlc-googlenet', 'deepscale-squeezenet-1.1' ]
        lib_model_tags = [ tag for tag in tags if tag not in irrelevant_tags ]
        model_tags = [ tag for tag in lib_model_tags if tag in model_tags ]
        lib_tags = [ tag for tag in lib_model_tags if tag not in model_tags ]
        if len(lib_tags)==1 and len(model_tags)==1:
            (lib, model) = (lib_tags[0], model_tags[0])
        else:
            continue
        
        for point in r['points']:
            with open(os.path.join(r['path'], 'ckp-%s.0001.json' % point)) as point_file:
                point_data_raw = json.load(point_file)
            characteristics_list = point_data_raw['characteristics_list']
            num_repetitions = len(characteristics_list)            
            # Obtain column data.
            data = [
                {
                    # features
                    'platform' : point_data_raw['features']['platform']['platform']['model'],
                    # choices
                    'lib' : lib,
                    'model' : model,
                    'batch_size' : np.int64(point_data_raw['choices']['env'].get('CK_CAFFE_BATCH_SIZE',-1)),
                    'num_threads' : np.int64(point_data_raw['choices']['env'].get('OPENBLAS_NUM_THREADS',-1)),
                    # statistical repetition
                    'repetition_id': repetition_id,
                    # runtime characteristics
                    'time (ms)'   : characteristics['run'].get(time_ms,+1e9), # "positive infinity"
                    'per layer info' : characteristics['run'].get('per_layer_info',[]),
                    'success?'    : characteristics['run'].get('run_success','n/a')
                }
                for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list) 
            ]
            # Deal with missing column data (resulting from failed runs).
            if len(data)==1:
                repetitions = point_data_raw['features'].get('statistical_repetitions',1)
                characteristics = characteristics * repetitions
            # Construct a DataFrame.
            df = pd.DataFrame(data)
            # Set columns and index names.
            df.columns.name = 'run characteristic'
            df.index.name = 'index'
            df = df.set_index([ 'platform', 'lib', 'model', 'num_threads', 'batch_size', 'repetition_id' ])
            # Append to the list of similarly constructed DataFrames.
            dfs.append(df)
    # Concatenate all constructed DataFrames (i.e. stack on top of each other).
    result = pd.concat(dfs)
    return result.sortlevel(result.index.names)

### Plot images per second against the batch size and the number of threads

In [None]:
def plot_trisurf(df_model, x_col, y_col, z_col, x_label=None, y_label=None, z_label=None, title=None):
    x = df_model[x_col]
    y = df_model[y_col]
    z = df_model[z_col]
    
    if x_label == None: x_label = x_col
    if y_label == None: y_label = y_col
    if z_label == None: z_label = z_col
        
    x_ticks = x.unique()
    y_ticks = y.unique()
    
    fig = plt.figure(figsize=(24, 12), dpi=600)
    ax = fig.add_subplot(111, projection='3d')
    trisurf = ax.plot_trisurf(x, y, z, cmap=cm.autumn_r, linewidth=0.2, antialiased=True, shade=True)
    ax.set_xlabel(x_label); ax.set_xticks(x_ticks); ax.set_xlim3d(x_ticks.max(), x_ticks.min())
    ax.set_ylabel(y_label); ax.set_yticks(y_ticks); ax.set_ylim3d(y_ticks.min(), y_ticks.max())
    ax.set_zlabel(z_label); ax.set_zlim3d(z.min(), z.max())
    ax.set_title(title, fontsize=20)
    fig.colorbar(trisurf, shrink=0.5, aspect=10)
    return fig

<a id="data"></a>
## Get the experimental data

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook. 

The Caffe experimental data was collected on the experimental platform (after installing all Caffe libraries and models of interest) as follows:
```
$ cd `ck find ck-caffe:script:explore-batch-size-openblas-threads`
$ python explore-batch-size-openblas-threads-benchmarking.py
```
The data can be downloaded from GitHub via CK as follows:
```
$ ck pull repo:ck-caffe-samsung-chromebook2 --url=https://github.com/dividiti/ck-caffe-samsung-chromebook2
```

<a id="tables"></a>
## Tables

<a id="df_all"></a>
### All data

In [None]:
df_all = get_experimental_results(
    repo_uoa='ck-caffe-samsung-chromebook2',
    tags='explore-batch-size-openblas-threads') \
    .reset_index(['platform', 'lib'], drop=True)
display_in_full(df_all)

<a id="df_time"></a>
### All execution time data

In [None]:
df_time = df_all['time (ms)'].unstack(df_all.index.names[:-1])
display_in_full(df_time)

<a id="df_mean_time_per_batch"></a>
### Mean execution time per batch

In [None]:
df_mean_time_per_batch = df_time.describe().ix['mean'].unstack(level='batch_size')
display_in_full(df_mean_time_per_batch)

<a id="df_mean_time_per_image"></a>
### Mean execution time per image

In [None]:
batch_sizes = df_mean_time_per_batch.columns.tolist()
df_mean_time_per_image = df_mean_time_per_batch / batch_sizes
display_in_full(df_mean_time_per_image)

<a id="df_mean_images_per_second"></a>
### Mean images per second

In [None]:
df_mean_seconds_per_image = 1e-3 * df_mean_time_per_image
df_mean_images_per_second = 1 / df_mean_seconds_per_image
display_in_full(df_mean_images_per_second)

<a id="graphs"></a>
## Graphs

In [None]:
models = df_mean_images_per_second.index.get_level_values('model').unique()
for model in models:
    df_model = df_mean_images_per_second \
        .ix[model] \
        .unstack() \
        .reset_index() \
        .rename(columns={0 : 'time (ms)'}) \
        .dropna() \
        .sort_values(by='batch_size', ascending=False)
    fig = plot_trisurf(df_model, title=model,
                 x_col='num_threads', x_label='Number of threads',
                 y_col='batch_size', y_label='Batch size',
                 z_col='time (ms)', z_label='Images per second')