# [cknowledge.org/ai](http://cknowledge.org/ai): Crowdsourcing benchmarking and optimization of AI

* [Reproducible Quality-Efficient Systems Tournaments](http://cknowledge.org/request) ([ReQuEST initiative](http://cknowledge.org/request.html#organizers))
* [AI artifacts](http://cknowledge.org/ai-artifacts) (cTuning foundation)
* [Android app](https://play.google.com/store/apps/details?id=openscience.crowdsource.video.experiments) (dividiti)
* [Desktop app](https://github.com/dividiti/ck-crowdsource-dnn-optimization) (dividiti)
* [CK-Caffe](https://github.com/dividiti/ck-caffe) (Berkeley)
* [CK-Caffe2](https://github.com/ctuning/ck-caffe2) (Facebook)
* [CK-CNTK](https://github.com/ctuning/ck-cntk) (Microsoft)
* [CK-KaNN](https://github.com/dividiti/ck-kann) (Kalray)
* [CK-MVNC](https://github.com/ctuning/ck-mvnc) (Movidius / Intel)
* [CK-MXNet](https://github.com/ctuning/ck-mxnet) (Apache)
* [CK-NNTest](https://github.com/ctuning/ck-nntest) (cTuning foundation)
* [CK-TensorFlow](https://github.com/ctuning/ck-tensorflow) (Google)
* [CK-TensorRT](https://github.com/dividiti/ck-tensorrt) (NVIDIA)
* etc.

# [dividiti](http://dividiti.com)'s submission to [ReQuEST @ ASPLOS'18](http://cknowledge.org/request-cfp-asplos2018.html)

## Table of Contents

1. [Overview](#overview)
1. [Platforms](#platforms)
  1. [HiKey 960](#platforms_hikey)
  1. [Firefly RK3399](#platforms_firefly)
1. [Experimental data](#data) [for developers]
1. [Data wrangling code](#code) [for developers]
1. [Experiments on HiKey 960](#experiments_hikey)

<a id="overview"></a>
## Overview

This Jupyter Notebook studies performance (execution time) vs accuracy (top1 / top5) using the [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) on two development platforms:
- [HiKey 960](https://www.96boards.org/product/hikey960/);
- [Firefly RK3399](http://en.t-firefly.com/index.php/product/rk3399.html).

<a id="platforms"></a>
## Platforms

<a id="platforms_hikey"></a>
### HiKey 960

  - Chip:
     - [HiSilicon Kirin 960](http://www.hisilicon.com/en/Solutions/Kirin)
  - CPU ("performance" / "big"):
    - ARM&reg; Cortex&reg;-A73;
    - Max clock 2362 MHz;
    - 4 cores;
  - CPU ("efficiency" / "LITTLE"):
    - ARM&reg; Cortex&reg;-A53;
    - Max clock 1844 MHz;
    - 4 cores;
  - GPU:
    - ARM&reg; Mali&trade; G71 architecture;
    - Max clock 1037 MHz;
    - 8 cores;
    - OpenCL driver (`hikey962`: `instr=1,clexperimental=1,softjobpatch`):
```
$ ck run program:tool-print-opencl-devices | grep "version:"
OpenCL 2.0 v1.r6p0-01rel0.24c5f5e966f2b7f1f19b91d6f32ff53e
```

  - RAM:
    - LPDDR4 SDRAM;
    - 3 GB;

  - BSP:
    - Debian Stretch (9) Linux
```
$ uname -a
Linux hikey962 4.4.74-00216-g10816f6 #3 SMP PREEMPT Thu Jul 6 14:38:42 BST 2017 aarch64 GNU/Linux
```

In [None]:
hikey_model = 'HiKey960\x00'
hikey_name  = 'HiKey 960'
hikey_id    = 'hikey-960'

<a id="platforms_firefly"></a>
### Firefly RK3399

  - Chip:
    - [Rockchip RK3399](http://rockchip.wikidot.com/rk3399)
  - CPU ("big"):
    - ARM&reg; Cortex&reg;-A72 architecture
    - Max clock 1800 MHz;
    - 2 cores;
  - CPU ("LITTLE"):
    - ARM&reg; Cortex&reg;-A53 architecture;
    - Max clock 1416 MHz;
    - 4 cores;
  - GPU:
    - ARM&reg; Mali&trade;-T860 architecture;
    - Max clock 800 MHz;
    - 4 cores;
    - OpenCL driver:
```
$ ck run program:tool-print-opencl-devices | grep "version:"
v1.r13p0-00rel0-git(a4271c9).31ba04af2d3c01618138bef3aed66c2c
```

  - RAM:
    - Samsung dual-channel DDR3;
    - 4 GB (8 GB swap);
  - BSP:
    - [Firefly-rk3399_xubuntu1604_201711301130.7z](https://drive.google.com/drive/u/0/folders/1lbaR7XVyHT4SnXkJ2ybj5YXAzAjDBWfT)
```
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
$ uname -a
Linux firefly 4.4.77 #554 SMP Thu Nov 30 11:30:11 HKT 2017 aarch64 aarch64 aarch64 GNU/Linux
```

In [None]:
firefly_model = 'Rockchip RK3399 Firefly Board (Linux Opensource)\x00'
firefly_name  = 'Firefly RK3399'
firefly_id    = 'firefly'

### Platform mappings

In [None]:
model_to_id = {
    firefly_model : firefly_id,
    hikey_model   : hikey_id
}
id_to_name = {
    firefly_id : firefly_name,
    hikey_id   : hikey_name
}

<a id="data"></a>
## Get the experimental data

The experimental data can be downloaded and registered with CK as described below.

### Accuracy experiments on 50,000 images

```
$ wget https://www.dropbox.com/s/<...>/ck-request-asplos18-mobilenets-armcl-opencl-accuracy-50000.zip
$ ck add repo --zip=ck-request-asplos18-mobilenets-armcl-opencl-accuracy-50000.zip
```

In [None]:
accuracy50000_repo_uoa = 'ck-request-asplos18-mobilenets-armcl-opencl-accuracy-50000'
!ck list $accuracy50000_repo_uoa:experiment:* | sort

### Accuracy experiments on 500 images

```
$ wget https://www.dropbox.com/s/<...>/ck-request-asplos18-mobilenets-armcl-opencl-accuracy-500.zip
$ ck add repo --zip=ck-request-asplos18-mobilenets-armcl-opencl-accuracy-500.zip
```

In [None]:
accuracy500_repo_uoa = 'ck-request-asplos18-mobilenets-armcl-opencl-accuracy-500'
!ck list $accuracy500_repo_uoa:experiment:* | sort

### Performance (latency) experiments

```
$ wget https://www.dropbox.com/s/<...>/ck-request-asplos18-mobilenets-armcl-opencl-performance.zip
$ ck add repo --zip=ck-request-asplos18-mobilenets-armcl-opencl-performance.zip
```

In [None]:
performance_repo_uoa = 'ck-request-asplos18-mobilenets-armcl-opencl-performance'
!ck list $performance_repo_uoa:experiment:* | sort

<a id="code"></a>
## Data wrangling code

**NB:** Please ignore this section if you are not interested in re-running or modifying this notebook.

### Includes

#### Standard

In [None]:
import os
import sys
import json
import re

#### Scientific

If some of the scientific packages are missing, please install them using:
```
# pip install jupyter pandas numpy matplotlib
```

In [None]:
import IPython as ip
import pandas as pd
import numpy as np
import matplotlib as mp
import seaborn as sb

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Matplotlib version: %s' % mp.__version__)
print ('Seaborn version: %s' % sb.__version__)

In [None]:
from IPython.display import Image, display
def display_in_full(df):
    pd.options.display.max_columns = len(df.columns)
    pd.options.display.max_rows = len(df.index)
    display(df)

In [None]:
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline

In [None]:
default_colormap = cm.autumn
default_fontsize = 16
default_barwidth = 0.8
default_figwidth = 24
default_figheight = 3
default_figdpi = 200
default_figsize = [default_figwidth, default_figheight]

In [None]:
if mp.__version__[0]=='2': mp.style.use('classic')
mp.rcParams['figure.max_open_warning'] = 200
mp.rcParams['figure.dpi'] = default_figdpi
mp.rcParams['font.size'] = default_fontsize
mp.rcParams['legend.fontsize'] = 'medium'

#### Collective Knowledge

If CK is not installed, please install it using:
```
# pip install ck
```

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Access experimental data

In [None]:
def get_experimental_results(repo_uoa, tags='explore-mobilenets-performance', accuracy=False,
                             module_uoa='experiment', _library=None, _platform=None):
    r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
    if r['return']>0:
        print('Error: %s' % r['error'])
        exit(1)
    experiments = r['lst']

    dfs = []
    for experiment in experiments:
        data_uoa = experiment['data_uoa']
        r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
        if r['return']>0:
            print('Error: %s' % r['error'])
            exit(1)
        # Library.
        library_tags = [ 
            tag for tag in r['dict']['tags']
            if tag in [ '17.12-48bc34ea', '18.01-f45d5a9b', '18.03-e40997bb', 'request-d8f69c13' ]
        ]
        if len(library_tags)==1:
            tag_to_library = {
                # Library tags on HiKey.
                '17.12-48bc34ea'        : '17.12',
                '18.01-f45d5a9b'        : '18.01',
                '18.03-e40997bb'        : '18.03',
                'request-d8f69c13'      : 'dv/dt', # 18.03+
                # TODO: Library tags on Firefly.
                '17.12-48bc34e'         : '17.12',
            }
            library = tag_to_library[library_tags[0]]
        else:
            print('[Warning] Bad library tags. Skipping experiment with tags:')
            print(r['dict']['tags'])
            continue
        if _library and _library!=library: continue
        # For each point.    
        for point in r['points']:
            point_file_path = os.path.join(r['path'], 'ckp-%s.0001.json' % point)
            with open(point_file_path) as point_file:
                point_data_raw = json.load(point_file)
            characteristics_list = point_data_raw['characteristics_list']
            num_repetitions = len(characteristics_list)
            platform = model_to_id[point_data_raw['features']['platform']['platform']['model']]
            if _platform and _platform!=platform: continue
            batch_size = np.int64(point_data_raw['choices']['env'].get('CK_BATCH_SIZE',-1))
            batch_count = np.int64(point_data_raw['choices']['env'].get('CK_BATCH_COUNT',-1))
            convolution_method = np.int64(point_data_raw['choices']['env'].get('CK_CONVOLUTION_METHOD_HINT',1))
            multiplier = np.float64(point_data_raw['choices']['env'].get('CK_ENV_MOBILENET_WIDTH_MULTIPLIER',-1))
            resolution = np.int64(point_data_raw['choices']['env'].get('CK_ENV_MOBILENET_RESOLUTION',-1))
            model = 'v1-%.2f-%d' % (multiplier, resolution)
            if accuracy:
                data = [
                    {
                        # features
                        'platform': platform,
                        'library': library,
                        # choices
                        'model': model,
                        'batch_size': batch_size,
                        'batch_count': batch_count,
                        'convolution_method': convolution_method,
                        'resolution': resolution,
                        'multiplier': multiplier,
                        # statistical repetition
                        'repetition_id': repetition_id,
                        # runtime characteristics
                        'success?': characteristics['run'].get('run_success', 'n/a'),
                        'accuracy_top1': characteristics['run'].get('accuracy_top1', 0),
                        'accuracy_top5': characteristics['run'].get('accuracy_top5', 0),
                        # recompute accuracy from frame_predictions (was incorrectly recorded in early experiments)
                        'accuracy_top1_': len([
                            prediction for prediction in characteristics['run'].get('frame_predictions', [])
                            if prediction['accuracy_top1']=='yes'
                        ]) / np.float64(batch_count),
                        'accuracy_top5_': len([
                            prediction for prediction in characteristics['run'].get('frame_predictions', [])
                            if prediction['accuracy_top5']=='yes'
                        ]) / np.float64(batch_count)
                    }
                    for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list)
                ]
            else: # performance
                data = [
                    {
                        # features
                        'platform': platform,
                        'library': library,
                        # choices
                        'model': model,
                        'batch_size': batch_size,
                        'batch_count': batch_count,
                        'convolution_method': convolution_method,
                        'resolution': resolution,
                        'multiplier': multiplier,
                        # statistical repetition
                        'repetition_id': repetition_id,
                        # runtime characteristics
                        'success?': characteristics['run'].get('run_success', 'n/a'),
                        'time_avg_ms': characteristics['run']['prediction_time_avg_s']*1e+3,
                        'time_total_ms': characteristics['run']['prediction_time_total_s']*1e+3,
                    }
                    for (repetition_id, characteristics) in zip(range(num_repetitions), characteristics_list)
                ]
            index = [
                'platform', 'library', 'model', 'multiplier', 'resolution', 'convolution_method', 'batch_size', 'repetition_id'
            ]
            # Construct a DataFrame.
            df = pd.DataFrame(data)
            df = df.set_index(index)
            # Append to the list of similarly constructed DataFrames.
            dfs.append(df)
    if dfs:
        # Concatenate all thus constructed DataFrames (i.e. stack on top of each other).
        result = pd.concat(dfs)
        result.sort_index(ascending=True, inplace=True)
    else:
        # Construct a dummy DataFrame the success status of which can be safely checked.
        result = pd.DataFrame(columns=['success?'])
    return result

### Merge performance and accuracy data

In [None]:
# Return a new DataFrame with only the performance and accuracy metrics.
def merge_performance_accuracy(df_performance, df_accuracy, reference_lib=None,
                               performance_metric = 'time_avg_ms', accuracy_metric = 'accuracy_top1'):
    df = df_performance[[performance_metric]]
    accuracy_list = []
    for index, row in df.iterrows():
        (platform, lib, model, multiplier, resolution, convolution_method, batch_size) = index
        if reference_lib: lib = reference_lib
        accuracy = df_accuracy.loc[(platform, lib, model, multiplier, resolution, 1, batch_size)][accuracy_metric]
        accuracy_list.append(accuracy)
    df = df.assign(accuracy_top1=accuracy_list) # FIXME: assign to the value of accuracy_metric
    return df

### Plot experimental data

In [None]:
# 
def plot(df, performance_metric='time_avg_ms', accuracy_metric='accuracy_top1',
         save_fig_name='mobilenets-armcl-opencl.png', fix=True):
    fig = plt.figure(figsize=(8,4), dpi=200)
    lib_to_color = { '17.12': 'cyan', '18.01' : 'magenta', '18.03' : 'orange', 'dv/dt' : 'green' }
    multiplier_to_marker_0 = { 1.00 : '*', 0.75 : 'D', 0.50: 'v', 0.25 : '8' } # gemm
    multiplier_to_marker_1 = { 1.00 : 'p', 0.75 : 's', 0.50: '^', 0.25 : 'o' } # direct

    ax = fig.gca()
    for index, row in df.iterrows():
        (lib, model, multiplier, resolution, convolution_method, batch_size) = index
        performance = row[performance_metric]
        accuracy = row[accuracy_metric]
        
        # Mark Pareto-optimal points.
        is_on_pareto = True
        for index1, row1 in df.iterrows():
            is_faster = row1[performance_metric] < row[performance_metric]
            is_no_less_accurate = row1[accuracy_metric] >= row[accuracy_metric]
            if is_faster and is_no_less_accurate:
                is_on_pareto = False
                break

        # GEMM-based convolution should be exactly the same in '18.03' and 'dv/dt', so plot
        # the minimum execution time of '18.03' and 'dv/dt' as '18.03'.
        if fix and convolution_method==0 and (lib=='dv/dt' or lib=='18.03'):
            performance_dv_dt = df.loc[('dv/dt', model, multiplier, resolution, convolution_method, batch_size)][performance_metric]
            performance_18_03 = df.loc[('18.03', model, multiplier, resolution, convolution_method, batch_size)][performance_metric]
            if lib=='18.03':
                if (performance_dv_dt < performance_18_03):
                    continue
            if lib=='dv/dt':
                if (performance_dv_dt < performance_18_03):
                    lib = '18.03' # change color
                else:
                    continue

        if convolution_method==0:
            marker = multiplier_to_marker_0[multiplier]
        else:
            marker = multiplier_to_marker_1[multiplier]
        color = lib_to_color[lib]
        size = resolution / 16

        ax.plot(performance, accuracy, marker, markerfacecolor=color, markersize=size)

        # Mark Pareto-optimal points with scaled black pluses.
        if is_on_pareto:
            ax.plot(performance, accuracy, 'k+', markersize=size)

    # Title.
    ax.set_title('%s: Mali-G71 @ 807 MHz, FP32' % (id_to_name[hikey_id]))
    # X axis.
    xlabel='Image recognition time (ms)' if performance_metric=='time_avg_ms' else ''
    ax.set_xlabel(xlabel)
    ax.set_xlim(5.0, 65.1)
    ax.set_xticks(np.arange(5.0, 65.1, 5.0))
    for xtick in ax.xaxis.get_major_ticks(): xtick.label.set_fontsize(12)
    # Y axis.
    ylabel='ImageNet validation accuracy (top %s)' % accuracy_metric[-1]
    ax.set_ylabel(ylabel)
    ax.set_ylim(0.4, 0.751)
    ax.set_yticks(np.arange(0.4, 0.751, 0.05))
    for ytick in ax.yaxis.get_major_ticks(): ytick.label.set_fontsize(12)
    # Legend.
    handles = [ mp.patches.Patch(color=color, label=lib) for (lib, color) in lib_to_color.items() ]
    plt.legend(title='Arm CL', handles=handles[::-1], loc='lower right')

    plt.grid()
    plt.savefig(save_fig_name, dpi=default_figdpi, bbox_inches='tight')
    plt.show()

<a id="experiments_hikey"></a>
## Experiments on HiKey 960

### Performance (latency)

In [None]:
df_performance = get_experimental_results(repo_uoa=performance_repo_uoa, tags='explore-mobilenets-performance',
                                          accuracy=False)
# Take the minimum execution time out of several repetitions.
df_performance = df_performance.groupby(level=df_performance.index.names[:-1]).min()
# Display all rows and columns.
# display_in_full(df_performance)

### Accuracy on 500 images

In [None]:
df_accuracy500 = get_experimental_results(repo_uoa=accuracy500_repo_uoa, tags='explore-mobilenets-accuracy',
                                          accuracy=True)
# Reduce the repetition_id index dimension.
df_accuracy500 = df_accuracy500 \
    .groupby(level=df_accuracy500.index.names[:-1]).min()
# Display all rows and columns.
display_in_full(df_accuracy500)

### Accuracy on 50,000 images

In [None]:
df_accuracy50000 = get_experimental_results(repo_uoa=accuracy50000_repo_uoa, tags='explore-mobilenets-accuracy',
                                            accuracy=True)
# Reduce the repetition_id index dimension.
df_accuracy50000 = df_accuracy50000 \
    .groupby(level=df_accuracy50000.index.names[:-1]).min()
# Display all rows and columns.
display_in_full(df_accuracy50000)

### Plot performance vs. Top 1 accuracy on 50,000 images (using the 'dv/dt' fork as reference lib)

In [None]:
accuracy_metric = 'top1_accuracy'

In [None]:
df_performance_accuracy50000 = merge_performance_accuracy(df_performance, df_accuracy50000, reference_lib='dv/dt')
display_in_full(df_performance_accuracy50000)

#### Only "18.03"

In [None]:
# plot(df_performance_accuracy50000.loc[hikey_id].loc[['18.03']], accuracy_metric=accuracy_metric,
#      save_fig_name='18_03.%s.50000.png' % accuracy_metric, fix=False)

#### "dv/dt" vs. "18.03"

In [None]:
# plot(df_performance_accuracy50000.loc[hikey_id].loc[['18.03','dv/dt']],
#      save_fig_name='dv_dt+18_03.%s.50000.png' % accuracy_metric)

#### "dv/dt" vs. "18.03" vs. "18.01" vs. "17.12"

In [None]:
plot(df_performance_accuracy50000.loc[hikey_id],
     save_fig_name='dv_dt+18_03+18_01+17_12.%s.50000.png' % (accuracy_metric+'_'))

### Plot performance vs. Top 1 accuracy on 500 images (using the 'dv/dt' fork as reference lib)

In [None]:
df_performance_accuracy500 = merge_performance_accuracy(df_performance, df_accuracy500, reference_lib='dv/dt')
display_in_full(df_performance_accuracy500)

#### "dv/dt" vs. "18.03" vs. "18.01" vs. "17.12"

In [None]:
plot(df_performance_accuracy500.loc[hikey_id],
     save_fig_name='dv_dt+18_03+18_01+17_12.%s.500.png' % (accuracy_metric+'_'))

### Plot performance vs. Top 1 accuracy on 500 images

In [None]:
df_performance_accuracy500 = merge_performance_accuracy(df_performance, df_accuracy500)
display_in_full(df_performance_accuracy500)

#### "dv/dt" vs. "18.03" vs. "18.01" vs. "17.12"

In [None]:
plot(df_performance_accuracy500.loc[hikey_id],
     save_fig_name='dv_dt+18_03+18_01+17_12.%s.500.png' % accuracy_metric)