# Exploring batch size effects on Chromebook 2

We explore effects of varying the batch size on **performance** of inference:

- of 4 different CNN **models** (net architecture + weights):
  - [AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet);
  - [SqueezeNet 1.0](https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.0);
  - [SqueezeNet 1.1](https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1);
  - [GoogleNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet);

- with 4 different **libraries**:

  - [CPU] [OpenBLAS](https://github.com/xianyi/OpenBLAS) 0.2.18;
  - [GPU] [clBLAS](https://github.com/clMathLibraries/clBLAS) 2.4;
  - [GPU] [CLBlast](https://github.com/CNugteren/CLBlast) dev (35623cd > 0.8.0);
  - [GPU] [CLBlast](https://github.com/CNugteren/CLBlast) dev (35623cd > 0.8.0) with Mali-optimized [overlay](https://github.com/intelfx/CLBlast/tree/mali-overlay) (641bb07);
  
- on the [Samsung Chromebook 2](http://www.samsung.com/us/computing/chromebooks/under-12/samsung-chromebook-2-11-6-xe503c12-k01us/) **platform**:
  - [CPU] quad-core ARM Cortex-A15 (@ 1900 MHz);
  - [GPU] quad-core ARM Mali-T628 (@ 600 MHz);
  - [GPU] OpenCL driver 6.0 (r6p0);
  - [GPU] OpenCL standard 1.1;
  - [RAM] 2 GB;
  - Gentoo Linux [over](community.arm.com/groups/arm-mali-graphics/blog/2014/12/18/installing-opencl-on-chromebook-2-in-30-minutes) ChromeOS with `/etc/lsb-release`:
```
CHROMEOS_AUSERVER=https://tools.google.com/service/update2
CHROMEOS_BOARD_APPID={24E2E4F7-F92C-6115-3E26-02C7EAA02946}
CHROMEOS_CANARY_APPID={90F229CE-83E2-4FAF-8479-E368A34938B1}
CHROMEOS_DEVSERVER=
CHROMEOS_RELEASE_APPID={24E2E4F7-F92C-6115-3E26-02C7EAA02946}
CHROMEOS_RELEASE_BOARD=peach_pit-signed-mp-v2keys
CHROMEOS_RELEASE_BRANCH_NUMBER=68
CHROMEOS_RELEASE_BUILD_NUMBER=8350
CHROMEOS_RELEASE_BUILD_TYPE=Official Build
CHROMEOS_RELEASE_CHROME_MILESTONE=52
CHROMEOS_RELEASE_DESCRIPTION=8350.68.0 (Official Build) stable-channel peach_pit 
CHROMEOS_RELEASE_NAME=Chrome OS
CHROMEOS_RELEASE_PATCH_NUMBER=0
CHROMEOS_RELEASE_TRACK=stable-channel
CHROMEOS_RELEASE_VERSION=8350.68.0
DEVICETYPE=CHROMEBOOK
GOOGLE_RELEASE=8350.68.0
```

## Includes

### Collective Knowledge

In [None]:
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)

### Scientific

In [None]:
import math
import IPython as ip
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mp
import matplotlib.pyplot as plt

In [None]:
print ('IPython version: %s' % ip.__version__)
print ('NumPy version: %s' % np.__version__)
print ('SciPy version: %s' % sp.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('Matplotlib version: %s' % mp.__version__)

In [None]:
from IPython.display import display
from matplotlib import cm
%matplotlib inline

## Access experimental results

In [None]:
def search_experimental_points_by_tags(tags):
    r=ck.access({'action':'get', 'module_uoa':'experiment', 'tags':tags, 'load_json_files':['0001']})
    if r['return']>0:
        print ("Error: %s" % r['error'])
        exit(1)
    # FIXME: For now, assume a single entry per the given tags.
    results = {}
    for point in r['points']:
            point_data_raw = point['0001']
            point_data_dict = {}
            time_fw_ms = [
                characteristics['run']['time_fw_ms'] 
                for characteristics in point_data_raw['characteristics_list']
                if characteristics['run']['run_success'] == 'yes'
            ]
            batch_size = point_data_raw['choices']['env']['CK_CAFFE_BATCH_SIZE']
            results[batch_size] = time_fw_ms
    return results

## Analyze experimental results

In [None]:
def analyze(results):
    df = pd.DataFrame(data=results)
    df.columns.name = 'Batch size'
    df.index.name='Repetition'

    # Show raw results.
    print ('Experimental results: raw')
    display(df)

    df_stats = df.describe()
    df_stats.loc['mean per image'] = df_stats.ix['mean'] / df.columns
    df_stats.loc['std per image']  =  df_stats.ix['std'] / df.columns # FIXME: div by sqrt(n)?
    
    # Show stats.
    print ('Experimental results: stats')
    display(df_stats)

    # Show two plots side-by-side: mean time per batch and mean time per image.
    fig, axs = plt.subplots(1,2)
    df_stats.ix['mean'] \
        .plot(ax=axs[0],
            yerr=df_stats.ix['std'],
            title='Mean time per batch (ms)',
            kind='bar', grid=True, rot=0, figsize=[10, 4], colormap=cm.autumn_r
        )
    df_stats.ix['mean per image'] \
        .plot(ax=axs[1],
            yerr=df_stats.ix['std per image'],
            title='Mean time per image (ms)',
            kind='bar', grid=True, rot=0, figsize=[10, 4], colormap=cm.autumn
        )
    
    # Show batch size giving minimum time per image, mean and std.
    min_time_per_image_idx = df_stats.ix['mean per image'].idxmin()
    if not math.isnan(min_time_per_image_idx):
        print (
            'Minimum time per image: batch size = %d, mean = %.2f, std = %.2f' % (
                min_time_per_image_idx, 
                df_stats.ix['mean per image'][min_time_per_image_idx],
                df_stats.ix['std per image'][min_time_per_image_idx]
            )
        )
    else:
        print ('Minimum time per image: N/A')

### AlexNet

#### AlexNet / clBLAS

In [None]:
alexnet_clblas_tags = 'chromebook2,time,caffemodel,alexnet,clblas'
alexnet_clblas_results = search_experimental_points_by_tags(alexnet_clblas_tags)
analyze(alexnet_clblas_results)

#### AlexNet / CLBlast dev

In [None]:
alexnet_clblast_development_tags = 'chromebook2,time,caffemodel,alexnet,clblast,vdevelopment'
alexnet_clblast_development_results = search_experimental_points_by_tags(alexnet_clblast_development_tags)
analyze(alexnet_clblast_development_results)

#### AlexNet / CLBlast with Mali-optimized overlay

In [None]:
alexnet_clblast_mali_overlay_tags = 'chromebook2,time,caffemodel,alexnet,clblast,vmali-overlay'
alexnet_clblast_mali_overlay_results = search_experimental_points_by_tags(alexnet_clblast_mali_overlay_tags)
analyze(alexnet_clblast_mali_overlay_results)

#### AlexNet / OpenBLAS

In [None]:
alexnet_openblas_tags = 'chromebook2,time,caffemodel,alexnet,openblas'
alexnet_openblas_results = search_experimental_points_by_tags(alexnet_openblas_tags)
analyze(alexnet_openblas_results)

### SqueezeNet 1.0

#### SqueezeNet 1.0 / clBLAS

In [None]:
squeezenet_1_0_clblas_tags = 'chromebook2,time,caffemodel,squeezenet-1.0,clblas'
squeezenet_1_0_clblas_results = search_experimental_points_by_tags(squeezenet_1_0_clblas_tags)
analyze(squeezenet_1_0_clblas_results)

#### SqueezeNet 1.0 / CLBlast dev

In [None]:
squeezenet_1_0_clblast_development_tags = 'chromebook2,time,caffemodel,squeezenet-1.0,clblast,vdevelopment'
squeezenet_1_0_clblast_development_results = search_experimental_points_by_tags(squeezenet_1_0_clblast_development_tags)
analyze(squeezenet_1_0_clblast_development_results)

#### SqueezeNet 1.0 / CLBlast dev with Mali-optimized overlay

In [None]:
squeezenet_1_0_clblast_mali_overlay_tags = 'chromebook2,time,caffemodel,squeezenet-1.0,clblast,vmali-overlay'
squeezenet_1_0_clblast_mali_overlay_results = search_experimental_points_by_tags(squeezenet_1_0_clblast_mali_overlay_tags)
analyze(squeezenet_1_0_clblast_mali_overlay_results)

#### SqueezeNet 1.0 / OpenBLAS

In [None]:
squeezenet_1_0_openblas_tags = 'chromebook2,time,caffemodel,squeezenet-1.0,openblas'
squeezenet_1_0_openblas_results = search_experimental_points_by_tags(squeezenet_1_0_openblas_tags)
analyze(squeezenet_1_0_openblas_results)

### SqueezeNet 1.1

#### SqueezeNet 1.1 / clBLAS

In [None]:
squeezenet_1_1_clblas_tags = 'chromebook2,time,caffemodel,squeezenet-1.1,clblas'
squeezenet_1_1_clblas_results = search_experimental_points_by_tags(squeezenet_1_1_clblas_tags)
analyze(squeezenet_1_1_clblas_results)

#### SqueezeNet 1.1 / CLBlast dev

In [None]:
squeezenet_1_1_clblast_development_tags = 'chromebook2,time,caffemodel,squeezenet-1.1,clblast,vdevelopment'
squeezenet_1_1_clblast_development_results = search_experimental_points_by_tags(squeezenet_1_1_clblast_development_tags)
analyze(squeezenet_1_1_clblast_development_results)

#### SqueezeNet 1.1 / CLBlast dev with Mali-optimized overlay

In [None]:
squeezenet_1_1_clblast_mali_overlay_tags = 'chromebook2,time,caffemodel,squeezenet-1.1,clblast,vmali-overlay'
squeezenet_1_1_clblast_mali_overlay_results = search_experimental_points_by_tags(squeezenet_1_1_clblast_mali_overlay_tags)
# analyze(squeezenet_1_1_clblast_mali_overlay_results)

**NB:** Failures are being investigated.

#### SqueezeNet 1.1 / OpenBLAS

In [None]:
squeezenet_1_1_openblas_tags = 'chromebook2,time,caffemodel,squeezenet-1.1,openblas'
squeezenet_1_1_openblas_results = search_experimental_points_by_tags(squeezenet_1_1_openblas_tags)
analyze(squeezenet_1_1_openblas_results)

### GoogleNet

#### GoogleNet / clBLAS

In [None]:
googlenet_clblas_tags = 'chromebook2,time,caffemodel,googlenet,clblas'
googlenet_clblas_results = search_experimental_points_by_tags(googlenet_clblas_tags)
analyze(googlenet_clblas_results)

#### GoogleNet / CLBlast dev

In [None]:
googlenet_clblast_development_tags = 'chromebook2,time,caffemodel,googlenet,clblast,vdevelopment'
googlenet_clblast_development_results = search_experimental_points_by_tags(googlenet_clblast_development_tags)
analyze(googlenet_clblast_development_results)

#### GoogleNet / CLBlast dev with Mali-optimized overlay

In [None]:
googlenet_clblast_mali_overlay_tags = 'chromebook2,time,caffemodel,googlenet,clblast,vmali-overlay'
googlenet_clblast_mali_overlay_results = search_experimental_points_by_tags(googlenet_clblast_mali_overlay_tags)
# analyze(googlenet_clblast_mali_overlay_results)

**NB:** Failures are being investigated.

#### GoogleNet / OpenBLAS

In [None]:
googlenet_openblas_tags = 'chromebook2,time,caffemodel,googlenet,openblas'
googlenet_openblas_results = search_experimental_points_by_tags(googlenet_openblas_tags)
analyze(googlenet_openblas_results)