Collective Knowledge repository for evaluating and optimising performance of Caffe
CMake C++ Python Jupyter Notebook Shell Batchfile Other

Collective Knowledge repository for collaboratively optimising Caffe-based designs



CK-Caffe is an open framework for collaborative and reproducible optimisation of convolutional neural networks. It's based on the Caffe framework from the Berkeley Vision and Learning Center (BVLC) and the Collective Knowledge framework for customizable cross-platform builds and experimental workflows with JSON API from the cTuning Foundation (see CK intro for more details: 1, 2 ). In essence, CK-Caffe is simply a suite of convenient wrappers with unified JSON API for customized building, evaluation and multi-objective optimisation of Caffe.

As outlined in our vision, we invite the community to collaboratively design and optimize convolutional neural networks to meet the performance, accuracy and cost requirements for deployment on a range of form factors - from sensors to self-driving cars. To this end, CK-Caffe leverages the key capabilities of CK to crowdsource experimentation across diverse platforms, CNN designs, optimization options, and so on; exchange experimental data in a flexible JSON-based format; and apply leading-edge predictive analytics to extract valuable insights from the experimental data.

See for more details.


Quick installation on Ubuntu

Please refer to our Installation Guide for detailed instructions for Ubuntu, Gentoo, Yocto, Windows and Android.

Installing general dependencies

$ sudo apt install coreutils \
                   build-essential \
                   make \
                   cmake \
                   wget \
                   git \
                   python \

Installing Caffe dependencies

$ sudo apt install libboost-all-dev \
                   libgflags-dev \
                   libgoogle-glog-dev \
                   libhdf5-serial-dev \
                   liblmdb-dev \
                   libleveldb-dev \
                   libprotobuf-dev \
                   protobuf-compiler \
                   libsnappy-dev \
$ sudo pip install protobuf

Installing CK

$ sudo pip install ck
$ ck version

Installing CK-Caffe repository

$ ck pull repo:ck-caffe --url=

Building Caffe and all dependencies via CK

The first time you run caffe benchmark, CK will build and install all missing dependencies for your machine, download required data sets and will start benchmark:

$ ck run program:caffe

Testing installation via image classification

 $ ck compile program:caffe-classification --speed
 $ ck run program:caffe-classification

Note, that you will be asked to select a jpeg image from available CK data sets. We added standard demo images (cat.jpg, catgrey.jpg, fish-bike.jpg, computer_mouse.jpg) to the 'ctuning-datasets-min' repository.

You can list them via

 $ ck pull repo:ctuning-datasets-min
 $ ck search dataset --tags=dnn

Testing beta crowd-benchmarking

It is now possible to participate in crowd-benchmarking of Caffe (early prototype):

$ ck crowdbench caffe --user={your email or ID to acknowledge contributions} --env.CK_CAFFE_BATCH_SIZE=2

You can also use this Android app to crowdsource benchmarking of ARM-based Caffe libraries for image recognition (beta version).

You can see continuously aggregated results in the public Collective Knowledge repository under 'crowd-benchmark Caffe library' scenario.

Creating dataset subsets

The ILSVRC2012 validation dataset contains 50K images. For quick experiments, you can create a subset of this dataset, as follows. Run:

$ ck install package:imagenet-2012-val-lmdb-256

When prompted, enter the number of images to convert to LMDB, say, N = 100. The first N images will be taken.

Customizing caffe benchmarking via CK command line

You can customize various Caffe parameters such as batch size and iterations via CK command line:

$ ck run program:caffe --env.CK_CAFFE_BATCH_SIZE=1 --env.CK_CAFFE_ITERATIONS=10

Installing CK on Windows, Android and various flavours of Linux

You can find details about CK-Caffe installation for Windows, various flavours of Linux and Android here.

Preliminary results

Compare accuracy of 4 CNNs

In this Jupyter notebook, we compare the Top-1 and Top-5 accuracy of 4 CNNs:

on the Imagenet validation set (50,000 images).

We have thus independently verified that on this data set SqueezeNet matches (and even slightly exceeds) the accuracy of AlexNet.

The experimental data is stored in the main CK-Caffe repository under 'experiment'.

Compare performance of 4 CNNs on Chromebook 2

This notebook investigates effects on inference performance of varying the batch size:

  • across the same 4 CNNs;
  • with 4 BLAS libraries:
    • [CPU] OpenBLAS 0.2.18 (one thread per core);
    • [GPU] clBLAS 2.4 (OpenCL 1.1 compliant);
    • [GPU] CLBlast dev (35623cd > 0.8.0);
    • [GPU] CLBlast dev (35623cd > 0.8.0) with Mali-optimized overlay (641bb07);
  • on the Samsung Chromebook 2 platform:
    • [CPU] quad-core ARM Cortex-A15 (@ 1900 MHz);
    • [GPU] quad-core ARM Mali-T628 (@ 600 MHz);
    • [GPU] OpenCL driver 6.0 (r6p0); OpenCL standard 1.1.

Finally, this notebook compares the best performance per image across the CNNs and BLAS libraries. When using OpenBLAS, SqueezeNet 1.1 is 2 times faster than SqueezeNet 1.0 and 2.4 times faster than AlexNet, broadly in line with expectations set by the SqueezeNet paper.

When using OpenCL BLAS libraries, however, SqueezeNet 1.0 is not necessarily faster than AlexNet, despite roughly 500 times reduction in the weights' size. This suggests that an optimal network design for a given task may depend on the software stack as well as on the hardware platform. Moreover, design choices may well shift over time, as software matures and new hardware becomes available. That's why we believe it is necessary to leverage community effort to collectively grow design and optimisation knowledge.

The experimental data and visualisation notebooks are stored in a separate repository which can be obtained as follows:

ck pull repo:ck-caffe-explore-batch-size-chromebook2 \

Next steps

CK-Caffe is part of an ambitious long-term and community-driven project to enable collaborative and systematic optimization of realistic workloads across diverse hardware in terms of performance, energy usage, accuracy, reliability, hardware price and other costs (ARM TechCon'16 talk, ARM TechCon'16 demo, DATE'16, CPC'15).

We are working with the community to unify and crowdsource performance analysis and tuning of various DNN frameworks (or any realistic workload) using Collective Knowledge Technology:

We continue gradually exposing various design and optimization choices including full parameterization of existing models.

Open R&D challenges

We use crowd-benchmarking and crowd-tuning of such realistic workloads across diverse hardware for open academic and industrial R&D challenges - join this community effort!

Related Publications with long term vision