Validated Collective Knowledge workflows and results from the 1st ACM ReQuEST tournament on co-design of Pareto-efficient SW/HW stack for image classification at ASPLOS'18:
Switch branches/tags
Nothing to show
Clone or download
Latest commit eb7a0d2 Oct 17, 2018
Failed to load latest commit information.
.cm validated Mar 19, 2018
experiment clean up May 20, 2018
.ckr.json added deps on ck-request Feb 19, 2018
.gitignore removing cache files Apr 3, 2018 updated logos Oct 16, 2018

compatibility License: CC BY 4.0

This repository contains validated workflows and results in the open CK format from the 1st reproducible ACM ReQuEST-ASPLOS'18 tournament on co-designing Pareto-efficient SW/HW stack for deep learning based inference (image classification):

  1. Image classification using Intel Caffe on Intel-based servers (AWS) (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  2. Image classification using MXNet/TVM/NNVM on ARM GPU (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  3. Image classification using TensorFlow and Apache Avro on IoT farms (5..11 Raspberry Pi 3 devices) vs NVIDIA Jetson TX2 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  4. Image classification using TVM on FPGA (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  5. Image classification using ArmCL and TensorFlow with OpenCL on HiKey 960 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)

You can see associated results report, ACM proceedings and the live ReQuEST scoreboard.

All above workflows implement image classification across a very diverse model/software/hardware stack:

  • Models: MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD and AlexNet.
  • Data types: 8-bit integer, 16-bit floating-point (half), 32-bit floating-point (float).
  • AI frameworks and libraries: MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM and NNVM.
  • Platforms: Xilinx Pynq-Z1 FPGA, Arm Cortex CPUs and Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399), a farm of Raspberry Pi devices, NVIDIA Jetson TX2, and Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure.

The reproduced results, available on the ReQuEST scoreboard, also exhibit amazing diversity:

  • Latency: 4 .. 500 ms
  • Throughput: 2 .. 465 images/sec
  • Top 1 accuracy: 41 .. 75 %
  • Top 5 accuracy: 65 .. 93 %
  • Platform cost: 40 .. 1200 $
  • Device frequency: 100 .. 2600 MHz
  • Peak power consumption: 2.5 .. 180 Watts
  • Trained model size (weights): 2 .. 130 MB
  • Cloud usage cost per inference: 2.6E-6 .. 9.5E-6 $

Shared CK components

Further discussions