Skip to content

Validated Collective Knowledge workflows and results from the 1st ACM ReQuEST tournament on co-design of Pareto-efficient SW/HW stack for image classification at ASPLOS'18:

Notifications You must be signed in to change notification settings

ctuning/ck-request-asplos18-results

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

compatibility License: CC BY 4.0

This repository contains validated workflows and results in the open CK format from the 1st reproducible ACM ReQuEST-ASPLOS'18 tournament on co-designing Pareto-efficient SW/HW stack for deep learning based inference (image classification):

  1. Image classification using Intel Caffe on Intel-based servers (AWS) (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  2. Image classification using MXNet/TVM/NNVM on ARM GPU (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  3. Image classification using TensorFlow and Apache Avro on IoT farms (5..11 Raspberry Pi 3 devices) vs NVIDIA Jetson TX2 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  4. Image classification using TVM on FPGA (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
  5. Image classification using ArmCL and TensorFlow with OpenCL on HiKey 960 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)

You can see associated results report, ACM proceedings and the live ReQuEST scoreboard.

All above workflows implement image classification across a very diverse model/software/hardware stack:

  • Models: MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD and AlexNet.
  • Data types: 8-bit integer, 16-bit floating-point (half), 32-bit floating-point (float).
  • AI frameworks and libraries: MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM and NNVM.
  • Platforms: Xilinx Pynq-Z1 FPGA, Arm Cortex CPUs and Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399), a farm of Raspberry Pi devices, NVIDIA Jetson TX2, and Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure.

The reproduced results, available on the ReQuEST scoreboard, also exhibit amazing diversity:

  • Latency: 4 .. 500 ms
  • Throughput: 2 .. 465 images/sec
  • Top 1 accuracy: 41 .. 75 %
  • Top 5 accuracy: 65 .. 93 %
  • Platform cost: 40 .. 1200 $
  • Device frequency: 100 .. 2600 MHz
  • Peak power consumption: 2.5 .. 180 Watts
  • Trained model size (weights): 2 .. 130 MB
  • Cloud usage cost per inference: 2.6E-6 .. 9.5E-6 $

Shared CK components

Further discussions

About

Validated Collective Knowledge workflows and results from the 1st ACM ReQuEST tournament on co-design of Pareto-efficient SW/HW stack for image classification at ASPLOS'18:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages