This repository contains validated workflows and results in the open CK format from the 1st reproducible ACM ReQuEST-ASPLOS'18 tournament on co-designing Pareto-efficient SW/HW stack for deep learning based inference (image classification):
- Image classification using Intel Caffe on Intel-based servers (AWS) (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
- Image classification using MXNet/TVM/NNVM on ARM GPU (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
- Image classification using TensorFlow and Apache Avro on IoT farms (5..11 Raspberry Pi 3 devices) vs NVIDIA Jetson TX2 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
- Image classification using TVM on FPGA (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
- Image classification using ArmCL and TensorFlow with OpenCL on HiKey 960 (CK GitHub workflow, paper DOI, CK workflow snapshot DOI)
You can see associated results report, ACM proceedings and the live ReQuEST scoreboard.
All above workflows implement image classification across a very diverse model/software/hardware stack:
- Models: MobileNets, ResNet-18, ResNet-50, Inception-v3, VGG16, SSD and AlexNet.
- Data types: 8-bit integer, 16-bit floating-point (half), 32-bit floating-point (float).
- AI frameworks and libraries: MXNet, TensorFlow, Caffe, Keras, Arm Compute Library, cuDNN, TVM and NNVM.
- Platforms: Xilinx Pynq-Z1 FPGA, Arm Cortex CPUs and Arm Mali GPGPUs (Linaro HiKey960 and T-Firefly RK3399), a farm of Raspberry Pi devices, NVIDIA Jetson TX2, and Intel Xeon servers in Amazon Web Services, Google Cloud and Microsoft Azure.
The reproduced results, available on the ReQuEST scoreboard, also exhibit amazing diversity:
- Latency: 4 .. 500 ms
- Throughput: 2 .. 465 images/sec
- Top 1 accuracy: 41 .. 75 %
- Top 5 accuracy: 65 .. 93 %
- Platform cost: 40 .. 1200 $
- Device frequency: 100 .. 2600 MHz
- Peak power consumption: 2.5 .. 180 Watts
- Trained model size (weights): 2 .. 130 MB
- Cloud usage cost per inference: 2.6E-6 .. 9.5E-6 $