Skip to content
Siva edited this page Jan 5, 2023 · 24 revisions

About

This page contains the benchmark results for several popular image classification models. We auto-tune all listed models on target platforms and benchmark the inference performance (time cost per image).

Content

ARM CPU

Note: If a board has big.LITTLE architecture, we will use all big cores. Otherwise, we will use all cores. In the following device specifications, we only list the cores being used.

Devices

  • Firefly-RK3399 : 2 x Cortex A72 1.8Ghz
  • Raspberry Pi 3B : 4 x Cortex A53 1.2Ghz
  • Huawei P20 Pro / Mate10 Pro (Soc: HiSilicon Kirin 970) : (4 x Cortex A73 2.36GHz)
  • Google Pixel 2 (Soc: Qualcomm Snapdragon 835) : (4 × Kyro 2.35 GHz)
  • PYNQ (2 x Cortex-A9 650MHz)

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Raspberry Pi 3B 610.2 2074.2 121.8 104.8 320.0 726.0 185.1 94.0 1772.0 2119.8
Firefly RK3399 336.8 1304.4 77.9 64.8 158.6 403.2 94.3 48.2 903.5 1086.0
Huawei P20 Pro 179.7 444.7 41.3 33.4 77.4 232.5 51.4 26.0 486.3 729.4
Google Pixel2 161.0 434.8 39.6 29.3 66.0 181.1 47.3 23.0 397.1 485.0
Xilinx PYNQ 2887.0 9691.7 721.4 513.3 1231.7 3585.5 913.0 478.3 -1.0 -1.0

Mobile GPU

Devices

  • Mali-T860 MP4: On Firefly-RK3399. Its frequency is locked to 800MHz.

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Mali-T860 410.6 784.7 79.5 77.7 127.3 354.7 111.0 62.5 673.2 792.1
  • dtype = float16 and batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 squeezenet-v1.0 squeezenet-v1.1 vgg-16 vgg-19
Mali-T860 295.4 464.9 52.9 60.7 84.3 221.0 77.3 46.7 405.6 472.8

NVIDIA GPU

Devices

  • Jetson TX2: on Max-N mode 1.3GHz
  • GTX 1080 TI, GTX Titan X

Results

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet mobilenet-v2 resnet-18 resnet-50 vgg-16 vgg-19
GTX 1080 Ti 3.6 5.8 0.7 1.0 1.1 2.8 4.2 4.8
GTX TITAN X 5.8 9.9 1.0 1.6 1.6 4.3 6.3 7.4
Jetson TX2 26.8 45.7 5.2 8.8 9.6 26.2 58.2 68.8

AMD GPU

  • dtype = float32, batch_size = 1 (unit: ms)
densenet-121 inception-v3 mobilenet resnet-18 resnet-50 vgg-16 vgg-19
Vega FE 5.8 8.9 1.0 1.6 4.5 6.3 7.2

Adreno GPU

TVM supports Adreno hardware in both Native OpenCL path as well as using OpenCLML BYOC path.

OpenCLML is Qualcomm's propriety acceleration operator library implemented as an extension. OpenCLML SDK is available developer community.More details about OpenCLML can be found at Qualcomm Developer Network OpenCLML and OpenCLML with TVM

OpenCLML is integrated into TVM as a BYOC backend which can accelerate operators using Qualcomm's hardware aware proprietary operators.

Devices

  • Snapdragon Gen 1 : Adreno 730

Native OpenCL Results

  • batch_size = 1 (unit: ms)
Resnet 18 Resnet 34 Resnet 50 VGG-16 VGG-19 Densenet-121 Inception V3 MobilenetV1 Squeezenet-v1.0 Squeezenet-v1.1
FP32 9.56 15.37 18.25 54.20 108.71 27.33 39.54 3.82 6.89 3.24
FP16 6.94 11.94 13.77 34.58 41.23 11.93 30.13 2.72 4.75 2.52

OpenCLML Results

  • batch_size = 1 (unit: ms)
Resnet 18 Resnet 34 Resnet 50 Densenet-121 Inception V3 MobilenetV1 Squeezenet-v1.0 Squeezenet-v1.1
FP32 9.75 15.22 25.43 15.56 26.63 3.85 8.90 2.79
FP16 4.52 7.34 13.17 7.87 12.44 1.54 3.28 1.31

Reproduce

See readme page https://github.com/dmlc/tvm/tree/master/apps/benchmark on how to get these numbers.