Benchmark

About

This page contains the benchmark results for several popular image classification models. We auto-tune all listed models on target platforms and benchmark the inference performance (time cost per image).

Content

Results
- ARM CPU
- Mobile GPU
- NVIDIA GPU
- AMD GPU
- Adreno GPU
Links
- Reproduce

ARM CPU

Note: If a board has big.LITTLE architecture, we will use all big cores. Otherwise, we will use all cores. In the following device specifications, we only list the cores being used.

Devices

Firefly-RK3399 : 2 x Cortex A72 1.8Ghz
Raspberry Pi 3B : 4 x Cortex A53 1.2Ghz
Huawei P20 Pro / Mate10 Pro (Soc: HiSilicon Kirin 970) : (4 x Cortex A73 2.36GHz)
Google Pixel 2 (Soc: Qualcomm Snapdragon 835) : (4 × Kyro 2.35 GHz)
PYNQ (2 x Cortex-A9 650MHz)

Results

dtype = float32, batch_size = 1 (unit: ms)

	densenet-121	inception-v3	mobilenet	mobilenet-v2	resnet-18	resnet-50	squeezenet-v1.0	squeezenet-v1.1	vgg-16	vgg-19
Raspberry Pi 3B	610.2	2074.2	121.8	104.8	320.0	726.0	185.1	94.0	1772.0	2119.8
Firefly RK3399	336.8	1304.4	77.9	64.8	158.6	403.2	94.3	48.2	903.5	1086.0
Huawei P20 Pro	179.7	444.7	41.3	33.4	77.4	232.5	51.4	26.0	486.3	729.4
Google Pixel2	161.0	434.8	39.6	29.3	66.0	181.1	47.3	23.0	397.1	485.0
Xilinx PYNQ	2887.0	9691.7	721.4	513.3	1231.7	3585.5	913.0	478.3	-1.0	-1.0

Mobile GPU

Devices

Mali-T860 MP4: On Firefly-RK3399. Its frequency is locked to 800MHz.

Results

dtype = float32, batch_size = 1 (unit: ms)

	densenet-121	inception-v3	mobilenet	mobilenet-v2	resnet-18	resnet-50	squeezenet-v1.0	squeezenet-v1.1	vgg-16	vgg-19
Mali-T860	410.6	784.7	79.5	77.7	127.3	354.7	111.0	62.5	673.2	792.1

dtype = float16 and batch_size = 1 (unit: ms)

	densenet-121	inception-v3	mobilenet	mobilenet-v2	resnet-18	resnet-50	squeezenet-v1.0	squeezenet-v1.1	vgg-16	vgg-19
Mali-T860	295.4	464.9	52.9	60.7	84.3	221.0	77.3	46.7	405.6	472.8

NVIDIA GPU

Devices

Jetson TX2: on Max-N mode 1.3GHz
GTX 1080 TI, GTX Titan X

Results

dtype = float32, batch_size = 1 (unit: ms)

	densenet-121	inception-v3	mobilenet	mobilenet-v2	resnet-18	resnet-50	vgg-16	vgg-19
GTX 1080 Ti	3.6	5.8	0.7	1.0	1.1	2.8	4.2	4.8
GTX TITAN X	5.8	9.9	1.0	1.6	1.6	4.3	6.3	7.4
Jetson TX2	26.8	45.7	5.2	8.8	9.6	26.2	58.2	68.8

AMD GPU

dtype = float32, batch_size = 1 (unit: ms)

	densenet-121	inception-v3	mobilenet	resnet-18	resnet-50	vgg-16	vgg-19
Vega FE	5.8	8.9	1.0	1.6	4.5	6.3	7.2

Adreno GPU

TVM supports Adreno hardware in both Native OpenCL path as well as using OpenCLML BYOC path.

OpenCLML is Qualcomm's propriety acceleration operator library implemented as an extension. OpenCLML SDK is available developer community.More details about OpenCLML can be found at Qualcomm Developer Network OpenCLML and OpenCLML with TVM

OpenCLML is integrated into TVM as a BYOC backend which can accelerate operators using Qualcomm's hardware aware proprietary operators.

Devices

Snapdragon Gen 1 : Adreno 730

Native OpenCL Results

batch_size = 1 (unit: ms)

	Resnet 18	Resnet 34	Resnet 50	VGG-16	VGG-19	Densenet-121	Inception V3	MobilenetV1	Squeezenet-v1.0	Squeezenet-v1.1
FP32	9.56	15.37	18.25	54.20	108.71	27.33	39.54	3.82	6.89	3.24
FP16	6.94	11.94	13.77	34.58	41.23	11.93	30.13	2.72	4.75	2.52

OpenCLML Results

batch_size = 1 (unit: ms)

	Resnet 18	Resnet 34	Resnet 50	Densenet-121	Inception V3	MobilenetV1	Squeezenet-v1.0	Squeezenet-v1.1
FP32	9.75	15.22	25.43	15.56	26.63	3.85	8.90	2.79
FP16	4.52	7.34	13.17	7.87	12.44	1.54	3.28	1.31

Reproduce

See readme page https://github.com/dmlc/tvm/tree/master/apps/benchmark on how to get these numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark

About

Content

ARM CPU

Devices

Results

Mobile GPU

Devices

Results

NVIDIA GPU

Devices

Results

AMD GPU

Adreno GPU

Devices

Native OpenCL Results

OpenCLML Results

Reproduce

Clone this wiki locally