See results on wiki page https://github.com/dmlc/tvm/wiki/Benchmark
How to Reproduce
To obtain the best performance, we always do auto-tuning for the specific devices and get the parameters for used kernels. To enable easy reproduction of our results, we release pre-tuned parameters for popular networks on some common devices. TVM will download related tuning cache files during compilation.
If you don't have the following listed devices, you can still run these scripts. You can pick the one that is most similar to your device as argument. In general, the performance should also be good.
Build TVM with LLVM and CUDA enabled. Help
python3 gpu_imagenet_bench.py --model 1080ti python3 gpu_imagenet_bench.py --model titanx # For NVIDIA Jetson TX2, you can run the following command directly on the board, # or use cross compilation and RPC like what we do for ARM CPU. python3 gpu_imagenet_bench.py --model tx2
ARM CPU & Mali GPU
For embedded devices, we use RPC infrastructure in TVM to make the management easy. You need to use it for reproducing benchmark results.
Note: We use llvm-4.0 in our tuning environment. Mismatch of the LLVM version during tuning and deployment can influence the performance, so you have to use a same version for reproduction.
Build TVM with LLVM enabled. Help
Start an RPC Tracker on the host machine
python3 -m tvm.exec.rpc_tracker
- Register devices to the tracker
For Linux device
- Build tvm runtime on your device Help
- Register your device to tracker by
python3 -m tvm.exec.rpc_server --tracker=[HOST_IP]:9190 --key=[DEVICE_KEY]
[HOST_IP]with the IP address of the host machine,
[DEVICE_KEY]with the name of device.
E.g. Here is an example command for RK3399,
python3 -m tvm.exec.rpc_server --tracker=10.77.1.123:9190 --key=rk3399, where 10.77.1.123 is the IP address of the tracker.
For Android device
- Build and install tvm RPC apk on your device Help. Make sure you can pass the android rpc test. Then you have alreadly known how to register.
- Verify the device registration
We can query all registered devices by
python3 -m tvm.exec.query_rpc_tracker
You should be able to find your devices in
Queue Status. Make sure the registration is correct before going ahead.
For our test environment, one sample output can be
Queue Status ---------------------------------- key total free pending ---------------------------------- mate10pro 1 1 0 p20pro 2 2 0 pixel2 2 2 0 rk3399 2 2 0 rasp3b 8 8 0
- Run benchmark
# ARM CPU python3 arm_cpu_imagenet_bench.py --model rasp3b --rpc-key rasp3b python3 arm_cpu_imagenet_bench.py --model rk3399 --rpc-key rk3399 python3 arm_cpu_imagenet_bench.py --model pixel2 --rpc-key pixel2 python3 arm_cpu_imagenet_bench.py --model p20pro --rpc-key p20pro python3 arm_cpu_imagenet_bench.py --model mate10pro --rpc-key mate10pro
# Mali GPU # NOTE: To make the test environment more stable, we close GUI and lock the frequency sudo /etc/init.d/lightdm stop sudo -i echo performance > /sys/class/misc/mali0/device/devfreq/ff9a0000.gpu/governor python3 mobile_gpu_imagenet_bench.py --model rk3399 --rpc-key rk3399 python3 mobile_gpu_imagenet_bench.py --model rk3399 --rpc-key rk3399 --dtype float16
Build TVM with LLVM and ROCm enabled. Help
python3 gpu_imagenet_bench.py --model gfx900 --target rocm