-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Expected behavior
I installed the latest TVM from source, and managed to compile Resnet50 following your tutorial. By setting the target flag as "llvm", I did observe a speedup from resnet50-v2-7-tvm.tar to resnet50-v2-7_autotuned.tar.
However, when setting the target flag as "cuda", the autotuned version is slower than the non-autotuned version on GPU. And I would like to ask whether anyone has observed similar behavior before, or is there anything I did in a wrong way.
Environment
Ubuntu 20.04 with 3080 Ti
CUDA 11.2 with driver 460.91.03
TVM version: 0.8.dev0
LLVM version: 13.0.0
Steps to reproduce
Steps to generate and test the non-autotuned version:
tvmc compile --target "cuda" --output resnet50-v2-7-tvm-cuda.tar resnet50-v2-7.onnx
tvmc run --device cuda --inputs imagenet_cat.npz --output predictions.npz --print-time --repeat 100 resnet50-v2-7-tvm-cuda.tar
My terminal output for non-autotuned version:
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
3.4426 3.4623 5.6511 3.0480 0.4118
Steps to generate and test the autotuned version:
tvmc tune --target "cuda" --output resnet50-v2-7-autotuner_records-cuda.json resnet50-v2-7.onnx
tvmc compile --target "cuda" --tuning-records resnet50-v2-7-autotuner_records-cuda.json --output resnet50-v2-7-tvm_autotuned-cuda.tar resnet50-v2-7.onnx
tvmc run --device cuda --inputs imagenet_cat.npz --output predictions.npz --print-time --repeat 100 resnet50-v2-7-tvm_autotuned-cuda.tar
My terminal output for autotuned version:
Execution time summary:
mean (ms) median (ms) max (ms) min (ms) std (ms)
4.8350 5.0163 7.8554 4.4040 0.5398
From the above, we can find that the autotuned one takes longer time than the non-autotuned one.
I am new to TVM. So my "bug" might be naive. Any help will be greatly appreciated, and thanks in advance! :-)