Hi all,
I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:
CPU < GPU < NPU (in terms of speed, so NPU should be fastest),
and vice versa for throughput (images/sec).
But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (https://github.com/amd/RyzenAI-SW/blob/main/tutorial/torchvision_inference/classification.py" rel="noopener nofollow noreferrer ugc" target="_blank" style="color: rgb(17, 91, 202); text-decoration: underline; font-size: 1em; pointer-events: auto; position: relative; cursor: pointer; margin-bottom: 0px;">AMD Github example). Here’s a sample of my results (all numbers for 50 images):
| Model |
CPU (ms/img) |
CPU (img/s) |
GPU (ms/img) |
GPU (img/s) |
NPU (ms/img) |
NPU (img/s) |
alexnet_quantized.onnx | 24.57 | 40.70 | 25.21 | 39.66 | 58.25 | 17.17
googlenet_quantized.onnx | 17.67 | 56.58 | 22.43 | 44.58 | 27.48 | 36.39
mobilenetv2_050_quantized.onnx | 10.87 | 92.04 | 9.98 | 100.24 | 13.93 | 71.80
cspresnet50_quantized.onnx | 38.35 | 26.07 | 49.14 | 20.35 | 63.55 | 15.74
squeezenet1_0_quantized.onnx | 7.66 | 130.51 | 10.24 | 97.62 | 12.86 | 77.79
Questions:
Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?
Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?
Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?
Has anyone else had similar experience with RyzenAI on this or similar hardware?
Notes:
I followed AMD’s official quantization/inference code.
All drivers and firmware are up to date.
Let me know if you need more info about my setup!
Hi all,
I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:
CPU < GPU < NPU (in terms of speed, so NPU should be fastest),
and vice versa for throughput (images/sec).
But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (AMD Github example). Here’s a sample of my results (all numbers for 50 images):
Model CPU (ms/img) CPU (img/s) GPU (ms/img) GPU (img/s) NPU (ms/img) NPU (img/s)
alexnet_quantized.onnx 24.57 40.70 25.21 39.66 58.25 17.17
googlenet_quantized.onnx 17.67 56.58 22.43 44.58 27.48 36.39
mobilenetv2_050_quantized.onnx 10.87 92.04 9.98 100.24 13.93 71.80
cspresnet50_quantized.onnx 38.35 26.07 49.14 20.35 63.55 15.74
squeezenet1_0_quantized.onnx 7.66 130.51 10.24 97.62 12.86 77.79
Questions:
Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?
Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?
Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?
Has anyone else had similar experience with RyzenAI on this or similar hardware?
Notes:
I followed AMD’s official quantization/inference code.
All drivers and firmware are up to date.
Let me know if you need more info about my setup!
Hi all,
I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:
CPU < GPU < NPU (in terms of speed, so NPU should be fastest),
and vice versa for throughput (images/sec).
But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (https://github.com/amd/RyzenAI-SW/blob/main/tutorial/torchvision_inference/classification.py" rel="noopener nofollow noreferrer ugc" target="_blank" style="color: rgb(17, 91, 202); text-decoration: underline; font-size: 1em; pointer-events: auto; position: relative; cursor: pointer; margin-bottom: 0px;">AMD Github example). Here’s a sample of my results (all numbers for 50 images):
alexnet_quantized.onnx | 24.57 | 40.70 | 25.21 | 39.66 | 58.25 | 17.17
googlenet_quantized.onnx | 17.67 | 56.58 | 22.43 | 44.58 | 27.48 | 36.39
mobilenetv2_050_quantized.onnx | 10.87 | 92.04 | 9.98 | 100.24 | 13.93 | 71.80
cspresnet50_quantized.onnx | 38.35 | 26.07 | 49.14 | 20.35 | 63.55 | 15.74
squeezenet1_0_quantized.onnx | 7.66 | 130.51 | 10.24 | 97.62 | 12.86 | 77.79
Questions:
Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?
Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?
Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?
Has anyone else had similar experience with RyzenAI on this or similar hardware?
Notes:
Hi all, I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:I followed AMD’s official quantization/inference code.
All drivers and firmware are up to date.
Let me know if you need more info about my setup!
CPU < GPU < NPU (in terms of speed, so NPU should be fastest),
and vice versa for throughput (images/sec).
But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (AMD Github example). Here’s a sample of my results (all numbers for 50 images):
Model CPU (ms/img) CPU (img/s) GPU (ms/img) GPU (img/s) NPU (ms/img) NPU (img/s)
alexnet_quantized.onnx 24.57 40.70 25.21 39.66 58.25 17.17
googlenet_quantized.onnx 17.67 56.58 22.43 44.58 27.48 36.39
mobilenetv2_050_quantized.onnx 10.87 92.04 9.98 100.24 13.93 71.80
cspresnet50_quantized.onnx 38.35 26.07 49.14 20.35 63.55 15.74
squeezenet1_0_quantized.onnx 7.66 130.51 10.24 97.62 12.86 77.79
Questions:
Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?
Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?
Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?
Has anyone else had similar experience with RyzenAI on this or similar hardware?
Notes:
I followed AMD’s official quantization/inference code.
All drivers and firmware are up to date.
Let me know if you need more info about my setup!