Skip to content

Ryzen 7940HS Inference: Why is CPU Faster than GPU/NPU for Quantized ONNX Models? (AMD RyzenAI) #212

@yashgadodia12

Description

@yashgadodia12

Hi all,
I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:

  • CPU < GPU < NPU (in terms of speed, so NPU should be fastest),

  • and vice versa for throughput (images/sec).

But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (https://github.com/amd/RyzenAI-SW/blob/main/tutorial/torchvision_inference/classification.py" rel="noopener nofollow noreferrer ugc" target="_blank" style="color: rgb(17, 91, 202); text-decoration: underline; font-size: 1em; pointer-events: auto; position: relative; cursor: pointer; margin-bottom: 0px;">AMD Github example). Here’s a sample of my results (all numbers for 50 images):

Model CPU (ms/img) CPU (img/s) GPU (ms/img) GPU (img/s) NPU (ms/img) NPU (img/s)

alexnet_quantized.onnx | 24.57 | 40.70 | 25.21 | 39.66 | 58.25 | 17.17
googlenet_quantized.onnx | 17.67 | 56.58 | 22.43 | 44.58 | 27.48 | 36.39
mobilenetv2_050_quantized.onnx | 10.87 | 92.04 | 9.98 | 100.24 | 13.93 | 71.80
cspresnet50_quantized.onnx | 38.35 | 26.07 | 49.14 | 20.35 | 63.55 | 15.74
squeezenet1_0_quantized.onnx | 7.66 | 130.51 | 10.24 | 97.62 | 12.86 | 77.79

Questions:

  1. Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?

  2. Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?

  3. Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?

  4. Has anyone else had similar experience with RyzenAI on this or similar hardware?

Notes:

  • I followed AMD’s official quantization/inference code.

  • All drivers and firmware are up to date.

  • Let me know if you need more info about my setup!

Hi all, I’m running image classification inference using quantized ONNX models on a Ryzen 7940HS device (8C/16T, integrated GPU/NPU). According to my understanding and most docs, the inference speed should be:

CPU < GPU < NPU (in terms of speed, so NPU should be fastest),

and vice versa for throughput (images/sec).

But in my benchmarks, CPU is often faster than both GPU and NPU, especially with quantized models using AMD’s official RyzenAI reference (AMD Github example). Here’s a sample of my results (all numbers for 50 images):

Model CPU (ms/img) CPU (img/s) GPU (ms/img) GPU (img/s) NPU (ms/img) NPU (img/s)
alexnet_quantized.onnx 24.57 40.70 25.21 39.66 58.25 17.17
googlenet_quantized.onnx 17.67 56.58 22.43 44.58 27.48 36.39
mobilenetv2_050_quantized.onnx 10.87 92.04 9.98 100.24 13.93 71.80
cspresnet50_quantized.onnx 38.35 26.07 49.14 20.35 63.55 15.74
squeezenet1_0_quantized.onnx 7.66 130.51 10.24 97.62 12.86 77.79
Questions:

Why is my CPU inference often faster than GPU and NPU, even though the expectation is the opposite?

Does the Ryzen 7940HS being an “older” chip (Zen 4, but 2023 release) or its 8C/16T configuration play a significant role in this?

Is there anything I should tune or check in my setup, quantization, or ONNX runtime to get better NPU/GPU results?

Has anyone else had similar experience with RyzenAI on this or similar hardware?

Notes:

I followed AMD’s official quantization/inference code.

All drivers and firmware are up to date.

Let me know if you need more info about my setup!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions