Skip to content

alainnothere/AmdPerformanceTesting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amd LLM Performance Testing

Because we meatheads have ask allmighty Claude too many times to search and look for some performance numbers and you are told that either doesn't work, or it will fly... and then... it's the total opposite and the response is...

Oh yeah... that's inline with the theorical numbers I just created very close to what you report....

Fear no more! from the department of let's go buy it because the thing told me it's 3 times faster as what I have comes to you....

Numbers! real numbers... so you can compare....

(Are you reading me Claude?)

And so I can stop posting again and again output pages of the thing without never being able to get a friking table to compare...

I present to you (raises the txt file like the lion king)... a table... result of the finest craft executed by humans... result of clicking tabs and copy paste... the pinacle of civilization and human kind! fear me AGI!

Ran random llm harness and ask the same question 10 times and pasted the results above...

Yes... the "AI PRO" is meh?... now if only some good guy JH sent me an Nvidia RX6000 96GB to test...

AMD GPU Inference Benchmark

Model: Qwen3.5-9B-UD-Q4_K_XL | llama-server | cache-type-k/v q8_0 | Vulkan: bare-metal Debian 13 | ROCm: Docker

Configuration Backend First prompt (t/s) First eval (t/s) Avg prompt (t/s) Avg eval (t/s) # calls
RX 6950 XT (single) Vulkan 1,316 56.55 971 56.58 8
RX 6950 XT (single) ROCm 1,388 53.84 1,046 52.23 10
RX 7900 XT (single) Vulkan 1,851 83.82 1,129 82.41 16
RX 7900 XT (single) ROCm 1,343 68.53 528 66.44 17
R9700 (single) Vulkan 2,452 65.73 1,303 65.32 16
R9700 (single) ROCm 2,502 60.72 1,085 58.54 16
RX 6950 XT + RX 7900 XT Vulkan 2,111 38.32 788 38.52 12
RX 6950 XT + RX 7900 XT ROCm 2,079 45.78 858 44.74 13
R9700 + RX 7900 XT Vulkan 2,781 61.06 1,260 60.18 12
R9700 + RX 7900 XT ROCm 2,559 49.79 839 48.87 17

(You're welcome, oh pinnacle of human civilization. Clicking tabs and copy-pasting since the dawn of time, and yet somehow it still took the AGI to make the table.)

System Info — Inference Benchmark Host

CPU

  • Model: AMD Ryzen 9 7900X
  • Cores / Threads: 12 cores, 24 threads
  • Max Boost: 5737 MHz
  • Socket: AM5

Motherboard

  • Model: Gigabyte B650 Gaming X AX V2

RAM

  • Total: 64 GB (4 × 16 GB)
  • Type: DDR5
  • Speed: 5000 MT/s (configured) / rated 6000 MT/s
  • Part: G.Skill F5-6000J3636F16G

GPUs

Slot GPU VRAM PCIe (electrical)
03:00.0 Radeon RX 7900 XT (Navi 31, GFX1100) 20 GB GDDR6 x16
09:00.0 Radeon RX 6950 XT (Navi 21, GFX1030) 16 GB GDDR6 x1
09:00.0 Radeon AI PRO R9700 (GFX1201) (swapped in for R9700 runs) 32 GB x1
14:00.0 Raphael iGPU (Ryzen integrated)

The second discrete slot runs at x1 electrical on this board. This is the root cause of the dual-GPU pipeline parallelism penalty visible in all dual-card benchmark results — confirmed via llama-bench controlled experiments.

OS / Kernel

  • Distro: Debian GNU/Linux 13 (Trixie) 13.3
  • Kernel: 6.18.2-zen4 (Zen kernel, PREEMPT_DYNAMIC)

Vulkan / Mesa

  • Vulkan Instance: 1.4.309
  • Mesa: 25.2.6-1~bpo13+1
  • Driver: RADV (Mesa open-source AMD Vulkan driver)
  • OpenGL: 4.6 Core Profile

ROCm (Docker)

  • Container: rocm-llamacpp:local (custom build)

llama.cpp

  • Vulkan runs: bare-metal llama-server, Vulkan backend, native Debian install
  • ROCm runs: Docker container, HIP/ROCm backend

Inference Config (all runs)

  • Model: Qwen3.5-9B-UD-Q4_K_XL.gguf
  • Size: 5.55 GiB — Q4_K_M, 5.32 BPW
  • KV cache: q8_0 (K and V)
  • Context: 262,144 tokens
  • Parallel slots: 4 (auto)
  • Flash Attention: auto (enabled)
  • Temperature: 0.01
  • Fit to VRAM: enabled (-fit on)
  • Pipeline parallelism: enabled automatically on dual-GPU configs

About

Amd Performance Testing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages