Skip to content

[BUG] gfx1201 (RDNA4 / Radeon AI PRO R9700) not in AITER arch table — FP8 WMMA silently falls back to FP32 #520

@mohanapgvb

Description

@mohanapgvb

Environment
GPU: AMD Radeon AI PRO R9700 (32GB GDDR6, gfx1201, RDNA4)
ROCm: 7.2.1
OS: Ubuntu 24.04 LTS
Frameworks affected: vLLM, SGLang, ROCm TransformerEngine

Problem

AMD's official product guide advertises "128 AI accelerators with FP8
support." ROCm 7.2.1 silently dequantizes all FP8 weights to FP32 on
gfx1201 with no warning. The AI accelerators do zero FP8 work.
Throughput is ~18-22 tok/s instead of expected ~35-40 tok/s.

Root Cause
gfx1201 is missing from _ARCH_TO_DEVICE in
aiter/ops/triton/utils/arch_info.py causing silent FP32 fallback.

Fix (community validated)
'gfx1201': 'MI350X'

RDNA4 uses FP8 E4M3FN identical to MI350X. Triton kernel path works
correctly. Non-breaking for existing CDNA deployments.

Request: Official ETA for merging this two-line fix into AITER mainline.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions