ATOM v0.1.4
First ATOM release in the bi-weekly paired-release cadence with AITER (see cadence proposal). Also the first ATOM release that uses a release-branch workflow (prior versions tagged directly off main).
Same commit as v0.1.4-rc0 (26f23a0b) — zero delta after 3-day RC soak with no partner issues filed against the RC.
Paired AITER version
ATOM v0.1.4 pairs with AITER v0.1.15. The two releases share commit history and were jointly validated 5/5 PASS on GSM8K (mi355-gpu-15).
Commits since v0.1.3
26f23a0 Revert "Qwen3.5-35B-A3B-FP8: GDN decode lossy fast path + fused MRoPE QK (#838)"
f51d7be Revert "Remove qkv 256 tok limitation (#999)"
e3c97b9 Remove qkv 256 tok limitation (#999) ← reverted
4c4ae4f fix(spec_decode): support DP attention with MTP in Deepseek V4 (#1001)
dfb8eda Qwen3.5-35B-A3B-FP8: GDN decode lossy fast path + fused MRoPE QK (#838) ← reverted
6260cc1 Debug 'no such file or directory benchmark_matrix.json' (#994)
11be15e [atom-vllm benchmark] refine model case name (#995)
9f9e97b Debug name 'AWS_P0' is not defined (#991)
Highlights (net effective changes from v0.1.3)
- fix(spec_decode): DP attention with MTP support in DeepSeek-V4 (#1001)
- CI/benchmark fixes: AWS_P0 env, model case names, benchmark_matrix.json (#991 / #994 / #995)
Cherry-pick exclusions (intentional)
Two PRs reverted from v0.1.4 during joint validation:
- #999 (minimax qkv 256 tok limit): caused MiniMax-M2.5 server crash in joint test with AITER v0.1.15 (
VllmBackend can only be called once). Will revisit in v0.1.5. - #838 (Qwen3.5 GDN + fused MRoPE QK): reverted from v0.1.4 cycle to keep Kimi-K2.5 accuracy unaffected. Will revisit in v0.1.5.
Joint validation (paired with AITER v0.1.15)
GSM8K 3-shot flexible-extract, MI355X (gfx950), ATOM stack on 26f23a0b + AITER wheel 0.1.15rc0+rocm7.1.manylinux.2.28-cp312:
| Model | Score | Threshold | Result |
|---|---|---|---|
| DeepSeek-R1-0528 (TP=8, fp8 KV) | 0.9537 | 0.94 | PASS |
| MiniMax-M2.5 (TP=2, fp8 KV) | 0.9318 | 0.92 | PASS |
| Qwen3-235B-A22B-FP8 (TP=8, fp8 KV) | 0.8681 | 0.87 | PASS (borderline) |
| GLM-5-FP8 (TP=8, fp8 KV) | 0.9409 | 0.93 | PASS |
| Kimi-K2.5-MXFP4 (TP=4, fp8 KV) | 0.9219 | 0.92 | PASS |
5/5 PASS by upstream-canonical thresholds.
Container
Production: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.4 (will be published after Nightly Docker Release workflow completes, ETA tonight)
Paired-pilot dev pin (immediate use): rocm/atom-dev:atom0.1.4-aiter0.1.15
Both ship the paired AITER v0.1.15 wheel pre-installed with matching triton 3.6.0 + flydsl 0.1.9.dev599.
Source
git clone --branch v0.1.4 https://github.com/ROCm/ATOMKnown issues
- Qwen3 borderline accuracy (0.8681 vs threshold 0.87): single-question swing within GSM8K 3-shot noise band. Tracking for v0.1.5.
- Kimi accuracy lower than ATOM HEAD upstream (0.9219 vs upstream-current 0.9393): root cause not fully bisected; possibly residual from #1001 DSv4 MTP change interaction with Kimi MoE path. Above threshold but below historical baseline. Will continue investigation in v0.1.5.
Feedback
- ATOM issues: https://github.com/ROCm/ATOM/issues — tag
v0.1.4 - AITER issues (paired): https://github.com/ROCm/aiter/issues — tag
v0.1.15 - Direct: peng.sun@amd.com, lingpeng.jin@amd.com