Skip to content

v0.1.4

Latest

Choose a tag to compare

@sunway513 sunway513 released this 06 Jun 20:01
· 117 commits to main since this release

ATOM v0.1.4

First ATOM release in the bi-weekly paired-release cadence with AITER (see cadence proposal). Also the first ATOM release that uses a release-branch workflow (prior versions tagged directly off main).

Same commit as v0.1.4-rc0 (26f23a0b) — zero delta after 3-day RC soak with no partner issues filed against the RC.

Paired AITER version

ATOM v0.1.4 pairs with AITER v0.1.15. The two releases share commit history and were jointly validated 5/5 PASS on GSM8K (mi355-gpu-15).

Commits since v0.1.3

26f23a0  Revert "Qwen3.5-35B-A3B-FP8: GDN decode lossy fast path + fused MRoPE QK (#838)"
f51d7be  Revert "Remove qkv 256 tok limitation (#999)"
e3c97b9  Remove qkv 256 tok limitation (#999)             ← reverted
4c4ae4f  fix(spec_decode): support DP attention with MTP in Deepseek V4 (#1001)
dfb8eda  Qwen3.5-35B-A3B-FP8: GDN decode lossy fast path + fused MRoPE QK (#838) ← reverted
6260cc1  Debug 'no such file or directory benchmark_matrix.json' (#994)
11be15e  [atom-vllm benchmark] refine model case name (#995)
9f9e97b  Debug name 'AWS_P0' is not defined (#991)

Highlights (net effective changes from v0.1.3)

  • fix(spec_decode): DP attention with MTP support in DeepSeek-V4 (#1001)
  • CI/benchmark fixes: AWS_P0 env, model case names, benchmark_matrix.json (#991 / #994 / #995)

Cherry-pick exclusions (intentional)

Two PRs reverted from v0.1.4 during joint validation:

  • #999 (minimax qkv 256 tok limit): caused MiniMax-M2.5 server crash in joint test with AITER v0.1.15 (VllmBackend can only be called once). Will revisit in v0.1.5.
  • #838 (Qwen3.5 GDN + fused MRoPE QK): reverted from v0.1.4 cycle to keep Kimi-K2.5 accuracy unaffected. Will revisit in v0.1.5.

Joint validation (paired with AITER v0.1.15)

GSM8K 3-shot flexible-extract, MI355X (gfx950), ATOM stack on 26f23a0b + AITER wheel 0.1.15rc0+rocm7.1.manylinux.2.28-cp312:

Model Score Threshold Result
DeepSeek-R1-0528 (TP=8, fp8 KV) 0.9537 0.94 PASS
MiniMax-M2.5 (TP=2, fp8 KV) 0.9318 0.92 PASS
Qwen3-235B-A22B-FP8 (TP=8, fp8 KV) 0.8681 0.87 PASS (borderline)
GLM-5-FP8 (TP=8, fp8 KV) 0.9409 0.93 PASS
Kimi-K2.5-MXFP4 (TP=4, fp8 KV) 0.9219 0.92 PASS

5/5 PASS by upstream-canonical thresholds.

Container

Production: rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.4 (will be published after Nightly Docker Release workflow completes, ETA tonight)

Paired-pilot dev pin (immediate use): rocm/atom-dev:atom0.1.4-aiter0.1.15

Both ship the paired AITER v0.1.15 wheel pre-installed with matching triton 3.6.0 + flydsl 0.1.9.dev599.

Source

git clone --branch v0.1.4 https://github.com/ROCm/ATOM

Known issues

  • Qwen3 borderline accuracy (0.8681 vs threshold 0.87): single-question swing within GSM8K 3-shot noise band. Tracking for v0.1.5.
  • Kimi accuracy lower than ATOM HEAD upstream (0.9219 vs upstream-current 0.9393): root cause not fully bisected; possibly residual from #1001 DSv4 MTP change interaction with Kimi MoE path. Above threshold but below historical baseline. Will continue investigation in v0.1.5.

Feedback