Skip to content

AITER v0.1.14.post1

Choose a tag to compare

@sunway513 sunway513 released this 05 Jun 15:10

AITER v0.1.14.post1

Patch release on top of v0.1.14 for vLLM downstream bump. Adds one cherry-pick: PR #3304 (mla fp8 qh32 seqlen=1 persistent kernel support on gfx950 — DSv3.2 FP8 MLA decode path).

Plus two CI infra commits to enable manylinux_2_28 wheel rebuild on current builders.

What's in it (delta vs v0.1.14)

0f3c58e6e  ci: pull latest install_triton.sh + aiter-release.yaml from main
76d80cd3f  mla: add fp8 qh32 seqlen=1 persistent kernel support on gfx950 (#3304)
[v0.1.14 baseline at bd0534e96]

Validation

  • DeepSeek-R1-0528 (TP=8, kv_cache_dtype=fp8) on MI355X (gfx950): GSM8K 3-shot flexible-extract = 0.9439 (threshold 0.94, PASS).
  • All 6 wheels installable + import aiter validated on rocm/atom torch 2.10 ABI container.
  • vLLM downstream: ABI compatible with current rocm/vllm-dev:nightly torch 2.10 path (verified by build matrix torch_pin=2.10 for rocm7.0/7.1, torch_pin=2.11 for rocm7.2).

Wheel Matrix

6 wheels for ROCm 7.0 / 7.1 / 7.2 × Python 3.10 / 3.12, manylinux_2_28 ABI. Fat binary covers gfx942 (MI300/MI325X) + gfx950 (MI350/MI355X).

ROCm Python torch ABI Size
7.0 3.10 2.10 ~470 MB
7.0 3.12 2.10 ~470 MB
7.1 3.10 2.10 ~460 MB
7.1 3.12 2.10 ~460 MB
7.2 3.10 2.11 ~455 MB
7.2 3.12 2.11 ~455 MB

Install

pip install https://github.com/ROCm/aiter/releases/download/v0.1.14.post1/<wheel-filename>

Known Issues

pip 26.0.1+ wheel filename parser

pip 26.0.1 (and possibly other 26.x versions) rejects this wheel with Invalid wheel filename (wrong number of parts): 'post1'. The combination of .post1 in the public version and .manylinux.2.28 in the local version segment confuses pip's PEP 491 filename parser.

Workaround: download the wheel, rename to strip the +rocm7.X.manylinux.2.28 local segment, then install:

wget https://github.com/ROCm/aiter/releases/download/v0.1.14.post1/amd_aiter-0.1.14.post1+rocm7.1.manylinux.2.28-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
mv amd_aiter-0.1.14.post1+rocm7.1.manylinux.2.28-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl \
   amd_aiter-0.1.14.post1-cp312-cp312-manylinux_2_28_x86_64.whl
pip install ./amd_aiter-0.1.14.post1-cp312-cp312-manylinux_2_28_x86_64.whl

Pip-installable in older pip (<= 25.x) directly from the URL.

gpt-oss accuracy

Per Doug Lehr's note (ROCM-25517), gpt-oss accuracy on v0.1.14 was below standard. This is not fixed in v0.1.14.post1 — only #3304 was cherry-picked per Richard Li's request to unblock the vLLM bump. gpt-oss fix is targeted for v0.1.15.

#3001 not included

Per Richard's request "#3001 if not already on the 0.1.14 line": #3001 was not on v0.1.14 line. We evaluated cherry-picking it and found it depends on a 7-PR chain including a 1210-line tuner refactor (#3220). Bringing the full chain risks the release window and changes far more than the post1 patch surface should. #3001 will land in v0.1.15. Skipped from post1.

Acknowledgments

  • Richard Li (vLLM team) — surfaced the DSv3.2 FP8 MLA gfx950 blocker and the minimum cherry-pick set
  • Alexios Lyrakis — author of the #3304 mla kernel