AITER v0.1.14.post1
AITER v0.1.14.post1
Patch release on top of v0.1.14 for vLLM downstream bump. Adds one cherry-pick: PR #3304 (mla fp8 qh32 seqlen=1 persistent kernel support on gfx950 — DSv3.2 FP8 MLA decode path).
Plus two CI infra commits to enable manylinux_2_28 wheel rebuild on current builders.
What's in it (delta vs v0.1.14)
0f3c58e6e ci: pull latest install_triton.sh + aiter-release.yaml from main
76d80cd3f mla: add fp8 qh32 seqlen=1 persistent kernel support on gfx950 (#3304)
[v0.1.14 baseline at bd0534e96]
Validation
- DeepSeek-R1-0528 (TP=8, kv_cache_dtype=fp8) on MI355X (gfx950): GSM8K 3-shot flexible-extract = 0.9439 (threshold 0.94, PASS).
- All 6 wheels installable +
import aitervalidated on rocm/atom torch 2.10 ABI container. - vLLM downstream: ABI compatible with current
rocm/vllm-dev:nightlytorch 2.10 path (verified by build matrix torch_pin=2.10 for rocm7.0/7.1, torch_pin=2.11 for rocm7.2).
Wheel Matrix
6 wheels for ROCm 7.0 / 7.1 / 7.2 × Python 3.10 / 3.12, manylinux_2_28 ABI. Fat binary covers gfx942 (MI300/MI325X) + gfx950 (MI350/MI355X).
| ROCm | Python | torch ABI | Size |
|---|---|---|---|
| 7.0 | 3.10 | 2.10 | ~470 MB |
| 7.0 | 3.12 | 2.10 | ~470 MB |
| 7.1 | 3.10 | 2.10 | ~460 MB |
| 7.1 | 3.12 | 2.10 | ~460 MB |
| 7.2 | 3.10 | 2.11 | ~455 MB |
| 7.2 | 3.12 | 2.11 | ~455 MB |
Install
pip install https://github.com/ROCm/aiter/releases/download/v0.1.14.post1/<wheel-filename>Known Issues
pip 26.0.1+ wheel filename parser
pip 26.0.1 (and possibly other 26.x versions) rejects this wheel with Invalid wheel filename (wrong number of parts): 'post1'. The combination of .post1 in the public version and .manylinux.2.28 in the local version segment confuses pip's PEP 491 filename parser.
Workaround: download the wheel, rename to strip the +rocm7.X.manylinux.2.28 local segment, then install:
wget https://github.com/ROCm/aiter/releases/download/v0.1.14.post1/amd_aiter-0.1.14.post1+rocm7.1.manylinux.2.28-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
mv amd_aiter-0.1.14.post1+rocm7.1.manylinux.2.28-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl \
amd_aiter-0.1.14.post1-cp312-cp312-manylinux_2_28_x86_64.whl
pip install ./amd_aiter-0.1.14.post1-cp312-cp312-manylinux_2_28_x86_64.whlPip-installable in older pip (<= 25.x) directly from the URL.
gpt-oss accuracy
Per Doug Lehr's note (ROCM-25517), gpt-oss accuracy on v0.1.14 was below standard. This is not fixed in v0.1.14.post1 — only #3304 was cherry-picked per Richard Li's request to unblock the vLLM bump. gpt-oss fix is targeted for v0.1.15.
#3001 not included
Per Richard's request "#3001 if not already on the 0.1.14 line": #3001 was not on v0.1.14 line. We evaluated cherry-picking it and found it depends on a 7-PR chain including a 1210-line tuner refactor (#3220). Bringing the full chain risks the release window and changes far more than the post1 patch surface should. #3001 will land in v0.1.15. Skipped from post1.
Acknowledgments
- Richard Li (vLLM team) — surfaced the DSv3.2 FP8 MLA gfx950 blocker and the minimum cherry-pick set
- Alexios Lyrakis — author of the #3304 mla kernel