LETS GO AMD!!! by Oseltamivir · Pull Request #1229 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-04-29T18:20:52Z

Refreshed as #1260

DSV4 ATOM optimizer via custom hand-prompted AI codegen AITER kernel

Builds on ROCm/ATOM#650 to bring DeepSeek-V4 (DSv4) FP4 support to ATOM on MI355X, with runtime AITER performance overlays and custom Triton kernels for sparse attention and indexer operations.

Key changes

DSv4 ATOM runtime overlay (dsv4_fp4_mi355x_atom.sh): Patches feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE ROCm/ATOM#650 at launch to give each request persistent DSv4 KV/compressor/indexer cache slots, unblocking CONC>1 serving. Batches attention projections, mHC, and MoE/FFN layer-by-layer across active requests while keeping sparse attention per-sequence.
AITER DSv4 perf stack: Assembles a custom AITER build from upstream main (ROCm/aiter@bb4ea92) plus cherry-picked performance PRs:
- ROCm/aiter#2822 — batched MXFP4 GEMM speedup on gfx950
- ROCm/aiter#2900 — MXFP4 scale padding fix for non-256 K
- ROCm/aiter#2642 — TP=4/8 MXFP4 MoE dispatch fix
- ROCm/aiter#2998 — related upstream AITER work
Custom Triton kernels (from Oseltamivir/aiter@0923d27):
- sparse_mqa_sink — DSv4 sparse MQA sink Triton op replacing the Torch fallback in PR650's sparse_attn_v4.py
- dsv4_indexer — DSv4 Indexer scorer/top-k Triton op with batched API support
- 4×128 sparse-attn tile (configurable via ATOM_DSV4_AITER_SPARSE_ATTN_* env vars) reducing repeated QK score work for D=512
Sweep expansion: CONC range expanded from 1-only to 1–8 (1k1k) and 1–4 (8k1k); TP=4 comparison points added to test whether fewer ranks reduce the cross_device_reduce communication bottleneck observed in profiling (~49% of GPU kernel time at TP=8).
AITER kernel test runner (runners/test_dsv4_aiter_kernels.sh): Standalone test harness for validating the overlaid AITER sparse MQA and indexer Triton kernels on MI355X.
Benchmark infrastructure: Added benchmark_lib.sh helpers for eval-only benchmark mode and DSv4 evaluation support in backend_request_func.py / benchmark_serving.py.

Related upstream work

github-actions · 2026-04-29T18:21:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-01T20:53:30Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25232685281
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25232685281

Removed the 'seq-len-configs' section from the YAML configuration.

github-actions · 2026-05-01T20:58:42Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25232810140
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25232810140

github-actions · 2026-05-02T00:39:55Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25233612426
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25233612426

github-actions · 2026-05-02T02:56:01Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25239290538
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25239290538

github-actions · 2026-05-02T03:37:21Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25242261206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25242261206

github-actions · 2026-05-02T03:49:34Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25242851751
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25242851751

github-actions · 2026-05-02T05:06:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25243060873
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25243060873

DSV4 ATOM optims

1b7a9f9

Oseltamivir requested a review from a team April 29, 2026 18:20

github-project-automation Bot added this to InferenceMAX Board Apr 29, 2026

Oseltamivir added the sweep-enabled label Apr 29, 2026

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

Oseltamivir removed the sweep-enabled label Apr 29, 2026

Oseltamivir and others added 2 commits April 29, 2026 15:20

flydsl

25cc815

Merge branch 'main' into DSV4-ATOM

aef733b

Oseltamivir added the sweep-enabled label Apr 29, 2026

higher conc

07f02d2

Oseltamivir requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners April 30, 2026 00:03

Oseltamivir and others added 4 commits April 29, 2026 21:09

higher conc optims

21c7d87

Merge branch 'main' into DSV4-ATOM

7a03b84

Optimize DSv4 ATOM profiling and decode batching

ce4cb44

Merge remote-tracking branch 'origin/DSV4-ATOM' into DSV4-ATOM

511adb8

Oseltamivir force-pushed the DSV4-ATOM branch from 2c0a0b0 to 511adb8 Compare April 30, 2026 14:18

Oseltamivir and others added 7 commits April 30, 2026 11:26

Constrain DSv4 ATOM profile window

6f4600a

atom

c3229d3

Merge branch 'main' into DSV4-ATOM

4f4063c

Add DSv4 ATOM TP4 comparison

59ff44e

Rebase DSv4 ATOM overlay on PR650 head

3579961

Retile DSv4 ATOM sparse attention

df0c152

Merge branch 'main' into DSV4-ATOM

b8732a4

Oseltamivir mentioned this pull request May 1, 2026

Dsv4 sparse indexer ROCm/aiter#2998

Open

1 task

eval

f00e5b7

Remove seq-len-configs from amd-master.yaml

6c22ca8

Removed the 'seq-len-configs' section from the YAML configuration.

Oseltamivir and others added 2 commits May 1, 2026 14:17

Pin DSv4 AITER overlay by commit

2ee7ace

Merge branch 'main' into DSV4-ATOM

0064f7f

SemiAnalysisAI deleted a comment from github-actions Bot May 1, 2026

single eval

5b4510a

Oseltamivir added 2 commits May 1, 2026 20:05

fix(atom): route dsv4 through v4 metadata

30c9702

fix(atom): allocate dsv4 cache slots by architecture

57aea52

fix(atom): accept dsv4 architecture in cache guard

d3dfba0

Oseltamivir added 2 commits May 1, 2026 22:07

fix(atom): shorten dsv4 smoke runs

41040bb

chore(atom): rerun shortened dsv4 smoke

7af5fa9

Oseltamivir added sweep-enabled and removed sweep-enabled labels May 2, 2026

Oseltamivir mentioned this pull request May 2, 2026

Clean up DSv4 ATOM AITER PR2998 overlay #1260

Open

1 task

Oseltamivir changed the title ~~DSV4 ATOM optimizer via custom hand prompted AI codegen AITER kernel~~ LETS GO AMD!!! May 2, 2026

Oseltamivir closed this May 2, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LETS GO AMD!!!#1229

LETS GO AMD!!!#1229
Oseltamivir wants to merge 34 commits intomainfrom
DSV4-ATOM

Oseltamivir commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Oseltamivir commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refreshed as #1260

DSV4 ATOM optimizer via custom hand-prompted AI codegen AITER kernel

Key changes

Related upstream work

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Oseltamivir commented Apr 29, 2026 •

edited

Loading