[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark by seungrokj · Pull Request #1626 · SemiAnalysisAI/InferenceX

seungrokj · 2026-05-31T03:30:50Z

Summary

Add new benchmark config dsv4-fp4-mi355x-atom-dp for DeepSeek-V4-Pro with DP-attention on MI355X using ATOM
Add new script benchmarks/single_node/dsv4_fp4_mi355x_atom_dp.sh with --enable-dp-attention --gpu-memory-utilization 0.85
Image: rocm/atom-dev:nightly_202605301523 (ATOM upstream run 26690241645, 2026-05-30)
Concurrency range: 64–1024 for both ISL 1024 and 8192

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

ISL	OSL	Conc	InferenceX (tok/s/GPU)	ATOM DP (tok/s/GPU)	Δ%
1024	1024	64	389.30	443.01	+13.8%
1024	1024	128	601.21	774.50	+28.8%
1024	1024	256	880.78	1322.72	+50.2%
1024	1024	512	—	2028.30	—
1024	1024	1024	—	2984.23	—
8192	1024	64	1162.87	1505.66	+29.5%
8192	1024	128	1469.89	2366.74	+61.0%
8192	1024	256	704.73	3404.86	+383.1%
8192	1024	512	—	4196.99	—

Test plan

Verify dsv4_fp4_mi355x_atom_dp.sh starts atom server with --enable-dp-attention --gpu-memory-utilization 0.85
Confirm dsv4-fp4-mi355x-atom-dp config picks up the new script
Run benchmark at conc=64 and conc=256 to confirm throughput matches upstream numbers

🤖 Generated with Claude Code

Note

Medium Risk
Benchmark matrix and ATOM server CLI changes affect throughput sweeps and GPU memory at high concurrency; no production serving or auth paths touched.

Overview
Updates dsv4-fp4-mi355x-atom for DeepSeek-V4-Pro on MI355X: pins rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3, drops the old day-0/OOM commentary, and splits the fixed-seq sweep into TP8 conc 1–64 (no dp-attn) plus TP8 conc 64–1024 with dp-attn: true for both 1k/1k and 8k/1k.

benchmarks/single_node/dsv4_fp4_mi355x_atom.sh now requires DP_ATTENTION, builds server flags from a PARALLEL_ARGS array (-tp only by default; adds --enable-dp-attention, and --enable-expert-parallel when EP > 1), and always passes --kv_cache_dtype fp8 separately instead of tying expert-parallel to KV cache flags.

Documents the change in perf-changelog.yaml for config key dsv4-fp4-mi355x-atom.

^{Reviewed by Cursor Bugbot for commit 51f749f. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add new benchmark config for DeepSeek-V4-Pro with DP-attention enabled on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --enable-dp-attention and --gpu-memory-utilization 0.85. Concurrency range 64-1024. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-31T03:30:57Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-05-31T03:37:13Z

+- config-keys:
+    - dsv4-fp4-mi355x-atom-dp
+  description:
+    - "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024"
+  pr-link: TBD


🟡 The new perf-changelog entry uses pr-link: TBD as a placeholder instead of a real PR URL. Every other entry in the file follows the https://github.com/SemiAnalysisAI/InferenceX/pull/N convention (see lines 3313, 3319, 3325, 3331, 3337 directly above). Replace TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1626 before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml line 3343 adds a new entry whose pr-link field is the literal placeholder string TBD rather than a real PR URL:

- config-keys: - dsv4-fp4-mi355x-atom-dp description: - "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024" pr-link: TBD

Why this breaks convention

Every other pr-link entry in perf-changelog.yaml uses a real GitHub PR URL of the form https://github.com/SemiAnalysisAI/InferenceX/pull/N. Grepping the file shows exactly one pr-link: TBD — this new line — out of 422 pr-link entries. The four immediately preceding entries (lines 3313, 3319, 3325, 3331, 3337) all use proper URLs, including the one for PR #1616 just above.

Why existing code doesn't prevent it

pr-link is a free-form YAML string field. There appears to be no schema validation requiring it to match a URL pattern, so the placeholder parses fine and would ship to main unchanged.

Impact

Low — the YAML is still valid and the description text still records what was added. The only damage is to changelog hygiene: a reader of perf-changelog.yaml who wants to jump to the PR context for the dsv4-fp4-mi355x-atom-dp entry will see TBD and have to grep git history instead of clicking through, which is the whole point of the pr-link field.

How to fix

Replace TBD on line 3343 with the actual PR URL. The PR number is known (#1626), so:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1626

Step-by-step proof

Open perf-changelog.yaml and look at lines 3331–3343 (the diff context for this PR).

Line 3337: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1616 — the prior entry follows the convention.

Line 3343: pr-link: TBD — the new entry violates the convention.

Grepping pr-link: across the whole file shows 422 occurrences, of which exactly one (line 3343) is TBD; the rest are proper https://github.com/SemiAnalysisAI/InferenceX/pull/N URLs.

The PR being merged is [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark #1626 (the title and URL in PR metadata confirm this), so the correct URL is https://github.com/SemiAnalysisAI/InferenceX/pull/1626.

Filing as nit since the file is still parseable and the change still records what happened; it's purely a hygiene/cross-reference fix, but trivially worth doing before the merge button is pressed.

- Consolidate the DP-attention and non-DP search spaces under a single dsv4-fp4-mi355x-atom config key using the stable atom0.1.3 image - Delete the standalone dsv4_fp4_mi355x_atom_dp.sh benchmark script (DP-attention now handled by the shared glm5 script pattern) - Update glm5_fp8_mi355x_atom.sh to support DP_ATTENTION flag via PARALLEL_ARGS, enabling dp-attn and expert-parallel combinations - Update perf-changelog.yaml config-key and image reference accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ript - dsv4_fp4_mi355x_atom.sh: replace EP string construction with PARALLEL_ARGS array pattern supporting DP_ATTENTION + EP_SIZE combos - glm5_fp8_mi355x_atom.sh: revert PARALLEL_ARGS back to simple -tp/$EP (glm5 does not use dp-attention) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 51f749f. Configure here.}

cursor · 2026-05-31T03:59:39Z

+    else #DP+TP
+        PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
+    fi
+fi 


Expert parallel dropped without DP

Medium Severity

When DP_ATTENTION is not true, the server is started with only -tp, so --enable-expert-parallel is never passed even if EP_SIZE is greater than 1. The previous logic enabled expert parallel whenever EP_SIZE exceeded 1, independent of DP-attention.

^{Reviewed by Cursor Bugbot for commit 51f749f. Configure here.}

github-actions · 2026-05-31T09:03:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26703436206

seungrokj requested a review from a team May 31, 2026 03:30

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners May 31, 2026 03:30

github-project-automation Bot added this to InferenceMAX Board May 31, 2026

perf-changelog: add pr-link for dsv4-fp4-mi355x-atom-dp

8b0bdf7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml

claude Bot reviewed May 31, 2026

View reviewed changes

seungrokj changed the title ~~Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark~~ ]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026

seungrokj changed the title ~~]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark~~ [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread .github/configs/amd-master.yaml

seungrokj and others added 3 commits May 31, 2026 12:56

glm5_fp8_mi355x_atom.sh: remove trailing whitespace

030ede7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

glm5_fp8_mi355x_atom.sh: use -tp instead of --tp

51f749f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cursor Bot reviewed May 31, 2026

View reviewed changes

seungrokj added the AMD label May 31, 2026

functionstackx added the full-sweep-enabled label May 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626
seungrokj wants to merge 6 commits into
mainfrom
seungrokj/dsv4-fp4-mi355x-atom-dp

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

claude Bot May 31, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seungrokj commented May 31, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

Test plan

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

claude Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 31, 2026

Choose a reason for hiding this comment

Expert parallel dropped without DP

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

seungrokj commented May 31, 2026 •

edited by cursor Bot

Loading