[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626
[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626seungrokj wants to merge 6 commits into
Conversation
Add new benchmark config for DeepSeek-V4-Pro with DP-attention enabled on MI355X using ATOM. Uses image rocm/atom-dev:nightly_202605301523 with --enable-dp-attention and --gpu-memory-utilization 0.85. Concurrency range 64-1024. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| - config-keys: | ||
| - dsv4-fp4-mi355x-atom-dp | ||
| description: | ||
| - "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024" | ||
| pr-link: TBD |
There was a problem hiding this comment.
🟡 The new perf-changelog entry uses pr-link: TBD as a placeholder instead of a real PR URL. Every other entry in the file follows the https://github.com/SemiAnalysisAI/InferenceX/pull/N convention (see lines 3313, 3319, 3325, 3331, 3337 directly above). Replace TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1626 before merging.
Extended reasoning...
What the bug is
perf-changelog.yaml line 3343 adds a new entry whose pr-link field is the literal placeholder string TBD rather than a real PR URL:
- config-keys:
- dsv4-fp4-mi355x-atom-dp
description:
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024"
pr-link: TBDWhy this breaks convention
Every other pr-link entry in perf-changelog.yaml uses a real GitHub PR URL of the form https://github.com/SemiAnalysisAI/InferenceX/pull/N. Grepping the file shows exactly one pr-link: TBD — this new line — out of 422 pr-link entries. The four immediately preceding entries (lines 3313, 3319, 3325, 3331, 3337) all use proper URLs, including the one for PR #1616 just above.
Why existing code doesn't prevent it
pr-link is a free-form YAML string field. There appears to be no schema validation requiring it to match a URL pattern, so the placeholder parses fine and would ship to main unchanged.
Impact
Low — the YAML is still valid and the description text still records what was added. The only damage is to changelog hygiene: a reader of perf-changelog.yaml who wants to jump to the PR context for the dsv4-fp4-mi355x-atom-dp entry will see TBD and have to grep git history instead of clicking through, which is the whole point of the pr-link field.
How to fix
Replace TBD on line 3343 with the actual PR URL. The PR number is known (#1626), so:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1626Step-by-step proof
- Open
perf-changelog.yamland look at lines 3331–3343 (the diff context for this PR). - Line 3337:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1616— the prior entry follows the convention. - Line 3343:
pr-link: TBD— the new entry violates the convention. - Grepping
pr-link:across the whole file shows 422 occurrences, of which exactly one (line 3343) isTBD; the rest are properhttps://github.com/SemiAnalysisAI/InferenceX/pull/NURLs. - The PR being merged is [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark #1626 (the title and URL in PR metadata confirm this), so the correct URL is
https://github.com/SemiAnalysisAI/InferenceX/pull/1626.
Filing as nit since the file is still parseable and the change still records what happened; it's purely a hygiene/cross-reference fix, but trivially worth doing before the merge button is pressed.
- Consolidate the DP-attention and non-DP search spaces under a single dsv4-fp4-mi355x-atom config key using the stable atom0.1.3 image - Delete the standalone dsv4_fp4_mi355x_atom_dp.sh benchmark script (DP-attention now handled by the shared glm5 script pattern) - Update glm5_fp8_mi355x_atom.sh to support DP_ATTENTION flag via PARALLEL_ARGS, enabling dp-attn and expert-parallel combinations - Update perf-changelog.yaml config-key and image reference accordingly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ript - dsv4_fp4_mi355x_atom.sh: replace EP string construction with PARALLEL_ARGS array pattern supporting DP_ATTENTION + EP_SIZE combos - glm5_fp8_mi355x_atom.sh: revert PARALLEL_ARGS back to simple -tp/$EP (glm5 does not use dp-attention) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 51f749f. Configure here.
| else #DP+TP | ||
| PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention ) | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Expert parallel dropped without DP
Medium Severity
When DP_ATTENTION is not true, the server is started with only -tp, so --enable-expert-parallel is never passed even if EP_SIZE is greater than 1. The previous logic enabled expert parallel whenever EP_SIZE exceeded 1, independent of DP-attention.
Reviewed by Cursor Bugbot for commit 51f749f. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26703436206 |


Summary
dsv4-fp4-mi355x-atom-dpfor DeepSeek-V4-Pro with DP-attention on MI355X using ATOMbenchmarks/single_node/dsv4_fp4_mi355x_atom_dp.shwith--enable-dp-attention --gpu-memory-utilization 0.85rocm/atom-dev:nightly_202605301523(ATOM upstream run 26690241645, 2026-05-30)Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)
Test plan
dsv4_fp4_mi355x_atom_dp.shstarts atom server with--enable-dp-attention --gpu-memory-utilization 0.85dsv4-fp4-mi355x-atom-dpconfig picks up the new script🤖 Generated with Claude Code
Note
Medium Risk
Benchmark matrix and ATOM server CLI changes affect throughput sweeps and GPU memory at high concurrency; no production serving or auth paths touched.
Overview
Updates
dsv4-fp4-mi355x-atomfor DeepSeek-V4-Pro on MI355X: pinsrocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3, drops the old day-0/OOM commentary, and splits the fixed-seq sweep into TP8 conc 1–64 (nodp-attn) plus TP8 conc 64–1024 withdp-attn: truefor both 1k/1k and 8k/1k.benchmarks/single_node/dsv4_fp4_mi355x_atom.shnow requiresDP_ATTENTION, builds server flags from aPARALLEL_ARGSarray (-tponly by default; adds--enable-dp-attention, and--enable-expert-parallelwhen EP > 1), and always passes--kv_cache_dtype fp8separately instead of tying expert-parallel to KV cache flags.Documents the change in
perf-changelog.yamlfor config keydsv4-fp4-mi355x-atom.Reviewed by Cursor Bugbot for commit 51f749f. Bugbot is set up for automated code reviews on this repo. Configure here.