Skip to content

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626

Open
seungrokj wants to merge 6 commits into
mainfrom
seungrokj/dsv4-fp4-mi355x-atom-dp
Open

[AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark#1626
seungrokj wants to merge 6 commits into
mainfrom
seungrokj/dsv4-fp4-mi355x-atom-dp

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented May 31, 2026

Summary

  • Add new benchmark config dsv4-fp4-mi355x-atom-dp for DeepSeek-V4-Pro with DP-attention on MI355X using ATOM
  • Add new script benchmarks/single_node/dsv4_fp4_mi355x_atom_dp.sh with --enable-dp-attention --gpu-memory-utilization 0.85
  • Image: rocm/atom-dev:nightly_202605301523 (ATOM upstream run 26690241645, 2026-05-30)
  • Concurrency range: 64–1024 for both ISL 1024 and 8192

Performance vs current InferenceX (dsv4-fp4-mi355x-atom, nightly_202605130853)

ISL OSL Conc InferenceX (tok/s/GPU) ATOM DP (tok/s/GPU) Δ%
1024 1024 64 389.30 443.01 +13.8%
1024 1024 128 601.21 774.50 +28.8%
1024 1024 256 880.78 1322.72 +50.2%
1024 1024 512 2028.30
1024 1024 1024 2984.23
8192 1024 64 1162.87 1505.66 +29.5%
8192 1024 128 1469.89 2366.74 +61.0%
8192 1024 256 704.73 3404.86 +383.1%
8192 1024 512 4196.99

Test plan

  • Verify dsv4_fp4_mi355x_atom_dp.sh starts atom server with --enable-dp-attention --gpu-memory-utilization 0.85
  • Confirm dsv4-fp4-mi355x-atom-dp config picks up the new script
  • Run benchmark at conc=64 and conc=256 to confirm throughput matches upstream numbers

🤖 Generated with Claude Code


Note

Medium Risk
Benchmark matrix and ATOM server CLI changes affect throughput sweeps and GPU memory at high concurrency; no production serving or auth paths touched.

Overview
Updates dsv4-fp4-mi355x-atom for DeepSeek-V4-Pro on MI355X: pins rocm/atom:rocm7.2.4_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.3, drops the old day-0/OOM commentary, and splits the fixed-seq sweep into TP8 conc 1–64 (no dp-attn) plus TP8 conc 64–1024 with dp-attn: true for both 1k/1k and 8k/1k.

benchmarks/single_node/dsv4_fp4_mi355x_atom.sh now requires DP_ATTENTION, builds server flags from a PARALLEL_ARGS array (-tp only by default; adds --enable-dp-attention, and --enable-expert-parallel when EP > 1), and always passes --kv_cache_dtype fp8 separately instead of tying expert-parallel to KV cache flags.

Documents the change in perf-changelog.yaml for config key dsv4-fp4-mi355x-atom.

Reviewed by Cursor Bugbot for commit 51f749f. Bugbot is set up for automated code reviews on this repo. Configure here.

Add new benchmark config for DeepSeek-V4-Pro with DP-attention
enabled on MI355X using ATOM. Uses image
rocm/atom-dev:nightly_202605301523 with --enable-dp-attention
and --gpu-memory-utilization 0.85. Concurrency range 64-1024.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread .github/configs/amd-master.yaml
Comment thread perf-changelog.yaml Outdated
Comment on lines +3339 to +3343
- config-keys:
- dsv4-fp4-mi355x-atom-dp
description:
- "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024"
pr-link: TBD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new perf-changelog entry uses pr-link: TBD as a placeholder instead of a real PR URL. Every other entry in the file follows the https://github.com/SemiAnalysisAI/InferenceX/pull/N convention (see lines 3313, 3319, 3325, 3331, 3337 directly above). Replace TBD with https://github.com/SemiAnalysisAI/InferenceX/pull/1626 before merging.

Extended reasoning...

What the bug is

perf-changelog.yaml line 3343 adds a new entry whose pr-link field is the literal placeholder string TBD rather than a real PR URL:

- config-keys:
    - dsv4-fp4-mi355x-atom-dp
  description:
    - "Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark; image rocm/atom-dev:nightly_202605301523, concurrency 64-1024"
  pr-link: TBD

Why this breaks convention

Every other pr-link entry in perf-changelog.yaml uses a real GitHub PR URL of the form https://github.com/SemiAnalysisAI/InferenceX/pull/N. Grepping the file shows exactly one pr-link: TBD — this new line — out of 422 pr-link entries. The four immediately preceding entries (lines 3313, 3319, 3325, 3331, 3337) all use proper URLs, including the one for PR #1616 just above.

Why existing code doesn't prevent it

pr-link is a free-form YAML string field. There appears to be no schema validation requiring it to match a URL pattern, so the placeholder parses fine and would ship to main unchanged.

Impact

Low — the YAML is still valid and the description text still records what was added. The only damage is to changelog hygiene: a reader of perf-changelog.yaml who wants to jump to the PR context for the dsv4-fp4-mi355x-atom-dp entry will see TBD and have to grep git history instead of clicking through, which is the whole point of the pr-link field.

How to fix

Replace TBD on line 3343 with the actual PR URL. The PR number is known (#1626), so:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1626

Step-by-step proof

  1. Open perf-changelog.yaml and look at lines 3331–3343 (the diff context for this PR).
  2. Line 3337: pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1616 — the prior entry follows the convention.
  3. Line 3343: pr-link: TBD — the new entry violates the convention.
  4. Grepping pr-link: across the whole file shows 422 occurrences, of which exactly one (line 3343) is TBD; the rest are proper https://github.com/SemiAnalysisAI/InferenceX/pull/N URLs.
  5. The PR being merged is [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark #1626 (the title and URL in PR metadata confirm this), so the correct URL is https://github.com/SemiAnalysisAI/InferenceX/pull/1626.

Filing as nit since the file is still parseable and the change still records what happened; it's purely a hygiene/cross-reference fix, but trivially worth doing before the merge button is pressed.

- Consolidate the DP-attention and non-DP search spaces under a single
  dsv4-fp4-mi355x-atom config key using the stable atom0.1.3 image
- Delete the standalone dsv4_fp4_mi355x_atom_dp.sh benchmark script
  (DP-attention now handled by the shared glm5 script pattern)
- Update glm5_fp8_mi355x_atom.sh to support DP_ATTENTION flag via
  PARALLEL_ARGS, enabling dp-attn and expert-parallel combinations
- Update perf-changelog.yaml config-key and image reference accordingly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@seungrokj seungrokj changed the title Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark ]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026
@seungrokj seungrokj changed the title ]AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark [AMD] Add DeepSeek-V4-Pro FP4 MI355X ATOM DP-attention benchmark May 31, 2026
Comment thread .github/configs/amd-master.yaml
seungrokj and others added 3 commits May 31, 2026 12:56
…ript

- dsv4_fp4_mi355x_atom.sh: replace EP string construction with
  PARALLEL_ARGS array pattern supporting DP_ATTENTION + EP_SIZE combos
- glm5_fp8_mi355x_atom.sh: revert PARALLEL_ARGS back to simple -tp/$EP
  (glm5 does not use dp-attention)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 51f749f. Configure here.

else #DP+TP
PARALLEL_ARGS=(-tp "$TP" --enable-dp-attention )
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert parallel dropped without DP

Medium Severity

When DP_ATTENTION is not true, the server is started with only -tp, so --enable-expert-parallel is never passed even if EP_SIZE is greater than 1. The previous logic enabled expert parallel whenever EP_SIZE exceeded 1, independent of DP-attention.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 51f749f. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants