[AMD/ROCM] atom qwen3.5 fp4 on mi355x by seungrokj · Pull Request #1133 · SemiAnalysisAI/InferenceX

seungrokj · 2026-04-24T06:51:22Z

Summary

Add Qwen3.5-397B-A17B MXFP4 single-node MI355X ATOM benchmark config and launch script
Image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post
TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths
--gpu-memory-utilization 0.9 set in server launch

Test plan

CI benchmark runs pass for qwen3.5-fp4-mi355x-atom with TP=2 and TP=4

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-24T06:51:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-04-24T06:56:18Z

+    - "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"
+    - "Add --gpu-memory-utilization 0.9 to server launch"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/


🟡 Two metadata issues in the new qwen3.5-fp4-mi355x-atom changelog entry (nit). (1) perf-changelog.yaml:1744 has an empty pr-link ending in /pull/ with no PR number — should be /pull/1133. (2) perf-changelog.yaml:1742 says "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths", but only TP=2 sweeps 4-256; TP=4 actually caps at 16 in amd-master.yaml. Both look like copy-paste from the sibling glm5.1 entry without full adjustment.

Extended reasoning...

1. Empty pr-link

The new changelog entry added at lines 1737-1744 of perf-changelog.yaml ends with:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/

This resolves to the PR index page, not to any specific PR, defeating the traceability purpose of the changelog. Every other entry in the file either uses a real PR number (e.g. the directly adjacent glm5.1-fp4-mi355x-atom entry on line 1735 uses /pull/1043, and the preceding qwen3.5-fp8-mi355x-atom entry on line 1734 uses /pull/1040) or a clear placeholder (XXX, XXXX, TBD). An empty suffix is unique in this file and is the only style that yields a literally broken URL. Fix: replace the trailing slash with /pull/1133. The pydantic ChangelogEntry.pr_link field in utils/matrix_logic/validation.py is typed as plain str and does not validate URL well-formedness, so this passes schema checks silently.

2. Inaccurate TP=4 concurrency range in description

Line 1742 reads:

- "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"

but the actual search-space added to .github/configs/amd-master.yaml is:

- isl: 1024 osl: 1024 search-space: - { tp: 2, conc-start: 4, conc-end: 256 } - { tp: 4, conc-start: 4, conc-end: 16 } - isl: 8192 osl: 1024 search-space: - { tp: 2, conc-start: 4, conc-end: 256 } - { tp: 4, conc-start: 4, conc-end: 16 }

TP=4 sweeps concurrency 4-16 only (not 4-256). The sentence overstates TP=4 coverage by 16x — it's a direct copy-paste from the sibling glm5.1-fp4-mi355x-atom description (line 1733 in this PR, pr-link 1043), where the glm5.1 config legitimately sweeps TP=4 up to conc 256. Suggested rewording: "TP=2 concurrency 4-256, TP=4 concurrency 4-16, for 1k1k and 8k1k sequence lengths".

Impact and severity

Both issues are documentation/metadata only — the YAML that actually drives benchmarks is internally consistent and the runtime is unaffected. Verifiers unanimously graded both as nit. However, they should be fixed before merge because: (a) the empty pull/ URL is a traceability regression that cannot be detected by current schema validation, and (b) the description actively misleads readers about what the benchmark covers.

Step-by-step proof for issue (2)

amd-master.yaml lines 316-322/317-326 (post-merge) define the qwen3.5-fp4-mi355x-atom search-space: for both isl: 1024/osl: 1024 and isl: 8192/osl: 1024, TP=2 has conc-start: 4, conc-end: 256 and TP=4 has conc-start: 4, conc-end: 16.

The search-space expansion (concurrency multiplies by 2 in the usual 4, 8, 16, 32, ... sweep) therefore yields TP=4 points {4, 8, 16} only — no TP=4 runs at conc 32, 64, 128, or 256.

The description string at line 1742 implies TP=4 is swept identically to TP=2 ("TP=2 and TP=4, concurrency 4-256"), which is false for this config.

Compare with the sibling qwen3.5-fp4-mi355x-sglang entry earlier in the file (no changelog entry in this PR, but identical shape): same TP=2 4-256, TP=4 4-16 split. The source of the copy-paste is the glm5.1 entry (line 1733) whose description reuses the same sentence but whose actual search-space is { tp: 4, conc-start: 4, conc-end: 256 }, making the sentence correct there.

Conclusion: Both items are real metadata defects; severity is nit.

seungrokj · 2026-04-24T08:26:34Z

e2e perf/accuracy
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24876549878

seungrokj · 2026-04-25T04:41:55Z

@functionstackx @cquil11 @billishyahao @chunfangamd plz approve this

functionstackx · 2026-04-25T05:12:10Z

@functionstackx @cquil11 @billishyahao @chunfangamd plz approve this

@Oseltamivir can u approve and merge this during downtime in between ur dsv4 mi355 runs

[AMD/ROCM] atom qwen3.5 fp4 on mi355x

9bf2a57

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj requested a review from a team April 24, 2026 06:51

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners April 24, 2026 06:51

github-project-automation Bot added this to InferenceMAX Board Apr 24, 2026

Add PR link to perf-changelog for qwen3.5 fp4 atom config

bb9d95f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seungrokj added the AMD label Apr 24, 2026

claude Bot reviewed Apr 24, 2026

View reviewed changes

Merge branch 'main' into srok/atom_qwen3.5_fp4

481c68c

Merge branch 'main' into srok/atom_qwen3.5_fp4

7eb6ad1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD/ROCM] atom qwen3.5 fp4 on mi355x#1133

[AMD/ROCM] atom qwen3.5 fp4 on mi355x#1133
seungrokj wants to merge 4 commits intomainfrom
srok/atom_qwen3.5_fp4

seungrokj commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

seungrokj commented Apr 25, 2026

Uh oh!

functionstackx commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

seungrokj commented Apr 24, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

1. Empty pr-link

2. Inaccurate TP=4 concurrency range in description

Impact and severity

Step-by-step proof for issue (2)

Uh oh!

seungrokj commented Apr 24, 2026

Uh oh!

seungrokj commented Apr 25, 2026

Uh oh!

functionstackx commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Empty `pr-link`