Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| - "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths" | ||
| - "Add --gpu-memory-utilization 0.9 to server launch" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ |
There was a problem hiding this comment.
🟡 Two metadata issues in the new qwen3.5-fp4-mi355x-atom changelog entry (nit). (1) perf-changelog.yaml:1744 has an empty pr-link ending in /pull/ with no PR number — should be /pull/1133. (2) perf-changelog.yaml:1742 says "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths", but only TP=2 sweeps 4-256; TP=4 actually caps at 16 in amd-master.yaml. Both look like copy-paste from the sibling glm5.1 entry without full adjustment.
Extended reasoning...
1. Empty pr-link
The new changelog entry added at lines 1737-1744 of perf-changelog.yaml ends with:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/This resolves to the PR index page, not to any specific PR, defeating the traceability purpose of the changelog. Every other entry in the file either uses a real PR number (e.g. the directly adjacent glm5.1-fp4-mi355x-atom entry on line 1735 uses /pull/1043, and the preceding qwen3.5-fp8-mi355x-atom entry on line 1734 uses /pull/1040) or a clear placeholder (XXX, XXXX, TBD). An empty suffix is unique in this file and is the only style that yields a literally broken URL. Fix: replace the trailing slash with /pull/1133. The pydantic ChangelogEntry.pr_link field in utils/matrix_logic/validation.py is typed as plain str and does not validate URL well-formedness, so this passes schema checks silently.
2. Inaccurate TP=4 concurrency range in description
Line 1742 reads:
- "TP=2 and TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths"but the actual search-space added to .github/configs/amd-master.yaml is:
- isl: 1024
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }
- isl: 8192
osl: 1024
search-space:
- { tp: 2, conc-start: 4, conc-end: 256 }
- { tp: 4, conc-start: 4, conc-end: 16 }TP=4 sweeps concurrency 4-16 only (not 4-256). The sentence overstates TP=4 coverage by 16x — it's a direct copy-paste from the sibling glm5.1-fp4-mi355x-atom description (line 1733 in this PR, pr-link 1043), where the glm5.1 config legitimately sweeps TP=4 up to conc 256. Suggested rewording: "TP=2 concurrency 4-256, TP=4 concurrency 4-16, for 1k1k and 8k1k sequence lengths".
Impact and severity
Both issues are documentation/metadata only — the YAML that actually drives benchmarks is internally consistent and the runtime is unaffected. Verifiers unanimously graded both as nit. However, they should be fixed before merge because: (a) the empty pull/ URL is a traceability regression that cannot be detected by current schema validation, and (b) the description actively misleads readers about what the benchmark covers.
Step-by-step proof for issue (2)
amd-master.yamllines 316-322/317-326 (post-merge) define theqwen3.5-fp4-mi355x-atomsearch-space: for bothisl: 1024/osl: 1024andisl: 8192/osl: 1024, TP=2 hasconc-start: 4, conc-end: 256and TP=4 hasconc-start: 4, conc-end: 16.- The search-space expansion (concurrency multiplies by 2 in the usual 4, 8, 16, 32, ... sweep) therefore yields TP=4 points
{4, 8, 16}only — no TP=4 runs at conc 32, 64, 128, or 256. - The description string at line 1742 implies TP=4 is swept identically to TP=2 ("TP=2 and TP=4, concurrency 4-256"), which is false for this config.
- Compare with the sibling
qwen3.5-fp4-mi355x-sglangentry earlier in the file (no changelog entry in this PR, but identical shape): same TP=2 4-256, TP=4 4-16 split. The source of the copy-paste is the glm5.1 entry (line 1733) whose description reuses the same sentence but whose actual search-space is{ tp: 4, conc-start: 4, conc-end: 256 }, making the sentence correct there.
Conclusion: Both items are real metadata defects; severity is nit.
|
e2e perf/accuracy |
|
@functionstackx @cquil11 @billishyahao @chunfangamd plz approve this |
@Oseltamivir can u approve and merge this during downtime in between ur dsv4 mi355 runs |
Summary
rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post--gpu-memory-utilization 0.9set in server launchTest plan
qwen3.5-fp4-mi355x-atomwith TP=2 and TP=4🤖 Generated with Claude Code