Conversation
…ization Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
to align with the existing PR #1009 used glm5 instead of glm5.1 (same architecture) |
| - "Image: rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post" | ||
| - "Add TP=4, concurrency 4-256 for 1k1k and 8k1k sequence lengths" | ||
| - "Add --gpu-memory-utilization 0.9 to server launch" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/ |
There was a problem hiding this comment.
🟡 The new glm5-fp8-mi355x-atom entry in perf-changelog.yaml has pr-link 'https://github.com/SemiAnalysisAI/InferenceX/pull/' with no PR number after the trailing slash. Every other entry uses an actual PR number or a placeholder like 'XXX'/'XXXX'/'TBD', so this reads as a copy-paste omission that yields a broken URL. Please append '1126' (this PR's number).
Extended reasoning...
What the bug is
The final entry added to perf-changelog.yaml (lines 1735-1744) describes the GLM-5 FP8 MI355X ATOM update, but its pr-link field is truncated:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/Nothing follows /pull/ — not a number, not even a placeholder like XXX/XXXX/TBD.
How it manifests / impact
The resulting URL https://github.com/SemiAnalysisAI/InferenceX/pull/ does not resolve to a specific PR. On GitHub, this bare path redirects to the repository pulls listing (or 404s depending on context) rather than to PR #1126, which defeats the purpose of the field: allowing future readers of the changelog to trace a perf delta back to the PR that caused it.
Why existing code doesn't prevent it
perf-changelog.yaml is free-form YAML with no schema validator enforcing that pr-link ends with a numeric PR id. Grep confirms this is the only entry in the file that ends with a bare pull/ — ~20 other entries use XXX/XXXX/TBD placeholders or real numbers. A follow-up commit (8a1620c Fix perf-changelog entry for GLM5 FP8 atom config) patched the config-keys and description fields in this same block but missed the pr-link.
Step-by-step proof
- Open
perf-changelog.yamland scroll to the last entry (lines 1735-1744):- config-keys: - glm5-fp8-mi355x-atom description: - "Update GLM-5 FP8 MI355X ATOM benchmark: ..." ... pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/
- Line 1744 ends with
pull/— no number. - Pasting that URL into a browser does not route to PR [AMD/ROCM] atom glm5 fp8 on mi355x #1126 (this PR).
- Compare with the preceding entry for
glm5.1-fp4-mi355x-atomon line 1732 which correctly points to.../pull/1043.
How to fix
Append 1126 so the line reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1126This is a trivial one-line fix. Severity is nit — it is a documentation/metadata field in a changelog, not runtime code, but it is caught during review and easy to correct before merge.
|
hi @functionstackx @cquil11 @billishyahao @chunfangamd can you approve this ? |
@Oseltamivir can u approve and merge this in during idle time between ur mi355 dsv4 runs. Dsv4 mi355 is priority in the queue |
Summary
--gpu-memory-utilization 0.9in server launch scriptTest plan
glm5-fp8-mi355x-atomwith TP=4 and TP=8rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.postworks correctly🤖 Generated with Claude Code