Update kimik2.5-fp4-mi355x-vllm vLLM ROCm image to v0.21.0#1407
Conversation
Ref #1154 Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - config-keys: | ||
| - kimik2.5-fp4-mi355x-vllm | ||
| description: | ||
| - "Update vLLM ROCm image from v0.18.0 to v0.21.0" | ||
| pr-link: XXX |
There was a problem hiding this comment.
🔴 The new perf-changelog entry sets pr-link: XXX instead of a real PR URL. Every other entry in the file uses the format https://github.com/SemiAnalysisAI/InferenceX/pull/<num>, so this should be https://github.com/SemiAnalysisAI/InferenceX/pull/1407 (or the template form /pull/XXX that utils/merge_with_reuse.sh auto-rewrites). The bare XXX will land in main as a malformed entry and also breaks merge_with_reuse.sh's endswith('/$PR') assertion on the auto-resolution path.
Extended reasoning...
What the bug is
The new entry added to perf-changelog.yaml at lines 2495-2501 sets pr-link: XXX — a bare literal placeholder rather than a GitHub PR URL. Every other entry in the file (100+ entries, e.g. lines 2476, 2482, 2488, 2495) follows the convention pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/<num>, and AGENTS.md (around lines 118-123) documents this format explicitly. The PR description references #1407, so the intended value is https://github.com/SemiAnalysisAI/InferenceX/pull/1407.
Why nothing catches it earlier
The Pydantic schema in utils/matrix_logic/validation.py:483 declares pr_link only as a required str — no URL pattern, no prefix check. So a bare XXX passes schema validation and the PR can land as-is.
Concrete impact on merge_with_reuse.sh
Walking through utils/merge_with_reuse.sh for this exact value:
- The auto-resolution path inspects the new block's
pr-link. Withpr-link: XXX, the check"XXX" in linkis True, so the block is treated as a PR-side contribution that needs rewriting. - The next step does
block.replace("/pull/XXX", f"/pull/{pr}")— but the bare stringXXXcontains no/pull/XXXsubstring, so the replacement is a no-op and the literalXXXsurvives unchanged. - The subsequent assertion
last['pr-link'].endswith(f'/${PR}')then evaluates'XXX'.endswith('/1407')→ False, killing the merge script.
So even ignoring the cosmetic/data-quality issue, this bare placeholder actively breaks the documented auto-rebase tooling.
Downstream tooling
Beyond merge_with_reuse.sh, anything that parses pr-link to render a hyperlink (changelog rendering, dashboards) will produce a broken or non-clickable entry, since XXX is not a URL.
How to fix
Replace the bare XXX with the real PR URL:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1407Alternatively, use the documented template form https://github.com/SemiAnalysisAI/InferenceX/pull/XXX so that merge_with_reuse.sh can substitute the real PR number at merge time.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25956503208 |
|
/reuse-sweep-run |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25976976970 |
Summary
kimik2.5-fp4-mi355x-vllmfrom v0.18.0 to v0.21.0.Ref #1154
Generated with Claude Code