Update gptoss-fp4-b200-vllm vLLM image to v0.21.0#1394
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25956494012 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25956494072 |
| - gptoss-fp4-b200-vllm | ||
| description: | ||
| - "Update vLLM image from v0.15.1 to v0.20.2" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333 |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry for the gptoss-fp4-b200-vllm vLLM image update has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333, but this PR is #1394. This looks like a copy-paste error — update the link to /pull/1394 so the changelog correctly attributes the change.
Extended reasoning...
What the bug is. The diff adds a new entry to perf-changelog.yaml (lines 2458–2462) documenting the gptoss-fp4-b200-vllm image update, but its pr-link is https://github.com/SemiAnalysisAI/InferenceX/pull/1333. This PR is #1394, so the changelog entry permanently points at the wrong PR.
The code path that triggers it. The added block is:
- config-keys:
- gptoss-fp4-b200-vllm
description:
- "Update vLLM image from v0.15.1 to v0.20.2"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333Why existing convention doesn't prevent it. Every neighboring entry in perf-changelog.yaml follows the convention that the pr-link references the PR that introduces the change — e.g. the immediately-preceding entry uses /pull/1329 and the following glm5-fp4-b200-sglang-mtp entry uses its own PR number. pr-link is free-form text in the YAML, so nothing automated will reject #1333 even though it's the wrong PR.
Impact. Once merged, the changelog entry will forever point at PR #1333 instead of the actual PR that landed the v0.15.1 → v0.20.2 update. This breaks the traceability/attribution that the changelog exists to provide: anyone clicking through to investigate this image bump (regression hunt, performance comparison, audit of which versions were tested when) will land on an unrelated PR. It also makes future cherry-picking, reverting, or bisecting harder because the documented PR doesn't actually contain the change. There is no runtime impact — this is a metadata/documentation accuracy issue.
How to fix. Change line 2462 from:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333
to:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1394
Step-by-step proof.
- The PR metadata states this is PR Update gptoss-fp4-b200-vllm vLLM image to v0.21.0 #1394 (
<pr number="1394">). - The diff in
perf-changelog.yamladds exactly one new entry, withpr-linkending in/pull/1333. - Every surrounding entry in the file uses a distinct PR number matching the PR that introduced it (e.g. preceding entry uses
#1329, and recent merged work like the GLM-5 fp8 change (#1375) uses#1375). - Therefore the convention dictates this new entry should reference
#1394, not#1333. #1333is unrelated to this image bump, so leaving it as-is misattributes the change in the changelog history.
Note: the diff also shows the YAML image being changed to v0.20.2 while the PR title says v0.21.0 — that's a separate concern; this report only covers the wrong pr-link.
|
|
||
| gptoss-fp4-b200-vllm: | ||
| image: vllm/vllm-openai:v0.15.1 | ||
| image: vllm/vllm-openai:v0.20.2 |
There was a problem hiding this comment.
🟡 The PR title and description say the vLLM image is being updated to v0.21.0, but the actual YAML change in .github/configs/nvidia-master.yaml (line 3972) bumps it to vllm/vllm-openai:v0.20.2, and the perf-changelog.yaml entry also says v0.20.2. Please reconcile by either bumping the image tag to v0.21.0 (matching the title/description) or updating the PR title/description/Ref to v0.20.2.
Extended reasoning...
What the bug is
The PR title is "Update gptoss-fp4-b200-vllm vLLM image to v0.21.0" and the description states it "Updates the vLLM image tag for gptoss-fp4-b200-vllm from v0.15.1 to v0.21.0". However, the actual diff updates the image to a different version:
.github/configs/nvidia-master.yamlline 3972:image: vllm/vllm-openai:v0.20.2perf-changelog.yamlline 2460:"Update vLLM image from v0.15.1 to v0.20.2"
The two code-side artifacts are internally consistent at v0.20.2, but they disagree with the PR metadata (title + description), which both claim v0.21.0.
Step-by-step proof
- Read the PR title:
Update gptoss-fp4-b200-vllm vLLM image to v0.21.0→ claims target = v0.21.0. - Read the PR description:
Updates the vLLM image tag for gptoss-fp4-b200-vllm from v0.15.1 to v0.21.0→ claims target = v0.21.0. - Read the diff hunk in
.github/configs/nvidia-master.yaml:Actual new tag = v0.20.2 (not v0.21.0).gptoss-fp4-b200-vllm: - image: vllm/vllm-openai:v0.15.1 + image: vllm/vllm-openai:v0.20.2
- Read the diff hunk in
perf-changelog.yaml:Changelog also reports v0.20.2.- config-keys: - gptoss-fp4-b200-vllm description: - "Update vLLM image from v0.15.1 to v0.20.2"
- Result: when this PR merges, the deployed/benchmarked image will be
vllm/vllm-openai:v0.20.2, notv0.21.0as the title claims.
Impact
The code side is internally consistent (YAML + changelog both v0.20.2), so runtime behavior is well-defined — this is a metadata/communication defect, not a runtime defect. However, this PR carries the full-sweep-enabled label, which triggers a full performance-benchmark sweep for this config. Reviewers and downstream consumers reading the title or changelog/release notes will be misled about which vLLM version was actually benchmarked, and the perf delta will be attributed to the wrong version. It also makes the linked tracking issue (Ref #1154) ambiguous about which version was actually shipped.
How to fix
The author needs to decide which side is canonical and align the other:
- If v0.21.0 was the intent: change
.github/configs/nvidia-master.yaml:3972toimage: vllm/vllm-openai:v0.21.0and update theperf-changelog.yamldescription string to"Update vLLM image from v0.15.1 to v0.21.0". - If v0.20.2 was the intent: update the PR title to
Update gptoss-fp4-b200-vllm vLLM image to v0.20.2and amend the description accordingly. The code does not need to change in this case.
Severity rationale
Filing as nit because the YAML and changelog are internally consistent — the deployment will not be broken, only the human-facing metadata is wrong. But it should be reconciled before merge so the benchmark results are correctly attributed.
…-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
9748d74 to
2813d07
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25983977548 |
# Conflicts: # perf-changelog.yaml
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984515141 |
The earlier rebase silently dropped trailing whitespace from two unrelated entries (PRs #1311, #1322). The 'no deletions in perf-changelog' policy treats whitespace changes as deletions and failed setup. Rebuild perf-changelog by checking out main's exact bytes and re-appending only the PR #1394 entry.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984551207 |
|
/reuse-sweep-run |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25998515577 |
Summary
gptoss-fp4-b200-vllmfrom v0.15.1 to v0.21.0.Ref #1154
Generated with Claude Code