Update gptoss-fp4-b200-vllm vLLM image to v0.21.0 by Klaud-Cold · Pull Request #1394 · SemiAnalysisAI/InferenceX

Klaud-Cold · 2026-05-16T07:45:19Z

Summary

Updates the vLLM image tag for gptoss-fp4-b200-vllm from v0.15.1 to v0.21.0.

github-actions · 2026-05-16T07:45:26Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-16T07:45:26Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-16T07:45:26Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-16T07:45:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25956494012
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25956494012

github-actions · 2026-05-16T07:46:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25956494072
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25956494072

claude · 2026-05-16T07:48:39Z

+    - gptoss-fp4-b200-vllm
+  description:
+    - "Update vLLM image from v0.15.1 to v0.20.2"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333


🟡 The new perf-changelog.yaml entry for the gptoss-fp4-b200-vllm vLLM image update has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333, but this PR is #1394. This looks like a copy-paste error — update the link to /pull/1394 so the changelog correctly attributes the change.

Extended reasoning...

What the bug is. The diff adds a new entry to perf-changelog.yaml (lines 2458–2462) documenting the gptoss-fp4-b200-vllm image update, but its pr-link is https://github.com/SemiAnalysisAI/InferenceX/pull/1333. This PR is #1394, so the changelog entry permanently points at the wrong PR.

The code path that triggers it. The added block is:

- config-keys: - gptoss-fp4-b200-vllm description: - "Update vLLM image from v0.15.1 to v0.20.2" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333

Why existing convention doesn't prevent it. Every neighboring entry in perf-changelog.yaml follows the convention that the pr-link references the PR that introduces the change — e.g. the immediately-preceding entry uses /pull/1329 and the following glm5-fp4-b200-sglang-mtp entry uses its own PR number. pr-link is free-form text in the YAML, so nothing automated will reject #1333 even though it's the wrong PR.

Impact. Once merged, the changelog entry will forever point at PR #1333 instead of the actual PR that landed the v0.15.1 → v0.20.2 update. This breaks the traceability/attribution that the changelog exists to provide: anyone clicking through to investigate this image bump (regression hunt, performance comparison, audit of which versions were tested when) will land on an unrelated PR. It also makes future cherry-picking, reverting, or bisecting harder because the documented PR doesn't actually contain the change. There is no runtime impact — this is a metadata/documentation accuracy issue.

How to fix. Change line 2462 from:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1333

to:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1394

Step-by-step proof.

The PR metadata states this is PR Update gptoss-fp4-b200-vllm vLLM image to v0.21.0 #1394 (<pr number="1394">).

The diff in perf-changelog.yaml adds exactly one new entry, with pr-link ending in /pull/1333.

Every surrounding entry in the file uses a distinct PR number matching the PR that introduced it (e.g. preceding entry uses #1329, and recent merged work like the GLM-5 fp8 change (#1375) uses #1375).

Therefore the convention dictates this new entry should reference #1394, not #1333.

#1333 is unrelated to this image bump, so leaving it as-is misattributes the change in the changelog history.

Note: the diff also shows the YAML image being changed to v0.20.2 while the PR title says v0.21.0 — that's a separate concern; this report only covers the wrong pr-link.

claude · 2026-05-16T07:48:39Z


 gptoss-fp4-b200-vllm:
-  image: vllm/vllm-openai:v0.15.1
+  image: vllm/vllm-openai:v0.20.2


🟡 The PR title and description say the vLLM image is being updated to v0.21.0, but the actual YAML change in .github/configs/nvidia-master.yaml (line 3972) bumps it to vllm/vllm-openai:v0.20.2, and the perf-changelog.yaml entry also says v0.20.2. Please reconcile by either bumping the image tag to v0.21.0 (matching the title/description) or updating the PR title/description/Ref to v0.20.2.

Extended reasoning...

What the bug is

The PR title is "Update gptoss-fp4-b200-vllm vLLM image to v0.21.0" and the description states it "Updates the vLLM image tag for gptoss-fp4-b200-vllm from v0.15.1 to v0.21.0". However, the actual diff updates the image to a different version:

.github/configs/nvidia-master.yaml line 3972: image: vllm/vllm-openai:v0.20.2

perf-changelog.yaml line 2460: "Update vLLM image from v0.15.1 to v0.20.2"

The two code-side artifacts are internally consistent at v0.20.2, but they disagree with the PR metadata (title + description), which both claim v0.21.0.

Step-by-step proof

Read the PR title: Update gptoss-fp4-b200-vllm vLLM image to v0.21.0 → claims target = v0.21.0.

Read the PR description: Updates the vLLM image tag for gptoss-fp4-b200-vllm from v0.15.1 to v0.21.0 → claims target = v0.21.0.

Read the diff hunk in .github/configs/nvidia-master.yaml:
gptoss-fp4-b200-vllm: - image: vllm/vllm-openai:v0.15.1 + image: vllm/vllm-openai:v0.20.2
Actual new tag = v0.20.2 (not v0.21.0).

Read the diff hunk in perf-changelog.yaml:
- config-keys: - gptoss-fp4-b200-vllm description: - "Update vLLM image from v0.15.1 to v0.20.2"
Changelog also reports v0.20.2.

Result: when this PR merges, the deployed/benchmarked image will be vllm/vllm-openai:v0.20.2, not v0.21.0 as the title claims.

Impact

The code side is internally consistent (YAML + changelog both v0.20.2), so runtime behavior is well-defined — this is a metadata/communication defect, not a runtime defect. However, this PR carries the full-sweep-enabled label, which triggers a full performance-benchmark sweep for this config. Reviewers and downstream consumers reading the title or changelog/release notes will be misled about which vLLM version was actually benchmarked, and the perf delta will be attributed to the wrong version. It also makes the linked tracking issue (Ref #1154) ambiguous about which version was actually shipped.

How to fix

The author needs to decide which side is canonical and align the other:

If v0.21.0 was the intent: change .github/configs/nvidia-master.yaml:3972 to image: vllm/vllm-openai:v0.21.0 and update the perf-changelog.yaml description string to "Update vLLM image from v0.15.1 to v0.21.0".

If v0.20.2 was the intent: update the PR title to Update gptoss-fp4-b200-vllm vLLM image to v0.20.2 and amend the description accordingly. The code does not need to change in this case.

Severity rationale

Filing as nit because the YAML and changelog are internally consistent — the deployment will not be broken, only the human-facing metadata is wrong. But it should be reconciled before merge so the benchmark results are correctly attributed.

…-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

github-actions · 2026-05-17T06:54:14Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25983977548
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25983977548

# Conflicts: # perf-changelog.yaml

github-actions · 2026-05-17T07:21:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984515141
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25984515141

The earlier rebase silently dropped trailing whitespace from two unrelated entries (PRs #1311, #1322). The 'no deletions in perf-changelog' policy treats whitespace changes as deletions and failed setup. Rebuild perf-changelog by checking out main's exact bytes and re-appending only the PR #1394 entry.

github-actions · 2026-05-17T13:43:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984551207
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25984551207

functionstackx · 2026-05-17T18:01:50Z

/reuse-sweep-run

github-actions · 2026-05-17T18:02:38Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25998515577
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25998515577

Klaud-Cold requested a review from a team May 16, 2026 07:45

Klaud-Cold added the full-sweep-enabled label May 16, 2026

Klaud-Cold requested review from jgangani and kedarpotdar-nv as code owners May 16, 2026 07:45

Klaud-Cold added the full-sweep-enabled label May 16, 2026

github-project-automation Bot added this to InferenceMAX Board May 16, 2026

Klaud-Cold mentioned this pull request May 16, 2026

[Auto] Docker Image Updates Available - 2026-04-25 #1154

Open

claude Bot reviewed May 16, 2026

View reviewed changes

$Update gptoss-fp4-b200-vllm vLLM image to v0.20.2\n\nRef #1154\n\nCo…

2813d07

…-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

functionstackx force-pushed the claude/issue-1154-gptoss-fp4-b200-vllm branch from 9748d74 to 2813d07 Compare May 17, 2026 06:53

Merge remote-tracking branch 'origin/main' into HEAD

1c05fb0

# Conflicts: # perf-changelog.yaml

Merge branch 'main' into claude/issue-1154-gptoss-fp4-b200-vllm

0413184

functionstackx merged commit 958c62d into main May 17, 2026
3 of 5 checks passed

functionstackx deleted the claude/issue-1154-gptoss-fp4-b200-vllm branch May 17, 2026 18:02

github-project-automation Bot moved this to Done in InferenceMAX Board May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update gptoss-fp4-b200-vllm vLLM image to v0.21.0#1394

Update gptoss-fp4-b200-vllm vLLM image to v0.21.0#1394
functionstackx merged 4 commits into
mainfrom
claude/issue-1154-gptoss-fp4-b200-vllm

Klaud-Cold commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

claude Bot May 16, 2026

Uh oh!

claude Bot May 16, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

functionstackx commented May 17, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Klaud-Cold commented May 16, 2026

Summary

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

claude Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 16, 2026

Choose a reason for hiding this comment

What the bug is

Step-by-step proof

Impact

How to fix

Severity rationale

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

functionstackx commented May 17, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants