Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0 by Klaud-Cold · Pull Request #1395 · SemiAnalysisAI/InferenceX

Klaud-Cold · 2026-05-16T07:45:22Z

Summary

Updates the vLLM image tag for kimik2.5-fp4-b200-vllm from v0.17.0 to v0.21.0.

…Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

claude · 2026-05-16T07:49:17Z


 kimik2.5-fp4-b200-vllm:
-  image: vllm/vllm-openai:v0.17.0
+  image: vllm/vllm-openai:v0.20.2


🟡 The PR title and description state the vLLM image is being bumped to v0.21.0, but the actual diff updates the image to v0.20.2 in both nvidia-master.yaml (line 2503) and the perf-changelog.yaml entry. This is a metadata-only mismatch — please reconcile before merge by either updating the title/description to say v0.20.2, or bumping the YAML/changelog to v0.21.0 if that was the intended target.

Extended reasoning...

The mismatch

The PR title is "Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0" and the description states "Updates the vLLM image tag for kimik2.5-fp4-b200-vllm from v0.17.0 to v0.21.0".

However, the actual diff tells a different story:

.github/configs/nvidia-master.yaml line 2503 changes from image: vllm/vllm-openai:v0.17.0 to image: vllm/vllm-openai:v0.20.2

The new perf-changelog.yaml entry describes the change as "Update vLLM image from v0.17.0 to v0.20.2"

So the YAML and the changelog are internally consistent at v0.20.2, while the PR title and description both claim v0.21.0.

Step-by-step proof

Open the PR view on GitHub — the title reads Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0.

Read the description — it says Updates the vLLM image tag for kimik2.5-fp4-b200-vllm from v0.17.0 to v0.21.0.

Inspect the diff on .github/configs/nvidia-master.yaml:
kimik2.5-fp4-b200-vllm: - image: vllm/vllm-openai:v0.17.0 + image: vllm/vllm-openai:v0.20.2

Inspect the diff on perf-changelog.yaml:
- config-keys: - kimik2.5-fp4-b200-vllm description: - "Update vLLM image from v0.17.0 to v0.20.2"

Compare: title/description say v0.21.0, but the actual deployable artifact and changelog say v0.20.2.

Impact

This is metadata-only — when this PR merges, the deployed image will be vllm/vllm-openai:v0.20.2 as written in the YAML, not v0.21.0. The PR title and description don't affect the runtime behavior. However:

Reviewers reading the title/description in the GitHub UI will be misled about which version is being approved.

Future archaeologists running git log will see a commit message claiming a v0.21.0 bump that didn't actually happen.

The merge commit message (which typically picks up the PR title) will permanently embed the incorrect version in git history.

How to fix

Reconcile by either:

Option A (most likely correct): Update the PR title to Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2 and update the description body to match, since the YAML and changelog both consistently say v0.20.2.

Option B: If v0.21.0 was actually the intended target, bump nvidia-master.yaml line 2503 to v0.21.0 and update the perf-changelog.yaml description accordingly.

Given that the YAML and the auto-generated changelog entry agree on v0.20.2, Option A appears to be the actual intent.

claude · 2026-05-16T07:49:17Z


+- config-keys:
+    - kimik2.5-fp4-b200-vllm
+  description:


🟡 The new perf-changelog entry for kimik2.5-fp4-b200-vllm sets pr-link to #1336, but this is PR #1395 — readers following the link will land on an unrelated PR. Please update the link to #1395 to match the convention used by surrounding entries.

Extended reasoning...

What is wrong

The new entry added to perf-changelog.yaml (lines 2332–2336) sets:

- config-keys: - kimik2.5-fp4-b200-vllm description: - "Update vLLM image from v0.17.0 to v0.20.2" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1336

However, the PR being reviewed is #1395, not #1336. The link is stale — most likely copy-pasted from an earlier draft or precursor PR for the same change.

Why this is a bug

The established convention in perf-changelog.yaml is that each entry's pr-link points to the PR that introduces the change. This is consistent across surrounding entries:

gptoss-fp4-h100-vllm → #1334 (the PR that introduced that change)

dsr1-fp4-b300-sglang → #1321

qwen3.5-fp8-h200-sglang-mtp → #1347

The immediately preceding entry for qwen3.5-fp8-mi355x-atom → #1310

PR #1336 has no relation to the kimik2.5-fp4-b200-vllm image update. A reader clicking the changelog link will land on an unrelated PR, defeating the purpose of the link as a navigation/audit aid.

Step-by-step proof

Open this PR — the URL/metadata shows it is #1395.

View the diff: the only behavioral change is nvidia-master.yaml bumping kimik2.5-fp4-b200-vllm's image from v0.17.0 to v0.20.2.

View the perf-changelog entry added by this PR: it sets pr-link: .../pull/1336.

Click the link in the changelog → it navigates to PR Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2 #1336, which does not touch kimik2.5-fp4-b200-vllm.

Compare to any neighbor entry (e.g., gptoss-fp4-h100-vllm → Update gptoss-fp4-h100-vllm vLLM image to v0.20.2 #1334): every other entry's pr-link resolves to the PR that introduced that entry, so the convention is unambiguous and Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0 #1395 violates it.

Fix

Replace:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1336

with:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1395

Impact

Documentation/metadata only — no runtime impact on benchmarks. But the changelog is a user-facing reference and a wrong link silently misleads anyone using it to trace when/why a config changed.

# Conflicts: # perf-changelog.yaml

vLLM v0.20.2's CUDA-graph memory profiling subtracts an aggressive chunk from the requested utilization, leaving negative space for the KV cache (-39.49 GiB observed). Raising to 0.98 gives the profiler enough headroom to land KV cache positive while still keeping ~2% as hard buffer. Alternative would have been setting VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0, but raising the cap is the minimum-blast-radius fix and matches what similar B200 recipes use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T07:49:21Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984500265
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25984500265

github-actions · 2026-05-17T13:46:22Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25985053704
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25985053704

github-actions Bot and others added 2 commits May 12, 2026 21:30

$Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2\n\nRef #1154\n\n…

bd3726a

…Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>

Merge branch 'main' into claude/issue-1154-kimik2.5-fp4-b200-vllm

72c0408

Klaud-Cold requested a review from a team May 16, 2026 07:45

Klaud-Cold added the full-sweep-enabled label May 16, 2026

Klaud-Cold requested review from jgangani and kedarpotdar-nv as code owners May 16, 2026 07:45

github-project-automation Bot added this to InferenceMAX Board May 16, 2026

Klaud-Cold mentioned this pull request May 16, 2026

[Auto] Docker Image Updates Available - 2026-04-25 #1154

Open

claude Bot reviewed May 16, 2026

View reviewed changes

claude-fix-bot and others added 3 commits May 17, 2026 03:20

Merge remote-tracking branch 'origin/main' into HEAD

c29c3c5

# Conflicts: # perf-changelog.yaml

Merge remote-tracking branch 'origin/main' into HEAD

90bc2f0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0#1395

Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0#1395
Klaud-Cold wants to merge 5 commits into
mainfrom
claude/issue-1154-kimik2.5-fp4-b200-vllm

Klaud-Cold commented May 16, 2026

Uh oh!

claude Bot May 16, 2026

Uh oh!

claude Bot May 16, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Klaud-Cold commented May 16, 2026

Summary

Uh oh!

claude Bot May 16, 2026

Choose a reason for hiding this comment

The mismatch

Step-by-step proof

Impact

How to fix

Uh oh!

claude Bot May 16, 2026

Choose a reason for hiding this comment

What is wrong

Why this is a bug

Step-by-step proof

Fix

Impact

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants