Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0#1395
Conversation
…Co-authored-by: Klaud Cold <Klaud-Cold@users.noreply.github.com>
|
|
||
| kimik2.5-fp4-b200-vllm: | ||
| image: vllm/vllm-openai:v0.17.0 | ||
| image: vllm/vllm-openai:v0.20.2 |
There was a problem hiding this comment.
🟡 The PR title and description state the vLLM image is being bumped to v0.21.0, but the actual diff updates the image to v0.20.2 in both nvidia-master.yaml (line 2503) and the perf-changelog.yaml entry. This is a metadata-only mismatch — please reconcile before merge by either updating the title/description to say v0.20.2, or bumping the YAML/changelog to v0.21.0 if that was the intended target.
Extended reasoning...
The mismatch
The PR title is "Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0" and the description states "Updates the vLLM image tag for kimik2.5-fp4-b200-vllm from v0.17.0 to v0.21.0".
However, the actual diff tells a different story:
.github/configs/nvidia-master.yamlline 2503 changes fromimage: vllm/vllm-openai:v0.17.0toimage: vllm/vllm-openai:v0.20.2- The new
perf-changelog.yamlentry describes the change as "Update vLLM image from v0.17.0 to v0.20.2"
So the YAML and the changelog are internally consistent at v0.20.2, while the PR title and description both claim v0.21.0.
Step-by-step proof
- Open the PR view on GitHub — the title reads
Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0. - Read the description — it says
Updates the vLLM image tag for kimik2.5-fp4-b200-vllm from v0.17.0 to v0.21.0. - Inspect the diff on
.github/configs/nvidia-master.yaml:kimik2.5-fp4-b200-vllm: - image: vllm/vllm-openai:v0.17.0 + image: vllm/vllm-openai:v0.20.2
- Inspect the diff on
perf-changelog.yaml:- config-keys: - kimik2.5-fp4-b200-vllm description: - "Update vLLM image from v0.17.0 to v0.20.2"
- Compare: title/description say v0.21.0, but the actual deployable artifact and changelog say v0.20.2.
Impact
This is metadata-only — when this PR merges, the deployed image will be vllm/vllm-openai:v0.20.2 as written in the YAML, not v0.21.0. The PR title and description don't affect the runtime behavior. However:
- Reviewers reading the title/description in the GitHub UI will be misled about which version is being approved.
- Future archaeologists running
git logwill see a commit message claiming a v0.21.0 bump that didn't actually happen. - The merge commit message (which typically picks up the PR title) will permanently embed the incorrect version in git history.
How to fix
Reconcile by either:
- Option A (most likely correct): Update the PR title to
Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2and update the description body to match, since the YAML and changelog both consistently say v0.20.2. - Option B: If v0.21.0 was actually the intended target, bump
nvidia-master.yamlline 2503 tov0.21.0and update theperf-changelog.yamldescription accordingly.
Given that the YAML and the auto-generated changelog entry agree on v0.20.2, Option A appears to be the actual intent.
|
|
||
| - config-keys: | ||
| - kimik2.5-fp4-b200-vllm | ||
| description: |
There was a problem hiding this comment.
🟡 The new perf-changelog entry for kimik2.5-fp4-b200-vllm sets pr-link to #1336, but this is PR #1395 — readers following the link will land on an unrelated PR. Please update the link to #1395 to match the convention used by surrounding entries.
Extended reasoning...
What is wrong
The new entry added to perf-changelog.yaml (lines 2332–2336) sets:
- config-keys:
- kimik2.5-fp4-b200-vllm
description:
- "Update vLLM image from v0.17.0 to v0.20.2"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1336However, the PR being reviewed is #1395, not #1336. The link is stale — most likely copy-pasted from an earlier draft or precursor PR for the same change.
Why this is a bug
The established convention in perf-changelog.yaml is that each entry's pr-link points to the PR that introduces the change. This is consistent across surrounding entries:
gptoss-fp4-h100-vllm→#1334(the PR that introduced that change)dsr1-fp4-b300-sglang→#1321qwen3.5-fp8-h200-sglang-mtp→#1347- The immediately preceding entry for
qwen3.5-fp8-mi355x-atom→#1310
PR #1336 has no relation to the kimik2.5-fp4-b200-vllm image update. A reader clicking the changelog link will land on an unrelated PR, defeating the purpose of the link as a navigation/audit aid.
Step-by-step proof
- Open this PR — the URL/metadata shows it is
#1395. - View the diff: the only behavioral change is
nvidia-master.yamlbumpingkimik2.5-fp4-b200-vllm's image fromv0.17.0tov0.20.2. - View the perf-changelog entry added by this PR: it sets
pr-link: .../pull/1336. - Click the link in the changelog → it navigates to PR Update kimik2.5-fp4-b200-vllm vLLM image to v0.20.2 #1336, which does not touch
kimik2.5-fp4-b200-vllm. - Compare to any neighbor entry (e.g.,
gptoss-fp4-h100-vllm→ Update gptoss-fp4-h100-vllm vLLM image to v0.20.2 #1334): every other entry'spr-linkresolves to the PR that introduced that entry, so the convention is unambiguous and Update kimik2.5-fp4-b200-vllm vLLM image to v0.21.0 #1395 violates it.
Fix
Replace:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1336with:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1395Impact
Documentation/metadata only — no runtime impact on benchmarks. But the changelog is a user-facing reference and a wrong link silently misleads anyone using it to trace when/why a config changed.
# Conflicts: # perf-changelog.yaml
vLLM v0.20.2's CUDA-graph memory profiling subtracts an aggressive chunk from the requested utilization, leaving negative space for the KV cache (-39.49 GiB observed). Raising to 0.98 gives the profiler enough headroom to land KV cache positive while still keeping ~2% as hard buffer. Alternative would have been setting VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS=0, but raising the cap is the minimum-blast-radius fix and matches what similar B200 recipes use. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25984500265 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25985053704 |
Summary
kimik2.5-fp4-b200-vllmfrom v0.17.0 to v0.21.0.Ref #1154
Generated with Claude Code