Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) by functionstackx · Pull Request #1267 · SemiAnalysisAI/InferenceX

functionstackx · 2026-05-03T02:32:34Z

Note

Supersedes #1071 (which itself reopened #1057 after the #1070 revert). Opening fresh from main since the original branch's merge base had drifted — the b200 schema migrated from seq-len-configs to scenarios.fixed-seq-len between then and now — and a clean reopen reads more cleanly than a rebase.

Summary

Add kimik2.5-int4-b300-vllm benchmark config and the corresponding benchmarks/single_node/kimik2.5_int4_b300.sh launch script.
At the time of submission, the vLLM Kimi-K2.5 recipes page does not have a B300-specific recipe, so this reuses the existing Kimi-K2.5 INT4 B200 vLLM recipe as-is until B300-specific tuning is available.
Image: vllm/vllm-openai:v0.20.0-cu130 — the original drafts (Add B300 config: kimi-k2.5-int4-vllm #1057, Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1071) carried the v0.19.0 placeholder while we waited; vLLM 0.20.0 has now shipped.
Search-space per (ISL, OSL):
- { tp: 8, conc-start: 4, conc-end: 64 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 64 } (new — covers the lower-TP / expert-parallel variant on the same B300 nodes)
perf-changelog.yaml entry added at the top, pointing at the original tracking PR Add B300 config: kimi-k2.5-int4-vllm #1057.

Test plan

CI config validation passes
Run kimik2.5-int4-b300-vllm single-node benchmark on a B300 node and confirm the v0.20.0 image starts, both TP=8 and TP=4/EP=1 sweeps complete, and result files are produced for both ISL/OSL configs.

🤖 Generated with Claude Code

- New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (#1057, reverted in #1070, reopened as #1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes #1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-03T02:32:42Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-03T02:32:43Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-03T03:14:49Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25267828360
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25267828360

AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR #95; move it after the current last entry (PR #1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…emiAnalysisAI#1267) * Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) - New `kimik2.5-int4-b300-vllm` config with the corresponding `benchmarks/single_node/kimik2.5_int4_b300.sh` launch script (mirrors the existing INT4 B200 vLLM recipe; the upstream vLLM Kimi-K2.5 recipes page does not yet ship B300-specific tuning). - Image: `vllm/vllm-openai:v0.20.0-cu130` — the original draft (SemiAnalysisAI#1057, reverted in SemiAnalysisAI#1070, reopened as SemiAnalysisAI#1071) carried `v0.19.0` while we waited on a working release; 0.20.0 has now shipped. - Search-space per (ISL, OSL): the existing TP=8 sweep plus a new TP=4 / EP=1 entry covering the lower-TP / expert-parallel variant on the same B300 nodes. Supersedes SemiAnalysisAI#1071 — opening fresh from main since the merge base had drifted (b200 schema migrated from `seq-len-configs` to `scenarios.fixed-seq-len`) and the user preferred a clean reopen over a rebase. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: move kimik2.5-int4-b300-vllm entry to bottom AGENTS.md requires new perf-changelog entries to be appended to the end of the file (oldest at top, newest at bottom). The original commit prepended the new entry above PR SemiAnalysisAI#95; move it after the current last entry (PR SemiAnalysisAI#1265) to satisfy the convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team May 3, 2026 02:32

functionstackx requested review from jgangani and kedarpotdar-nv as code owners May 3, 2026 02:32

github-project-automation Bot added this to InferenceMAX Board May 3, 2026

functionstackx mentioned this pull request May 3, 2026

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep) #1071

Closed

2 tasks

functionstackx added the full-sweep-enabled label May 3, 2026

claude Bot reviewed May 3, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

functionstackx removed the full-sweep-enabled label May 3, 2026

functionstackx enabled auto-merge (squash) May 3, 2026 03:40

functionstackx disabled auto-merge May 3, 2026 03:41

functionstackx merged commit 9c7fb6f into main May 3, 2026
20 checks passed

functionstackx deleted the feat/kimi-int4-b300-vllm-0.20 branch May 3, 2026 03:41

github-project-automation Bot moved this to Done in InferenceMAX Board May 3, 2026

functionstackx mentioned this pull request May 3, 2026

Re-append kimik2.5-int4-b300-vllm changelog entry #1269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep)#1267

Add B300 config: kimi-k2.5-int4-vllm (vLLM 0.20.0 + TP=4/EP=1 sweep)#1267
functionstackx merged 2 commits intomainfrom
feat/kimi-int4-b300-vllm-0.20

functionstackx commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 3, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant