Skip to content

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735

Open
functionstackx wants to merge 6 commits intomainfrom
claude/issue-727-20260218-0415
Open

Add Kimi K2.5 1T INT4 vLLM benchmark for B200#735
functionstackx wants to merge 6 commits intomainfrom
claude/issue-727-20260218-0415

Conversation

@functionstackx
Copy link
Contributor

Summary

  • Add Kimi K2.5 INT4 vLLM benchmark for B200 single-node
  • Image: vllm/vllm-openai:v0.15.1
  • TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k
  • Flags: --mm-encoder-tp-mode data, --trust-remote-code (on both vllm serve and benchmark)
  • Branches off claude/issue-723-20260218-0123 (MI355X Kimi K2.5 work)

Closes #727

Generated with Claude Code

github-actions bot and others added 5 commits February 18, 2026 01:25
- Add kimik2.5-int4-mi355x-vllm config to amd-master.yaml
- Image: vllm/vllm-openai-rocm:v0.15.1 (per Andy Luo's recipe)
- Model: moonshotai/Kimi-K2.5 with --mm-encoder-tp-mode data
- TP=8, concurrency 4-64 for 1k1k, 1k8k, 8k1k
- No AITER env vars, no --no-enable-prefix-caching

Closes #723

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
The flag is not recognized by vllm v0.15.1 and causes the server
to fail on startup.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
The benchmark_serving.py script already accepts --trust-remote-code but
benchmark_lib.sh's run_benchmark_serving() function wasn't passing it
through. This caused tokenizer loading failures for models like
Kimi-K2.5 that require trust_remote_code=True.

- Add --trust-remote-code flag parsing in run_benchmark_serving()
- Pass the flag through to benchmark_serving.py when set
- Enable --trust-remote-code in the Kimi-K2.5 benchmark script

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
- Create benchmarks/kimik2.5_int4_b200.sh with --trust-remote-code on
  both vllm serve and run_benchmark_serving
- Add kimik2.5-int4-b200-vllm config to nvidia-master.yaml
- Update perf-changelog.yaml with new entry

Image: vllm/vllm-openai:v0.15.1
Model: moonshotai/Kimi-K2.5, TP=8, concurrency 4-64
Flags: --mm-encoder-tp-mode data, --trust-remote-code

Closes #727

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
@functionstackx
Copy link
Contributor Author

/sweep

@github-actions
Copy link
Contributor

@functionstackx Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22127197803
Command: ``
Pinned ref: eebc476
Approval: not required (trusted collaborator).

@functionstackx
Copy link
Contributor Author

/sweep test-config --config-keys kimik2.5-int4-b200-vllm --runner-config .github/configs/runners.yaml --config-files .github/configs/nvidia-master.yaml

@functionstackx functionstackx changed the base branch from claude/issue-723-20260218-0123 to main February 18, 2026 05:05
@functionstackx functionstackx changed the title Add Kimi K2.5 INT4 vLLM benchmark for B200 Add Kimi K2.5 1T INT4 vLLM benchmark for B200 Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

b200 kimi k2.5 int4 vllm single node

1 participant

Comments