[NVIDIA] Update Minimax fp8 B200 Configs by Ankur-singh · Pull Request #1010 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-04-07T17:30:25Z

Summary

Update benchmark search-space configurations for minimaxm2.5-fp8-b200-vllm to refine concurrency ranges and parallelism strategies.

Changes

.github/configs/nvidia-master.yaml — Updated search-space for MiniMax-M2.5 FP8 B200 vLLM:

ISL 1024 / OSL 1024:
- Removed tp:2 (no EP) sweep; replaced with tp:2, ep:2 at conc: 512
- Narrowed tp:4 (no EP) range from conc 4–512 → conc 4–128
- Widened tp:4, ep:4 range from conc 16–64 → conc 256–512
ISL 8192 / OSL 1024:
- Shifted tp:2 range from conc 4–256 → conc 64–512
- Narrowed tp:4 range from conc 4–256 → conc 4–64
- Added new tp:4 sweep point at conc 512

perf-changelog.yaml — Added changelog entry for this config update.

Update perf-changelog.yaml

github-actions · 2026-04-07T17:30:35Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Ankur-singh · 2026-04-07T17:31:09Z

@claude please add PR description

Klaud-Cold · 2026-04-07T17:31:35Z

Claude finished @Ankur-singh's task in 57s —— View job

Adding PR Description

Analyze PR changes
Update PR description
Fix perf-changelog.yaml PR link placeholder (xxxx → 1010)

Done! Added a detailed PR description summarizing the search-space config changes for minimaxm2.5-fp8-b200-vllm. Also fixed the placeholder PR link in perf-changelog.yaml and pushed the fix.
|

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

perf-changelog.yaml

Ankur-singh · 2026-04-08T01:33:09Z

@functionstackx @cquil11 Can you please review this PR?

I’ve already informed the team to update the corresponding recipe/cookbook. We should receive the PR any time now.

Based on validated benchmark configs in SemiAnalysisAI/InferenceX#1010, tp:4/ep:4 and tp:2/ep:2 are now confirmed for B200. Also enables 2-GPU selection for B200, adds --kv-cache-dtype fp8_e4m3 and --disable-radix-cache as B200-specific flags per the benchmark script. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

Add benchmark-validated flags for B200 FP8 from SemiAnalysisAI/InferenceX#1010: --enable-expert-parallel (tp:4/ep:4 validated, tp:2/ep:2 also supported), --gpu-memory-utilization 0.90, --block-size 32, --kv-cache-dtype fp8, --stream-interval 20, --no-enable-prefix-caching. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

hshrivastava-droid · 2026-04-08T17:41:53Z

Receipe link- vllm-project/recipes#321

@functionstackx - could you please help review this?

cquil11

evals + throughput look good
merge

* MiniMax-M2.5 B200: add EP, FP8 KV cache, disable radix cache Based on validated benchmark configs in SemiAnalysisAI/InferenceX#1010, tp:4/ep:4 and tp:2/ep:2 are now confirmed for B200. Also enables 2-GPU selection for B200, adds --kv-cache-dtype fp8_e4m3 and --disable-radix-cache as B200-specific flags per the benchmark script. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * Update Qwen35ConfigGenerator for B200 FP4 (NVFP4) Based on SemiAnalysisAI/InferenceX#820. - Set mem-fraction-static to 0.85 for B200 FP4 (benchmark uses 0.85) - Add --quantization modelopt_fp4 (required flag, was missing) - Add --chunked-prefill-size 32768, --max-prefill-tokens 32768 - Add --max-running-requests 128, --stream-interval 30 - Add --disable-radix-cache (always required for FP4) - Skip --enable-flashinfer-allreduce-fusion for FP4 (TP=4, not used per benchmark) Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * Remove --disable-radix-cache flag for B200 in MiniMaxM25ConfigGenerator Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * revert: remove accidental MiniMax B200 changes from Qwen3.5 PR PR #230 should only touch Qwen35ConfigGenerator. Revert all changes to MiniMaxM25ConfigGenerator (B200 2-GPU support, B200 EP, B200 kv-cache-dtype) that were accidentally included on this branch. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * revert: restore MiniMax comment order to match main Undo accidental comment/variable reorder in MiniMaxM25ConfigGenerator that was not part of the intended Qwen3.5 B200 FP4 changes. Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> * Update Qwen3.5 config to conditionally enable allreduce fusion based on quantization --------- Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com> Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com> Co-authored-by: Zijie Xia <zijie_xia@icloud.com>

Ankur-singh added 2 commits April 7, 2026 10:22

Update minimax fp8 b200 configs

6cb47f7

Add perf-changelog entry for minimaxm2.5-fp8-b200-vllm update

01a451e

Update perf-changelog.yaml

Ankur-singh requested a review from a team April 7, 2026 17:30

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners April 7, 2026 17:30

github-project-automation bot added this to InferenceMAX Board Apr 7, 2026

Fix perf-changelog PR link to #1010

93adcd0

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

Ankur-singh added the sweep-enabled label Apr 7, 2026

claude bot reviewed Apr 7, 2026

View reviewed changes

perf-changelog.yaml Show resolved Hide resolved

Ankur-singh and others added 3 commits April 7, 2026 12:22

Update minimax config

bcc899f

Merge branch 'main' into nv/minimax-fp8-vllm

be1ed24

Update description for MiniMax-M2.5 FP8 B200 config

6d7481c

Ankur-singh changed the title ~~[WIP] Update Minimax fp8 B200 Configs~~ Update Minimax fp8 B200 Configs Apr 8, 2026

faradawn mentioned this pull request Apr 8, 2026

MiniMax-M2.5: update B200 FP8 serving config vllm-project/recipes#321

Open

Merge branch 'main' into nv/minimax-fp8-vllm

e5bc422

cquil11 approved these changes Apr 8, 2026

View reviewed changes

cquil11 merged commit 5d037dd into main Apr 8, 2026
7 of 46 checks passed

cquil11 deleted the nv/minimax-fp8-vllm branch April 8, 2026 18:43

github-project-automation bot moved this to Done in InferenceMAX Board Apr 8, 2026

cquil11 added the NVIDIA label Apr 8, 2026

cquil11 changed the title ~~Update Minimax fp8 B200 Configs~~ [NVIDIA] Update Minimax fp8 B200 Configs Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Update Minimax fp8 B200 Configs#1010

[NVIDIA] Update Minimax fp8 B200 Configs#1010
cquil11 merged 7 commits intomainfrom
nv/minimax-fp8-vllm

Ankur-singh commented Apr 7, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Ankur-singh commented Apr 7, 2026

Uh oh!

Klaud-Cold commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Ankur-singh commented Apr 8, 2026

Uh oh!

hshrivastava-droid commented Apr 8, 2026

Uh oh!

cquil11 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ankur-singh commented Apr 7, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Ankur-singh commented Apr 7, 2026

Uh oh!

Klaud-Cold commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

Uh oh!

Ankur-singh commented Apr 8, 2026

Uh oh!

hshrivastava-droid commented Apr 8, 2026

Uh oh!

cquil11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ankur-singh commented Apr 7, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Apr 7, 2026 •

edited

Loading