Skip to content

[NVIDIA] Update Minimax fp8 B200 Configs#1010

Merged
cquil11 merged 7 commits intomainfrom
nv/minimax-fp8-vllm
Apr 8, 2026
Merged

[NVIDIA] Update Minimax fp8 B200 Configs#1010
cquil11 merged 7 commits intomainfrom
nv/minimax-fp8-vllm

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

@Ankur-singh Ankur-singh commented Apr 7, 2026

Summary

Update benchmark search-space configurations for minimaxm2.5-fp8-b200-vllm to refine concurrency ranges and parallelism strategies.

Changes

.github/configs/nvidia-master.yaml — Updated search-space for MiniMax-M2.5 FP8 B200 vLLM:

  • ISL 1024 / OSL 1024:

    • Removed tp:2 (no EP) sweep; replaced with tp:2, ep:2 at conc: 512
    • Narrowed tp:4 (no EP) range from conc 4–512conc 4–128
    • Widened tp:4, ep:4 range from conc 16–64conc 256–512
  • ISL 8192 / OSL 1024:

    • Shifted tp:2 range from conc 4–256conc 64–512
    • Narrowed tp:4 range from conc 4–256conc 4–64
    • Added new tp:4 sweep point at conc 512

perf-changelog.yaml — Added changelog entry for this config update.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

@claude please add PR description

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 7, 2026

Claude finished @Ankur-singh's task in 57s —— View job


Adding PR Description

  • Analyze PR changes
  • Update PR description
  • Fix perf-changelog.yaml PR link placeholder (xxxx1010)

Done! Added a detailed PR description summarizing the search-space config changes for minimaxm2.5-fp8-b200-vllm. Also fixed the placeholder PR link in perf-changelog.yaml and pushed the fix.
|

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
@Ankur-singh Ankur-singh changed the title [WIP] Update Minimax fp8 B200 Configs Update Minimax fp8 B200 Configs Apr 8, 2026
@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

@functionstackx @cquil11 Can you please review this PR?

I’ve already informed the team to update the corresponding recipe/cookbook. We should receive the PR any time now.

faradawn added a commit to faradawn/sgl-cookbook that referenced this pull request Apr 8, 2026
Based on validated benchmark configs in SemiAnalysisAI/InferenceX#1010,
tp:4/ep:4 and tp:2/ep:2 are now confirmed for B200. Also enables 2-GPU
selection for B200, adds --kv-cache-dtype fp8_e4m3 and --disable-radix-cache
as B200-specific flags per the benchmark script.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
faradawn added a commit to faradawn/recipes that referenced this pull request Apr 8, 2026
Add benchmark-validated flags for B200 FP8 from SemiAnalysisAI/InferenceX#1010:
--enable-expert-parallel (tp:4/ep:4 validated, tp:2/ep:2 also supported),
--gpu-memory-utilization 0.90, --block-size 32, --kv-cache-dtype fp8,
--stream-interval 20, --no-enable-prefix-caching.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

Receipe link- vllm-project/recipes#321

@functionstackx - could you please help review this?

Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evals + throughput look good
merge

@cquil11 cquil11 merged commit 5d037dd into main Apr 8, 2026
7 of 46 checks passed
@cquil11 cquil11 deleted the nv/minimax-fp8-vllm branch April 8, 2026 18:43
@cquil11 cquil11 added the NVIDIA label Apr 8, 2026
@cquil11 cquil11 changed the title Update Minimax fp8 B200 Configs [NVIDIA] Update Minimax fp8 B200 Configs Apr 8, 2026
zijiexia added a commit to sgl-project/sgl-cookbook that referenced this pull request Apr 10, 2026
* MiniMax-M2.5 B200: add EP, FP8 KV cache, disable radix cache

Based on validated benchmark configs in SemiAnalysisAI/InferenceX#1010,
tp:4/ep:4 and tp:2/ep:2 are now confirmed for B200. Also enables 2-GPU
selection for B200, adds --kv-cache-dtype fp8_e4m3 and --disable-radix-cache
as B200-specific flags per the benchmark script.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

* Update Qwen35ConfigGenerator for B200 FP4 (NVFP4)

Based on SemiAnalysisAI/InferenceX#820.

- Set mem-fraction-static to 0.85 for B200 FP4 (benchmark uses 0.85)
- Add --quantization modelopt_fp4 (required flag, was missing)
- Add --chunked-prefill-size 32768, --max-prefill-tokens 32768
- Add --max-running-requests 128, --stream-interval 30
- Add --disable-radix-cache (always required for FP4)
- Skip --enable-flashinfer-allreduce-fusion for FP4 (TP=4, not used per benchmark)

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

* Remove --disable-radix-cache flag for B200 in MiniMaxM25ConfigGenerator

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

* revert: remove accidental MiniMax B200 changes from Qwen3.5 PR

PR #230 should only touch Qwen35ConfigGenerator. Revert all changes to
MiniMaxM25ConfigGenerator (B200 2-GPU support, B200 EP, B200 kv-cache-dtype)
that were accidentally included on this branch.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

* revert: restore MiniMax comment order to match main

Undo accidental comment/variable reorder in MiniMaxM25ConfigGenerator
that was not part of the intended Qwen3.5 B200 FP4 changes.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

* Update Qwen3.5 config to conditionally enable allreduce fusion based on quantization

---------

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Co-authored-by: zijiexia <37504505+zijiexia@users.noreply.github.com>
Co-authored-by: Zijie Xia <zijie_xia@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants