[NV] minimaxm2.5 fp8 b300 vllm update by hshrivastava-droid · Pull Request #1106 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-04-21T01:57:55Z

Summary

Updates the MiniMax-M2.5 FP8 B300 vLLM benchmark configuration and environment variables:

Environment variable change: Replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high in the benchmark script
Search-space overhaul (1k/1k): Restructure TP/EP/concurrency sweep — use TP=4 at low concurrency (4–128), TP=4 EP=4 at mid concurrency (256–512), TP=2 EP=2 at high concurrency (512–1024), and add TP=2 EP=2 with dp-attn at very high concurrency (1024–2048)
Search-space overhaul (8k/1k): Replace broad TP=2/TP=4 sweeps with targeted configs — TP=1 at low concurrency (4–16), TP=2 at mid concurrency (64–256), and TP=4 at low concurrency (4–8)
Changelog: Add perf-changelog.yaml entry documenting the env var changes

Changed Files

File	Change
`benchmarks/single_node/minimaxm2.5_fp8_b300.sh`	Swap env var from `VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl` to `VLLM_FLOAT32_MATMUL_PRECISION=high`
`.github/configs/nvidia-master.yaml`	Revise search-space for `minimaxm2.5-fp8-b300-vllm` (both 1k/1k and 8k/1k)
`perf-changelog.yaml`	Add changelog entry for this PR

Test Plan

Run e2e benchmarks for minimaxm2.5-fp8-b300-vllm to validate new search-space configurations
Verify the VLLM_FLOAT32_MATMUL_PRECISION=high env var produces expected performance results
Confirm dp-attn config at high concurrency (1024–2048) works correctly

github-actions · 2026-04-21T01:58:03Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

hshrivastava-droid · 2026-04-21T18:24:29Z

@claude - add PR description

Klaud-Cold · 2026-04-21T18:24:55Z

Claude finished @hshrivastava-droid's task in 44s —— View job

Adding PR Description

Gather context on PR changes
Write and update PR description

PR description has been added. It covers:

The environment variable swap (VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl → VLLM_FLOAT32_MATMUL_PRECISION=high)
The restructured search-space configurations for both 1k/1k and 8k/1k sequence lengths
A test plan checklist for validation

functionstackx · 2026-04-21T18:34:37Z

 PORT=${PORT:-8888}

-export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
+export VLLM_FLOAT32_MATMUL_PRECISION=high


can u update the vllm recipe?

hshrivastava-droid · 2026-04-21T20:25:50Z

vllm recipe- vllm-project/recipes#353

hshrivastava-droid · 2026-04-21T20:34:59Z

@functionstackx - could you please review this?

hshrivastava-droid · 2026-04-21T20:36:48Z

@functionstackx - could you please review this?

functionstackx

lgtm! thank you for this PR!

jgangani

LGTM

update vllm

8babce4

hshrivastava-droid requested a review from a team April 21, 2026 01:57

hshrivastava-droid requested review from jgangani and kedarpotdar-nv as code owners April 21, 2026 01:57

github-project-automation Bot added this to InferenceMAX Board Apr 21, 2026

hshrivastava-droid added NVIDIA sweep-enabled labels Apr 21, 2026

hshrivastava-droid and others added 2 commits April 20, 2026 19:00

update Pr number

b256571

Merge branch 'main' into minimaxm2.5-fp8-b300-vllm-v2

08fa198

hshrivastava-droid changed the title ~~[WIP][NV] minimaxm2.5 fp8 b300 vllm update~~ [NV] minimaxm2.5 fp8 b300 vllm update Apr 21, 2026

functionstackx requested changes Apr 21, 2026

View reviewed changes

faradawn mentioned this pull request Apr 21, 2026

feat(MiniMax-M2.5): add VLLM_FLOAT32_MATMUL_PRECISION=high for Blackwell (B200/B300 FP8+FP4) vllm-project/recipes#353

Open

hshrivastava-droid requested a review from functionstackx April 21, 2026 20:33

functionstackx approved these changes Apr 21, 2026

View reviewed changes

hshrivastava-droid added 2 commits April 21, 2026 13:53

Merge branch 'main' into minimaxm2.5-fp8-b300-vllm-v2

7c0de34

update conc

54bf90e

jgangani approved these changes Apr 21, 2026

View reviewed changes

kedarpotdar-nv approved these changes Apr 21, 2026

View reviewed changes

hshrivastava-droid merged commit 0c2467c into main Apr 21, 2026
35 checks passed

hshrivastava-droid deleted the minimaxm2.5-fp8-b300-vllm-v2 branch April 21, 2026 22:41

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 21, 2026

This was referenced Apr 24, 2026

Add dsv4-fp8-h200-sglang single-node config #1136

Closed

Add DeepSeek-V4-Pro SGLang aggregated GB200 benchmarks (NVIDIA srt-slurm PR #69) #1137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] minimaxm2.5 fp8 b300 vllm update#1106

[NV] minimaxm2.5 fp8 b300 vllm update#1106
hshrivastava-droid merged 5 commits intomainfrom
minimaxm2.5-fp8-b300-vllm-v2

hshrivastava-droid commented Apr 21, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

Klaud-Cold commented Apr 21, 2026 •

edited

Loading

Uh oh!

functionstackx Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

functionstackx left a comment •

edited

Loading

Uh oh!

jgangani left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

hshrivastava-droid commented Apr 21, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changed Files

Test Plan

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

Klaud-Cold commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

functionstackx Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026

Uh oh!

functionstackx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgangani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hshrivastava-droid commented Apr 21, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Apr 21, 2026 •

edited

Loading

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading

functionstackx left a comment •

edited

Loading