[NV] update minimaxm2.5-fp8-b200-vllm by hshrivastava-droid · Pull Request #1068 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-04-17T18:38:36Z

No description provided.

github-actions · 2026-04-17T18:38:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-17T18:43:34Z

 PORT=${PORT:-8888}

-export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
+export VLLM_FLOAT32_MATMUL_PRECISION=high


🔴 The B300 benchmark script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated to match the env var change made to the B200 script in this PR. The B300 script explicitly documents that it "reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is", so it should be updated in the same PR to replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high.

Extended reasoning...

What the bug is and how it manifests

This PR updates benchmarks/single_node/minimaxm2.5_fp8_b200.sh (line 27) to replace export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with export VLLM_FLOAT32_MATMUL_PRECISION=high. However, the companion B300 script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated and still exports VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl at line 31. After this PR merges, the two scripts will be out of sync with different environment configurations.

The specific code path that triggers it

The B300 script contains an explicit design comment at lines 3–5: "this script reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is until B300-specific tuning is available." This is corroborated by the perf-changelog.yaml entry for minimaxm2.5-fp8-b300-vllm which also states it reuses the B200 recipe as-is. The deliberate design intent is for B300 to mirror B200 until independent tuning is done.

Why existing code doesn't prevent it

There is no automated mechanism to enforce parity between the B200 and B300 scripts. The only enforcement is the human convention expressed in the B300 comment, which was overlooked in this PR when only the B200 script was modified.

What the impact would be

After this PR merges, running benchmarks on B300 will use the old VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl setting, while B200 uses the new VLLM_FLOAT32_MATMUL_PRECISION=high setting. Since the env var was presumably changed on B200 for correctness or performance reasons (possibly mnnvl was incorrect or suboptimal on this hardware), B300 will be benchmarked under suboptimal or incorrect conditions — producing results that are not comparable to B200 and do not reflect the intended configuration.

How to fix it

In benchmarks/single_node/minimaxm2.5_fp8_b300.sh at line 31, replace:

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

with:

export VLLM_FLOAT32_MATMUL_PRECISION=high

Step-by-step proof

Before this PR, B200 script (line 27) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

Before this PR, B300 script (line 31) had: export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl — matching B200 as intended

This PR changes B200 line 27 to: export VLLM_FLOAT32_MATMUL_PRECISION=high

This PR does NOT change the B300 script

After this PR: B200 uses VLLM_FLOAT32_MATMUL_PRECISION=high, B300 uses VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl

B300's own comment says it should mirror B200 — contradiction.

hshrivastava-droid · 2026-04-20T20:42:34Z

@claude - add PR description

jgangani

LGTM

hshrivastava-droid · 2026-04-20T22:24:47Z

@functionstackx - could you please review?

functionstackx · 2026-04-20T22:29:13Z

@functionstackx - could you please review?

can update vllm recipe with this change

and it seems like pr validation is failing

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24691388125/job/72214032593?pr=1068

Ankur-singh · 2026-04-20T22:37:09Z

@functionstackx already updated the recipe vllm-project/recipes#334

hshrivastava-droid · 2026-04-21T00:52:29Z

@functionstackx - could you please review? conflicts resolved.

Oseltamivir

lgtm

update Gemm flag

b904003

hshrivastava-droid requested a review from a team April 17, 2026 18:38

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

hshrivastava-droid added NVIDIA sweep-enabled labels Apr 17, 2026

hshrivastava-droid added 2 commits April 17, 2026 12:06

update PR number

300a69d

uodate conc

dbcd373

hshrivastava-droid requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 20:38

Merge branch 'main' into minimaxm2.5-fp8-b200-vllm-v2

0241e60

hshrivastava-droid changed the title ~~[WIP][NV] update minimaxm2.5-fp8-b200-vllm~~ [NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026

kedarpotdar-nv approved these changes Apr 20, 2026

View reviewed changes

hshrivastava-droid changed the title ~~[NV] update minimaxm2.5-fp8-b200-vllm~~ [WIP][NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026

update conc

64b721e

jgangani approved these changes Apr 20, 2026

View reviewed changes

hshrivastava-droid changed the title ~~[WIP][NV] update minimaxm2.5-fp8-b200-vllm~~ [NV] update minimaxm2.5-fp8-b200-vllm Apr 20, 2026

Merge branch 'main' into minimaxm2.5-fp8-b200-vllm-v2

568ced6

hshrivastava-droid and others added 3 commits April 21, 2026 15:37

Merge branch 'main' into minimaxm2.5-fp8-b200-vllm-v2

c906b8a

Merge branch 'main' into minimaxm2.5-fp8-b200-vllm-v2

5018fcc

Merge branch 'main' into minimaxm2.5-fp8-b200-vllm-v2

d569911

Oseltamivir approved these changes Apr 22, 2026

View reviewed changes

Oseltamivir merged commit 41ed82d into main Apr 22, 2026
3 checks passed

Oseltamivir deleted the minimaxm2.5-fp8-b200-vllm-v2 branch April 22, 2026 00:00

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] update minimaxm2.5-fp8-b200-vllm#1068

[NV] update minimaxm2.5-fp8-b200-vllm#1068
Oseltamivir merged 9 commits intomainfrom
minimaxm2.5-fp8-b200-vllm-v2

hshrivastava-droid commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

hshrivastava-droid commented Apr 20, 2026

Uh oh!

jgangani left a comment

Uh oh!

hshrivastava-droid commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

Ankur-singh commented Apr 20, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading

Uh oh!

Oseltamivir left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

hshrivastava-droid commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 20, 2026

Uh oh!

jgangani left a comment

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 20, 2026

Uh oh!

functionstackx commented Apr 20, 2026

Uh oh!

Ankur-singh commented Apr 20, 2026

Uh oh!

hshrivastava-droid commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hshrivastava-droid commented Apr 21, 2026 •

edited

Loading