[NV] update minimaxm2.5-fp8-b200-vllm#1068
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| PORT=${PORT:-8888} | ||
|
|
||
| export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl | ||
| export VLLM_FLOAT32_MATMUL_PRECISION=high |
There was a problem hiding this comment.
🔴 The B300 benchmark script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated to match the env var change made to the B200 script in this PR. The B300 script explicitly documents that it "reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is", so it should be updated in the same PR to replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high.
Extended reasoning...
What the bug is and how it manifests
This PR updates benchmarks/single_node/minimaxm2.5_fp8_b200.sh (line 27) to replace export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with export VLLM_FLOAT32_MATMUL_PRECISION=high. However, the companion B300 script (benchmarks/single_node/minimaxm2.5_fp8_b300.sh) was not updated and still exports VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl at line 31. After this PR merges, the two scripts will be out of sync with different environment configurations.
The specific code path that triggers it
The B300 script contains an explicit design comment at lines 3–5: "this script reuses the existing MiniMax-M2.5 FP8 B200 vLLM recipe as-is until B300-specific tuning is available." This is corroborated by the perf-changelog.yaml entry for minimaxm2.5-fp8-b300-vllm which also states it reuses the B200 recipe as-is. The deliberate design intent is for B300 to mirror B200 until independent tuning is done.
Why existing code doesn't prevent it
There is no automated mechanism to enforce parity between the B200 and B300 scripts. The only enforcement is the human convention expressed in the B300 comment, which was overlooked in this PR when only the B200 script was modified.
What the impact would be
After this PR merges, running benchmarks on B300 will use the old VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl setting, while B200 uses the new VLLM_FLOAT32_MATMUL_PRECISION=high setting. Since the env var was presumably changed on B200 for correctness or performance reasons (possibly mnnvl was incorrect or suboptimal on this hardware), B300 will be benchmarked under suboptimal or incorrect conditions — producing results that are not comparable to B200 and do not reflect the intended configuration.
How to fix it
In benchmarks/single_node/minimaxm2.5_fp8_b300.sh at line 31, replace:
export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvlwith:
export VLLM_FLOAT32_MATMUL_PRECISION=highStep-by-step proof
- Before this PR, B200 script (line 27) had:
export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl - Before this PR, B300 script (line 31) had:
export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl— matching B200 as intended - This PR changes B200 line 27 to:
export VLLM_FLOAT32_MATMUL_PRECISION=high - This PR does NOT change the B300 script
- After this PR: B200 uses
VLLM_FLOAT32_MATMUL_PRECISION=high, B300 usesVLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl - B300's own comment says it should mirror B200 — contradiction.
|
@claude - add PR description |
|
@functionstackx - could you please review? |
can update vllm recipe with this change https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24691388125/job/72214032593?pr=1068 |
|
@functionstackx already updated the recipe vllm-project/recipes#334 |
|
@functionstackx - could you please review? conflicts resolved. |

No description provided.