Skip to content

[NV] minimaxm2.5 fp8 b300 vllm update#1106

Merged
hshrivastava-droid merged 5 commits intomainfrom
minimaxm2.5-fp8-b300-vllm-v2
Apr 21, 2026
Merged

[NV] minimaxm2.5 fp8 b300 vllm update#1106
hshrivastava-droid merged 5 commits intomainfrom
minimaxm2.5-fp8-b300-vllm-v2

Conversation

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@hshrivastava-droid hshrivastava-droid commented Apr 21, 2026

Summary

Updates the MiniMax-M2.5 FP8 B300 vLLM benchmark configuration and environment variables:

  • Environment variable change: Replace VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl with VLLM_FLOAT32_MATMUL_PRECISION=high in the benchmark script
  • Search-space overhaul (1k/1k): Restructure TP/EP/concurrency sweep — use TP=4 at low concurrency (4–128), TP=4 EP=4 at mid concurrency (256–512), TP=2 EP=2 at high concurrency (512–1024), and add TP=2 EP=2 with dp-attn at very high concurrency (1024–2048)
  • Search-space overhaul (8k/1k): Replace broad TP=2/TP=4 sweeps with targeted configs — TP=1 at low concurrency (4–16), TP=2 at mid concurrency (64–256), and TP=4 at low concurrency (4–8)
  • Changelog: Add perf-changelog.yaml entry documenting the env var changes

Changed Files

File Change
benchmarks/single_node/minimaxm2.5_fp8_b300.sh Swap env var from VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl to VLLM_FLOAT32_MATMUL_PRECISION=high
.github/configs/nvidia-master.yaml Revise search-space for minimaxm2.5-fp8-b300-vllm (both 1k/1k and 8k/1k)
perf-changelog.yaml Add changelog entry for this PR

Test Plan

  • Run e2e benchmarks for minimaxm2.5-fp8-b300-vllm to validate new search-space configurations
  • Verify the VLLM_FLOAT32_MATMUL_PRECISION=high env var produces expected performance results
  • Confirm dp-attn config at high concurrency (1024–2048) works correctly

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@claude - add PR description

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] minimaxm2.5 fp8 b300 vllm update [NV] minimaxm2.5 fp8 b300 vllm update Apr 21, 2026
@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 21, 2026

Claude finished @hshrivastava-droid's task in 44s —— View job


Adding PR Description

  • Gather context on PR changes
  • Write and update PR description

PR description has been added. It covers:

  • The environment variable swap (VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvlVLLM_FLOAT32_MATMUL_PRECISION=high)
  • The restructured search-space configurations for both 1k/1k and 8k/1k sequence lengths
  • A test plan checklist for validation

PORT=${PORT:-8888}

export VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl
export VLLM_FLOAT32_MATMUL_PRECISION=high
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u update the vllm recipe?

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

hshrivastava-droid commented Apr 21, 2026

vllm recipe- vllm-project/recipes#353

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@functionstackx - could you please review this?

1 similar comment
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@functionstackx - could you please review this?

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you for this PR!

Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hshrivastava-droid hshrivastava-droid merged commit 0c2467c into main Apr 21, 2026
35 checks passed
@hshrivastava-droid hshrivastava-droid deleted the minimaxm2.5-fp8-b300-vllm-v2 branch April 21, 2026 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants