Tune MiniMax MI355X vLLM scheduling thresholds by jiacao-amd · Pull Request #1276 · SemiAnalysisAI/InferenceX

jiacao-amd · 2026-05-04T20:26:10Z

Summary

Tune the MiniMax-M2.5 FP8 MI355X vLLM launch policy for better throughput across the 1k/1k and 8k/1k sweep points.

Default path: block-size=32, shuffled KV cache disabled, async scheduling enabled.
1k/1k: use block-size=16 with shuffled KV cache; disable async scheduling through c128.
1k/1k TP8/EP8 c2: keep block-size=32 and shuffled KV cache disabled, but disable async scheduling.
TP8/EP8 fallback path: keep block-size=32 and shuffled KV cache disabled.
8k/1k: disable async scheduling through c64; keep c32 on block-size=32 with shuffled KV cache disabled, and enable block-size=16 with shuffled KV cache from c64 upward.

Throughput Comparison

Metric: tput_per_gpu only.

Validation run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25346292897
Baseline run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24006177731
Compared benchmark points: 33/33
Improved: 32/33
Regressed: 1/33
Average delta: +6.5%
Median delta: +6.6%

ISL/OSL	TP/EP	Conc	Baseline	Validation	Delta
1k/1k	2/2	2	175.6	197.4	+12.4%
1k/1k	2/2	4	316.6	352.1	+11.2%
1k/1k	2/2	8	551.1	591.0	+7.2%
1k/1k	2/2	16	912.4	985.4	+8.0%
1k/1k	2/2	32	1512.4	1621.4	+7.2%
1k/1k	2/2	64	2283.1	2268.4	-0.6%
1k/1k	2/2	128	3745.8	3845.6	+2.7%
1k/1k	2/2	256	5459.7	5781.6	+5.9%
1k/1k	2/2	512	8080.0	8362.1	+3.5%
1k/1k	4/4	4	173.0	195.2	+12.8%
1k/1k	4/4	8	329.8	355.5	+7.8%
1k/1k	4/4	16	554.2	597.7	+7.9%
1k/1k	4/4	32	976.6	1037.3	+6.2%
1k/1k	4/4	64	1574.5	1679.0	+6.6%
1k/1k	4/4	128	2620.1	2707.7	+3.3%
1k/1k	4/4	256	3846.2	3945.2	+2.6%
1k/1k	8/8	2	47.7	53.1	+11.4%
8k/1k	2/2	2	712.6	831.1	+16.6%
8k/1k	2/2	4	1320.6	1357.8	+2.8%
8k/1k	2/2	8	2162.1	2187.6	+1.2%
8k/1k	2/2	16	3378.7	3530.0	+4.5%
8k/1k	2/2	32	4645.2	5091.0	+9.6%
8k/1k	2/2	64	6495.4	6760.2	+4.1%
8k/1k	2/2	128	8601.5	8900.6	+3.5%
8k/1k	2/2	256	10391.0	10689.6	+2.9%
8k/1k	4/4	4	730.2	807.2	+10.6%
8k/1k	4/4	8	1291.4	1394.4	+8.0%
8k/1k	4/4	16	2076.3	2250.0	+8.4%
8k/1k	4/4	32	3314.3	3533.6	+6.6%
8k/1k	4/4	64	4741.3	5074.6	+7.0%
8k/1k	4/4	128	6719.6	6882.4	+2.4%
8k/1k	4/4	256	8156.8	8409.5	+3.1%
8k/1k	4/4	512	9136.0	9873.0	+8.1%

Testing

bash -n benchmarks/single_node/minimaxm2.5_fp8_mi355x.sh
git diff --check
Compared results_bmk artifacts from the validation and baseline runs above.

github-actions · 2026-05-04T20:26:18Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

jiacao-amd · 2026-05-04T22:10:51Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm

github-actions · 2026-05-04T22:11:01Z

@jiacao-amd Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25346292897
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys minimaxm2.5-fp8-mi355x-vllm
Pinned ref: a9a3cef
Approval: not required (trusted collaborator).

jiacao-amd requested a review from a team May 4, 2026 20:26

github-project-automation Bot added this to InferenceMAX Board May 4, 2026

jiacao-amd mentioned this pull request May 4, 2026

Adjust MiniMax MI355X block size for TP8 EP8 #1228

Closed

jiacao-amd force-pushed the minimax-mi355x-scheduler-thresholds branch from 17bc2cc to c2b7d37 Compare May 4, 2026 20:30

claude Bot reviewed May 4, 2026

View reviewed changes

Comment thread benchmarks/single_node/minimaxm2.5_fp8_mi355x.sh Outdated

jiacao-amd force-pushed the minimax-mi355x-scheduler-thresholds branch from c2b7d37 to 98bc84c Compare May 4, 2026 20:35

SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026

Tune MiniMax MI355X vLLM scheduling thresholds

a9a3cef

jiacao-amd force-pushed the minimax-mi355x-scheduler-thresholds branch from 98bc84c to a9a3cef Compare May 4, 2026 21:42

SemiAnalysisAI deleted a comment from github-actions Bot May 4, 2026

Clarify MiniMax 8k1k scheduling branches

4b89f40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune MiniMax MI355X vLLM scheduling thresholds#1276

Tune MiniMax MI355X vLLM scheduling thresholds#1276
jiacao-amd wants to merge 2 commits intomainfrom
minimax-mi355x-scheduler-thresholds

jiacao-amd commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Uh oh!

jiacao-amd commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiacao-amd commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Throughput Comparison

Testing

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Uh oh!

jiacao-amd commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiacao-amd commented May 4, 2026 •

edited

Loading