[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613
[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613ruodil wants to merge 2 commits into
Conversation
….5 8-GPU perf MiniMax-M2.5 FP8 has `intermediate_size=1536` and `weight_block_size=128`. TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank intermediate size to be a multiple of the block size 128. Under TP=8 each rank gets 1536/8=192, which fails the assert. Per developer guidance, route MoE through EP=8 and rely on attention DP instead of TP. Changes: - llm_perf_core.yml: switch the 7 minimax_m2.5_fp8 8-GPU test names from `tp:8-gpus:8` to `ep:8-gpus:8`. - pytorch_model_config.py: add a pattern matching exactly those 7 cases and enable `attention_dp: True` in the generated trtllm-bench config. The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched. Fixes: NVBugs 6193836. Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
|
/bot skip --comment "skip test as just modifying cases" |
📝 WalkthroughWalkthroughThis PR updates the minimax_m2.5_fp8 performance testing configuration to use expert parallelism (EP=8) with attention-DP enabled. A new pattern config entry is added to enable attention distributed parallelism, and test parameters are updated to switch from tensor parallelism (TP=8) to expert parallelism (EP=8). Changesminimax_m2.5_fp8 Attention DP Configuration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Around line 324-331: The QA perf list changed the minimax_m2.5_fp8 rows to use
ep:8-gpus:8 but the corresponding test-db perf YAMLs were not updated; search
for entries named minimax_m2.5_fp8 (and any minimax / m2.5 variants) under the
test-db perf lists and update their rows to match the QA values (replace
whatever EP/GPUs fields they have with ep:8-gpus:8, including the
maxbs/max_throughput and min_latency variants), or if the model is intentionally
not covered add a short YAML comment explaining why; ensure you update all
occurrences so CI mirrors QA.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5e36aa5d-3eb9-459c-a8b9-b588a3e8e506
📒 Files selected for processing (2)
tests/integration/defs/perf/pytorch_model_config.pytests/integration/test_lists/qa/llm_perf_core.yml
|
PR_Github #50445 [ skip ] triggered by Bot. Commit: |
|
PR_Github #50445 [ skip ] completed with state |
MiniMax-M2.5 FP8 has
intermediate_size=1536andweight_block_size=128. TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank intermediate size to be a multiple of the block size 128. Under TP=8 each rank gets 1536/8=192, which fails the assert. Per developer guidance, route MoE through EP=8 and rely on attention DP instead of TP.Changes:
tp:8-gpus:8toep:8-gpus:8.attention_dp: Truein the generated trtllm-bench config.The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched.
Fixes: NVBugs 6193836.
Summary by CodeRabbit
Tests
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.