[https://nvbugs/6095421][fix] Update resolve_moe_backend by heyuhhh · Pull Request #14282 · NVIDIA/TensorRT-LLM

heyuhhh · 2026-05-19T02:32:12Z

Summary by CodeRabbit

New Features
- Improved MOE backend selection to account for quantization configuration, enabling optimal backend resolution when using FP8_BLOCK_SCALES quantization with specific GPU architectures.
Tests
- Activated a previously waived accuracy test for enhanced validation coverage.

Description

Follows PR#14182 to complete the fix

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

DeepSeek-V3-Lite-fp8 disagg test hangs on GB300 because resolve_moe_backend() returns CUTLASS for non-GptOss MoE models, and the CUTLASS FP8_BLOCK_SCALES path uses a DeepGEMM JIT kernel that only supports Hopper (SM90). On Blackwell (SM100/103) the worker throws 'fp8 blockscale gemm only support Hopper' during warmup; the disagg parent then deadlocks waiting for MPI_READY. Route AUTO moe_backend to DEEPGEMM on SM100/103, which natively supports these architectures with FP8_BLOCK_SCALES. Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

heyuhhh · 2026-05-19T02:32:49Z

/bot run --disable-fail-fast

coderabbitai · 2026-05-19T02:35:40Z

📝 Walkthrough

Walkthrough

This PR introduces quantization-aware MOE backend resolution by extending the resolve_moe_backend method with an optional quant_config parameter and refactoring from_pretrained into a two-phase process that loads quantization configuration before finalizing backend selection. A test waiver is removed as no longer necessary.

Changes

Quantization-aware MOE Backend Selection

Layer / File(s)	Summary
Enhanced resolve_moe_backend method with quantization awareness `tensorrt_llm/_torch/model_config.py`	Method signature accepts optional `quant_config` parameter. New quantization-aware logic detects FP8_BLOCK_SCALES algorithm on SM 100f GPUs and returns "DEEPGEMM" backend. Import formatting updated for multi-line block.
Two-phase MOE backend resolution in from_pretrained `tensorrt_llm/_torch/model_config.py`	`from_pretrained` refactored to compute MOE backend hint first using architecture only, then load quantization config, then finalize backend resolution with loaded `quant_config` to enable quantization-aware backend selection.
Test waiver cleanup `tests/integration/test_lists/waives.txt`	Removed waiver for Qwen3_30B_A3B MXFP8 latency test, restoring it to active test execution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

yuxianq
niukuo
zhenhuaw-me
jieli-matrix

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	The PR description is minimal and lacks critical technical details. It references PR#14182 but provides no explanation of what is being fixed, why, or the technical changes involved.	Expand the description to explain the issue being fixed (MoE backend selection on Blackwell with FP8_BLOCK_SCALES), the solution approach, and reference relevant test coverage. Include why this follows PR#14182.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: updating the resolve_moe_backend method to be quantization-aware on specific hardware.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-05-19T02:38:20Z

PR_Github #49042 [ run ] triggered by Bot. Commit: b38eb84 Link to invocation

tensorrt-cicd · 2026-05-19T10:13:21Z

PR_Github #49042 [ run ] completed with state SUCCESS. Commit: b38eb84
/LLM/main/L0_MergeRequest_PR pipeline #38778 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

heyuhhh · 2026-05-19T15:26:44Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-19T15:36:07Z

PR_Github #49224 [ run ] triggered by Bot. Commit: 25d6fd7 Link to invocation

tensorrt-cicd · 2026-05-19T20:25:22Z

PR_Github #49224 [ run ] completed with state SUCCESS. Commit: 25d6fd7
/LLM/main/L0_MergeRequest_PR pipeline #38897 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

tensorrt-cicd and others added 3 commits May 17, 2026 20:37

[nvbugs/6095421][chore] Remove stale waiver after fix

721c243

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>

fix: make auto moe backend quant aware

b38eb84

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

heyuhhh requested a review from a team as a code owner May 19, 2026 02:32

heyuhhh requested review from dongxuy04 and leslie-fang25 May 19, 2026 02:32

github-actions Bot assigned heyuhhh May 19, 2026

heyuhhh requested a review from lfr-0531 May 19, 2026 12:36

lfr-0531 reviewed May 19, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/model_config.py Outdated

heyuhhh added 2 commits May 19, 2026 15:25

change DEEPGEMM MoE backend to TRTLLM

0a16582

Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>

Merge branch 'main' into user/yuhangh/fix_6095421

25d6fd7

lfr-0531 approved these changes May 20, 2026

View reviewed changes

lfr-0531 merged commit fd54508 into NVIDIA:main May 20, 2026
9 of 10 checks passed

coderabbitai Bot mentioned this pull request May 20, 2026

[https://nvbugs/6175060][fix] Fix B300 MegaMoE test selection #14362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6095421][fix] Update resolve_moe_backend#14282

[https://nvbugs/6095421][fix] Update resolve_moe_backend#14282
lfr-0531 merged 5 commits into
NVIDIA:mainfrom
heyuhhh:user/yuhangh/fix_6095421

heyuhhh commented May 19, 2026 •

edited by lfr-0531

Loading

Uh oh!

heyuhhh commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

heyuhhh commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

heyuhhh commented May 19, 2026 • edited by lfr-0531 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

heyuhhh commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

heyuhhh commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

heyuhhh commented May 19, 2026 •

edited by lfr-0531

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading