[https://nvbugs/6094072][fix] swizzle GPT-OSS dummy MXFP4 weights by dongfengy · Pull Request #13708 · NVIDIA/TensorRT-LLM

dongfengy · 2026-05-02T22:08:11Z

TestGPTOSS::test_dummy_load_format tests dummy weights loading, which means actual weights loading is not called. We need to do some necessary post process to ensure the weights format is correct.

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

dongfengy · 2026-05-02T22:10:47Z

/bot run

coderabbitai · 2026-05-02T22:14:59Z

📝 Walkthrough

Walkthrough

This PR introduces a reusable static helper method _swizzle_and_replace for MXFP4 weight deposition that performs in-place swizzling, parameter extraction, storage release, and tensor reassignment. It refactors existing load_quant_scales logic to use this helper and adds a new post_load_weights override to handle residual weight swizzling with conditional input dequant cleanup.

Changes

MXFP4 Weight Swizzling and Memory Management

Layer / File(s)	Summary
Helper Abstraction `tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py` (lines 1124–1137)	New static method `_swizzle_and_replace` encapsulates swizzling via `swizzle_weight_and_scale`, parameter popping, storage release (resize to 0), CUDA cache cleanup, and tensor reassignment.
Load Quantization Scales `tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py` (lines 1201–1208)	`load_quant_scales` is refactored to replace inline swizzle-and-replace logic for `w3_w1_weight` → `fc31_dequant` and `w2_weight` → `fc2_dequant` with two `_swizzle_and_replace` calls.
Post-Load Hook `tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py` (lines 1346–1361)	New `post_load_weights` override conditionally reshuffles `fc31_dequant.data`, applies `_swizzle_and_replace` to both weight pairs if `w3_w1_weight` remains, clears input dequant parameters for `torch.float8_e4m3fn`, and delegates to parent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. The Description and Test Coverage sections are empty placeholders, and the PR checklist is unchecked, failing to document what and why this change was made.	Fill in the Description section explaining the issue and solution, add Test Coverage details listing relevant tests, and complete the PR checklist as appropriate for this change.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: fixing the swizzling of GPT-OSS dummy MXFP4 weights, which aligns with the refactoring of weight swizzling logic in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Review rate limit: 9/10 reviews remaining, refill in 6 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py (1)
1124-1126: ⚡ Quick win

Add type annotations to the new helper and override.

These new methods are unannotated, which weakens the mypy coverage this file is expected to support. Adding explicit torch.nn.Module / str / torch.Tensor parameter types and -> None where appropriate would keep the new code aligned with the repo typing rules. As per coding guidelines, "Always annotate functions; make the return type None if the function does not return anything" and "code should support mypy type checking".

Also applies to: 1346-1360
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py` around lines 1124
- 1126, Annotate the new helper _swizzle_and_replace and the corresponding
override method mentioned in the review with explicit types: use module:
torch.nn.Module, weight_name: str, scale_name: str, weight_data: torch.Tensor,
scale_data: torch.Tensor (or Optional[torch.Tensor] if it can be None) and add a
return type of -> None; also ensure the override method's parameters and return
type are similarly annotated (use torch.nn.Module/str/torch.Tensor as
appropriate) and add any necessary imports (torch and typing.Optional) so mypy
sees the types.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`:
- Around line 1204-1208: After performing the weight replacement/swizzle path
you must refresh module.quant_scales so it no longer references the old tensors
whose storage was resized or set to None; update the code after the
_swizzle_and_replace calls (and after any assignments that set
fc31_input_dequant / fc2_input_dequant to None in the float8 branch) to rebuild
module.quant_scales (the object populated by setup_quant_scales() which
originally captured objects from create_weights()) so entries like
module.quant_scales.fc31_dequant, fc2_dequant, fc31_input_dequant and
fc2_input_dequant point to the new tensors or None as appropriate.
- Around line 1127-1134: _swizzle_and_replace unconditionally calls
old_param.data.storage().resize_(0) which can corrupt new_weight/new_scale if
they alias the original storage; update _swizzle_and_replace to check aliasing
the same way swizzle_weight_and_scale does (compare old_param.data.data_ptr() or
storage pointer to new_weight.data_ptr() and new_scale.data_ptr()) and only
free/resize the old storage when there is no alias, or simply remove the manual
resize; also add explicit type annotations to the _swizzle_and_replace method
signature for its parameters and return type so callers and linters know
expected types (referencing function name _swizzle_and_replace, helper
swizzle_weight_and_scale, variables old_param,
old_param.data.storage().resize_(0), new_weight, new_scale, and data_ptr()).

---

Nitpick comments:
In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`:
- Around line 1124-1126: Annotate the new helper _swizzle_and_replace and the
corresponding override method mentioned in the review with explicit types: use
module: torch.nn.Module, weight_name: str, scale_name: str, weight_data:
torch.Tensor, scale_data: torch.Tensor (or Optional[torch.Tensor] if it can be
None) and add a return type of -> None; also ensure the override method's
parameters and return type are similarly annotated (use
torch.nn.Module/str/torch.Tensor as appropriate) and add any necessary imports
(torch and typing.Optional) so mypy sees the types.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4f467fba-aca2-4000-af70-24353769124e

📥 Commits

Reviewing files that changed from the base of the PR and between b9cbe46 and e9c1d7a.

📒 Files selected for processing (1)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py

tensorrt-cicd · 2026-05-02T22:18:05Z

PR_Github #46571 [ run ] triggered by Bot. Commit: e9c1d7a Link to invocation

tensorrt-cicd · 2026-05-03T02:18:11Z

PR_Github #46571 [ run ] completed with state SUCCESS. Commit: e9c1d7a
/LLM/main/L0_MergeRequest_PR pipeline #36622 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dongfengy · 2026-05-03T02:56:46Z

/bot run

tensorrt-cicd · 2026-05-03T03:03:43Z

PR_Github #46579 [ run ] triggered by Bot. Commit: 1766894 Link to invocation

tensorrt-cicd · 2026-05-03T06:33:54Z

PR_Github #46579 [ run ] completed with state SUCCESS. Commit: 1766894
/LLM/main/L0_MergeRequest_PR pipeline #36629 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dongfengy · 2026-05-03T17:31:35Z

/bot run

tensorrt-cicd · 2026-05-03T17:38:07Z

PR_Github #46606 [ run ] triggered by Bot. Commit: 322ed86 Link to invocation

tensorrt-cicd · 2026-05-03T21:32:14Z

PR_Github #46606 [ run ] completed with state SUCCESS. Commit: 322ed86
/LLM/main/L0_MergeRequest_PR pipeline #36653 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dongfengy · 2026-05-04T01:28:08Z

/bot run

tensorrt-cicd · 2026-05-04T01:35:51Z

PR_Github #46612 [ run ] triggered by Bot. Commit: 322ed86 Link to invocation

tensorrt-cicd · 2026-05-04T05:18:18Z

PR_Github #46612 [ run ] completed with state SUCCESS. Commit: 322ed86
/LLM/main/L0_MergeRequest_PR pipeline #36659 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

…rmat Fixed by the swizzle GPT-OSS dummy MXFP4 weights commit on this branch. Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

dongfengy · 2026-05-04T15:48:32Z

/bot run

tensorrt-cicd · 2026-05-04T15:56:54Z

PR_Github #46653 [ run ] triggered by Bot. Commit: c51ee54 Link to invocation

tensorrt-cicd · 2026-05-04T17:28:57Z

PR_Github #46653 [ run ] completed with state SUCCESS. Commit: c51ee54
/LLM/main/L0_MergeRequest_PR pipeline #36695 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dongfengy · 2026-05-04T21:33:40Z

/bot run

tensorrt-cicd · 2026-05-04T21:39:54Z

PR_Github #46694 [ run ] triggered by Bot. Commit: c51ee54 Link to invocation

tensorrt-cicd · 2026-05-05T05:55:48Z

PR_Github #46694 [ run ] completed with state SUCCESS. Commit: c51ee54
/LLM/main/L0_MergeRequest_PR pipeline #36732 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>

dongfengy · 2026-05-05T15:43:41Z

/bot run

tensorrt-cicd · 2026-05-05T15:50:15Z

PR_Github #46829 [ run ] triggered by Bot. Commit: 1efd5b7 Link to invocation

tensorrt-cicd · 2026-05-05T21:20:34Z

PR_Github #46829 [ run ] completed with state SUCCESS. Commit: 1efd5b7
/LLM/main/L0_MergeRequest_PR pipeline #36850 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

HuiGao-NV

LGTM

dongfengy · 2026-05-06T01:05:13Z

/bot skip --comment "Passed 19 hours ago. No change since then except rebase. CI failing with unrelated tests."

tensorrt-cicd · 2026-05-06T01:12:22Z

PR_Github #46884 [ skip ] triggered by Bot. Commit: 1efd5b7 Link to invocation

tensorrt-cicd · 2026-05-06T01:26:25Z

PR_Github #46884 [ skip ] completed with state SUCCESS. Commit: 1efd5b7
Skipping testing for commit 1efd5b7

Link to invocation

…IDIA#13708) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>

dongfengy requested a review from a team as a code owner May 2, 2026 22:08

dongfengy requested a review from HuiGao-NV May 2, 2026 22:08

github-actions Bot assigned dongfengy May 2, 2026

dongfengy mentioned this pull request May 2, 2026

[https://nvbugs/6094072][fix] When load_format="dummy" is used, load_quant_scales (which transforms MXFP4 #13295

Closed

2 tasks

dongfengy changed the title ~~[None][fix] swizzle GPT-OSS dummy MXFP4 weights~~ [https://nvbugs/6094072][fix] swizzle GPT-OSS dummy MXFP4 weights May 2, 2026

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py

Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py

dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from d1caab5 to 1766894 Compare May 3, 2026 02:56

dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from 1766894 to 322ed86 Compare May 3, 2026 17:31

dongfengy added 2 commits May 4, 2026 08:42

[None][fix] swizzle GPT-OSS dummy MXFP4 weights

d5b75b5

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

[https://nvbugs/6094072][test] unwaive TestGPTOSS::test_dummy_load_fo…

c51ee54

…rmat Fixed by the swizzle GPT-OSS dummy MXFP4 weights commit on this branch. Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from 322ed86 to c51ee54 Compare May 4, 2026 15:42

Merge branch 'main' into fix/gptoss-mxfp4-dummy-load-codex

1efd5b7

Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>

HuiGao-NV approved these changes May 6, 2026

View reviewed changes

dongfengy merged commit 1668df7 into NVIDIA:main May 6, 2026
6 checks passed

Conversation

dongfengy commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

dongfengy commented May 2, 2026

Uh oh!

coderabbitai Bot commented May 2, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented May 2, 2026

Uh oh!

tensorrt-cicd commented May 3, 2026

Uh oh!

dongfengy commented May 3, 2026

Uh oh!

tensorrt-cicd commented May 3, 2026

Uh oh!

tensorrt-cicd commented May 3, 2026

Uh oh!

dongfengy commented May 3, 2026

Uh oh!

tensorrt-cicd commented May 3, 2026

Uh oh!

tensorrt-cicd commented May 3, 2026

Uh oh!

dongfengy commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

dongfengy commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

dongfengy commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

dongfengy commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

HuiGao-NV left a comment

Choose a reason for hiding this comment

Uh oh!

dongfengy commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

tensorrt-cicd commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongfengy commented May 2, 2026 •

edited

Loading