Skip to content

[https://nvbugs/6094072][fix] swizzle GPT-OSS dummy MXFP4 weights#13708

Merged
dongfengy merged 3 commits into
NVIDIA:mainfrom
dongfengy:fix/gptoss-mxfp4-dummy-load-codex
May 6, 2026
Merged

[https://nvbugs/6094072][fix] swizzle GPT-OSS dummy MXFP4 weights#13708
dongfengy merged 3 commits into
NVIDIA:mainfrom
dongfengy:fix/gptoss-mxfp4-dummy-load-codex

Conversation

@dongfengy
Copy link
Copy Markdown
Collaborator

@dongfengy dongfengy commented May 2, 2026

TestGPTOSS::test_dummy_load_format tests dummy weights loading, which means actual weights loading is not called. We need to do some necessary post process to ensure the weights format is correct.

@coderabbitai summary

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@dongfengy dongfengy requested a review from a team as a code owner May 2, 2026 22:08
@dongfengy dongfengy requested a review from HuiGao-NV May 2, 2026 22:08
@dongfengy dongfengy changed the title [None][fix] swizzle GPT-OSS dummy MXFP4 weights [https://nvbugs/6094072][fix] swizzle GPT-OSS dummy MXFP4 weights May 2, 2026
@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

📝 Walkthrough

Walkthrough

This PR introduces a reusable static helper method _swizzle_and_replace for MXFP4 weight deposition that performs in-place swizzling, parameter extraction, storage release, and tensor reassignment. It refactors existing load_quant_scales logic to use this helper and adds a new post_load_weights override to handle residual weight swizzling with conditional input dequant cleanup.

Changes

MXFP4 Weight Swizzling and Memory Management

Layer / File(s) Summary
Helper Abstraction
tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py (lines 1124–1137)
New static method _swizzle_and_replace encapsulates swizzling via swizzle_weight_and_scale, parameter popping, storage release (resize to 0), CUDA cache cleanup, and tensor reassignment.
Load Quantization Scales
tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py (lines 1201–1208)
load_quant_scales is refactored to replace inline swizzle-and-replace logic for w3_w1_weightfc31_dequant and w2_weightfc2_dequant with two _swizzle_and_replace calls.
Post-Load Hook
tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py (lines 1346–1361)
New post_load_weights override conditionally reshuffles fc31_dequant.data, applies _swizzle_and_replace to both weight pairs if w3_w1_weight remains, clears input dequant parameters for torch.float8_e4m3fn, and delegates to parent.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. The Description and Test Coverage sections are empty placeholders, and the PR checklist is unchecked, failing to document what and why this change was made. Fill in the Description section explaining the issue and solution, add Test Coverage details listing relevant tests, and complete the PR checklist as appropriate for this change.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: fixing the swizzling of GPT-OSS dummy MXFP4 weights, which aligns with the refactoring of weight swizzling logic in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py (1)

1124-1126: ⚡ Quick win

Add type annotations to the new helper and override.

These new methods are unannotated, which weakens the mypy coverage this file is expected to support. Adding explicit torch.nn.Module / str / torch.Tensor parameter types and -> None where appropriate would keep the new code aligned with the repo typing rules. As per coding guidelines, "Always annotate functions; make the return type None if the function does not return anything" and "code should support mypy type checking".

Also applies to: 1346-1360

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py` around lines 1124
- 1126, Annotate the new helper _swizzle_and_replace and the corresponding
override method mentioned in the review with explicit types: use module:
torch.nn.Module, weight_name: str, scale_name: str, weight_data: torch.Tensor,
scale_data: torch.Tensor (or Optional[torch.Tensor] if it can be None) and add a
return type of -> None; also ensure the override method's parameters and return
type are similarly annotated (use torch.nn.Module/str/torch.Tensor as
appropriate) and add any necessary imports (torch and typing.Optional) so mypy
sees the types.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`:
- Around line 1204-1208: After performing the weight replacement/swizzle path
you must refresh module.quant_scales so it no longer references the old tensors
whose storage was resized or set to None; update the code after the
_swizzle_and_replace calls (and after any assignments that set
fc31_input_dequant / fc2_input_dequant to None in the float8 branch) to rebuild
module.quant_scales (the object populated by setup_quant_scales() which
originally captured objects from create_weights()) so entries like
module.quant_scales.fc31_dequant, fc2_dequant, fc31_input_dequant and
fc2_input_dequant point to the new tensors or None as appropriate.
- Around line 1127-1134: _swizzle_and_replace unconditionally calls
old_param.data.storage().resize_(0) which can corrupt new_weight/new_scale if
they alias the original storage; update _swizzle_and_replace to check aliasing
the same way swizzle_weight_and_scale does (compare old_param.data.data_ptr() or
storage pointer to new_weight.data_ptr() and new_scale.data_ptr()) and only
free/resize the old storage when there is no alias, or simply remove the manual
resize; also add explicit type annotations to the _swizzle_and_replace method
signature for its parameters and return type so callers and linters know
expected types (referencing function name _swizzle_and_replace, helper
swizzle_weight_and_scale, variables old_param,
old_param.data.storage().resize_(0), new_weight, new_scale, and data_ptr()).

---

Nitpick comments:
In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py`:
- Around line 1124-1126: Annotate the new helper _swizzle_and_replace and the
corresponding override method mentioned in the review with explicit types: use
module: torch.nn.Module, weight_name: str, scale_name: str, weight_data:
torch.Tensor, scale_data: torch.Tensor (or Optional[torch.Tensor] if it can be
None) and add a return type of -> None; also ensure the override method's
parameters and return type are similarly annotated (use
torch.nn.Module/str/torch.Tensor as appropriate) and add any necessary imports
(torch and typing.Optional) so mypy sees the types.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4f467fba-aca2-4000-af70-24353769124e

📥 Commits

Reviewing files that changed from the base of the PR and between b9cbe46 and e9c1d7a.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py

Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py
Comment thread tensorrt_llm/_torch/modules/fused_moe/fused_moe_triton.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46571 [ run ] triggered by Bot. Commit: e9c1d7a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46571 [ run ] completed with state SUCCESS. Commit: e9c1d7a
/LLM/main/L0_MergeRequest_PR pipeline #36622 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@dongfengy dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from d1caab5 to 1766894 Compare May 3, 2026 02:56
@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46579 [ run ] triggered by Bot. Commit: 1766894 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46579 [ run ] completed with state SUCCESS. Commit: 1766894
/LLM/main/L0_MergeRequest_PR pipeline #36629 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@dongfengy dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from 1766894 to 322ed86 Compare May 3, 2026 17:31
@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46606 [ run ] triggered by Bot. Commit: 322ed86 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46606 [ run ] completed with state SUCCESS. Commit: 322ed86
/LLM/main/L0_MergeRequest_PR pipeline #36653 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46612 [ run ] triggered by Bot. Commit: 322ed86 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46612 [ run ] completed with state SUCCESS. Commit: 322ed86
/LLM/main/L0_MergeRequest_PR pipeline #36659 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dongfengy added 2 commits May 4, 2026 08:42
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
…rmat

Fixed by the swizzle GPT-OSS dummy MXFP4 weights commit on this branch.

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
@dongfengy dongfengy force-pushed the fix/gptoss-mxfp4-dummy-load-codex branch from 322ed86 to c51ee54 Compare May 4, 2026 15:42
@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46653 [ run ] triggered by Bot. Commit: c51ee54 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46653 [ run ] completed with state SUCCESS. Commit: c51ee54
/LLM/main/L0_MergeRequest_PR pipeline #36695 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46694 [ run ] triggered by Bot. Commit: c51ee54 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46694 [ run ] completed with state SUCCESS. Commit: c51ee54
/LLM/main/L0_MergeRequest_PR pipeline #36732 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46829 [ run ] triggered by Bot. Commit: 1efd5b7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46829 [ run ] completed with state SUCCESS. Commit: 1efd5b7
/LLM/main/L0_MergeRequest_PR pipeline #36850 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@HuiGao-NV HuiGao-NV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dongfengy
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "Passed 19 hours ago. No change since then except rebase. CI failing with unrelated tests."

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46884 [ skip ] triggered by Bot. Commit: 1efd5b7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46884 [ skip ] completed with state SUCCESS. Commit: 1efd5b7
Skipping testing for commit 1efd5b7

Link to invocation

@dongfengy dongfengy merged commit 1668df7 into NVIDIA:main May 6, 2026
6 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
…IDIA#13708)

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants