Skip to content

Add opt-in MXFP8 LM-head output projection#4825

Open
gdengk wants to merge 9 commits into
NVIDIA:mainfrom
gdengk:gaod/main/mxfp8-output-proj
Open

Add opt-in MXFP8 LM-head output projection#4825
gdengk wants to merge 9 commits into
NVIDIA:mainfrom
gdengk:gaod/main/mxfp8-output-proj

Conversation

@gdengk
Copy link
Copy Markdown
Contributor

@gdengk gdengk commented May 15, 2026

What does this PR do ?

Adds an opt-in TE-based LM-head ColumnParallelLinear that runs under the MXFP8 autocast context. Controlled by the new fp8_output_proj config flag; active only when fp8=True and fp8_recipe='mxfp8'. Defaults to off, so existing flows are unaffected.

This is the main-branch equivalent of #4484 and #4489 (merged to 26.04-alpha), ported as a self-contained module instead of layering on the alpha-only LinearCrossEntropyModule wrapper.

Changes

  • New module megatron/core/transformer/mxfp8_output_proj.py with TELinearCrossEntropyModule and is_te_mxfp8_output_proj_active.
  • GPTModel.__init__ conditionally swaps the output layer when the gate is on.
  • New TransformerConfig.fp8_output_proj field (default False).

Tests

tests/unit_tests/transformer/test_mxfp8_output_proj.py:

  • Gating helper across config combinations (pure-Python).
  • Constructor validations for unsupported options.
  • GPTModel default uses ColumnParallelLinear.
  • GPTModel uses TELinearCrossEntropyModule under mxfp8 (Blackwell-gated).

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

a added 3 commits May 15, 2026 14:02
Introduces an opt-in TE-based LM-head ColumnParallelLinear that runs under
the MXFP8 autocast context. Controlled by the new `fp8_output_proj` config
flag; active only when `fp8=True` and `fp8_recipe='mxfp8'`. Defaults to off,
so existing flows are unaffected.

This is the main-branch equivalent of NVIDIA#4484 and NVIDIA#4489 (merged to
`26.04-alpha`), ported as a self-contained module instead of layering on the
alpha-only `LinearCrossEntropyModule` wrapper.
Covers:
  * is_te_mxfp8_output_proj_active branches (pure-Python, no GPU)
  * TELinearCrossEntropyModule constructor validations (early raises,
    no GPU init required since each fires before super().__init__())
  * GPTModel default uses ColumnParallelLinear
  * GPTModel uses TELinearCrossEntropyModule when fp8_output_proj is
    enabled under the mxfp8 recipe (Blackwell-only, skipped otherwise)
Reject fp8_output_proj=True when fp8 is off or fp8_recipe is not 'mxfp8'
at config-construction time, so misconfiguration fails fast instead of
only being caught by the runtime RuntimeError in TELinearCrossEntropyModule.
The runtime check is retained as defense-in-depth against later mutation.
@gdengk gdengk requested review from a team as code owners May 15, 2026 21:50
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft May 15, 2026 21:50
@github-actions
Copy link
Copy Markdown
Contributor

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@gdengk gdengk marked this pull request as ready for review May 15, 2026 21:50
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team May 15, 2026 21:51
@dingqingy-nv
Copy link
Copy Markdown

/claude review

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Phlip79
Copy link
Copy Markdown
Member

Phlip79 commented May 24, 2026

/ok to test c292e98

# so GPTModel.sharded_state_dict's no-extra-state invariant still holds.
return None

def set_extra_state(self, state):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a warning here in case someone attempts to call set_extra_state?

@Phlip79
Copy link
Copy Markdown
Member

Phlip79 commented May 24, 2026

/ok to test 859fd91

@Phlip79
Copy link
Copy Markdown
Member

Phlip79 commented May 24, 2026

/ok to test 13976e4

@Phlip79
Copy link
Copy Markdown
Member

Phlip79 commented May 26, 2026

LGTM, but is this only applicable for GPTModel or is this also relevant for HybridModel?

I'll add support in a follow-up PR for this. There are a few other features that also need to be added to HybridModel.

@Phlip79 Phlip79 removed the request for review from a team May 26, 2026 20:56
Comment thread megatron/core/transformer/mxfp8_output_proj.py Outdated
Comment thread megatron/core/transformer/mxfp8_output_proj.py Outdated
Comment thread megatron/core/transformer/mxfp8_output_proj.py Outdated
@gdengk
Copy link
Copy Markdown
Contributor Author

gdengk commented May 28, 2026

/ok to test 5bc6d74

@gdengk
Copy link
Copy Markdown
Contributor Author

gdengk commented May 28, 2026

/ok to test 2b624cc

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels May 28, 2026
@ericharper ericharper added this pull request to the merge queue May 28, 2026
@svcnvidia-nemo-ci
Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26600989893

@svcnvidia-nemo-ci
Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26603411168

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

26.06 Approved All necessary approvals have been made complexity: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants