fix(te-plugin): handle TE 2.15+ tuple return from _Linear / _GroupedLinear by kevalmorabia97 · Pull Request #1481 · NVIDIA/Model-Optimizer

kevalmorabia97 · 2026-05-13T19:11:23Z

What does this PR do?

Type of change: Bug fix

TE 2.15+ changed _Linear.forward and _GroupedLinear.forward to return (out, new_workspace) tuples instead of a single tensor. ModelOpt's patched te_quantized_linear_fn / te_grouped_quantized_linear_fn still piped the whole tuple into self.output_quantizer, crashing inside TensorQuantizer.forward on tuple.numel():

File ".../modelopt/torch/quantization/plugins/transformer_engine.py", line 184, in te_grouped_quantized_linear_fn
    return self.output_quantizer(output)
File ".../tensor_quantizer.py", line 1037, in forward
    if inputs.numel() == 0:
AttributeError: 'tuple' object has no attribute 'numel'

Mirror the existing pattern from _QuantTELayerNormLinear.forward: when the underlying TE call returns a tuple, quantize only output[0] (the activation tensor) and pass auxiliary workspace metadata through unchanged. TE <= 2.14 returns a single tensor and falls through the isinstance branch identically to before this change.

Already landed on release/0.44.0 as commit c897fbeaaf; this brings main in sync. Follow-up to #1473 (signature introspection + _forward cache lookup), which fixed an earlier symptom of the same TE 2.15 signature change but not this tuple-return path.

Usage

No public API change. PTQ continues to work transparently across TE 2.x:

import modelopt.torch.quantization as mtq
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)  # now works on TE 2.15.x

Testing

Verified locally against both TE 2.12 and TE 2.15.0 using:

pytest tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py

Without this fix on TE 2.15, the same test fails immediately with AttributeError: 'tuple' object has no attribute 'numel'. With this fix, both versions exercise the same code paths and pass — TE <= 2.14 skips the isinstance(output, tuple) branch and behaves identically to before.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A

Additional Information

Triggered by Megatron-Bridge failing tests after their TE 2.15 bump. The release/0.44.0 cherry-pick was pushed directly (commit c897fbeaaf) so Bridge could unblock; this PR carries the same fix forward to main.

…edLinear` TE 2.15+ changed `_Linear.forward` and `_GroupedLinear.forward` to return `(out, new_workspace)` tuples instead of a single tensor. ModelOpt's patched `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` still passed the whole tuple into `self.output_quantizer`, crashing inside `TensorQuantizer.forward` on `tuple.numel()`: AttributeError: 'tuple' object has no attribute 'numel' Mirror the existing pattern from `_QuantTELayerNormLinear.forward`: quantize only `output[0]` (activation) and pass auxiliary workspace metadata through verbatim. TE <= 2.14 returns a single tensor and falls through the isinstance branch unchanged. Already landed on release/0.44.0 (c897fbe); this brings main in sync. Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>

coderabbitai · 2026-05-13T19:11:37Z

📝 Walkthrough

Walkthrough

This PR updates two Transformer Engine quantized linear wrapper methods to support TE 2.15+ returning tuples. Both _QuantTELinear and _QuantTEGroupedLinear now detect tuple outputs, apply quantization only to the activation tensor at index 0, and preserve any additional returned elements unchanged.

Changes

Transformer Engine 2.15+ Tuple Output Support

Layer / File(s)	Summary
Quantized linear wrapper tuple output handling `modelopt/torch/quantization/plugins/transformer_engine.py`	`_QuantTELinear.te_quantized_linear_fn` and `_QuantTEGroupedLinear.te_grouped_quantized_linear_fn` now conditionally check for tuple outputs from TE, quantize only the activation tensor (`output[0]`), and return any additional tuple elements alongside the quantized activation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: handling TE 2.15+ tuple returns in the Transformer Engine plugin's linear wrappers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected in PR. Changes to transformer_engine.py only add safe tuple handling with isinstance checks. No unsafe patterns found.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kmorabia/te-2.15-tuple-output-fix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-13T19:15:07Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-13 20:34 UTC

codecov · 2026-05-13T19:24:16Z

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.86%. Comparing base (62401e1) to head (28d14ee).

Files with missing lines	Patch %	Lines
...t/torch/quantization/plugins/transformer_engine.py	25.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1481      +/-   ##
==========================================
+ Coverage   76.78%   76.86%   +0.08%     
==========================================
  Files         473      473              
  Lines       51413    51417       +4     
==========================================
+ Hits        39476    39524      +48     
+ Misses      11937    11893      -44

Flag	Coverage Δ
examples	`41.59% <0.00%> (+2.61%)`	⬆️
gpu	`59.72% <25.00%> (-0.60%)`	⬇️
regression	`14.98% <0.00%> (+0.07%)`	⬆️
unit	`52.55% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kinjalpatel27

LGTM

kevalmorabia97 requested a review from a team as a code owner May 13, 2026 19:11

kevalmorabia97 requested a review from ajrasane May 13, 2026 19:11

kevalmorabia97 requested review from kinjalpatel27 and removed request for ajrasane May 13, 2026 19:11

kevalmorabia97 added cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch labels May 13, 2026

kinjalpatel27 approved these changes May 13, 2026

View reviewed changes

kevalmorabia97 merged commit 94337ad into main May 13, 2026
49 checks passed

kevalmorabia97 deleted the kmorabia/te-2.15-tuple-output-fix branch May 13, 2026 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(te-plugin): handle TE 2.15+ tuple return from _Linear / _GroupedLinear#1481

fix(te-plugin): handle TE 2.15+ tuple return from _Linear / _GroupedLinear#1481
kevalmorabia97 merged 1 commit into
mainfrom
kmorabia/te-2.15-tuple-output-fix

kevalmorabia97 commented May 13, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

kinjalpatel27 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevalmorabia97 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kinjalpatel27 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevalmorabia97 commented May 13, 2026 •

edited

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading

github-actions Bot commented May 13, 2026 •

edited

Loading

codecov Bot commented May 13, 2026 •

edited

Loading