Skip to content

fix(te-plugin): handle TE 2.15+ tuple return from _Linear / _GroupedLinear#1481

Merged
kevalmorabia97 merged 1 commit into
mainfrom
kmorabia/te-2.15-tuple-output-fix
May 13, 2026
Merged

fix(te-plugin): handle TE 2.15+ tuple return from _Linear / _GroupedLinear#1481
kevalmorabia97 merged 1 commit into
mainfrom
kmorabia/te-2.15-tuple-output-fix

Conversation

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 commented May 13, 2026

What does this PR do?

Type of change: Bug fix

TE 2.15+ changed _Linear.forward and _GroupedLinear.forward to return (out, new_workspace) tuples instead of a single tensor. ModelOpt's patched te_quantized_linear_fn / te_grouped_quantized_linear_fn still piped the whole tuple into self.output_quantizer, crashing inside TensorQuantizer.forward on tuple.numel():

File ".../modelopt/torch/quantization/plugins/transformer_engine.py", line 184, in te_grouped_quantized_linear_fn
    return self.output_quantizer(output)
File ".../tensor_quantizer.py", line 1037, in forward
    if inputs.numel() == 0:
AttributeError: 'tuple' object has no attribute 'numel'

Mirror the existing pattern from _QuantTELayerNormLinear.forward: when the underlying TE call returns a tuple, quantize only output[0] (the activation tensor) and pass auxiliary workspace metadata through unchanged. TE <= 2.14 returns a single tensor and falls through the isinstance branch identically to before this change.

Already landed on release/0.44.0 as commit c897fbeaaf; this brings main in sync. Follow-up to #1473 (signature introspection + _forward cache lookup), which fixed an earlier symptom of the same TE 2.15 signature change but not this tuple-return path.

Usage

No public API change. PTQ continues to work transparently across TE 2.x:

import modelopt.torch.quantization as mtq
mtq.quantize(model, mtq.NVFP4_DEFAULT_CFG, forward_loop)  # now works on TE 2.15.x

Testing

Verified locally against both TE 2.12 and TE 2.15.0 using:

pytest tests/gpu_megatron/torch/quantization/plugins/test_transformer_engine.py

Without this fix on TE 2.15, the same test fails immediately with AttributeError: 'tuple' object has no attribute 'numel'. With this fix, both versions exercise the same code paths and pass — TE <= 2.14 skips the isinstance(output, tuple) branch and behaves identically to before.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A

Additional Information

Triggered by Megatron-Bridge failing tests after their TE 2.15 bump. The release/0.44.0 cherry-pick was pushed directly (commit c897fbeaaf) so Bridge could unblock; this PR carries the same fix forward to main.

…edLinear`

TE 2.15+ changed `_Linear.forward` and `_GroupedLinear.forward` to return
`(out, new_workspace)` tuples instead of a single tensor. ModelOpt's
patched `te_quantized_linear_fn` / `te_grouped_quantized_linear_fn` still
passed the whole tuple into `self.output_quantizer`, crashing inside
`TensorQuantizer.forward` on `tuple.numel()`:

  AttributeError: 'tuple' object has no attribute 'numel'

Mirror the existing pattern from `_QuantTELayerNormLinear.forward`:
quantize only `output[0]` (activation) and pass auxiliary workspace
metadata through verbatim. TE <= 2.14 returns a single tensor and falls
through the isinstance branch unchanged.

Already landed on release/0.44.0 (c897fbe); this brings main in sync.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner May 13, 2026 19:11
@kevalmorabia97 kevalmorabia97 requested a review from ajrasane May 13, 2026 19:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

This PR updates two Transformer Engine quantized linear wrapper methods to support TE 2.15+ returning tuples. Both _QuantTELinear and _QuantTEGroupedLinear now detect tuple outputs, apply quantization only to the activation tensor at index 0, and preserve any additional returned elements unchanged.

Changes

Transformer Engine 2.15+ Tuple Output Support

Layer / File(s) Summary
Quantized linear wrapper tuple output handling
modelopt/torch/quantization/plugins/transformer_engine.py
_QuantTELinear.te_quantized_linear_fn and _QuantTEGroupedLinear.te_grouped_quantized_linear_fn now conditionally check for tuple outputs from TE, quantize only the activation tensor (output[0]), and return any additional tuple elements alongside the quantized activation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: handling TE 2.15+ tuple returns in the Transformer Engine plugin's linear wrappers.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in PR. Changes to transformer_engine.py only add safe tuple handling with isinstance checks. No unsafe patterns found.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/te-2.15-tuple-output-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97 kevalmorabia97 requested review from kinjalpatel27 and removed request for ajrasane May 13, 2026 19:11
@kevalmorabia97 kevalmorabia97 added cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch labels May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-13 20:34 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.86%. Comparing base (62401e1) to head (28d14ee).

Files with missing lines Patch % Lines
...t/torch/quantization/plugins/transformer_engine.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1481      +/-   ##
==========================================
+ Coverage   76.78%   76.86%   +0.08%     
==========================================
  Files         473      473              
  Lines       51413    51417       +4     
==========================================
+ Hits        39476    39524      +48     
+ Misses      11937    11893      -44     
Flag Coverage Δ
examples 41.59% <0.00%> (+2.61%) ⬆️
gpu 59.72% <25.00%> (-0.60%) ⬇️
regression 14.98% <0.00%> (+0.07%) ⬆️
unit 52.55% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@kinjalpatel27 kinjalpatel27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevalmorabia97 kevalmorabia97 merged commit 94337ad into main May 13, 2026
49 checks passed
@kevalmorabia97 kevalmorabia97 deleted the kmorabia/te-2.15-tuple-output-fix branch May 13, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.44.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants