[PyTorch] Avoid initializing recipe state in fusible op base class constructor #2421

timmoon10 · 2025-11-25T20:59:09Z

Description

This fixes a bug when constructing the Quantize op within a quantized_params_init context.

#1951 changed the BasicOperation class so that it would initialize the quantization recipe state when constructed within a quantized_params_init context. This was needed so that the recipe state is available when creating quantized weights. However, putting this logic within the base class means that we are initializing the recipe state before any child class attrs are set, so the recipe state initialization can't adapt to any user-provided config. This PR fixes this bug by moving the recipe state initialization to the BasicLinear op, which is currently the only op that supports quantized params.

Pinging @yaox12.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Avoid initializing recipe state in fusible op base class constructor

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-11-25T21:00:15Z

/te-ci pytorch

ksivaman

LGTM

greptile-apps · 2025-11-25T21:01:37Z

Greptile Overview

Greptile Summary

Fixed initialization order bug where recipe state was initialized in BasicOperation constructor before child class attributes were set.

Key Changes:

Removed premature reset_recipe_state() call from BasicOperation.__init__() that executed before child classes could set their attributes
Moved recipe state initialization to BasicLinear.__init__() after _with_quantized_weight attribute is set, allowing the recipe state to properly adapt to user configuration
Updated test to construct Quantize ops inside quantized_model_init context to verify the fix works correctly

Root Cause:
PR #1951 added recipe state initialization to the base class constructor to support quantized params, but this caused the state to be initialized without access to child class configuration like _with_quantized_weight, tensor_parallel_mode, and sequence_parallel. The reset_recipe_state() method in BasicLinear uses getattr(self, "_with_quantized_weight", False) specifically to handle being called before the attribute is set, indicating this timing issue was anticipated.

Impact:
This fix ensures that when constructing ops within a quantized_model_init context, the recipe state is properly initialized with full access to the op's configuration, preventing potential misconfiguration of quantizers.

Confidence Score: 5/5

This PR is safe to merge with no concerns - it's a clean architectural fix that properly resolves an initialization order bug
The changes follow object-oriented design principles by deferring initialization to the appropriate subclass where all configuration is available. The fix is minimal, targeted, and includes a test update that validates the bug is resolved. The code includes defensive programming (using getattr with defaults) showing this scenario was anticipated.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
transformer_engine/pytorch/ops/op.py	5/5	Removed premature recipe state initialization from BasicOperation constructor - correctly defers initialization to subclasses
transformer_engine/pytorch/ops/basic/basic_linear.py	5/5	Added recipe state initialization after setting `_with_quantized_weight` attribute, ensuring proper state when using quantized params
tests/pytorch/test_fusible_ops.py	5/5	Moved Sequential op construction inside `quantized_model_init` context to properly test the bug fix

Sequence Diagram

sequenceDiagram
    participant User
    participant quantized_model_init as quantized_model_init context
    participant BasicLinear
    participant BasicOperation
    participant FP8GlobalStateManager
    
    Note over User,FP8GlobalStateManager: Before this PR (broken)
    User->>quantized_model_init: enter context
    quantized_model_init->>FP8GlobalStateManager: set with_fp8_parameters=True
    User->>BasicLinear: __init__()
    BasicLinear->>BasicOperation: super().__init__()
    BasicOperation->>FP8GlobalStateManager: with_fp8_parameters()
    FP8GlobalStateManager-->>BasicOperation: True
    BasicOperation->>BasicOperation: reset_recipe_state(recipe)
    Note over BasicOperation: BUG: Called before child attrs set!
    BasicOperation-->>BasicLinear: return
    BasicLinear->>BasicLinear: set _with_quantized_weight
    Note over BasicLinear: Too late! Recipe state already initialized<br/>without access to this attribute
    
    Note over User,FP8GlobalStateManager: After this PR (fixed)
    User->>quantized_model_init: enter context
    quantized_model_init->>FP8GlobalStateManager: set with_fp8_parameters=True
    User->>BasicLinear: __init__()
    BasicLinear->>BasicOperation: super().__init__()
    Note over BasicOperation: No recipe state initialization
    BasicOperation-->>BasicLinear: return
    BasicLinear->>BasicLinear: set _with_quantized_weight
    BasicLinear->>FP8GlobalStateManager: with_fp8_parameters()
    FP8GlobalStateManager-->>BasicLinear: True
    BasicLinear->>BasicLinear: reset_recipe_state(recipe)
    Note over BasicLinear: FIXED: Recipe state initialized<br/>after all attrs are set

greptile-apps

_{3 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

yaox12

LGTM. Thanks for your quick fix. Verified on my case.

…nstructor (#2421) Do not initialize recipe state in base op class Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by: Tim Moon <tmoon@nvidia.com>

…nstructor (NVIDIA#2421) Do not initialize recipe state in base op class Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Do not initialize recipe state in base op class

7804155

Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested review from ksivaman and yaox12 November 25, 2025 20:59

timmoon10 added the bug Something isn't working label Nov 25, 2025

ksivaman approved these changes Nov 25, 2025

View reviewed changes

greptile-apps bot reviewed Nov 25, 2025

View reviewed changes

yaox12 approved these changes Nov 26, 2025

View reviewed changes

yaox12 merged commit 9ca89e9 into NVIDIA:main Nov 26, 2025
22 of 23 checks passed

yaox12 added the 2.11.0 label Nov 26, 2025

timmoon10 deleted the tmoon/debug-te-ops-quantize branch November 26, 2025 03:53

yaox12 added 2.10.0 and removed 2.11.0 labels Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch] Avoid initializing recipe state in fusible op base class constructor #2421

[PyTorch] Avoid initializing recipe state in fusible op base class constructor #2421

Uh oh!

timmoon10 commented Nov 25, 2025

Uh oh!

timmoon10 commented Nov 25, 2025

Uh oh!

ksivaman left a comment

Uh oh!

greptile-apps bot commented Nov 25, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

yaox12 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PyTorch] Avoid initializing recipe state in fusible op base class constructor #2421

[PyTorch] Avoid initializing recipe state in fusible op base class constructor #2421

Uh oh!

Conversation

timmoon10 commented Nov 25, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Nov 25, 2025

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Nov 25, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

yaox12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants