Skip to content

Conversation

@timmoon10
Copy link
Collaborator

Description

This fixes a bug when constructing the Quantize op within a quantized_params_init context.

#1951 changed the BasicOperation class so that it would initialize the quantization recipe state when constructed within a quantized_params_init context. This was needed so that the recipe state is available when creating quantized weights. However, putting this logic within the base class means that we are initializing the recipe state before any child class attrs are set, so the recipe state initialization can't adapt to any user-provided config. This PR fixes this bug by moving the recipe state initialization to the BasicLinear op, which is currently the only op that supports quantized params.

Pinging @yaox12.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Avoid initializing recipe state in fusible op base class constructor

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Op attrs may not be set. Move recipe state initialization to linear op constructor.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10 timmoon10 requested review from ksivaman and yaox12 November 25, 2025 20:59
@timmoon10 timmoon10 added the bug Something isn't working label Nov 25, 2025
@timmoon10
Copy link
Collaborator Author

/te-ci pytorch

Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 25, 2025

Greptile Overview

Greptile Summary

Fixed initialization order bug where recipe state was initialized in BasicOperation constructor before child class attributes were set.

Key Changes:

  • Removed premature reset_recipe_state() call from BasicOperation.__init__() that executed before child classes could set their attributes
  • Moved recipe state initialization to BasicLinear.__init__() after _with_quantized_weight attribute is set, allowing the recipe state to properly adapt to user configuration
  • Updated test to construct Quantize ops inside quantized_model_init context to verify the fix works correctly

Root Cause:
PR #1951 added recipe state initialization to the base class constructor to support quantized params, but this caused the state to be initialized without access to child class configuration like _with_quantized_weight, tensor_parallel_mode, and sequence_parallel. The reset_recipe_state() method in BasicLinear uses getattr(self, "_with_quantized_weight", False) specifically to handle being called before the attribute is set, indicating this timing issue was anticipated.

Impact:
This fix ensures that when constructing ops within a quantized_model_init context, the recipe state is properly initialized with full access to the op's configuration, preventing potential misconfiguration of quantizers.

Confidence Score: 5/5

  • This PR is safe to merge with no concerns - it's a clean architectural fix that properly resolves an initialization order bug
  • The changes follow object-oriented design principles by deferring initialization to the appropriate subclass where all configuration is available. The fix is minimal, targeted, and includes a test update that validates the bug is resolved. The code includes defensive programming (using getattr with defaults) showing this scenario was anticipated.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
transformer_engine/pytorch/ops/op.py 5/5 Removed premature recipe state initialization from BasicOperation constructor - correctly defers initialization to subclasses
transformer_engine/pytorch/ops/basic/basic_linear.py 5/5 Added recipe state initialization after setting _with_quantized_weight attribute, ensuring proper state when using quantized params
tests/pytorch/test_fusible_ops.py 5/5 Moved Sequential op construction inside quantized_model_init context to properly test the bug fix

Sequence Diagram

sequenceDiagram
    participant User
    participant quantized_model_init as quantized_model_init context
    participant BasicLinear
    participant BasicOperation
    participant FP8GlobalStateManager
    
    Note over User,FP8GlobalStateManager: Before this PR (broken)
    User->>quantized_model_init: enter context
    quantized_model_init->>FP8GlobalStateManager: set with_fp8_parameters=True
    User->>BasicLinear: __init__()
    BasicLinear->>BasicOperation: super().__init__()
    BasicOperation->>FP8GlobalStateManager: with_fp8_parameters()
    FP8GlobalStateManager-->>BasicOperation: True
    BasicOperation->>BasicOperation: reset_recipe_state(recipe)
    Note over BasicOperation: BUG: Called before child attrs set!
    BasicOperation-->>BasicLinear: return
    BasicLinear->>BasicLinear: set _with_quantized_weight
    Note over BasicLinear: Too late! Recipe state already initialized<br/>without access to this attribute
    
    Note over User,FP8GlobalStateManager: After this PR (fixed)
    User->>quantized_model_init: enter context
    quantized_model_init->>FP8GlobalStateManager: set with_fp8_parameters=True
    User->>BasicLinear: __init__()
    BasicLinear->>BasicOperation: super().__init__()
    Note over BasicOperation: No recipe state initialization
    BasicOperation-->>BasicLinear: return
    BasicLinear->>BasicLinear: set _with_quantized_weight
    BasicLinear->>FP8GlobalStateManager: with_fp8_parameters()
    FP8GlobalStateManager-->>BasicLinear: True
    BasicLinear->>BasicLinear: reset_recipe_state(recipe)
    Note over BasicLinear: FIXED: Recipe state initialized<br/>after all attrs are set
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Member

@yaox12 yaox12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your quick fix. Verified on my case.

@yaox12 yaox12 merged commit 9ca89e9 into NVIDIA:main Nov 26, 2025
22 of 23 checks passed
@yaox12 yaox12 added the 2.11.0 label Nov 26, 2025
@timmoon10 timmoon10 deleted the tmoon/debug-te-ops-quantize branch November 26, 2025 03:53
KshitijLakhani pushed a commit that referenced this pull request Nov 26, 2025
…nstructor (#2421)

Do not initialize recipe state in base op class

Op attrs may not be set. Move recipe state initialization to linear op constructor.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@yaox12 yaox12 added 2.10.0 and removed 2.11.0 labels Nov 27, 2025
pggPL pushed a commit to pggPL/TransformerEngine that referenced this pull request Dec 1, 2025
…nstructor (NVIDIA#2421)

Do not initialize recipe state in base op class

Op attrs may not be set. Move recipe state initialization to linear op constructor.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.10.0 bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants