[PyTorch] Inference mode disables initializing quantized weights with column-wise usage by timmoon10 · Pull Request #1847 · NVIDIA/TransformerEngine

timmoon10 · 2025-06-04T00:37:05Z

Description

When initializing a model with quantized weights, the required data is different for training and inference (training requires row-wise data for forward GEMM and column-wise data for dgrad GEMM, inference only requires column-wise). This PR adds logic so the model will only initialize quantized weights with the data required for inference when initialized within no-grad mode or inference mode. It is also less aggressive about deallocating weight data in order to handle cases where we were alternating between training and validation modes.

This is an alternative to #1827. The heuristic API in that PR are somewhat redundant with these plain PyTorch APIs, but it is also more general.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Avoid initializing quantized weights with column-wise usage if grads are not enabled
Do not deallocate unnecessary usages in weight tensors during forward pass

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…ce mode Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-06-04T00:48:12Z

/te-ci pytorch

ksivaman · 2025-06-09T22:06:57Z

 import pytest
 import os

+import transformer_engine.pytorch


Why is this needed?

I find it convenient to be able to access a class without explicitly doing from ... import ...:

TransformerEngine/tests/pytorch/test_sanity.py

Line 1393 in 649b04c

module = transformer_engine.pytorch.ops.Linear(hidden_size, hidden_size)

It's just a matter of style though. Within the package we explicitly list the imports to order to guarantee only relative imports, but this isn't relevant for tests since we always do absolute imports. Also, Google's style guide recommends against it.

Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 · 2025-06-09T23:50:30Z

/te-ci pytorch

ksivaman

LGTM

timmoon10 added 2 commits June 4, 2025 00:16

Do not initialize quantized weights with column-wise usage in inferen…

2752a8c

…ce mode Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix bug in test

649b04c

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested review from ksivaman and ptrendx June 4, 2025 00:37

timmoon10 added bug Something isn't working enhancement New feature or request labels Jun 4, 2025

timmoon10 added the 2.5.0 label Jun 5, 2025

ksivaman reviewed Jun 9, 2025

View reviewed changes

timmoon10 and others added 2 commits June 9, 2025 16:50

Use no-grad mode instead of inference mode in tests

6bf26cc

Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Merge branch 'main' into inference-mode-weight-init

837a651

ksivaman approved these changes Jun 13, 2025

View reviewed changes

ksivaman merged commit 655512c into NVIDIA:main Jun 13, 2025
21 checks passed

timmoon10 deleted the inference-mode-weight-init branch June 13, 2025 03:22

timmoon10 mentioned this pull request Jun 13, 2025

[PyTorch] Recipe heuristics for initializing quantized weights #1827

Closed

13 tasks

ptrendx mentioned this pull request Jul 29, 2025

Storage in fp8 #1880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Inference mode disables initializing quantized weights with column-wise usage#1847

[PyTorch] Inference mode disables initializing quantized weights with column-wise usage#1847
ksivaman merged 4 commits intoNVIDIA:mainfrom
timmoon10:inference-mode-weight-init

timmoon10 commented Jun 4, 2025

Uh oh!

timmoon10 commented Jun 4, 2025

Uh oh!

ksivaman Jun 9, 2025

Uh oh!

timmoon10 Jun 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Jun 9, 2025

Uh oh!

ksivaman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmoon10 commented Jun 4, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented Jun 4, 2025

Uh oh!

ksivaman Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

timmoon10 Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented Jun 9, 2025

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timmoon10 Jun 9, 2025 •

edited

Loading