[PyTorch] Linear op avoids saving input tensor if weight grad is not needed by timmoon10 · Pull Request #1817 · NVIDIA/TransformerEngine

timmoon10 · 2025-05-22T23:33:25Z

Description

The linear op caches its input tensor after the forward pass so that it can be used in the backward pass to compute the grad weight tensor. However, if the grad weight tensor is not needed, we can save some memory by not caching.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Linear op avoids saving input tensor if weight grad is not needed

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-05-22T23:41:01Z

/te-ci pytorch L1

… usages Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2025-05-23T22:52:47Z

/te-ci pytorch L1

…needed (NVIDIA#1817) * Linear op avoids saving input tensor if weight grad is not needed Signed-off-by: Tim Moon <tmoon@nvidia.com> * Linear op forward avoids producing quantized tensors with unnecessary usages Signed-off-by: Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by: Tim Moon <tmoon@nvidia.com> * Avoid unnecessary usages in fused linear ops Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com>

Linear op avoids saving input tensor if weight grad is not needed

daad008

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 added the enhancement New feature or request label May 22, 2025

timmoon10 and others added 5 commits May 23, 2025 18:05

Merge branch 'main' into te-sequential-no-wgrad

5153f83

Linear op forward avoids producing quantized tensors with unnecessary…

7f0ac06

… usages Signed-off-by: Tim Moon <tmoon@nvidia.com>

Fix linter warnings

3b70d9e

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Avoid unnecessary usages in fused linear ops

a25378c

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into te-sequential-no-wgrad

2b51531

Merge branch 'main' into te-sequential-no-wgrad

62bb3ed

timmoon10 merged commit 41909dc into NVIDIA:main May 29, 2025
9 of 11 checks passed

This was referenced Apr 24, 2026

[PyTorch] Set usages for linear op quantizers before forward #2222

Merged

[PyTorch] Avoid removing usages from quantized weight tensors #2929

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Linear op avoids saving input tensor if weight grad is not needed#1817

[PyTorch] Linear op avoids saving input tensor if weight grad is not needed#1817
timmoon10 merged 7 commits intoNVIDIA:mainfrom
timmoon10:te-sequential-no-wgrad

timmoon10 commented May 22, 2025

Uh oh!

timmoon10 commented May 22, 2025

Uh oh!

timmoon10 commented May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timmoon10 commented May 22, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

timmoon10 commented May 22, 2025

Uh oh!

timmoon10 commented May 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant