Skip to content

Navigation Menu

Appearance settings

View all features
- BY COMPANY SIZE
  Enterprises
  Small and medium teams
  Startups
  Nonprofits
- BY USE CASE
  App Modernization
  DevSecOps
  DevOps
  CI/CD
  View all use cases
- BY INDUSTRY
  Healthcare
  Financial services
  Manufacturing
  Government
  View all industries
View all solutions
- EXPLORE BY TOPIC
  AI
  Software Development
  DevOps
  Security
  View all topics
- EXPLORE BY TYPE
  Customer stories
  Events & webinars
  Ebooks & reports
  Business insights
  GitHub Skills
- SUPPORT & SERVICES
  Documentation
  Customer support
  Community forum
  Trust center
  Partners
View all resources
- COMMUNITY
  GitHub SponsorsFund open source developers
- PROGRAMS
  Security Lab
  Maintainer Community
  Accelerator
  GitHub Stars
  Archive Program
- REPOSITORIES
  Topics
  Trending
  Collections
- ENTERPRISE SOLUTIONS
  Enterprise platformAI-powered developer platform
- AVAILABLE ADD-ONS
  GitHub Advanced SecurityEnterprise-grade security features
  Copilot for BusinessEnterprise-grade AI features
  Premium SupportEnterprise-grade 24/7 support
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 735
Star 3.4k

Code
Issues 232
Pull requests 138
Discussions
Actions
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security and quality
Insights

Delete extra tensor objects after restoring float8 tensors#1500

Merged

ptrendx merged 13 commits into

NVIDIA:mainNVIDIA/TransformerEngine:mainfrom

sudhakarsingh27:fix_memory_leak_te_2.0sudhakarsingh27/TransformerEngine:fix_memory_leak_te_2.0Copy head branch name to clipboard

Feb 28, 2025

Conversation Commits13 (13)Checks Files changed

Merged

Delete extra tensor objects after restoring float8 tensors#1500
ptrendx merged 13 commits into
NVIDIA:mainfrom
sudhakarsingh27:fix_memory_leak_te_2.0

Conversation

Copy link

Copy Markdown

Member

sudhakarsingh27 commented Feb 21, 2025

Description

After restoring the float8 tensors in the backward passes of LayernormMLP, LayernormLinear and Linear, the tensor objects are not needed but ctx.tensor_objects still holds the reference and hence it results in extra memory usage. This fixes it.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Delete extra reference to tensor_objects once they're used in the backwards of LayernormMLP, LayernormLinear and Linear modules.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

All reactions

sudhakarsingh27 and others added 2 commits

February 20, 2025 22:00


          delete extra tensor objects after restoring float8 tensors

065661b

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          [pre-commit.ci] auto fixes from pre-commit.com hooks

01a34db

for more information, see https://pre-commit.ci

sudhakarsingh27 requested a review from ptrendx

February 21, 2025 06:06

sudhakarsingh27 self-assigned this

ksivaman reviewed

View reviewed changes

Comment thread

transformer_engine/pytorch/tensor/_internal/float8_tensor_base.py

Uh oh!

There was an error while loading. Please reload this page.

ptrendx added the 2.1.0 label

ptrendx reviewed

View reviewed changes

Comment thread

transformer_engine/pytorch/module/layernorm_linear.py Outdated

Uh oh!

There was an error while loading. Please reload this page.

sudhakarsingh27 added 3 commits

February 22, 2025 17:56


          nit fix

a1120ea

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          Merge branch 'fix_memory_leak_te_2.0' of https://github.com/sudhakars…

892ff20

…ingh27/TransformerEngine into fix_memory_leak_te_2.0


          Merge branch 'main' into fix_memory_leak_te_2.0

b7fc167

Copy link

Copy Markdown

Member Author

sudhakarsingh27 commented Feb 24, 2025

/te-ci

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

sudhakarsingh27 and others added 3 commits

February 26, 2025 16:18


          fix the leak in float8tensor and mxfloat8tensor classes

e36b440

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          Merge branch 'main' into fix_memory_leak_te_2.0

6d273cb


          [pre-commit.ci] auto fixes from pre-commit.com hooks

17bf57d

for more information, see https://pre-commit.ci

Copy link

Copy Markdown

Member Author

sudhakarsingh27 commented Feb 27, 2025

/te-ci pytorch

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

ptrendx mentioned this pull request

Blockwise float8 quantizer and quantized tensor class #1513

Merged

34 tasks

sudhakarsingh27 and others added 5 commits

February 27, 2025 14:24


          uncomment the fix

b3643cd

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          uncomment the fix

e793400

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          [pre-commit.ci] auto fixes from pre-commit.com hooks

0c10c5f

for more information, see https://pre-commit.ci


          fix lint

0c3aa43

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>


          Merge branch 'fix_memory_leak_te_2.0' of https://github.com/sudhakars…

32e2a3d

…ingh27/TransformerEngine into fix_memory_leak_te_2.0

Copy link

Copy Markdown

Member Author

sudhakarsingh27 commented Feb 27, 2025

/te-ci pytorch

All reactions

Sorry, something went wrong.

Uh oh!

There was an error while loading. Please reload this page.

ptrendx mentioned this pull request

Verified TE2.0 with offloading #1514

Merged

ptrendx approved these changes

View reviewed changes

ptrendx merged commit d3efaeb into NVIDIA:main

ptrendx pushed a commit that referenced this pull request


          Delete extra tensor objects after restoring float8 tensors (#1500)

4f9cd42

* delete extra tensor objects after restoring float8 tensors

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* nit fix

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* fix the leak in float8tensor and mxfloat8tensor classes

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* uncomment the fix

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix lint

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>

---------

Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

timmoon10 mentioned this pull request

[PyTorch] Don't set FP8 data to None when saving base tensors #1548

Merged

13 tasks

timmoon10 mentioned this pull request

[PyTorch] Bunch of memory management fixes #1686

Merged

13 tasks

hungryGeek16 mentioned this pull request

fix unfused padding causal sdpa #3063

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

ksivaman ksivaman left review comments

pggPL pggPL left review comments

ptrendx ptrendx approved these changes

Assignees

sudhakarsingh27

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Uh oh!

There was an error while loading. Please reload this page.

4 participants

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Footer

© 2026 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Community
Docs
Contact

You can’t perform that action at this time.