Skip to content

[PyTorch] Avoid garbage collection when capturing a CUDA Graph#2092

Merged
timmoon10 merged 2 commits intoNVIDIA:mainfrom
timmoon10:graph-gc-debug
Aug 20, 2025
Merged

[PyTorch] Avoid garbage collection when capturing a CUDA Graph#2092
timmoon10 merged 2 commits intoNVIDIA:mainfrom
timmoon10:graph-gc-debug

Conversation

@timmoon10
Copy link
Collaborator

Description

This PR avoids a situation where automatic garbage collection destroys a graph while another graph is being captured, which results in a CUDA error.

The bug was introduced with pytorch/pytorch#158193 and pytorch/pytorch#158649. See pytorch/pytorch#161037 for a bugfix and more details.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Avoid garbage collection when capturing a CUDA Graph

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator Author

/te-ci pytorch

Copy link
Collaborator

@pggPL pggPL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ksivaman ksivaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timmoon10 timmoon10 merged commit 96944a8 into NVIDIA:main Aug 20, 2025
18 of 23 checks passed
@timmoon10 timmoon10 deleted the graph-gc-debug branch August 20, 2025 18:07
KshitijLakhani pushed a commit that referenced this pull request Aug 26, 2025
Avoid garbage collection when capturing a CUDA Graph

Signed-off-by: Tim Moon <tmoon@nvidia.com>
abhinavgoel95 pushed a commit to abhinavgoel95/TransformerEngine that referenced this pull request Sep 3, 2025
…A#2092)

Avoid garbage collection when capturing a CUDA Graph

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.7.0 bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants