Fix #1043: Fix memory leak in StridedMemoryView #1048

mdboom · 2025-09-29T18:53:19Z

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-09-29T18:53:22Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

mdboom · 2025-09-29T18:53:28Z

/ok to test

mdboom · 2025-09-29T18:54:13Z

/ok to test

cuda_bindings/benchmarks/test_cupy.py

leofang · 2025-09-29T20:24:44Z

btw we also need a rel-note entry for this fix

mdboom · 2025-09-29T20:24:55Z

While this fix definitely works with the reproducer, I'm a little unsure as to why the destructor on the capsule returned by cupy.__dlpack__ isn't getting called to begin with. Since it's Python objects all over here, I'm not sure why explicit destruction is necessary. I'm converting this to a draft to take some time to convince myself there isn't some other problem going on here.

copy-pr-bot · 2025-09-29T20:25:08Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

leofang · 2025-09-29T20:28:28Z

You can search the documentation that we built for DLPack, the text around used_dltensor is relevant:
https://dmlc.github.io/dlpack/latest/python_spec.html

mdboom · 2025-09-29T20:40:28Z

You can search the documentation that we built for DLPack, the text around used_dltensor is relevant: https://dmlc.github.io/dlpack/latest/python_spec.html

Thanks for that. I now see why this PR makes sense in the context of what StridedMemoryView already does. I have some reading to do to understand why something called a "view" would take ownership of the thing it's viewing -- I think my mental model of what's really happening just needs some filling in...

mdboom · 2025-09-29T20:40:35Z

/ok to test

leofang · 2025-09-29T20:48:08Z

why something called a "view" would take ownership of the thing it's viewing -- I think my mental model of what's really happening just needs some filling in...

Yeah, this is unfortunate because in DLPack we follow what the Python buffer protocol & memoryview do. We increment the refcount of the exporting object until the view is destroyed.

FWIW, after this fix we'd still be hitting exactly the same issue as CuTe tensors (which are also views): NVIDIA/cutlass#2479. My advice to the CUTLASS/CuTe team was that views aren't supposed to be held indefinitely.

kkraus14

Implementation looks good, but a couple of questions / comments related to the testing

cuda_core/tests/test_memory.py

kkraus14 · 2025-09-30T03:24:52Z

cuda_core/tests/test_memory.py

+    for idx in range(1000):
+        arr = cupy.zeros((1024, 1024, 1024), dtype=cupy.uint8)
+        StridedMemoryView(arr, stream_ptr=-1)


Instead of doing 1000 iterations, we might be able to do something much smaller and introspect the cupy memory pool to ensure memory is being freed as expected: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.MemoryPool.html#cupy.cuda.MemoryPool

Or if we're feeling ambitious we could use a temporary custom allocator for CuPy that we could use to track the allocations and deallocations: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.using_allocator.html

I was the one who raised #1043
Could the test be simplified to:

arr = np.zeros(1048576, dtype=np.uint8) before = sys.getrefcount(arr) for idx in range(10): StridedMemoryView(arr, stream_ptr=-1) after = sys.getrefcount(arr) assert before == after

Using numpy also allows the test to run without cupy.

Yep, this test works. I confirmed it breaks before this PR.

mdboom · 2025-09-30T13:43:25Z

/ok to test

mdboom · 2025-09-30T14:15:38Z

/ok to test

leofang · 2025-09-30T15:13:43Z

cuda_core/tests/test_memory.py

+try:
+    import numpy as np
+except ImportError:
+    np = None


nit: currently numpy is already a required dependency for cuda.core, so we don't need try-except here.

leofang · 2025-09-30T15:13:55Z

cuda_core/tests/test_memory.py

+
+
+# Ensure that memory views dellocate their reference to dlpack tensors
+@pytest.mark.skipif(np is None, reason="numpy is not installed")


github-actions · 2025-09-30T15:33:14Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

mdboom requested a review from leofang September 29, 2025 18:53

Fix NVIDIA#1043: Fix memory leak in StridedMemoryView

97798db

mdboom force-pushed the issue-1043 branch from 42885d7 to 97798db Compare September 29, 2025 18:54

This comment has been minimized.

Sign in to view

leofang reviewed Sep 29, 2025

View reviewed changes

cuda_bindings/benchmarks/test_cupy.py Show resolved Hide resolved

leofang assigned mdboom Sep 29, 2025

leofang added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Sep 29, 2025

leofang added this to the cuda.core beta 7 milestone Sep 29, 2025

mdboom marked this pull request as draft September 29, 2025 20:25

mdboom added 2 commits September 29, 2025 16:28

Move location of test

c332765

Add relnote

5aeae16

leofang previously approved these changes Sep 29, 2025

View reviewed changes

mdboom marked this pull request as ready for review September 29, 2025 20:40

mdboom enabled auto-merge (squash) September 29, 2025 20:40

leofang linked an issue Sep 29, 2025 that may be closed by this pull request

[BUG]: StridedMemoryView leaks the producer memory #1043

Closed

1 task

kkraus14 previously approved these changes Sep 30, 2025

View reviewed changes

Improve test

468ead2

mdboom dismissed kkraus14’s stale review via 468ead2 September 30, 2025 13:42

mdboom dismissed leofang’s stale review via 468ead2 September 30, 2025 13:42

Merge remote-tracking branch 'upstream/main' into issue-1043

8dc2fc4

Fix comment

2be573a

mdboom requested review from kkraus14 and leofang September 30, 2025 14:58

leofang approved these changes Sep 30, 2025

View reviewed changes

mdboom merged commit 85ff9c2 into NVIDIA:main Sep 30, 2025
70 checks passed



		# Ensure that memory views dellocate their reference to dlpack tensors
		@pytest.mark.skipif(np is None, reason="numpy is not installed")

Fix #1043: Fix memory leak in StridedMemoryView #1048

Fix #1043: Fix memory leak in StridedMemoryView #1048

Uh oh!

Conversation

mdboom commented Sep 29, 2025

Checklist

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

mdboom commented Sep 29, 2025

Uh oh!

mdboom commented Sep 29, 2025

Uh oh!

This comment has been minimized.

Uh oh!

leofang commented Sep 29, 2025

Uh oh!

mdboom commented Sep 29, 2025

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

leofang commented Sep 29, 2025

Uh oh!

mdboom commented Sep 29, 2025

Uh oh!

mdboom commented Sep 29, 2025

Uh oh!

leofang commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkraus14 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kkraus14 Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

pijyoi Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

mdboom Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

mdboom commented Sep 30, 2025

Uh oh!

mdboom commented Sep 30, 2025

Uh oh!

leofang Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leofang commented Sep 29, 2025 •

edited

Loading