Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Sep 29, 2025

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 29, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@mdboom mdboom requested a review from leofang September 29, 2025 18:53
@mdboom
Copy link
Contributor Author

mdboom commented Sep 29, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Sep 29, 2025

/ok to test

@github-actions

This comment has been minimized.

@leofang
Copy link
Member

leofang commented Sep 29, 2025

btw we also need a rel-note entry for this fix

@leofang leofang added bug Something isn't working P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Sep 29, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Sep 29, 2025

While this fix definitely works with the reproducer, I'm a little unsure as to why the destructor on the capsule returned by cupy.__dlpack__ isn't getting called to begin with. Since it's Python objects all over here, I'm not sure why explicit destruction is necessary. I'm converting this to a draft to take some time to convince myself there isn't some other problem going on here.

@leofang leofang added this to the cuda.core beta 7 milestone Sep 29, 2025
@mdboom mdboom marked this pull request as draft September 29, 2025 20:25
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Sep 29, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang
Copy link
Member

leofang commented Sep 29, 2025

You can search the documentation that we built for DLPack, the text around used_dltensor is relevant:
https://dmlc.github.io/dlpack/latest/python_spec.html

leofang
leofang previously approved these changes Sep 29, 2025
@mdboom
Copy link
Contributor Author

mdboom commented Sep 29, 2025

You can search the documentation that we built for DLPack, the text around used_dltensor is relevant: https://dmlc.github.io/dlpack/latest/python_spec.html

Thanks for that. I now see why this PR makes sense in the context of what StridedMemoryView already does. I have some reading to do to understand why something called a "view" would take ownership of the thing it's viewing -- I think my mental model of what's really happening just needs some filling in...

@mdboom mdboom marked this pull request as ready for review September 29, 2025 20:40
@mdboom
Copy link
Contributor Author

mdboom commented Sep 29, 2025

/ok to test

@mdboom mdboom enabled auto-merge (squash) September 29, 2025 20:40
@leofang
Copy link
Member

leofang commented Sep 29, 2025

why something called a "view" would take ownership of the thing it's viewing -- I think my mental model of what's really happening just needs some filling in...

Yeah, this is unfortunate because in DLPack we follow what the Python buffer protocol & memoryview do. We increment the refcount of the exporting object until the view is destroyed.

FWIW, after this fix we'd still be hitting exactly the same issue as CuTe tensors (which are also views): NVIDIA/cutlass#2479. My advice to the CUTLASS/CuTe team was that views aren't supposed to be held indefinitely.

@leofang leofang linked an issue Sep 29, 2025 that may be closed by this pull request
1 task
kkraus14
kkraus14 previously approved these changes Sep 30, 2025
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good, but a couple of questions / comments related to the testing

Comment on lines 449 to 451
for idx in range(1000):
arr = cupy.zeros((1024, 1024, 1024), dtype=cupy.uint8)
StridedMemoryView(arr, stream_ptr=-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing 1000 iterations, we might be able to do something much smaller and introspect the cupy memory pool to ensure memory is being freed as expected: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.MemoryPool.html#cupy.cuda.MemoryPool

Or if we're feeling ambitious we could use a temporary custom allocator for CuPy that we could use to track the allocations and deallocations: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.using_allocator.html

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was the one who raised #1043
Could the test be simplified to:

arr = np.zeros(1048576, dtype=np.uint8)
before = sys.getrefcount(arr)
for idx in range(10):
    StridedMemoryView(arr, stream_ptr=-1)
after = sys.getrefcount(arr)
assert before == after

Using numpy also allows the test to run without cupy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this test works. I confirmed it breaks before this PR.

@mdboom
Copy link
Contributor Author

mdboom commented Sep 30, 2025

/ok to test

@mdboom
Copy link
Contributor Author

mdboom commented Sep 30, 2025

/ok to test

@mdboom mdboom requested review from kkraus14 and leofang September 30, 2025 14:58
Comment on lines +10 to +13
try:
import numpy as np
except ImportError:
np = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: currently numpy is already a required dependency for cuda.core, so we don't need try-except here.



# Ensure that memory views dellocate their reference to dlpack tensors
@pytest.mark.skipif(np is None, reason="numpy is not installed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mdboom mdboom merged commit 85ff9c2 into NVIDIA:main Sep 30, 2025
70 checks passed
@github-actions
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: StridedMemoryView leaks the producer memory

4 participants