-
Notifications
You must be signed in to change notification settings - Fork 223
Fix #1043: Fix memory leak in StridedMemoryView #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
btw we also need a rel-note entry for this fix |
|
While this fix definitely works with the reproducer, I'm a little unsure as to why the destructor on the capsule returned by |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
You can search the documentation that we built for DLPack, the text around |
Thanks for that. I now see why this PR makes sense in the context of what |
|
/ok to test |
Yeah, this is unfortunate because in DLPack we follow what the Python buffer protocol & memoryview do. We increment the refcount of the exporting object until the view is destroyed. FWIW, after this fix we'd still be hitting exactly the same issue as CuTe tensors (which are also views): NVIDIA/cutlass#2479. My advice to the CUTLASS/CuTe team was that views aren't supposed to be held indefinitely. |
kkraus14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good, but a couple of questions / comments related to the testing
cuda_core/tests/test_memory.py
Outdated
| for idx in range(1000): | ||
| arr = cupy.zeros((1024, 1024, 1024), dtype=cupy.uint8) | ||
| StridedMemoryView(arr, stream_ptr=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing 1000 iterations, we might be able to do something much smaller and introspect the cupy memory pool to ensure memory is being freed as expected: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.MemoryPool.html#cupy.cuda.MemoryPool
Or if we're feeling ambitious we could use a temporary custom allocator for CuPy that we could use to track the allocations and deallocations: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.using_allocator.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was the one who raised #1043
Could the test be simplified to:
arr = np.zeros(1048576, dtype=np.uint8)
before = sys.getrefcount(arr)
for idx in range(10):
StridedMemoryView(arr, stream_ptr=-1)
after = sys.getrefcount(arr)
assert before == afterUsing numpy also allows the test to run without cupy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this test works. I confirmed it breaks before this PR.
|
/ok to test |
|
/ok to test |
| try: | ||
| import numpy as np | ||
| except ImportError: | ||
| np = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: currently numpy is already a required dependency for cuda.core, so we don't need try-except here.
|
|
||
|
|
||
| # Ensure that memory views dellocate their reference to dlpack tensors | ||
| @pytest.mark.skipif(np is None, reason="numpy is not installed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Checklist