-
Notifications
You must be signed in to change notification settings - Fork 226
Fix an invalid memory access in call to cuMemPoolGetAttribute #1272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test 6752a1f |
|
I see this in the documentation: Description CU_MEMPOOL_ATTR_RELEASE_THRESHOLD: (value type = cuuint64_t) Amount of reserved memory in bytes to hold onto before trying to release memory back to the OS. When more than the release threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the OS on the next call to stream, event or context synchronize. (default 0) CU_MEMPOOL_ATTR_REUSE_FOLLOW_EVENT_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to use memory asynchronously freed in another stream as long as a stream ordering dependency of the allocating stream on the free action exists. Cuda events and null stream interactions can create the required stream ordered dependencies. (default enabled) CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC: (value type = int) Allow reuse of already completed frees when there is no dependency between the free and allocation. (default enabled) CU_MEMPOOL_ATTR_REUSE_ALLOW_INTERNAL_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to insert new stream dependencies in order to establish the stream ordering required to reuse a piece of memory released by cuMemFreeAsync (default enabled). CU_MEMPOOL_ATTR_RESERVED_MEM_CURRENT: (value type = cuuint64_t) Amount of backing memory currently allocated for the mempool CU_MEMPOOL_ATTR_RESERVED_MEM_HIGH: (value type = cuuint64_t) High watermark of backing memory allocated for the mempool since the last time it was reset. CU_MEMPOOL_ATTR_USED_MEM_CURRENT: (value type = cuuint64_t) Amount of memory from the pool that is currently in use by the application. CU_MEMPOOL_ATTR_USED_MEM_HIGH: (value type = cuuint64_t) High watermark of the amount of memory from the pool that was in use by the application. There are three attributes with value type Is that real? If it's real, could we still get into trouble because of big-endian vs little-endian? |
|
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I found another nerve-wrecking casting:
| &(self._handle), <void*><uintptr_t>(handle), IPC_HANDLE_TYPE, 0) |
The driver expects an
int cast directly to void* (I was bitten by this right before the v0.4.0 release).
|
|
||
| cdef int DMRA_getattribute( | ||
| cdef uint64_t DMRA_getattribute( | ||
| cydriver.CUmemoryPool pool_handle, cydriver.CUmemPool_attribute attr_enum | ||
| ): | ||
| cdef int value | ||
| cdef uint64_t value | ||
| with nogil: | ||
| HANDLE_RETURN(cydriver.cuMemPoolGetAttribute(pool_handle, attr_enum, <void *> &value)) | ||
| return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This technically is still not correct, because the driver sometimes does want just an int.
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MALLOC__ASYNC.html#group__CUDA__MALLOC__ASYNC_1gd45ea7c43e4a1add4b971d06fa72eda4
Wouldn't this result in only the first 32 bits being written in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to initialize to zero.
Or the last 32 bits, depending on endian-ness. |
|
Right. Another issue: Why don't we see any segfault in the CI? We even turned on compute-sanitizer in some CI runs. |
It appears that the properties using |
Valgrind also could not detect this, since the bad write clobbers part of a local variable on the stack. When it crashes, a pointer is sliced as follows: 0x7fffxxxxxxxx -> 0x7fff00000000. I cannot explain why the crash is intermittant. |
6752a1f to
20334d1
Compare
|
/ok to test 20334d1 |
|
I propose an alternative solution (#1274) that should be safer, more extensible, and with the bonus of slightly faster. |
rwgk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how thorough we want to be here.
I'd definitely add comments.
| cydriver.CUmemoryPool pool_handle, cydriver.CUmemPool_attribute attr_enum | ||
| ): | ||
| cdef int value | ||
| cdef uint64_t value = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok, I believe it'll work.
But at a minimum there should be a comment/warning somewhere, explaining that
-
we know we're implicitly casting
int*tocuuint64_t*for some attributes, and -
we decided to not worry about endianness here because the
ints we know about today are only converted to bool.
Better IMO would be to just do it right, so that any changes to the enums in the future will lead to obvious failures that are most likely easily fixed.
For this one case it may not matter much, but in aggregate it'll makes a huge difference in long-term reliability and maintainability of cuda-python as a whole.
It's really straightforward to achieve, although it's significantly more code:
This would give us complete safety. If there are changes to the API in the future, it will fail loudly and we'll know immediately that we need to review and adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of safety, I left a comment above (#1274) probably while you were leaving the review here 🙂
|
Closing this since we have a better solution. |
Removed preview folders for the following PRs: - PR #1272
Description
Changes the argument size from 32-bits to 64-bits, as expected by the driver.
closes NVIDIA/cuda-python-private#197