Fix an invalid memory access in call to cuMemPoolGetAttribute #1272

Andy-Jost · 2025-11-19T20:27:38Z

Description

Changes the argument size from 32-bits to 64-bits, as expected by the driver.

closes NVIDIA/cuda-python-private#197

copy-pr-bot · 2025-11-19T20:27:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-11-19T20:28:16Z

/ok to test 6752a1f

rwgk · 2025-11-19T20:37:36Z

I see this in the documentation:

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MALLOC__ASYNC.html#group__CUDA__MALLOC__ASYNC_1gd45ea7c43e4a1add4b971d06fa72eda4

Description
Supported attributes are:

CU_MEMPOOL_ATTR_RELEASE_THRESHOLD: (value type = cuuint64_t) Amount of reserved memory in bytes to hold onto before trying to release memory back to the OS. When more than the release threshold bytes of memory are held by the memory pool, the allocator will try to release memory back to the OS on the next call to stream, event or context synchronize. (default 0)

CU_MEMPOOL_ATTR_REUSE_FOLLOW_EVENT_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to use memory asynchronously freed in another stream as long as a stream ordering dependency of the allocating stream on the free action exists. Cuda events and null stream interactions can create the required stream ordered dependencies. (default enabled)

CU_MEMPOOL_ATTR_REUSE_ALLOW_OPPORTUNISTIC: (value type = int) Allow reuse of already completed frees when there is no dependency between the free and allocation. (default enabled)

CU_MEMPOOL_ATTR_REUSE_ALLOW_INTERNAL_DEPENDENCIES: (value type = int) Allow cuMemAllocAsync to insert new stream dependencies in order to establish the stream ordering required to reuse a piece of memory released by cuMemFreeAsync (default enabled).

CU_MEMPOOL_ATTR_RESERVED_MEM_CURRENT: (value type = cuuint64_t) Amount of backing memory currently allocated for the mempool

CU_MEMPOOL_ATTR_RESERVED_MEM_HIGH: (value type = cuuint64_t) High watermark of backing memory allocated for the mempool since the last time it was reset.

CU_MEMPOOL_ATTR_USED_MEM_CURRENT: (value type = cuuint64_t) Amount of memory from the pool that is currently in use by the application.

CU_MEMPOOL_ATTR_USED_MEM_HIGH: (value type = cuuint64_t) High watermark of the amount of memory from the pool that was in use by the application.

There are three attributes with value type int.

Is that real?

If it's real, could we still get into trouble because of big-endian vs little-endian?

github-actions · 2025-11-19T20:40:30Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1272/
https://nvidia.github.io/cuda-python/pr-preview/pr-1272/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1272/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1272/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

leofang

btw I found another nerve-wrecking casting:

cuda-python/cuda_core/cuda/core/experimental/_memory/_ipc.pyx

Line 209 in 1e23bf8

&(self._handle), <void*><uintptr_t>(handle), IPC_HANDLE_TYPE, 0)

The driver expects an int cast directly to void* (I was bitten by this right before the v0.4.0 release).

leofang · 2025-11-19T20:42:16Z

cuda_core/cuda/core/experimental/_memory/_device_memory_resource.pyx


-cdef int DMRA_getattribute(
+cdef uint64_t DMRA_getattribute(
    cydriver.CUmemoryPool pool_handle, cydriver.CUmemPool_attribute attr_enum
 ):
-    cdef int value
+    cdef uint64_t value
    with nogil:
        HANDLE_RETURN(cydriver.cuMemPoolGetAttribute(pool_handle, attr_enum, <void *> &value))
    return value


This technically is still not correct, because the driver sometimes does want just an int.
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MALLOC__ASYNC.html#group__CUDA__MALLOC__ASYNC_1gd45ea7c43e4a1add4b971d06fa72eda4
Wouldn't this result in only the first 32 bits being written in those cases?

Yes, I need to initialize to zero.

rwgk · 2025-11-19T20:46:39Z

Wouldn't this result in only the first 32 bits being written in those cases?

Or the last 32 bits, depending on endian-ness.

leofang · 2025-11-19T20:53:57Z

Right.

Another issue: Why don't we see any segfault in the CI? We even turned on compute-sanitizer in some CI runs.

Andy-Jost · 2025-11-19T21:57:41Z

Wouldn't this result in only the first 32 bits being written in those cases?

Or the last 32 bits, depending on endian-ness.

It appears that the properties using int are using it in liu of bool. We should be okay in those cases since we convert to bool before returning to Python.

Andy-Jost · 2025-11-19T22:01:56Z

Right.

Another issue: Why don't we see any segfault in the CI? We even turned on compute-sanitizer in some CI runs.

Valgrind also could not detect this, since the bad write clobbers part of a local variable on the stack. When it crashes, a pointer is sliced as follows: 0x7fffxxxxxxxx -> 0x7fff00000000. I cannot explain why the crash is intermittant.

Andy-Jost · 2025-11-19T22:09:08Z

/ok to test 20334d1

leofang · 2025-11-19T23:52:52Z

I propose an alternative solution (#1274) that should be safer, more extensible, and with the bonus of slightly faster.

rwgk

Not sure how thorough we want to be here.

I'd definitely add comments.

rwgk · 2025-11-19T23:58:43Z

cuda_core/cuda/core/experimental/_memory/_device_memory_resource.pyx

    cydriver.CUmemoryPool pool_handle, cydriver.CUmemPool_attribute attr_enum
 ):
-    cdef int value
+    cdef uint64_t value = 0


This looks ok, I believe it'll work.

But at a minimum there should be a comment/warning somewhere, explaining that

we know we're implicitly casting int* to cuuint64_t* for some attributes, and

we decided to not worry about endianness here because the ints we know about today are only converted to bool.

Better IMO would be to just do it right, so that any changes to the enums in the future will lead to obvious failures that are most likely easily fixed.

For this one case it may not matter much, but in aggregate it'll makes a huge difference in long-term reliability and maintainability of cuda-python as a whole.

It's really straightforward to achieve, although it's significantly more code:

https://chatgpt.com/share/691e57f4-17e0-8008-ade1-2e8d5fc590d9

This would give us complete safety. If there are changes to the API in the future, it will fail loudly and we'll know immediately that we need to review and adjust.

In terms of safety, I left a comment above (#1274) probably while you were leaving the review here 🙂

Andy-Jost · 2025-11-20T00:17:52Z

Closing this since we have a better solution.

Removed preview folders for the following PRs: - PR #1272

Andy-Jost requested a review from rwgk November 19, 2025 20:27

Andy-Jost self-assigned this Nov 19, 2025

leofang reviewed Nov 19, 2025

View reviewed changes

Expand a 32-bit int to 64-bits for proper CUDA API call.

20334d1

Andy-Jost force-pushed the fix-for-197 branch from 6752a1f to 20334d1 Compare November 19, 2025 22:07

leofang mentioned this pull request Nov 19, 2025

Cythonize device memory resource attributes #1274

Merged

2 tasks

rwgk approved these changes Nov 19, 2025

View reviewed changes

Andy-Jost closed this Nov 20, 2025

github-actions bot pushed a commit that referenced this pull request Nov 20, 2025

Clean up PR preview folders for 1 closed/merged PRs

5ae6509

Removed preview folders for the following PRs: - PR #1272

Fix an invalid memory access in call to cuMemPoolGetAttribute #1272

Fix an invalid memory access in call to cuMemPoolGetAttribute #1272

Uh oh!

Conversation

Andy-Jost commented Nov 19, 2025 • edited by leofang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

copy-pr-bot bot commented Nov 19, 2025

Uh oh!

Andy-Jost commented Nov 19, 2025

Uh oh!

rwgk commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

leofang Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk commented Nov 19, 2025

Uh oh!

leofang commented Nov 19, 2025

Uh oh!

Andy-Jost commented Nov 19, 2025

Uh oh!

Andy-Jost commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andy-Jost commented Nov 19, 2025

Uh oh!

leofang commented Nov 19, 2025

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Andy-Jost commented Nov 19, 2025 •

edited by leofang

Loading

Andy-Jost commented Nov 19, 2025 •

edited

Loading