Memory refactor #1205

Andy-Jost · 2025-10-31T22:41:52Z

Major refactoring of the memory package.

Overview
This PR refactors the _memory.pyx module into a dedicated package (_memory/) to address its growing size and complexity, which were hindering further development. The primary goals are to physically separate the code into more manageable submodules, simplify the internal logic, and enhance the overall structure, including the addition of .pxd headers for better Cython integration.

Major Changes

Split _memory.pyx into submodules, the major ones being the following:
- Buffers: _buffer.*
- Device memory resources: _dmr.*
- IPC (Inter-Process Communication): _ipc.*
- Virtual memory management: _vmm.*
Introduced Cython headers (.pxd) for public definitions to improve modularity and type safety.
Refactored DeviceMemoryResource to isolate IPC-related code, reducing coupling.
Simplified IPC implementation by adding an IPCData class to encapsulate relevant data members and eliminating a redundant uuid field.
Streamlined the class hierarchy by removing unnecessary classes.
Simplified the Cython interface for memory allocation and deallocation operations.

Minor Improvements

Added __all__ lists to modules for explicit control over exports.
Extracted long implementation functions from class definitions to make classes more concise and readable.
Renamed various private attributes and methods for consistency (e.g., _handle instead of _mempool_handle).
Consolidated and alphabetized property definitions for better organization.
Converted additional classes and functions to Cython for performance gains.

copy-pr-bot · 2025-10-31T22:41:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rparolin · 2025-11-07T19:27:48Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+    def _clear(self):
+        self._ptr = 0
+        self._size = 0
+        self._mr = None


Nit: Consider renaming _mr -> _memory_resource or mem_resource.

rparolin · 2025-11-07T19:42:57Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+        stream: Stream | None = None
+    ):
+        cdef Buffer self = Buffer.__new__(cls)
+        self._ptr = <intptr_t>(int(ptr))


Shouldn't this be a uintptr_t or a uint64_t?

… uintptr_t.

Andy-Jost · 2025-11-07T20:43:52Z

/ok to test cf4dc9d

Andy-Jost · 2025-11-10T20:38:35Z

/ok to test 19e4b8f

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

rparolin · 2025-11-12T18:09:02Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+            if self._memory_resource is None:
+                raise ValueError("a destination buffer must be provided (this "
+                                 "buffer does not have a memory_resource)")
+            dst = self._memory_resource.allocate(src_size, stream)


Nit: This is a common idiom in python? My tendency would be to assert and require the user to pass in a buffer that fits expectations on size and alignment rather than doing it for them from a clarity of memory ownership perspective. I appreciate that this is not as large a concern in Python as it is in other systems programming languages.

My 2c: it would be better to require the buffer, as you suggest.

rparolin · 2025-11-12T18:10:56Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+        cdef size_t dst_size = self._size
+        cdef size_t src_size = src._size
+
+        if src_size != dst_size:


Do we need to guard against if size is zero? To avoid kicking off a memcpy operation at all in that situation.

From the POV of this change (refactoring), no need to change. More generally, I would prefer to NOT have that guard because the driver should take care of correctness and the caller can avoid the zero-size. No need to spend time checking it here, since it would pessimize the common case. I don't feel too strongly about it, though.

rparolin · 2025-11-12T18:26:22Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+        """Return True if this buffer can be accessed by the CPU, otherwise False."""
+        if self._memory_resource is not None:
+            return self._memory_resource.is_host_accessible
+        raise NotImplementedError("WIP: Currently this property only supports buffers with associated MemoryResource")


This something that we intend to change in the future? Should we remove the 'wip' from the string?

Same comment as above. For this change I only copy code. Not saying anything directly to your point, though. If we want to change this I'm happy to prepare the PR.

rparolin · 2025-11-12T18:28:24Z

cuda_core/cuda/core/experimental/_memory/_buffer.pyx

+    """
+
+    @abc.abstractmethod
+    def allocate(self, size_t size, stream: Stream = None) -> Buffer:


Question: Do we need memory alignment support in these alloc functions?

Not sure? The driver docs for cuMemAlloc say the pointer is "suitably aligned for any type of variable."

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467

There is no similar statement for cuMemAllocAsync

https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MALLOC__ASYNC.html#group__CUDA__MALLOC__ASYNC_1g13413273e84a641bce1929eae9e6501f

rparolin · 2025-11-12T18:44:08Z

cuda_core/cuda/core/experimental/_memory/_legacy.py

+
+    # TODO: support creating this MR with flags that are later passed to cuMemHostAlloc?
+
+    def allocate(self, size, stream=None) -> Buffer:


Ditto alignment comment above.

rparolin

Generally looks good to me. I left some general comments that I consider non-blocking.

@leofang Any major concerns you'd like to see address before this gets merged?

leofang · 2025-11-12T19:00:37Z

cuda_core/cuda/core/experimental/_device.pyx

+if TYPE_CHECKING:
+    from cuda.core.experimental._memory import Buffer, MemoryResource


Not blocking, but we want to avoid using TYPE_CHECKING: #468

Is this something to eliminate? If so, I can prepare a separate change to remove it everywhere.

leofang · 2025-11-12T19:08:47Z

cuda_core/tests/test_memory.py

+    ]
+    d = {}
+    exec("from cuda.core.experimental._memory import *", d)  # noqa: S102
+    d = {k: v for k, v in d.items() if not k.startswith("__")}


Q: We should exclude everything starting with one underscore, not just those with two?

Could change that. When I originally wrote this _SynchronousMemoryResource was in the mix, too.

Andy-Jost · 2025-11-12T19:09:48Z

/ok to test ff3820f

leofang · 2025-11-12T19:19:08Z

cuda_core/cuda/core/experimental/_memory/__init__.py

+from ._buffer import *  # noqa: F403
+from ._device_memory_resource import *  # noqa: F403
+from ._ipc import *  # noqa: F403
+from ._legacy import *  # noqa: F403
+from ._virtual_memory_resource import *  # noqa: F403


Style: Would be better to call out what's being imported. Maintaining __all__ or having to chasing after each module for their __all__ is not fun. I don't recall we ever maintain __all__ in any other modules.

I guess I have the opposite preference. E.g., if I add something to, say, _buffer I find it easier to update the __all__ list there rather than in a separate file.

github-actions · 2025-11-12T20:39:55Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost added 25 commits October 28, 2025 15:43

Resolve a Cython build warning.

37849a3

Make memory module into a package.

ac8a69c

Rename cyStream to _cyStream for consistency.

123aa24

Move defs to memory.pxd header

fe4b67e

Separate VMM.

ce77d44

Weaken dependencies from device to memory module.

c5179bc

Move LegacyPinnedMemoryResource to a submodule.

e192748

Move _SynchronousMemoryResource into a submodule.

729c900

Partly separates the IPC implementation.

8735455

Move IPC registry to ipc module.

b2517f6

Collect and reorder DeviceMemoryResource properties.

a61317a

Move more IPC implementation out of DeviceMemoryResource.

0e2d1d8

Minor refactoring.

5387629

Move Buffer IPC implementation.

7fa38ca

Simplify the class hierarchy (remove _cyBuffer and _cyMemoryResource).

f357abd

Refactor to shrink Cython interface.

89057f9

Simplify Buffer close.

00b60eb

Refactor DeviceMemoryResource.__init__.

ecc9405

Move Buffer into a separate module.

228936b

Refactors DeviceMemoryResource IPC implementation.

9a86bde

Removes superfluous _uuid member of DeviceMemoryResource.

c7f6cde

Adds __all__ lists.

216b4fb

Prepend underscore to submodules, add a test for package contents.

6a30a39

Refactor IPC data of DMR into IPCData class.

229ddc6

General clean up.

0fd3ca9

Andy-Jost requested review from cpcloud, leofang and mdboom and removed request for cpcloud October 31, 2025 22:41

Andy-Jost added the cuda.core Everything related to the cuda.core module label Nov 4, 2025

rparolin reviewed Nov 7, 2025

View reviewed changes

Andy-Jost added 2 commits November 7, 2025 12:42

Rename _mr to _memory_resource. Change pointer types from intptr_t to…

7315e28

… uintptr_t.

Merge branch 'main' into memory-refactor

cf4dc9d

leofang assigned Andy-Jost Nov 10, 2025

leofang added this to the cuda.core beta 9 milestone Nov 10, 2025

leofang added enhancement Any code-related improvements P0 High priority - Must do! labels Nov 10, 2025

Merge branch 'main' into memory-refactor

19e4b8f

rparolin reviewed Nov 12, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_memory/_buffer.pyx Show resolved Hide resolved

rparolin reviewed Nov 12, 2025

View reviewed changes

Andy-Jost added 2 commits November 12, 2025 10:34

Merge remote-tracking branch 'origin/main' into memory-refactor

743b8a3

Rename files _dmr.* and _vmm.py to avoid abbreviations.

cce7f6c

Andy-Jost force-pushed the memory-refactor branch from b5ae007 to cce7f6c Compare November 12, 2025 18:34

Merge branch 'main' into memory-refactor

ff3820f

rparolin reviewed Nov 12, 2025

View reviewed changes

rparolin approved these changes Nov 12, 2025

View reviewed changes

leofang reviewed Nov 12, 2025

View reviewed changes

Andy-Jost enabled auto-merge (squash) November 12, 2025 19:09

leofang reviewed Nov 12, 2025

View reviewed changes

Andy-Jost merged commit f9df16f into NVIDIA:main Nov 12, 2025
57 checks passed


		# TODO: support creating this MR with flags that are later passed to cuMemHostAlloc?

		def allocate(self, size, stream=None) -> Buffer:

		if TYPE_CHECKING:
		from cuda.core.experimental._memory import Buffer, MemoryResource

Memory refactor #1205

Memory refactor #1205

Uh oh!

Conversation

Andy-Jost commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major refactoring of the memory package.

Uh oh!

copy-pr-bot bot commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Nov 7, 2025

Uh oh!

Andy-Jost commented Nov 10, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Andy-Jost commented Oct 31, 2025 •

edited

Loading

Andy-Jost Nov 12, 2025 •

edited

Loading

Andy-Jost Nov 12, 2025 •

edited

Loading

Andy-Jost Nov 12, 2025 •

edited

Loading