-
Notifications
You must be signed in to change notification settings - Fork 221
Memory refactor #1205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory refactor #1205
Conversation
| def _clear(self): | ||
| self._ptr = 0 | ||
| self._size = 0 | ||
| self._mr = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider renaming _mr -> _memory_resource or mem_resource.
| stream: Stream | None = None | ||
| ): | ||
| cdef Buffer self = Buffer.__new__(cls) | ||
| self._ptr = <intptr_t>(int(ptr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a uintptr_t or a uint64_t?
|
/ok to test cf4dc9d |
|
/ok to test 19e4b8f |
| if self._memory_resource is None: | ||
| raise ValueError("a destination buffer must be provided (this " | ||
| "buffer does not have a memory_resource)") | ||
| dst = self._memory_resource.allocate(src_size, stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is a common idiom in python? My tendency would be to assert and require the user to pass in a buffer that fits expectations on size and alignment rather than doing it for them from a clarity of memory ownership perspective. I appreciate that this is not as large a concern in Python as it is in other systems programming languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2c: it would be better to require the buffer, as you suggest.
| cdef size_t dst_size = self._size | ||
| cdef size_t src_size = src._size | ||
|
|
||
| if src_size != dst_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to guard against if size is zero? To avoid kicking off a memcpy operation at all in that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the POV of this change (refactoring), no need to change. More generally, I would prefer to NOT have that guard because the driver should take care of correctness and the caller can avoid the zero-size. No need to spend time checking it here, since it would pessimize the common case. I don't feel too strongly about it, though.
| """Return True if this buffer can be accessed by the CPU, otherwise False.""" | ||
| if self._memory_resource is not None: | ||
| return self._memory_resource.is_host_accessible | ||
| raise NotImplementedError("WIP: Currently this property only supports buffers with associated MemoryResource") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This something that we intend to change in the future? Should we remove the 'wip' from the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above. For this change I only copy code. Not saying anything directly to your point, though. If we want to change this I'm happy to prepare the PR.
| """ | ||
|
|
||
| @abc.abstractmethod | ||
| def allocate(self, size_t size, stream: Stream = None) -> Buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we need memory alignment support in these alloc functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure? The driver docs for cuMemAlloc say the pointer is "suitably aligned for any type of variable."
There is no similar statement for cuMemAllocAsync
b5ae007 to
cce7f6c
Compare
|
|
||
| # TODO: support creating this MR with flags that are later passed to cuMemHostAlloc? | ||
|
|
||
| def allocate(self, size, stream=None) -> Buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto alignment comment above.
rparolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. I left some general comments that I consider non-blocking.
@leofang Any major concerns you'd like to see address before this gets merged?
| if TYPE_CHECKING: | ||
| from cuda.core.experimental._memory import Buffer, MemoryResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but we want to avoid using TYPE_CHECKING: #468
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something to eliminate? If so, I can prepare a separate change to remove it everywhere.
| ] | ||
| d = {} | ||
| exec("from cuda.core.experimental._memory import *", d) # noqa: S102 | ||
| d = {k: v for k, v in d.items() if not k.startswith("__")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: We should exclude everything starting with one underscore, not just those with two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could change that. When I originally wrote this _SynchronousMemoryResource was in the mix, too.
|
/ok to test ff3820f |
| from ._buffer import * # noqa: F403 | ||
| from ._device_memory_resource import * # noqa: F403 | ||
| from ._ipc import * # noqa: F403 | ||
| from ._legacy import * # noqa: F403 | ||
| from ._virtual_memory_resource import * # noqa: F403 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Would be better to call out what's being imported. Maintaining __all__ or having to chasing after each module for their __all__ is not fun. I don't recall we ever maintain __all__ in any other modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I have the opposite preference. E.g., if I add something to, say, _buffer I find it easier to update the __all__ list there rather than in a separate file.
|
Major refactoring of the memory package.
Overview
This PR refactors the
_memory.pyxmodule into a dedicated package (_memory/) to address its growing size and complexity, which were hindering further development. The primary goals are to physically separate the code into more manageable submodules, simplify the internal logic, and enhance the overall structure, including the addition of.pxdheaders for better Cython integration.Major Changes
_memory.pyxinto submodules, the major ones being the following:_buffer.*_dmr.*_ipc.*_vmm.*.pxd) for public definitions to improve modularity and type safety.DeviceMemoryResourceto isolate IPC-related code, reducing coupling.IPCDataclass to encapsulate relevant data members and eliminating a redundantuuidfield.Minor Improvements
__all__lists to modules for explicit control over exports._handleinstead of_mempool_handle).