-
Notifications
You must be signed in to change notification settings - Fork 223
Implements GraphMemoryResource #1235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…h capture state is not as expected.
…source methods to take any kind of stream-providing object. Update graph allocation tests.
…rapper (testing only)
|
|
||
| from cuda.bindings cimport cydriver | ||
| from cuda.core.experimental._memory._buffer cimport MemoryResource | ||
| from cuda.core.experimental._memory._device_memory_resource cimport DeviceMemoryResource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the cruft here.
| if _settable: | ||
| def fset(GraphMemoryResourceAttributes self, uint64_t value): | ||
| if value != 0: | ||
| raise AttributeError(f"Attribute {stub.__name__!r} may only be set to zero (got {value}).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The driver checks for this condition in cuDeviceSetGraphMemAttribute and issues a log message: "High watermark can only be reset to 0"
It's a shame we cannot access that message programmatically for use in the Python error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good news: CUDA 13 adds functions for error log management. It looks like cuLogsRegisterCallback might help here.
| ctx = self._get_current_context() | ||
| return Event._init(self._id, ctx, options, True) | ||
|
|
||
| def allocate(self, size, stream: Stream | None = None) -> Buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated Stream arguments to IsStreamT throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused by this change. Why did we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GraphBuilder object contains a stream and can be used in place of streams. In test_graph.py you can find lots of statements like this launch(gb, LaunchConfig(grid=1, block=1), empty_kernel).
This was only partly implemented. For instance, launch contained logic to convert arguments to streams, but nothing in the memory module did.
This change extends the support. In general, we should uniformly use IsStreamT for any API function that accepts a stream-like argument and then call Stream._init(arg) to convert it.
| raise ValueError( | ||
| f"stream must either be a Stream object or support __cuda_stream__ (got {type(stream)})" | ||
| ) from None | ||
| stream = Stream._init(stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The canonical way to invoke the __cuda_stream__ protocol now is to call Stream._init. It will either succeed in creating a Stream object or raise an exception.
| ... | ||
|
|
||
|
|
||
| cdef cydriver.CUstream _try_to_get_stream_ptr(obj: IsStreamT) except*: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block was moved below without changes.
| gb = device.create_graph_builder().begin_building(mode=mode) | ||
| with pytest.raises( | ||
| RuntimeError, | ||
| match=r"DeviceMemoryResource cannot perform memory operations on a capturing " | ||
| r"stream \(consider using GraphMemoryResource\)\.", | ||
| ): | ||
| dmr.allocate(1, stream=gb) | ||
| gb.end_building().complete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section illustrates a drawback of not using with contexts. Ignore the fact that the error is caught here (that's just for testing). If an exception is thrown during graph capture, control can easily escape without making a call to gb.end_building. That leaves the surrounding code in an unexpected state (capturing on).
| if stream is None: | ||
| raise ValueError("stream cannot be None, stream must either be a Stream object or support __cuda_stream__") | ||
| try: | ||
| stream_handle = stream.handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a (now fixed) bug. Any object with a handle attribute would use that preferentially over the stream protocol, even if the handle was not for a stream.
|
/ok to test 13e3dfb |
|
| dst : :obj:`~_memory.Buffer` | ||
| Source buffer to copy data from | ||
| stream : Stream | ||
| stream : IsStreamT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsStreamT is still a Stream type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsStreamT identifies any type that supports conversion to Stream via __cuda_stream__, including Stream itself.
| return self._ipc_data._alloc_handle | ||
|
|
||
| def allocate(self, size_t size, stream: Stream = None) -> Buffer: | ||
| def allocate(self, size_t size, stream: Optional[IsStreamT] = None) -> Buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this allocate member function need to take an alignment parameter?
| is used. | ||
| """ | ||
| DMR_deallocate(self, <uintptr_t>ptr, size, <Stream>stream) | ||
| stream = Stream._init(stream) if stream is not None else default_stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to capture the user error of neglecting to pass the stream that memory allocation came from? I'm thinking of a debug assert that verify if the allocated memory address came from the provided stream? Another potential option, is if the user neglects providing a stream, we do a look to determine where the address came from. Not sure if that is possible given the current lower level API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the stream only defines an ordering, it is legal to allocate on one stream and deallocate on another.
That said, I think the overall memory management scheme could be clarified and potentially improved.
| "a capturing stream (consider using GraphMemoryResource).") | ||
|
|
||
|
|
||
| cdef inline Buffer DMR_allocate(DeviceMemoryResource self, size_t size, Stream stream): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does inline do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cython will mark the function CYTHON_INLINE in the generated C++.
#ifndef CYTHON_INLINE
#if defined(__clang__)
#define CYTHON_INLINE __inline__ __attribute__ ((__unused__))
#else
#define CYTHON_INLINE inline
#endif
#endif
| """ | ||
|
|
||
| def __new__(cls, device_id: int | Device): | ||
| cdef int c_device_id = getattr(device_id, 'device_id', device_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I wish we had function overloading in the language..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It can also be done with decorators and/or pybind11. The main problem with general overloading for us is that the signature matching is done at runtime in Python and we are very sensitive to performance.
In many simple cases, arguments are orthogonal and we just want to reduce a set of types to a certain type. That's what we have here ({Device, int} -> Device) and elsewhere in this change with stream conversions (IsStreamT -> Stream). For these situations, I think the best solution is to make a standard conversion for each type: to convert "any supported object," s, to a Stream, use Stream._init(s). Similarly, we should probably update code like this to Device._init(d) or Device(d). We could also consider adding IsDeviceT and __cuda_device__ for symmetry.
(I would actually prefer to remove the _init methods in favor of constructors, but I understand why we have those and it's a different discussion, anyway.)
|
Comment: I'd recommend in future chunking large PRs into smaller reviewable chunks. IMHO, it makes it more approachable and consumable for reviewers. |
| raise NotImplementedError("WIP: https://github.com/NVIDIA/cuda-python/issues/189") | ||
|
|
||
| def create_stream(self, obj: IsStreamT | None = None, options: StreamOptions | None = None) -> Stream: | ||
| def create_stream(self, obj: Optional[IsStreamT] = None, options: StreamOptions | None = None) -> Stream: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: We seem to be introducing inconsistent syntax preference here, any reason for this change? The old typing and the new one are equivalent. Also, we still keep the typing for options the same.
I think if ruff were able to lint Cython code, this would have been flagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC in "modern" (in relative terms, since Python typing changes so rapidly) typing using | is preferred over Optional. We should get a typing expert to help review (I am not one) 🙂
| return Event._init(self._id, ctx, options, True) | ||
|
|
||
| def allocate(self, size, stream: Stream | None = None) -> Buffer: | ||
| def allocate(self, size, stream: Optional[IsStreamT] = None) -> Buffer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| def allocate(self, size, stream: Optional[IsStreamT] = None) -> Buffer: | |
| def allocate(self, size, stream: IsStreamT | None = None) -> Buffer: |
Description
Implements
GraphMemoryResourcefor memory interactions with the graph memory allocator. Allocations from this object succeed only when graph capturing is active. Conversely, allocations fromDeviceMemoryResourcenow raise an exception when graph capturing is active.A new test module is added.
This change also simplifies and extends the logic for accepting arbitrary stream parameters as objects implementing
__cuda_stream__. Support for that protocol was added in several places, allowingGraphBuilderto be used anywhere a stream is expected, including memory resource and buffer methods.closes #963