-
-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cudaLaunchHostFunc
#4338
Add cudaLaunchHostFunc
#4338
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,6 +61,9 @@ cdef extern from *: | |
driver.Stream stream, Error status, void* userData) | ||
ctypedef StreamCallbackDef* StreamCallback 'cudaStreamCallback_t' | ||
|
||
ctypedef void HostFnDef(void* userData) | ||
ctypedef HostFnDef* HostFn 'cudaHostFn_t' | ||
|
||
|
||
cdef extern from '../cupy_cuda_runtime.h' nogil: | ||
|
||
|
@@ -170,6 +173,7 @@ cdef extern from '../cupy_cuda_runtime.h' nogil: | |
int cudaStreamSynchronize(driver.Stream stream) | ||
int cudaStreamAddCallback(driver.Stream stream, StreamCallback callback, | ||
void* userData, unsigned int flags) | ||
int cudaLaunchHostFunc(driver.Stream stream, HostFn fn, void* userData) | ||
int cudaStreamQuery(driver.Stream stream) | ||
int cudaStreamWaitEvent(driver.Stream stream, driver.Event event, | ||
unsigned int flags) | ||
|
@@ -798,8 +802,18 @@ cdef _streamCallbackFunc(driver.Stream hStream, int status, | |
cpython.Py_DECREF(obj) | ||
|
||
|
||
cdef _HostFnFunc(void* func_arg) with gil: | ||
obj = <object>func_arg | ||
func, arg = obj | ||
func(arg) | ||
cpython.Py_DECREF(obj) | ||
|
||
|
||
cpdef streamAddCallback(intptr_t stream, callback, intptr_t arg, | ||
unsigned int flags=0): | ||
if _is_hip_environment and stream == 0: | ||
raise RuntimeError('HIP does not allow adding callbacks to the ' | ||
'default (null) stream') | ||
func_arg = (callback, arg) | ||
cpython.Py_INCREF(func_arg) | ||
with nogil: | ||
|
@@ -809,6 +823,21 @@ cpdef streamAddCallback(intptr_t stream, callback, intptr_t arg, | |
check_status(status) | ||
|
||
|
||
cpdef launchHostFunc(intptr_t stream, callback, intptr_t arg): | ||
if _is_hip_environment: | ||
raise RuntimeError('This feature is not supported on HIP') | ||
if CUDA_VERSION < 10000: | ||
raise RuntimeError('This feature is only supported on CUDA 10.0+') | ||
Comment on lines
+827
to
+830
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @takagi by inspecting the generated C file I noticed Cython (at least 0.29.21 that I'm using) has a nice property that for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds so nice! Would you check if it works with Cython 0.28.0 as, currently, CuPy requires Cython 0.28.0 or later to build it from its source? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @takagi, I just checked Cython 0.28.0 will also eliminate the dead code! This is the warning thrown during cythonizing (in both versions):
Though I think 0.28 is too outdated and should be avoided as much as possible (#4148). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can I ask you to post a new issue so other core members can find that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
|
||
func_arg = (callback, arg) | ||
cpython.Py_INCREF(func_arg) | ||
with nogil: | ||
status = cudaLaunchHostFunc( | ||
<driver.Stream>stream, <HostFn>_HostFnFunc, | ||
<void*>func_arg) | ||
check_status(status) | ||
|
||
|
||
cpdef streamQuery(intptr_t stream): | ||
return cudaStreamQuery(<driver.Stream>stream) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,6 +59,25 @@ def test_get_and_add_callback(self): | |
stream.synchronize() | ||
assert out == list(range(N)) | ||
|
||
@attr.gpu | ||
@unittest.skipIf(cuda.runtime.is_hip, 'HIP does not support this') | ||
@unittest.skipIf(cuda.driver.get_build_version() < 10000, | ||
'Only CUDA 10.0+ supports this') | ||
def test_launch_host_func(self): | ||
N = 100 | ||
cupy_arrays = [testing.shaped_random((2, 3)) for _ in range(N)] | ||
|
||
stream = cuda.Stream.null | ||
|
||
out = [] | ||
for i in range(N): | ||
numpy_array = cupy_arrays[i].get(stream=stream) | ||
stream.launch_host_func( | ||
lambda t: out.append(t[0]), (i, numpy_array)) | ||
|
||
stream.synchronize() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this test here be performed on an explicit stream too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is in fact done in the other test added below. It's just that the stream pointer is wrapped by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah right, I should have checked more than what GH showed to see how the stream was created, thanks for pointing that out and sorry for the false alarm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for checking! It's always good to have extra pairs of eyes 🙏 |
||
assert out == list(range(N)) | ||
|
||
@attr.gpu | ||
def test_with_statement(self): | ||
stream1 = cuda.Stream() | ||
|
@@ -93,11 +112,7 @@ def test_get_and_add_callback(self): | |
N = 100 | ||
cupy_arrays = [testing.shaped_random((2, 3)) for _ in range(N)] | ||
|
||
if not cuda.runtime.is_hip: | ||
stream = cuda.Stream.null | ||
else: | ||
# adding callbacks to the null stream in HIP would segfault... | ||
stream = cuda.Stream() | ||
stream = self.stream | ||
Comment on lines
-96
to
+115
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This fixes the mistake I made in #3835 🤦🏻♂️ |
||
|
||
out = [] | ||
for i in range(N): | ||
|
@@ -108,3 +123,22 @@ def test_get_and_add_callback(self): | |
|
||
stream.synchronize() | ||
assert out == list(range(N)) | ||
|
||
@attr.gpu | ||
@unittest.skipIf(cuda.runtime.is_hip, 'HIP does not support this') | ||
@unittest.skipIf(cuda.driver.get_build_version() < 10000, | ||
'Only CUDA 10.0+ supports this') | ||
def test_launch_host_func(self): | ||
N = 100 | ||
cupy_arrays = [testing.shaped_random((2, 3)) for _ in range(N)] | ||
|
||
stream = self.stream | ||
|
||
out = [] | ||
for i in range(N): | ||
numpy_array = cupy_arrays[i].get(stream=stream) | ||
stream.launch_host_func( | ||
lambda t: out.append(t[0]), (i, numpy_array)) | ||
|
||
stream.synchronize() | ||
assert out == list(range(N)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a naive question, but
add_callback
supports/requires 3 arguments: stream, error status and user data. Shouldn't it be the same here? I think for the test you proposed in #4322 (comment) it would be very useful to have the stream so that we can verify that it's indeed happening on the stream we're expecting.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I am following the requirement from
cudaLaunchHostFunc
that the host function should only have 1 argument instead of 3.For the test we would like to do there, we could either use
add_callback
+ a 3-arg callback, or use this new functionlaunch_host_func
+ a 1-arg callback, with the stream added as part of the arg:and check
stream.ptr
in the callback function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I missed the fact that's how
cudaLaunchHostFunc
should behave out of my ignorance of that function, thanks for clarifying!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, Peter! I didn't know this function before either. I became aware of it only because your PTDS PR made me wonder if it's possible to test it programmatically other than eyeballing
nvvp
/nsys
, and I started browsing the Runtime API doc 😄