Add `cudaLaunchHostFunc` #4338

leofang · 2020-11-25T21:51:36Z

According to the CUDA Runtime API documentation, cudaStreamAddCallback will be deprecated in favor of this new function, added since CUDA 10.0:

This function is slated for eventual deprecation and removal. If you do not require the callback to execute in case of a device error, consider using cudaLaunchHostFunc. Additionally, this function is not supported with cudaStreamBeginCapture and cudaStreamEndCapture, unlike cudaLaunchHostFunc.

TODO:

Skip tests for CUDA 9.x
Add stubs

leofang · 2020-11-25T21:54:19Z

tests/cupy_tests/cuda_tests/test_stream.py

-        if not cuda.runtime.is_hip:
-            stream = cuda.Stream.null
-        else:
-            # adding callbacks to the null stream in HIP would segfault...
-            stream = cuda.Stream()
+        stream = self.stream


This fixes the mistake I made in #3835 🤦🏻‍♂️ self.stream was totally ignored...

leofang · 2020-11-26T03:23:16Z

/test

pentschev

This looks good to me overall, I have a few questions only, but thanks for working on it @leofang !

pentschev · 2020-11-26T21:57:23Z

cupy/cuda/stream.pyx

+            callback (function): Callback function. It must take only one
+                argument (user data object), and returns nothing.


Possibly a naive question, but add_callback supports/requires 3 arguments: stream, error status and user data. Shouldn't it be the same here? I think for the test you proposed in #4322 (comment) it would be very useful to have the stream so that we can verify that it's indeed happening on the stream we're expecting.

Ah, I am following the requirement from cudaLaunchHostFunc that the host function should only have 1 argument instead of 3.

For the test we would like to do there, we could either use add_callback + a 3-arg callback, or use this new function launch_host_func + a 1-arg callback, with the stream added as part of the arg:

data = [] data.append(stream) stream.launch_host_func(callback, data)

and check stream.ptr in the callback function.

Got it, I missed the fact that's how cudaLaunchHostFunc should behave out of my ignorance of that function, thanks for clarifying!

No worries, Peter! I didn't know this function before either. I became aware of it only because your PTDS PR made me wonder if it's possible to test it programmatically other than eyeballing nvvp/nsys, and I started browsing the Runtime API doc 😄

pentschev · 2020-11-26T21:59:17Z

tests/cupy_tests/cuda_tests/test_stream.py

+            stream.launch_host_func(
+                lambda t: out.append(t[0]), (i, numpy_array))
+
+        stream.synchronize()


Should this test here be performed on an explicit stream too?

This is in fact done in the other test added below. It's just that the stream pointer is wrapped by ExternalStream there, but it should achieve what you had in mind.

Ah right, I should have checked more than what GH showed to see how the stream was created, thanks for pointing that out and sorry for the false alarm.

Thanks for checking! It's always good to have extra pairs of eyes 🙏

pentschev

LGTM, thanks @leofang !

takagi · 2020-12-01T04:33:40Z

pfnCI, test this please.

chainer-ci · 2020-12-01T04:43:47Z

Jenkins CI test (for commit d7dc7d3, target branch master) failed with status FAILURE.

leofang · 2020-12-01T04:49:15Z

cupy_backends/cuda/api/runtime.pyx

+    if _is_hip_environment:
+        raise RuntimeError('This feature is not supported on HIP')
+    if CUDA_VERSION < 10000:
+        raise RuntimeError('This feature is only supported on CUDA 10.0+')


@takagi by inspecting the generated C file I noticed Cython (at least 0.29.21 that I'm using) has a nice property that for the if conditions that can be determined at compile time, Cython will be smart enough to determine the rest of code is dead and eliminate it! (So for RTD and HIP, this function actually stops right after raising.) As a result, it's fine even if we don't add any stubs (for example I forgot to add stubs for CUDA 9.2 😂) Can we rely on this behavior?

Sounds so nice! Would you check if it works with Cython 0.28.0 as, currently, CuPy requires Cython 0.28.0 or later to build it from its source?

Hi @takagi, I just checked Cython 0.28.0 will also eliminate the dead code! This is the warning thrown during cythonizing (in both versions):

[50/59] Cythonizing cupy_backends/cuda/api/runtime.pyx warning: cupy_backends/cuda/api/runtime.pyx:832:16: Unreachable code

Though I think 0.28 is too outdated and should be avoided as much as possible (#4148).

Can I ask you to post a new issue so other core members can find that?

No problem, @takagi, see #4393.

leofang · 2020-12-01T16:22:22Z

Jenkins, test this please

chainer-ci · 2020-12-01T18:43:45Z

Jenkins CI test (for commit d7dc7d3, target branch master) succeeded!

takagi · 2020-12-03T14:20:43Z

LGTM!

leofang · 2020-12-04T09:07:44Z

Thanks @takagi @pentschev!

leofang added 3 commits November 25, 2020 16:33

add cudaLaunchHostFunc

76103b3

add tests

2cfc575

add doc

d0205b8

leofang commented Nov 25, 2020

View reviewed changes

more robust handling

d7dc7d3

leofang mentioned this pull request Nov 26, 2020

Support for Per Thread Default Stream (PTDS) #4322

Merged

kmaehashi assigned takagi Nov 26, 2020

kmaehashi added cat:feature New features/APIs prio:medium labels Nov 26, 2020

pentschev reviewed Nov 26, 2020

View reviewed changes

pentschev approved these changes Nov 27, 2020

View reviewed changes

leofang commented Dec 1, 2020

View reviewed changes

leofang mentioned this pull request Dec 1, 2020

Generating CUDA-related library FFI from its headers #4358

Open

leofang mentioned this pull request Dec 2, 2020

Relying on Cython to eliminate some C stubs that were needed for version compatibility #4393

Open

takagi added this to the v9.0.0b1 milestone Dec 3, 2020

takagi merged commit 60f37e8 into cupy:master Dec 3, 2020

leofang deleted the launch_host branch December 3, 2020 14:28

leofang mentioned this pull request Jan 25, 2021

Support stream capture #4567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cudaLaunchHostFunc` #4338

Add `cudaLaunchHostFunc` #4338

leofang commented Nov 25, 2020 •

edited

Loading

leofang Nov 25, 2020 •

edited

Loading

leofang commented Nov 26, 2020

pentschev left a comment

pentschev Nov 26, 2020

leofang Nov 27, 2020 •

edited

Loading

pentschev Nov 27, 2020

leofang Nov 28, 2020

pentschev Nov 26, 2020

leofang Nov 27, 2020

pentschev Nov 27, 2020

leofang Nov 28, 2020

pentschev left a comment

takagi commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

leofang Dec 1, 2020

takagi Dec 1, 2020

leofang Dec 1, 2020

takagi Dec 2, 2020

leofang Dec 2, 2020

takagi Dec 3, 2020

leofang commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

takagi commented Dec 3, 2020

leofang commented Dec 4, 2020

		callback (function): Callback function. It must take only one
		argument (user data object), and returns nothing.

Add cudaLaunchHostFunc #4338

Add cudaLaunchHostFunc #4338

Conversation

leofang commented Nov 25, 2020 • edited Loading

leofang Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

leofang commented Nov 26, 2020

pentschev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leofang Nov 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pentschev left a comment

Choose a reason for hiding this comment

takagi commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leofang commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

takagi commented Dec 3, 2020

leofang commented Dec 4, 2020

Add `cudaLaunchHostFunc` #4338

Add `cudaLaunchHostFunc` #4338

leofang commented Nov 25, 2020 •

edited

Loading

leofang Nov 25, 2020 •

edited

Loading

leofang Nov 27, 2020 •

edited

Loading