-
-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CUDA Array Interface to v3 - Part 1 #4357
Conversation
We're thinking of backporting this to v8 to allow consuming CAI v3 safely in CuPy v8, but excluding PTDS support. |
Sounds reasonable to me, @leofang I'm fine if you want to push this forward before we merge PTDS. |
Sounds good. So perhaps this PR should split into two: the first one handles CAI v3 without PTDS (as @kmaehashi suggested) and will be backported (hopefully it's straightforward), and the second handles PTDS once #4322 is settled and will not be backported. btw, note that the earliest Numba that supports CAI v3 is expected to be released early next year: numba/numba#5162 (review). |
@kmaehashi @asi1024 @pentschev @jakirkham @kkraus14 @gmarkall I think this is ready. The PR description is updated. PTAL. |
Jenkins, test this please |
Jenkins CI test (for commit 5f12098, target branch master) succeeded! |
Co-authored-by: Akifumi Imanishi <akifumi.imanishi@gmail.com>
Jenkins, test this please |
Jenkins CI test (for commit fc797e9, target branch master) succeeded! |
Jenkins, test this please. |
Jenkins CI test (for commit 76bfcc3, target branch master) succeeded! |
LGTM! Thanks! |
Update CUDA Array Interface to v3 - Part 1
This PR is blocked by #4322 because we need to know how to handle per-thread default streams (PTDS).UPDATE: This PR excludes PTDS from consideration as it's currently not supported in the codebase, see #4357 (comment). In the coming "Part 2" PR it'll be properly addressed.
I am propagating the upstream change in the CAI protocol numba/numba#5162 to CuPy. The most notable change in the update to v3 is the requirement of stream synchronization. CAI v3 specifies that
For the detailed definitions (Producer, Consumer, User, etc), see the CAI v3 documentation; note the capitalized nouns follow the CAI's definitions.
Accompanying numba/numba#5162 Numba introduces a new environment variable
NUMBA_CUDA_ARRAY_INTERFACE_SYNC
to avoid syncing when acting as a Consumer. Following this path, I introduce two environment variables for advanced users to overwrite the sync behaviors, both of which are defaulted to 1 to make it compliant with CAI v3. However, due to the mismatched concepts on "default streams" between Numba and CuPy, the effect is slightly different.In a nutshell, they allows us to fully restore the old (status quo) behavior, as if the v3 update does not exist. Specifically,
CUPY_CUDA_ARRAY_INTERFACE_EXPORT_STREAM
is set to 0CUPY_CUDA_ARRAY_INTERFACE_SYNC
is set to 0This should make it as performant and give Users the full control as before (if so desired), while taking care the need of certain libraries (notably, mpi4py) in which none of CUDA API is accessible and thus the required synchronization cannot be performed.
The v3 protocol also made it clear about Users' responsibility of maintaining the lifetime of GPU arrays and streams for the purpose of utilizing CAI, so we can safely assume any given external stream is valid.
UPDATE 2: To make it easier to review, note the 3 test files touched in this PR examine different (though arguably overlapping) aspects of the CAI according to my interpretation:
tests/cupy_tests/core_tests/test_ndarray.py
: Check CuPy's behavior as a Producertests/cupy_tests/creation_tests/test_from_data.py
: Check CuPy's behavior as a Consumertests/cupy_tests/core_tests/test_ndarray_cuda_array_interface.py
: Ensure various operations are correctly done when CAI is in playcc: @jakirkham @pentschev