Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CUDA Array Interface to v3 - Part 1 #4357

Merged
merged 19 commits into from
Dec 16, 2020
Merged

Conversation

leofang
Copy link
Member

@leofang leofang commented Nov 27, 2020

This PR is blocked by #4322 because we need to know how to handle per-thread default streams (PTDS).

UPDATE: This PR excludes PTDS from consideration as it's currently not supported in the codebase, see #4357 (comment). In the coming "Part 2" PR it'll be properly addressed.

I am propagating the upstream change in the CAI protocol numba/numba#5162 to CuPy. The most notable change in the update to v3 is the requirement of stream synchronization. CAI v3 specifies that

  • Producer should include the stream pointer (whenever applicable) on which the data pointed by CAI can be safely operated
  • Consumer by default must synchronize the stream, if given, at the interoperating point
  • Either Producer or Consumer can provide options to Users to overwrite this behavior (the protocol does not specify how this option should be implemented)

For the detailed definitions (Producer, Consumer, User, etc), see the CAI v3 documentation; note the capitalized nouns follow the CAI's definitions.

Accompanying numba/numba#5162 Numba introduces a new environment variable NUMBA_CUDA_ARRAY_INTERFACE_SYNC to avoid syncing when acting as a Consumer. Following this path, I introduce two environment variables for advanced users to overwrite the sync behaviors, both of which are defaulted to 1 to make it compliant with CAI v3. However, due to the mismatched concepts on "default streams" between Numba and CuPy, the effect is slightly different.

In a nutshell, they allows us to fully restore the old (status quo) behavior, as if the v3 update does not exist. Specifically,

  • As a Producer, CuPy would not export any stream if CUPY_CUDA_ARRAY_INTERFACE_EXPORT_STREAM is set to 0
  • As a Consumer, CuPy would not synchronize over any external streams provided through CAI if CUPY_CUDA_ARRAY_INTERFACE_SYNC is set to 0

This should make it as performant and give Users the full control as before (if so desired), while taking care the need of certain libraries (notably, mpi4py) in which none of CUDA API is accessible and thus the required synchronization cannot be performed.

The v3 protocol also made it clear about Users' responsibility of maintaining the lifetime of GPU arrays and streams for the purpose of utilizing CAI, so we can safely assume any given external stream is valid.

UPDATE 2: To make it easier to review, note the 3 test files touched in this PR examine different (though arguably overlapping) aspects of the CAI according to my interpretation:

  • tests/cupy_tests/core_tests/test_ndarray.py: Check CuPy's behavior as a Producer
  • tests/cupy_tests/creation_tests/test_from_data.py: Check CuPy's behavior as a Consumer
  • tests/cupy_tests/core_tests/test_ndarray_cuda_array_interface.py: Ensure various operations are correctly done when CAI is in play

cc: @jakirkham @pentschev

@kmaehashi kmaehashi added cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch labels Nov 30, 2020
@kmaehashi
Copy link
Member

We're thinking of backporting this to v8 to allow consuming CAI v3 safely in CuPy v8, but excluding PTDS support.

@pentschev
Copy link
Member

We're thinking of backporting this to v8 to allow consuming CAI v3 safely in CuPy v8, but excluding PTDS support.

Sounds reasonable to me, @leofang I'm fine if you want to push this forward before we merge PTDS.

@leofang
Copy link
Member Author

leofang commented Nov 30, 2020

Sounds good. So perhaps this PR should split into two: the first one handles CAI v3 without PTDS (as @kmaehashi suggested) and will be backported (hopefully it's straightforward), and the second handles PTDS once #4322 is settled and will not be backported.

btw, note that the earliest Numba that supports CAI v3 is expected to be released early next year: numba/numba#5162 (review).

cupy/core/core.pyx Outdated Show resolved Hide resolved
@leofang leofang changed the title [WIP] Update CUDA Array Interface to v3 Update CUDA Array Interface to v3 - Part 1 Dec 1, 2020
@leofang leofang marked this pull request as ready for review December 1, 2020 04:03
@leofang
Copy link
Member Author

leofang commented Dec 1, 2020

@kmaehashi @asi1024 @pentschev @jakirkham @kkraus14 @gmarkall I think this is ready. The PR description is updated. PTAL.

@leofang
Copy link
Member Author

leofang commented Dec 1, 2020

Jenkins, test this please

cupy/core/core.pyx Outdated Show resolved Hide resolved
@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5f12098, target branch master) succeeded!

cupy/core/core.pyx Show resolved Hide resolved
cupy/cuda/stream.pyx Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved
leofang and others added 2 commits December 1, 2020 16:29
Co-authored-by: Akifumi Imanishi <akifumi.imanishi@gmail.com>
@asi1024 asi1024 added this to the v9.0.0b1 milestone Dec 2, 2020
@leofang
Copy link
Member Author

leofang commented Dec 3, 2020

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit fc797e9, target branch master) succeeded!

cupy/core/core.pyx Outdated Show resolved Hide resolved
cupy/core/core.pyx Outdated Show resolved Hide resolved
@asi1024
Copy link
Member

asi1024 commented Dec 16, 2020

Jenkins, test this please.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 76bfcc3, target branch master) succeeded!

@asi1024
Copy link
Member

asi1024 commented Dec 16, 2020

LGTM! Thanks!

@asi1024 asi1024 merged commit 5b35bea into cupy:master Dec 16, 2020
asi1024 added a commit to asi1024/cupy that referenced this pull request Dec 16, 2020
Update CUDA Array Interface to v3 - Part 1
@leofang leofang deleted the CAI_v3 branch December 16, 2020 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants