Update CUDA Array Interface to v3 - Part 1 #4357

leofang · 2020-11-27T07:58:40Z

~~This PR is blocked by #4322 because we need to know how to handle per-thread default streams (PTDS).~~

UPDATE: This PR excludes PTDS from consideration as it's currently not supported in the codebase, see #4357 (comment). In the coming "Part 2" PR it'll be properly addressed.

I am propagating the upstream change in the CAI protocol numba/numba#5162 to CuPy. The most notable change in the update to v3 is the requirement of stream synchronization. CAI v3 specifies that

Producer should include the stream pointer (whenever applicable) on which the data pointed by CAI can be safely operated
Consumer by default must synchronize the stream, if given, at the interoperating point
Either Producer or Consumer can provide options to Users to overwrite this behavior (the protocol does not specify how this option should be implemented)

For the detailed definitions (Producer, Consumer, User, etc), see the CAI v3 documentation; note the capitalized nouns follow the CAI's definitions.

Accompanying numba/numba#5162 Numba introduces a new environment variable NUMBA_CUDA_ARRAY_INTERFACE_SYNC to avoid syncing when acting as a Consumer. Following this path, I introduce two environment variables for advanced users to overwrite the sync behaviors, both of which are defaulted to 1 to make it compliant with CAI v3. However, due to the mismatched concepts on "default streams" between Numba and CuPy, the effect is slightly different.

In a nutshell, they allows us to fully restore the old (status quo) behavior, as if the v3 update does not exist. Specifically,

As a Producer, CuPy would not export any stream if CUPY_CUDA_ARRAY_INTERFACE_EXPORT_STREAM is set to 0
As a Consumer, CuPy would not synchronize over any external streams provided through CAI if CUPY_CUDA_ARRAY_INTERFACE_SYNC is set to 0

This should make it as performant and give Users the full control as before (if so desired), while taking care the need of certain libraries (notably, mpi4py) in which none of CUDA API is accessible and thus the required synchronization cannot be performed.

The v3 protocol also made it clear about Users' responsibility of maintaining the lifetime of GPU arrays and streams for the purpose of utilizing CAI, so we can safely assume any given external stream is valid.

UPDATE 2: To make it easier to review, note the 3 test files touched in this PR examine different (though arguably overlapping) aspects of the CAI according to my interpretation:

tests/cupy_tests/core_tests/test_ndarray.py: Check CuPy's behavior as a Producer
tests/cupy_tests/creation_tests/test_from_data.py: Check CuPy's behavior as a Consumer
tests/cupy_tests/core_tests/test_ndarray_cuda_array_interface.py: Ensure various operations are correctly done when CAI is in play

cc: @jakirkham @pentschev

kmaehashi · 2020-11-30T05:45:11Z

We're thinking of backporting this to v8 to allow consuming CAI v3 safely in CuPy v8, but excluding PTDS support.

pentschev · 2020-11-30T11:24:55Z

We're thinking of backporting this to v8 to allow consuming CAI v3 safely in CuPy v8, but excluding PTDS support.

Sounds reasonable to me, @leofang I'm fine if you want to push this forward before we merge PTDS.

leofang · 2020-11-30T15:22:22Z

Sounds good. So perhaps this PR should split into two: the first one handles CAI v3 without PTDS (as @kmaehashi suggested) and will be backported (hopefully it's straightforward), and the second handles PTDS once #4322 is settled and will not be backported.

btw, note that the earliest Numba that supports CAI v3 is expected to be released early next year: numba/numba#5162 (review).

cupy/core/core.pyx

leofang · 2020-12-01T04:08:03Z

@kmaehashi @asi1024 @pentschev @jakirkham @kkraus14 @gmarkall I think this is ready. The PR description is updated. PTAL.

leofang · 2020-12-01T05:32:00Z

Jenkins, test this please

cupy/core/core.pyx

chainer-ci · 2020-12-01T06:58:08Z

Jenkins CI test (for commit 5f12098, target branch master) succeeded!

cupy/core/core.pyx

cupy/cuda/stream.pyx

tests/cupy_tests/core_tests/test_ndarray.py

Co-authored-by: Akifumi Imanishi <akifumi.imanishi@gmail.com>

leofang · 2020-12-03T07:54:43Z

Jenkins, test this please

chainer-ci · 2020-12-03T08:50:06Z

Jenkins CI test (for commit fc797e9, target branch master) succeeded!

cupy/core/core.pyx

asi1024 · 2020-12-16T05:40:49Z

Jenkins, test this please.

chainer-ci · 2020-12-16T07:39:38Z

Jenkins CI test (for commit 76bfcc3, target branch master) succeeded!

asi1024 · 2020-12-16T08:04:40Z

LGTM! Thanks!

Update CUDA Array Interface to v3 - Part 1

leofang added 4 commits November 27, 2020 02:54

update CUDA Array Interface to v3

999cc01

add doc for env var

2efc57f

expand tests

8c0fa33

more tests

551deaa

leofang mentioned this pull request Nov 29, 2020

Support for Per Thread Default Stream (PTDS) #4322

Merged

kmaehashi assigned asi1024 Nov 30, 2020

kmaehashi added cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch labels Nov 30, 2020

leofang added 6 commits November 30, 2020 15:51

fix export condition

14c301b

flake8

bff0c4c

clean up a bit

7acd753

add comment

a1de611

remove micro-optimization to be strictly compliant

f8ef740

simplify the support

65435ab

leofang commented Dec 1, 2020

View reviewed changes

cupy/core/core.pyx Outdated Show resolved Hide resolved

fix tests

5f12098

leofang changed the title ~~[WIP] Update CUDA Array Interface to v3~~ Update CUDA Array Interface to v3 - Part 1 Dec 1, 2020

leofang marked this pull request as ready for review December 1, 2020 04:03

kkraus14 reviewed Dec 1, 2020

View reviewed changes

cupy/core/core.pyx Outdated Show resolved Hide resolved

asi1024 reviewed Dec 1, 2020

View reviewed changes

cupy/core/core.pyx Show resolved Hide resolved

cupy/cuda/stream.pyx Outdated Show resolved Hide resolved

tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved

tests/cupy_tests/core_tests/test_ndarray.py Outdated Show resolved Hide resolved

pentschev reviewed Dec 1, 2020

View reviewed changes

leofang and others added 2 commits December 1, 2020 16:29

better stream repr

8d2abae

Co-authored-by: Akifumi Imanishi <akifumi.imanishi@gmail.com>

create streams in setUp()

ad95878

asi1024 added this to the v9.0.0b1 milestone Dec 2, 2020

leofang added 2 commits December 2, 2020 00:00

introduce CUPY_CUDA_ARRAY_INTERFACE_EXPORT_STREAM

56ede6d

fix table layout

fc797e9

asi1024 reviewed Dec 4, 2020

View reviewed changes

cupy/core/core.pyx Outdated Show resolved Hide resolved

cupy/core/core.pyx Outdated Show resolved Hide resolved

leofang added 4 commits December 9, 2020 23:48

switch to CUPY_CUDA_ARRAY_INTERFACE_EXPORT_VERSION

2faeaeb

[WIP] fix tests

0f2e3a2

export version and update tests

5651bc6

update doc

76bfcc3

asi1024 merged commit 5b35bea into cupy:master Dec 16, 2020

asi1024 mentioned this pull request Dec 16, 2020

[backport] Update CUDA Array Interface to v3 - Part 1 #4446

Merged

asi1024 added a commit to asi1024/cupy that referenced this pull request Dec 16, 2020

Merge pull request cupy#4357 from leofang/CAI_v3

ecea32a

Update CUDA Array Interface to v3 - Part 1

leofang deleted the CAI_v3 branch December 16, 2020 08:16

leofang mentioned this pull request Dec 19, 2020

[DOC] Add a synchronization example for CUDA Array Interface v3 mpi4py/mpi4py#10

Closed

leofang mentioned this pull request Feb 10, 2021

Update CUDA Array Interface to v3 - Part 2 #4659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CUDA Array Interface to v3 - Part 1 #4357

Update CUDA Array Interface to v3 - Part 1 #4357

leofang commented Nov 27, 2020 •

edited

Loading

kmaehashi commented Nov 30, 2020

pentschev commented Nov 30, 2020

leofang commented Nov 30, 2020

leofang commented Dec 1, 2020

leofang commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

leofang commented Dec 3, 2020

chainer-ci commented Dec 3, 2020

asi1024 commented Dec 16, 2020

chainer-ci commented Dec 16, 2020

asi1024 commented Dec 16, 2020

Update CUDA Array Interface to v3 - Part 1 #4357

Update CUDA Array Interface to v3 - Part 1 #4357

Conversation

leofang commented Nov 27, 2020 • edited Loading

kmaehashi commented Nov 30, 2020

pentschev commented Nov 30, 2020

leofang commented Nov 30, 2020

leofang commented Dec 1, 2020

leofang commented Dec 1, 2020

chainer-ci commented Dec 1, 2020

leofang commented Dec 3, 2020

chainer-ci commented Dec 3, 2020

asi1024 commented Dec 16, 2020

chainer-ci commented Dec 16, 2020

asi1024 commented Dec 16, 2020

leofang commented Nov 27, 2020 •

edited

Loading