-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast IndexIterator
for ChainerX CUDA
#8360
Conversation
IndexIterator
for CUDAIndexIterator
for CUDA
IndexIterator
for CUDAIndexIterator
for CUDA
b4ac302
to
3fb14c8
Compare
PTAL |
Jenkins, test this please. |
3fb14c8
to
4588b6e
Compare
One of the things I don't like, is that we have GPU specific code in a code that should be generic. Probably overriding it with a specific |
Jenkins CI test (for commit 3fb14c8, target branch master) succeeded! |
Regarding the test. # This is more than 2**31 elements
a=chainerx.zeros(shape=(64,32,6*1024*1024*128//(16*16)), dtype=chainerx.int8, device='cuda:0')
a=a.swapaxes(2,0)
a+=1
assert not a.is_contiguous
assert a.sum() == a.shape[0]*a.shape[1]*a.shape[2] The test should be skipped if the allocation fails due to the GPU not having enough memory? |
This PR depends on #8389. |
Now #8389 is merged. Could you add tests? |
Sure, I have them already, let me just rebase and push |
c78c8fb
to
cce2c8b
Compare
PTAL |
Jenkins, test this please. |
Jenkins, test this please. |
4 similar comments
Jenkins, test this please. |
Jenkins, test this please. |
Jenkins, test this please. |
Jenkins, test this please. |
Jenkins CI test (for commit d7a1168, target branch master) succeeded! |
flexCI, test this please. |
Jenkins CI test (for commit 627df80, target branch master) succeeded! |
flexCI, test this please. |
Jenkins CI test (for commit 9dcf904, target branch master) succeeded! |
flexCI, test this please. |
1 similar comment
flexCI, test this please. |
Jenkins CI test (for commit bbcbbdc, target branch master) succeeded! |
flexCI, test this please. |
Jenkins CI test (for commit 75c9fb8, target branch master) succeeded! |
Jenkins, test this please. |
Jenkins CI test (for commit e26af2a, target branch master) succeeded! |
Jenkins, test this please |
f2fc19e
to
941d5e5
Compare
Jenkins, test this please |
Jenkins CI test (for commit 941d5e5, target branch master) succeeded! |
Travis failure seems unrelated to this PR. (ref. #8481) |
LGTM! |
IndexIterator
for CUDAIndexIterator
for ChainerX CUDA
Jenkins CI test (for commit 941d5e5, target branch master) succeeded! |
Thanks to @asi1024 @shinh
ChainerX indexer used pretty expensive int64 division and modulo operations when calculating array indexes on CUDA.
This was noticeable when arrays were not contiguous, severely affecting the execution time of even simple kernels as
ElementWise
ones.This PR replaces the code for index calculation with the same one as Cupy.
In the following test time for chainerx is reduced from 0.70 secs to 0.27