New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small fixes for CUB block reduction kernels #3520
Conversation
I guess the CI is not yet set up to test CUB kernels? @asi1024 Any chance you have resource to test this PR locally for me? I checked by running |
We will merge this PR after the CI is set up (chainer/chainer-test#582 and #3461 are required) |
optimize_impl = optimize_config.optimize_impl | ||
best = optimize_impl( | ||
optimize_config, target_func, suggest_func, | ||
default_best={ | ||
'block_size_log': default_block_size_log, | ||
'items_per_thread': default_items_per_thread, | ||
}) | ||
}, ignore_error=(driver.CUDADriverError,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we have a better way to filter out the particular error...CUDADriverError
seems too general, although I can't think of other possibilities for how it can be thrown without being out of resources.
I think #2584 too? |
All the blockers are merged. Let's see what happens... |
Successfully created a job for commit aecb521: |
Jenkins CI test (for commit aecb521, target branch master) failed with status FAILURE. |
Looks like the only 4 failures are due to no conversion from |
I can reproduce it now. Working on a C++ solution. But, why doesn't the old implementation have this issue? |
This comment has been minimized.
This comment has been minimized.
A few chosen tests passed locally, let's see if the CI complains... |
Successfully created a job for commit 7c3f82e: |
Jenkins CI test (for commit 7c3f82e, target branch master) failed with status FAILURE. |
Jenkins, test this please |
Successfully created a job for commit 1bf854f: |
Jenkins CI test (for commit 1bf854f, target branch master) failed with status FAILURE. |
This comment has been minimized.
This comment has been minimized.
Test again to see if slowness persists... Jenkins, test this please |
Successfully created a job for commit 1bf854f: |
Jenkins CI test (for commit 1bf854f, target branch master) failed with status FAILURE. |
This comment has been minimized.
This comment has been minimized.
Jenkins, test this please |
Successfully created a job for commit 2dc432b: |
OK so the non-compliance with the rule of three (#3612 (comment)) is indeed the issue, among with other minor bugs. It seems we don't need to do any refactoring. Let me try to clean things up, perhaps with a rebase. I will update the PR description after a few successful CI runs (just to be safe) to explain the problem. |
Jenkins CI test (for commit 2dc432b, target branch master) succeeded! |
Jenkins, test this please |
Successfully created a job for commit 2dc432b: |
Jenkins CI test (for commit 2dc432b, target branch master) succeeded! |
Jenkins, test this please |
Successfully created a job for commit 8aef545: |
Jenkins CI test (for commit 8aef545, target branch master) failed with status FAILURE. |
|
Successfully created a job for commit 8aef545: |
Jenkins CI test (for commit 8aef545, target branch master) failed with status FAILURE. |
|
Successfully created a job for commit 8aef545: |
Jenkins CI test (for commit 8aef545, target branch master) succeeded! |
1. Remove all of the type constraints 2. Add a possible exception for optimizer 3. Allow compiler exceptions to propagate upward 4. Make complex<T> (almost) obey the rule of three (to fix fp16 -> complex conversion) 5. Fix tests
8aef545
to
06bfd25
Compare
Rebased. Jenkins, test this please |
Successfully created a job for commit 06bfd25: |
Jenkins CI test (for commit 06bfd25, target branch master) succeeded! |
Jenkins, test this please |
Successfully created a job for commit 06bfd25: |
Jenkins CI test (for commit 06bfd25, target branch master) succeeded! |
@asi1024 I think this is ready. PR description updated too. PTAL. |
LGTM! |
Thanks! |
norm
kernels might also need the support for complex numbers.CUDADriverError
could be raised due to out of resource. This was first observed in Performance boost: CUB-backed_SimpleReductionKernel
#3244 (comment), and I thought by constraining the search range it'd be remedied, but today I encountered it a few more times for different tasks, so apparently this is necessary.After adding this, I see that the error is gracefully handled:
complex<T>
(almost) obey the rule of three (to fix fp16 -> complex conversion): This is basically a follow-up of Sync the headers incupy/core/include/cupy/complex/
with upstream? #2629 and Update thrust::complex headers with a bug fix #2741. It turns out that by ensuring the rule of three (except for the destructor, which is trivial), we get thefloat16
->complex<T>
conversion for free (through C++ implicit conversion fp16->fp32->complex) without additional change. I should have done this when working on Update thrust::complex headers with a bug fix #2741...😢 Note the changes are in line with the Thrust implementation.