Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fixes for CUB block reduction kernels #3520

Merged
merged 1 commit into from Jul 22, 2020

Conversation

leofang
Copy link
Member

@leofang leofang commented Jul 1, 2020

  1. Remove all of the type constraints: I remember I set the limitations due to some errors when running the full test suite, but I could no longer reproduce it (with the latest master). @grlee77's new norm kernels might also need the support for complex numbers.
  2. Add a possible exception to the optimizer: during the optuna optimization CUDADriverError could be raised due to out of resource. This was first observed in Performance boost: CUB-backed _SimpleReductionKernel #3244 (comment), and I thought by constraining the search range it'd be remedied, but today I encountered it a few more times for different tasks, so apparently this is necessary.
    After adding this, I see that the error is gracefully handled:
[I 2020-06-30 22:29:07,612] Finished trial#1 with value: inf with parameters: {'block_size_log': 9, 'items_per_thread': 28}. Best is trial#0 with value: 0.0029116286219972552.
  1. Allow compiler exceptions to propagate upward.
  2. (UPDATE) Make complex<T> (almost) obey the rule of three (to fix fp16 -> complex conversion): This is basically a follow-up of Sync the headers in cupy/core/include/cupy/complex/ with upstream? #2629 and Update thrust::complex headers with a bug fix #2741. It turns out that by ensuring the rule of three (except for the destructor, which is trivial), we get the float16 -> complex<T> conversion for free (through C++ implicit conversion fp16->fp32->complex) without additional change. I should have done this when working on Update thrust::complex headers with a bug fix #2741...😢 Note the changes are in line with the Thrust implementation.
  3. (UPDATE) Fix tests

@kmaehashi kmaehashi added this to the v8.0.0rc1 milestone Jul 1, 2020
@leofang leofang changed the title [WIP] Small fixes for CUB block reduction kernels Small fixes for CUB block reduction kernels Jul 1, 2020
@leofang leofang marked this pull request as ready for review July 1, 2020 05:24
@leofang
Copy link
Member Author

leofang commented Jul 1, 2020

I guess the CI is not yet set up to test CUB kernels? @asi1024 Any chance you have resource to test this PR locally for me? I checked by running pytest tests/cupy_tests/ -x and encountered no error. I don't remember what were the failed tests that motivated me to add the type checks...😅

@asi1024 asi1024 added the cat:enhancement Improvements to existing features label Jul 1, 2020
@asi1024
Copy link
Member

asi1024 commented Jul 1, 2020

We will merge this PR after the CI is set up (chainer/chainer-test#582 and #3461 are required)

optimize_impl = optimize_config.optimize_impl
best = optimize_impl(
optimize_config, target_func, suggest_func,
default_best={
'block_size_log': default_block_size_log,
'items_per_thread': default_items_per_thread,
})
}, ignore_error=(driver.CUDADriverError,))
Copy link
Member Author

@leofang leofang Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we have a better way to filter out the particular error...CUDADriverError seems too general, although I can't think of other possibilities for how it can be thrown without being out of resources.

@leofang
Copy link
Member Author

leofang commented Jul 2, 2020

We will merge this PR after the CI is set up (chainer/chainer-test#582 and #3461 are required)

I think #2584 too?

@asi1024 asi1024 added the st:blocked-by-another-pr Blocked by another pull-request label Jul 6, 2020
@leofang leofang marked this pull request as draft July 8, 2020 00:17
@leofang leofang marked this pull request as ready for review July 9, 2020 17:06
@leofang
Copy link
Member Author

leofang commented Jul 9, 2020

All the blockers are merged. Let's see what happens...
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit aecb521:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit aecb521, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Jul 9, 2020

Looks like the only 4 failures are due to no conversion from float16 to complex<float> exists (which would make sense), but I can't reproduce it locally...@asi1024 any thoughts?

@leofang
Copy link
Member Author

leofang commented Jul 9, 2020

Looks like the only 4 failures are due to no conversion from float16 to complex<float> exists (which would make sense)

I can reproduce it now. Working on a C++ solution. But, why doesn't the old implementation have this issue?

@leofang

This comment has been minimized.

@leofang
Copy link
Member Author

leofang commented Jul 10, 2020

A few chosen tests passed locally, let's see if the CI complains...
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 7c3f82e:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 7c3f82e, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Jul 10, 2020

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 1bf854f:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 1bf854f, target branch master) failed with status FAILURE.

@leofang

This comment has been minimized.

@leofang
Copy link
Member Author

leofang commented Jul 11, 2020

Test again to see if slowness persists...

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 1bf854f:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 1bf854f, target branch master) failed with status FAILURE.

@leofang

This comment has been minimized.

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 2dc432b:

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

OK so the non-compliance with the rule of three (#3612 (comment)) is indeed the issue, among with other minor bugs. It seems we don't need to do any refactoring. Let me try to clean things up, perhaps with a rebase. I will update the PR description after a few successful CI runs (just to be safe) to explain the problem.

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 2dc432b, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 2dc432b:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 2dc432b, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 8aef545:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 8aef545, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

cupy-py35 timed out, but the rest is good.
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 8aef545:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 8aef545, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

cupy-py3 has a server connection issue (again), but none of them failed! I'll run the CI one more time, and then rebase and squash to clean up.
Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 8aef545:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 8aef545, target branch master) succeeded!

1. Remove all of the type constraints
2. Add a possible exception for optimizer
3. Allow compiler exceptions to propagate upward
4. Make complex<T> (almost) obey the rule of three (to fix fp16 -> complex conversion)
5. Fix tests
@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

Rebased.

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 06bfd25:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 06bfd25, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

Jenkins, test this please

@pfn-ci-bot
Copy link
Collaborator

Successfully created a job for commit 06bfd25:

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 06bfd25, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Jul 17, 2020

@asi1024 I think this is ready. PR description updated too. PTAL.

@asi1024 asi1024 removed the st:blocked-by-another-pr Blocked by another pull-request label Jul 22, 2020
@asi1024
Copy link
Member

asi1024 commented Jul 22, 2020

LGTM!

@asi1024 asi1024 merged commit e028d74 into cupy:master Jul 22, 2020
@leofang leofang deleted the cub_block_c_in_f_out branch July 22, 2020 05:24
@leofang
Copy link
Member Author

leofang commented Jul 22, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants