-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid ::result_type
for partial sums in TBB reduce_by_key
#1998
Avoid ::result_type
for partial sums in TBB reduce_by_key
#1998
Conversation
🟨 CI finished in 3h 33m: Pass: 99%/250 | Total: 1d 11h | Avg: 8m 31s | Max: 42m 13s | Hits: 90%/247487
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
f735ab9
to
0de25a6
Compare
🟨 CI finished in 3h 52m: Pass: 99%/250 | Total: 5d 06h | Avg: 30m 17s | Max: 1h 09m | Hits: 39%/247487
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 15h 13m: Pass: 100%/250 | Total: 5d 06h | Avg: 30m 19s | Max: 1h 09m | Hits: 40%/248341
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
You've found another can of worms! 😄 We've had many discussions about these intermediate accumulator types, and IIRC we decided on determining the return type of https://github.com/NVIDIA/cccl/blob/main/cub/cub/device/dispatch/dispatch_scan.cuh#L237-L241 cccl/cub/cub/detail/type_traits.cuh Lines 65 to 67 in 92b4b0b
If we're going to break behavior, we should use the same approach in the new version, and see if there are any other algorithms that need to be fixed. It'd be worth discussing this in the team meeting before putting too much work in here in case we'd need to delay this to a major release. |
8af06ab
to
949cc83
Compare
I hope I get a prize at some point!
That's perfect, I will use this then!
I looked into
Fair enough, let's discuss it in a team meeting. |
This is also the approach taken by the standard, so this is what we should be doing. |
949cc83
to
e837d2b
Compare
We discussed this PR during the code review hour and @gevtushenko suggested to check for a nested We also discussed that we should move CUB's |
e3f8056
to
fc71d1d
Compare
🟩 CI finished in 3h 40m: Pass: 100%/250 | Total: 4d 23h | Avg: 28m 44s | Max: 1h 00m | Hits: 61%/248341
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
This allows us to get rid of partial_sum_type, which still uses the C++11-deprecated function object API ::result_type.
Co-authored-by: Georgii Evtushenko <evtushenko.georgy@gmail.com>
e781f99
to
d59607b
Compare
🟨 CI finished in 7h 28m: Pass: 95%/250 | Total: 1d 07h | Avg: 7m 38s | Max: 28m 26s | Hits: 90%/239877
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
::result_type
for partial sums in TBB reduce_by_key
🟩 CI finished in 19h 19m: Pass: 100%/250 | Total: 1d 10h | Avg: 8m 11s | Max: 28m 26s | Hits: 90%/250036
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | pycuda |
🏃 Runner counts (total jobs: 250)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
41 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
This allows us to get rid of
partial_sum_type
in the TBB backend forreduce_by_key
with no iniital value, which still uses the C++11-deprecated function object API::result_type
.This is potentially a breaking change, because users could, e.g., reduce a large sequence of
int8
using a binary function advertising a::result_type
ofint64
, which would now accumulate on anint8
variable, instead ofint64
. However, this seems to be the current behavior of the CUDA backend anyway, which always takes the iterator's value_type instead of querying the return type of the binary reduction function.