-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup CUB util_type.cuh #1863
Conversation
8d0ed55
to
1612b94
Compare
cub/cub/agent/agent_histogram.cuh
Outdated
cub::detail::conditional_t<std::is_pointer<SampleIteratorT>::value, | ||
CacheModifiedInputIterator<LOAD_MODIFIER, SampleT, OffsetT>, | ||
SampleIteratorT>; | ||
::cuda::std::__conditional_t<std::is_pointer<SampleIteratorT>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: There is a potential compile time improvement here.
conditional_t
requies both alternatives to be instantiated. However, we have an alternative ::cuda::std::_If
with the same interface that only instantiates the taken alternative.
In most cases this does not make a difference but here we could avoid instantiation of CacheModifiedInputIterator<LOAD_MODIFIER, SampleT, OffsetT>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason ::cuda::std::_If
does not require instantiation is because the If and Else types only appear in alias declarations and never occur in class template parameters, right?
1612b94
to
72be64b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: What's the difference between
::cuda:::std::conditional_t
::cuda::std::__conditional_t
::cuda::std::_If
All three of these are used to replace cub::detail::conditional_t
in this PR. Can we just use one of these consistently, or are there differences?
I see @miscco's comment about the lazy vs. eager eval with _If
, should we just be using this everywhere?
I would argue that |
Thanks for the explanation! I'm still fairly unfamiliar with libcu++ implementation details 😅 That makes sense and I agree with switching to |
869f4cd
to
7fc7838
Compare
Applied. |
🟨 CI finished in 10h 50m: Pass: 99%/249 | Total: 4d 21h | Avg: 28m 23s | Max: 59m 43s | Hits: 54%/246866
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
7fc7838
to
0122db7
Compare
🟨 CI finished in 2h 35m: Pass: 99%/249 | Total: 4d 19h | Avg: 27m 54s | Max: 52m 45s | Hits: 54%/248107
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 5h 04m: Pass: 100%/249 | Total: 4d 20h | Avg: 27m 57s | Max: 52m 45s | Hits: 54%/248963
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
0122db7
to
0436028
Compare
🟩 CI finished in 3h 20m: Pass: 100%/249 | Total: 4d 18h | Avg: 27m 31s | Max: 56m 10s | Hits: 63%/248963
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
0436028
to
9235530
Compare
🟨 CI finished in 5h 38m: Pass: 99%/249 | Total: 5d 06h | Avg: 30m 21s | Max: 1h 14m | Hits: 28%/247587
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
🟩 CI finished in 6h 38m: Pass: 100%/249 | Total: 5d 06h | Avg: 30m 23s | Max: 1h 14m | Hits: 28%/248439
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
64f2031
to
25d4efa
Compare
🟨 CI finished in 3h 12m: Pass: 99%/249 | Total: 4d 20h | Avg: 28m 03s | Max: 53m 10s | Hits: 55%/247587
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
Pull Request is not mergeable
🟩 CI finished in 4h 12m: Pass: 100%/249 | Total: 4d 20h | Avg: 28m 04s | Max: 53m 10s | Hits: 55%/248439
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
_If does not need to instantiate the type not selected.
25d4efa
to
f3af556
Compare
🟩 CI finished in 4h 00m: Pass: 100%/249 | Total: 4d 22h | Avg: 28m 38s | Max: 1h 00m | Hits: 54%/248439
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental |
🏃 Runner counts (total jobs: 249)
# | Runner |
---|---|
178 | linux-amd64-cpu16 |
40 | linux-amd64-gpu-v100-latest-1 |
16 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the cleanup! The readability has significantly improved.
I left one comment. Feel free to merge once it's addressed.
@@ -58,12 +59,6 @@ _CCCL_DIAG_POP | |||
# endif // !_CCCL_CUDACC_BELOW_11_8 | |||
#endif // _CCCL_HAS_NV_BF16 | |||
|
|||
#if !defined(_CCCL_COMPILER_NVRTC) | |||
# include <iterator> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
important: inclusion of <iterator>
seems to have dissapeared. Let's return it for !NVRTC case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Will be addressed here: #1929 1929
Follow-up fix for PR NVIDIA#1863
This PR attempts to cleanup and improve CUB's
util_type.cuh
.