Implement the new tuning API for DeviceRadixSort#6767
Implement the new tuning API for DeviceRadixSort#6767bernhardmgruber merged 7 commits intoNVIDIA:mainfrom
DeviceRadixSort#6767Conversation
DeviceRadixSort
d16f5b0 to
3263d2c
Compare
miscco
left a comment
There was a problem hiding this comment.
I love how much this cleans everything up
66dfd43 to
b0fe51f
Compare
0d56d8d to
56c437d
Compare
|
/ok to test 56c437d |
This comment has been minimized.
This comment has been minimized.
c583277 to
172c115
Compare
8ad87dc to
800b936
Compare
|
/ok to test 800b936 |
This comment has been minimized.
This comment has been minimized.
800b936 to
7e722a2
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test 7e722a2 |
This comment has been minimized.
This comment has been minimized.
|
/ok to test 6cefe64 |
6cefe64 to
74089a2
Compare
|
/ok to test 74089a2 |
This comment has been minimized.
This comment has been minimized.
c09c7cc to
212d86d
Compare
04994b7 to
f358513
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
91a8322 to
7c7f33e
Compare
|
There are SASS differences for |
This comment has been minimized.
This comment has been minimized.
Fixed |
This comment has been minimized.
This comment has been minimized.
d54e8bb to
e4c2c21
Compare
This comment has been minimized.
This comment has been minimized.
| // SPDX-FileCopyrightText: Copyright (c) 2011-2023, NVIDIA CORPORATION. All rights reserved. | ||
| // SPDX-License-Identifier: BSD-3 |
There was a problem hiding this comment.
I believe this is the wrong license?
There was a problem hiding this comment.
I pulled most of the code out of the benchmark files keys.cu and pairs.cu so I retained the license.
| __launch_bounds__(int(ALT_DIGIT_BITS ? PolicySelector{}(::cuda::arch_id{CUB_PTX_ARCH / 10}).alt_upsweep.block_threads | ||
| : PolicySelector{}(::cuda::arch_id{CUB_PTX_ARCH / 10}).upsweep.block_threads)) |
There was a problem hiding this comment.
I am strongly wondering why this is not part of the PolicySelector class? Is there any reason we have to pass this individually?
There was a problem hiding this comment.
Because the kernel is instantiated twice, where ALT_DIGIT_BITS is once true and once false. The logic here picks the corresponding tuning based on which kernel instantiation we have.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
5034581 to
8dedb3f
Compare
🥳 CI Workflow Results🟩 Finished in 1d 04h: Pass: 100%/93 | Total: 5d 04h | Max: 4h 54m | Hits: 46%/91677See results here. |
PR is not fully ready yet, but ready to take a first round of review.
cub.test.device.radix_sort_keys.lid_0.key_bits_16passesif constexpron the onesweep algorithm in the dispatcherpolicy_selector_from_hub_policycub::DeviceRadixSort#7282cub.bench.radix_sort.keys.basefor SMs70;80;90;100cub.bench.radix_sort.pairs.basefor SMs70;80;90;100I cannot SASS check benchmarks for < SM70, because
nvbench_helper.cudoes not compile there with:But it's somewhat ok, since we don't officially support <SM75 anyway
Fixes: #6676