[CUB] Replace Shuffle(Up|Down|Index) with cuda::device::warp_shuffle#8159
[CUB] Replace Shuffle(Up|Down|Index) with cuda::device::warp_shuffle#8159fbusato wants to merge 7 commits intoNVIDIA:mainfrom
Shuffle(Up|Down|Index) with cuda::device::warp_shuffle#8159Conversation
😬 CI Workflow Results🟥 Finished in 1h 52m: Pass: 97%/298 | Total: 10d 08h | Max: 1h 50m | Hits: 81%/373048See results here. |
|
That's a very impactful change. I see it affects CUB's radix sort, reduce and scan. I am afraid we need to see benchmarks for all those algorithms on at least one architecture. Q: Are the new |
Unfortunately, instruction count does not always translate to runtime improvement. For a change to our algorithms, there either needs to be no changes in SASS or a benchmark that shows the impact. I know it's annoying, but letting a regression slip in would be even worse. We could try the new benchmark CI to do this ;) |
Description
Address the aliasing issue in:
SHUFFLE_{UP, DOWN, INDEX}#8117Additionally, deprecate legacy shuffle functions in favor of new CUDA device APIs.
Notes:
SHFLinstruction counts are preserved, reduce benchmark kernels show significantly fewer instructions (up to 20%)