Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DeviceSegmentedRadixSort #879

Open
gevtushenko opened this issue Oct 5, 2021 · 0 comments
Open

Optimize DeviceSegmentedRadixSort #879

gevtushenko opened this issue Oct 5, 2021 · 0 comments
Labels
cub For all items related to CUB

Comments

@gevtushenko
Copy link
Collaborator

During the development of the new segmented sort, I extracted an AgentSegmentedRadixSort class. It's mostly based on the existing DeviceSegmentedRadixSort implementation. The only differences are:

  1. while (current_bit < end_bit) loop is moved from the host to the device side.
  2. if the segment data fit into shared memory, BlockRadixSort is used.

The combination of these changes gives about 6x speedup on RTX3090 and up to 7x on RTX2080 for segments with up to 5k elements. Unfortunately, the case of large segments is also affected. Since the new code requires a different number of registers, the speedup/slowdown is unpredictable. For some input data types/segment sizes, I got about 14% improvement. In few cases, I've noticed a 40% slowdown. Although the median speedup was around 0.996, more research is required.

When the slowdowns of the large segments sorting are addressed, we should use AgentSegmentedRadixSort as the DeviceSegmentedRadixSort implementation.

@jrhemstad jrhemstad added the cub For all items related to CUB label Feb 22, 2023
@alliepiper alliepiper removed their assignment Feb 23, 2023
@jarmak-nv jarmak-nv transferred this issue from NVIDIA/cub Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cub For all items related to CUB
Projects
Status: No status
Development

No branches or pull requests

3 participants