Add support for virtual shared memory to DispatchReduceByKey#5440
Merged
elstehle merged 5 commits intoNVIDIA:mainfrom Aug 12, 2025
Merged
Add support for virtual shared memory to DispatchReduceByKey#5440elstehle merged 5 commits intoNVIDIA:mainfrom
DispatchReduceByKey#5440elstehle merged 5 commits intoNVIDIA:mainfrom
Conversation
Contributor
|
I am not 100% certain this does not influence the perf of the generated kernels. Could you please post whether SASS changes for ordinary key types? Otherwise we need a quick benchmark I think. Thx! |
Contributor
Author
Valid concern. I had verified and ticked the box in the referenced issue. Will also add a comment on the PR description. |
bernhardmgruber
approved these changes
Aug 6, 2025
Contributor
🟨 CI finished in 2h 16m: Pass: 91%/162 | Total: 2d 08h | Avg: 20m 58s | Max: 2h 11m | Hits: 89%/152709
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 162)
| # | Runner |
|---|---|
| 93 | linux-amd64-cpu16 |
| 17 | linux-amd64-gpu-l4-latest-1 |
| 17 | windows-amd64-cpu16 |
| 10 | linux-arm64-cpu16 |
| 9 | linux-amd64-gpu-h100-latest-1 |
| 7 | linux-amd64-gpu-rtx2080-latest-1 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
18b5fec to
2bc5315
Compare
Contributor
🟨 CI finished in 2h 16m: Pass: 91%/162 | Total: 1d 04h | Avg: 10m 23s | Max: 2h 13m | Hits: 99%/152709
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 162)
| # | Runner |
|---|---|
| 93 | linux-amd64-cpu16 |
| 17 | linux-amd64-gpu-l4-latest-1 |
| 17 | windows-amd64-cpu16 |
| 10 | linux-arm64-cpu16 |
| 9 | linux-amd64-gpu-h100-latest-1 |
| 7 | linux-amd64-gpu-rtx2080-latest-1 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
Contributor
🟩 CI finished in 1h 51m: Pass: 100%/162 | Total: 3d 17h | Avg: 33m 05s | Max: 1h 49m | Hits: 76%/153019
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| libcu++ | |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| libcu++ | |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 162)
| # | Runner |
|---|---|
| 93 | linux-amd64-cpu16 |
| 17 | linux-amd64-gpu-l4-latest-1 |
| 17 | windows-amd64-cpu16 |
| 10 | linux-arm64-cpu16 |
| 9 | linux-amd64-gpu-h100-latest-1 |
| 7 | linux-amd64-gpu-rtx2080-latest-1 |
| 6 | linux-amd64-gpu-rtxa6000-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
shwina
pushed a commit
to shwina/cccl
that referenced
this pull request
Aug 19, 2025
…A#5440) * adds vsmem to reduce_by_key * adds tests for vsmem * fixes rle, which does not support vsmem yet * addresses review comments
davebayer
pushed a commit
to davebayer/cccl
that referenced
this pull request
Sep 23, 2025
…A#5440) * adds vsmem to reduce_by_key * adds tests for vsmem * fixes rle, which does not support vsmem yet * addresses review comments
2 tasks
bdice
pushed a commit
to bdice/cccl
that referenced
this pull request
Nov 21, 2025
…A#5440) * adds vsmem to reduce_by_key * adds tests for vsmem * fixes rle, which does not support vsmem yet * addresses review comments
bernhardmgruber
pushed a commit
that referenced
this pull request
Nov 23, 2025
* Adds support for large number of items to `DeviceRunLengthEncode::NonTrivialRuns` (#5252) * streaming non trivial runs * change global offset computation * fixes style * integrate latest bench and test changes * addresses review comments * replaces getters with member var * Add support for virtual shared memory to `DispatchReduceByKey` (#5440) * adds vsmem to reduce_by_key * adds tests for vsmem * fixes rle, which does not support vsmem yet * addresses review comments * Fixes non-default-constructible iterators for large number of items types in `DeviceRunLengthEncode::Encode` (#6451) * adds tests for non default constructible iterators * fixes non default constructible iterators in rle * Simplify generation of `streaming_context` for run_length_encode * Reinstate regression test * Revert test/benchmark changes Co-authored-by: Elias Stehle <3958403+elstehle@users.noreply.github.com> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com> Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #5438
I verified that for our existing
reduce_by_keytests that sass didn't change - except for the extra vsmem kernel parameter.