Skip to content

Add support for virtual shared memory to DispatchReduceByKey#5440

Merged
elstehle merged 5 commits intoNVIDIA:mainfrom
elstehle:enh/vsmem-reduce-by-key
Aug 12, 2025
Merged

Add support for virtual shared memory to DispatchReduceByKey#5440
elstehle merged 5 commits intoNVIDIA:mainfrom
elstehle:enh/vsmem-reduce-by-key

Conversation

@elstehle
Copy link
Contributor

@elstehle elstehle commented Aug 6, 2025

Description

Closes #5438

I verified that for our existing reduce_by_key tests that sass didn't change - except for the extra vsmem kernel parameter.

@elstehle elstehle requested a review from a team as a code owner August 6, 2025 08:09
@elstehle elstehle requested a review from miscco August 6, 2025 08:09
@github-project-automation github-project-automation bot moved this to Todo in CCCL Aug 6, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Aug 6, 2025
@bernhardmgruber
Copy link
Contributor

I am not 100% certain this does not influence the perf of the generated kernels. Could you please post whether SASS changes for ordinary key types? Otherwise we need a quick benchmark I think. Thx!

@elstehle
Copy link
Contributor Author

elstehle commented Aug 6, 2025

I am not 100% certain this does not influence the perf of the generated kernels. Could you please post whether SASS changes for ordinary key types? Otherwise we need a quick benchmark I think. Thx!

Valid concern. I had verified and ticked the box in the referenced issue. Will also add a comment on the PR description.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2025

🟨 CI finished in 2h 16m: Pass: 91%/162 | Total: 2d 08h | Avg: 20m 58s | Max: 2h 11m | Hits: 89%/152709
  • 🟨 python: Pass: 36%/22 | Total: 1h 55m | Avg: 5m 14s | Max: 9m 47s

    🟨 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 15s | Avg:  9m 37s | Max:  9m 47s
      🟥 Test cuda.cccl.cooperative Pass:   0%/5   | Total: 22m 12s | Avg:  4m 26s | Max:  4m 34s
      🟨 Test cuda.cccl.examples Pass:  20%/5   | Total: 21m 17s | Avg:  4m 15s | Max:  4m 32s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 20m 10s | Avg:  4m 02s | Max:  4m 15s
      🟥 Test cuda.cccl.parallel Pass:   0%/5   | Total: 32m 20s | Avg:  6m 28s | Max:  9m 28s
    🟨 cpu
      🟨 amd64              Pass:  36%/22  | Total:  1h 55m | Avg:  5m 14s | Max:  9m 47s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  36%/22  | Total:  1h 55m | Avg:  5m 14s | Max:  9m 47s
    🟨 cxx
      🟨 GCC13              Pass:  36%/22  | Total:  1h 55m | Avg:  5m 14s | Max:  9m 47s
    🟨 cxx_family
      🟨 GCC                Pass:  36%/22  | Total:  1h 55m | Avg:  5m 14s | Max:  9m 47s
    🟨 ctk
      🟨 12.5               Pass:  50%/6   | Total: 25m 03s | Avg:  4m 10s | Max:  4m 27s
      🟥 12.8               Pass:   0%/2   | Total:  9m 18s | Avg:  4m 39s | Max:  4m 47s
      🟨 12.9               Pass:  35%/14  | Total:  1h 20m | Avg:  5m 46s | Max:  9m 47s
    🟨 cudacxx
      🟨 nvcc12.5           Pass:  50%/6   | Total: 25m 03s | Avg:  4m 10s | Max:  4m 27s
      🟥 nvcc12.8           Pass:   0%/2   | Total:  9m 18s | Avg:  4m 39s | Max:  4m 47s
      🟨 nvcc12.9           Pass:  35%/14  | Total:  1h 20m | Avg:  5m 46s | Max:  9m 47s
    🟨 gpu
      🟨 h100               Pass:  25%/4   | Total: 20m 38s | Avg:  5m 09s | Max:  8m 15s
      🟨 l4                 Pass:  38%/18  | Total:  1h 34m | Avg:  5m 15s | Max:  9m 47s
    🟨 py_version
      🟨 3.10               Pass:  33%/9   | Total: 45m 19s | Avg:  5m 02s | Max:  9m 47s
      🟨 3.13               Pass:  38%/13  | Total:  1h 09m | Avg:  5m 22s | Max:  9m 28s
    
  • 🟩 cub: Pass: 100%/50 | Total: 1d 08h | Avg: 38m 59s | Max: 1h 31m | Hits: 79%/52560

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 07h | Avg: 39m 11s | Max:  1h 31m | Hits:  79%/50000 
      🟩 arm64              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 03s | Max: 39m 37s | Hits:  81%/2560  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 29m | Avg: 41m 52s | Max:  1h 18m | Hits:  80%/6296  
      🟩 12.9               Pass: 100%/45  | Total:  1d 05h | Avg: 38m 40s | Max:  1h 31m | Hits:  79%/46264 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 37m 07s | Avg: 18m 33s | Max: 20m 01s | Hits:  86%/2207  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 29m | Avg: 41m 52s | Max:  1h 18m | Hits:  80%/6296  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 04h | Avg: 39m 36s | Max:  1h 31m | Hits:  78%/44057 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 37m 07s | Avg: 18m 33s | Max: 20m 01s | Hits:  86%/2207  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 07h | Avg: 39m 50s | Max:  1h 31m | Hits:  79%/50353 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 06m | Avg: 31m 44s | Max: 35m 36s | Hits:  81%/5122  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 07m | Avg: 33m 43s | Max: 34m 26s | Hits:  80%/2557  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 16m | Avg: 38m 27s | Max: 39m 13s | Hits:  80%/2557  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 08m | Avg: 34m 15s | Max: 35m 46s | Hits:  80%/2557  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 08s | Max: 36m 24s | Hits:  80%/2557  
      🟩 Clang19            Pass: 100%/7   | Total:  3h 00m | Avg: 25m 50s | Max: 37m 24s | Hits:  83%/6043  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 20m | Avg: 40m 28s | Max: 43m 32s | Hits:  79%/2560  
      🟩 GCC8               Pass: 100%/1   | Total: 42m 19s | Avg: 42m 19s | Max: 42m 19s | Hits:  79%/1280  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 16m | Avg: 38m 05s | Max: 39m 26s | Hits:  79%/2560  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 23m | Avg: 41m 36s | Max: 44m 30s | Hits:  77%/2561  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 19m | Avg: 39m 39s | Max: 39m 59s | Hits:  78%/2557  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 28m | Avg: 44m 21s | Max: 47m 10s | Hits:  78%/2557  
      🟩 GCC13              Pass: 100%/12  | Total:  5h 26m | Avg: 27m 14s | Max: 43m 56s | Hits:  81%/7683  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 45m | Avg:  1h 22m | Max:  1h 27m | Hits:  73%/2350  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  4h 53m | Avg:  1h 13m | Max:  1h 31m | Hits:  73%/4700  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  74%/2359  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  9h 50m | Avg: 31m 06s | Max: 39m 13s | Hits:  81%/21393 
      🟩 GCC                Pass: 100%/23  | Total: 12h 57m | Avg: 33m 48s | Max: 47m 10s | Hits:  79%/21758 
      🟩 MSVC               Pass: 100%/6   | Total:  7h 39m | Avg:  1h 16m | Max:  1h 31m | Hits:  73%/7050  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 02m | Hits:  74%/2359  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 09m | Avg: 23m 13s | Max: 28m 19s | Hits:  84%/1281  
      🟩 rtx2080            Pass: 100%/39  | Total:  1d 03h | Avg: 42m 54s | Max:  1h 31m | Hits:  79%/48719 
      🟩 rtxa6000           Pass: 100%/8   | Total:  3h 26m | Avg: 25m 48s | Max: 42m 53s | Hits:  79%/2560  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total:  1d 05h | Avg: 42m 08s | Max:  1h 31m | Hits:  79%/52560 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 12s | Avg: 23m 12s | Max: 23m 12s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 11s | Avg: 14m 11s | Max: 14m 11s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 17m | Avg: 25m 43s | Max: 28m 19s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 34s | Max: 24m 43s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 09m | Avg: 23m 13s | Max: 28m 19s | Hits:  84%/1281  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 28m | Avg: 44m 03s | Max:  1h 03m | Hits:  79%/2456  
      🟩 100;120            Pass: 100%/2   | Total:  1h 24m | Avg: 42m 11s | Max: 59m 21s | Hits:  79%/2456  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 37m | Avg: 44m 38s | Max:  1h 27m | Hits:  78%/26271 
      🟩 20                 Pass: 100%/29  | Total: 16h 51m | Avg: 34m 53s | Max:  1h 31m | Hits:  80%/26289 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 15h 29m | Avg: 18m 34s | Max: 1h 01m | Hits: 94%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 17m 48s | Avg:  8m 54s | Max: 12m 41s | Hits:  96%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 15h 11m | Avg: 18m 58s | Max:  1h 01m | Hits:  94%/80312 
      🟩 arm64              Pass: 100%/2   | Total: 18m 04s | Avg:  9m 02s | Max: 12m 19s | Hits:  97%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 23m | Avg: 16m 39s | Max: 53m 00s | Hits:  96%/9560  
      🟩 12.9               Pass: 100%/45  | Total: 14h 05m | Avg: 18m 47s | Max:  1h 01m | Hits:  94%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 10m 56s | Avg:  5m 28s | Max:  5m 32s | Hits: 100%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 23m | Avg: 16m 39s | Max: 53m 00s | Hits:  96%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 13h 54m | Avg: 19m 25s | Max:  1h 01m | Hits:  94%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 56s | Avg:  5m 28s | Max:  5m 32s | Hits: 100%/3826  
      🟩 nvcc               Pass: 100%/48  | Total: 15h 18m | Avg: 19m 07s | Max:  1h 01m | Hits:  94%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 40m 12s | Avg: 10m 03s | Max: 22m 20s | Hits:  98%/7652  
      🟩 Clang15            Pass: 100%/2   | Total: 18m 34s | Avg:  9m 17s | Max: 10m 50s | Hits:  96%/3826  
      🟩 Clang16            Pass: 100%/2   | Total: 24m 14s | Avg: 12m 07s | Max: 17m 54s | Hits:  96%/3826  
      🟩 Clang17            Pass: 100%/2   | Total: 23m 59s | Avg: 11m 59s | Max: 17m 54s | Hits:  96%/3826  
      🟩 Clang18            Pass: 100%/2   | Total: 23m 29s | Avg: 11m 44s | Max: 17m 19s | Hits:  96%/3826  
      🟩 Clang19            Pass: 100%/7   | Total: 50m 30s | Avg:  7m 12s | Max: 14m 44s | Hits:  98%/9565  
      🟩 GCC7               Pass: 100%/2   | Total: 24m 33s | Avg: 12m 16s | Max: 14m 47s | Hits:  95%/3828  
      🟩 GCC8               Pass: 100%/1   | Total: 26m 05s | Avg: 26m 05s | Max: 26m 05s | Hits:  92%/1914  
      🟩 GCC9               Pass: 100%/2   | Total: 24m 57s | Avg: 12m 28s | Max: 15m 30s | Hits:  95%/3828  
      🟩 GCC10              Pass: 100%/2   | Total: 39m 03s | Avg: 19m 31s | Max: 20m 01s | Hits:  93%/3828  
      🟩 GCC11              Pass: 100%/2   | Total: 35m 21s | Avg: 17m 40s | Max: 21m 50s | Hits:  93%/3828  
      🟩 GCC12              Pass: 100%/2   | Total: 29m 49s | Avg: 14m 54s | Max: 15m 02s | Hits:  93%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  1h 57m | Avg: 10m 43s | Max: 29m 13s | Hits:  96%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 45s | Max: 58m 30s | Hits:  86%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  3h 53m | Avg: 46m 38s | Max:  1h 01m | Hits:  89%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  1h 45m | Avg: 52m 53s | Max: 55m 16s | Hits:  89%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  3h 00m | Avg:  9m 31s | Max: 22m 20s | Hits:  97%/32521 
      🟩 GCC                Pass: 100%/22  | Total:  4h 57m | Avg: 13m 32s | Max: 29m 13s | Hits:  95%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  5h 44m | Avg: 49m 14s | Max:  1h 01m | Hits:  89%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 45m | Avg: 52m 53s | Max: 55m 16s | Hits:  89%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 13m 52s | Avg:  6m 56s | Max:  7m 36s | Hits:  99%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total: 12h 27m | Avg: 19m 41s | Max: 58m 30s | Hits:  94%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 47m | Avg: 16m 43s | Max:  1h 01m | Hits:  94%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total: 14h 24m | Avg: 20m 06s | Max:  1h 01m | Hits:  94%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 41m 10s | Avg: 13m 43s | Max: 33m 13s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 23m 36s | Avg:  5m 54s | Max:  7m 36s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 13m 52s | Avg:  6m 56s | Max:  7m 36s | Hits:  99%/1914  
      🟩 90;90a             Pass: 100%/2   | Total: 47m 05s | Avg: 23m 32s | Max: 40m 15s | Hits:  94%/3820  
      🟩 100;120            Pass: 100%/2   | Total: 50m 17s | Avg: 25m 08s | Max: 42m 44s | Hits:  94%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  8h 07m | Avg: 23m 13s | Max: 58m 30s | Hits:  93%/40160 
      🟩 20                 Pass: 100%/27  | Total:  7h 03m | Avg: 15m 41s | Max:  1h 01m | Hits:  96%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 16m | Avg: 7m 00s | Max: 31m 37s | Hits: 98%/15342

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  3h 02m | Avg:  7m 36s | Max: 31m 37s | Hits:  98%/12978 
      🟩 arm64              Pass: 100%/4   | Total: 13m 47s | Avg:  3m 26s | Max:  3m 56s | Hits:  99%/2364  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 17m 29s | Avg:  5m 49s | Max: 10m 48s | Hits:  98%/1470  
      🟩 12.9               Pass: 100%/25  | Total:  2h 58m | Avg:  7m 09s | Max: 31m 37s | Hits:  98%/13872 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 17m 29s | Avg:  5m 49s | Max: 10m 48s | Hits:  98%/1470  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  2h 58m | Avg:  7m 09s | Max: 31m 37s | Hits:  98%/13872 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 16m | Avg:  7m 00s | Max: 31m 37s | Hits:  98%/15342 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  6m 55s | Avg:  3m 27s | Max:  3m 44s | Hits:  99%/1184  
      🟩 Clang15            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s | Hits:  99%/591   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 48s | Avg:  3m 48s | Max:  3m 48s | Hits:  99%/591   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s | Hits:  99%/591   
      🟩 Clang18            Pass: 100%/1   | Total:  3m 43s | Avg:  3m 43s | Max:  3m 43s | Hits:  99%/591   
      🟩 Clang19            Pass: 100%/4   | Total: 35m 21s | Avg:  8m 50s | Max: 25m 07s | Hits:  99%/2364  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 28s | Avg:  3m 44s | Max:  3m 58s | Hits:  99%/1184  
      🟩 GCC11              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s | Hits:  99%/591   
      🟩 GCC12              Pass: 100%/1   | Total:  4m 06s | Avg:  4m 06s | Max:  4m 06s | Hits:  99%/591   
      🟩 GCC13              Pass: 100%/8   | Total:  1h 02m | Avg:  7m 49s | Max: 31m 37s | Hits:  99%/4728  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 10m 48s | Avg: 10m 48s | Max: 10m 48s | Hits:  95%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 34m 23s | Avg: 11m 27s | Max: 11m 38s | Hits:  95%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 15m 39s | Avg:  7m 49s | Max:  8m 01s | Hits:  97%/1178  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 57m 30s | Avg:  5m 45s | Max: 25m 07s | Hits:  99%/5912  
      🟩 GCC                Pass: 100%/12  | Total:  1h 18m | Avg:  6m 30s | Max: 31m 37s | Hits:  99%/7094  
      🟩 MSVC               Pass: 100%/4   | Total: 45m 11s | Avg: 11m 17s | Max: 11m 38s | Hits:  95%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 15m 39s | Avg:  7m 49s | Max:  8m 01s | Hits:  97%/1178  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  8m 14s | Hits:  99%/1182  
      🟩 rtx2080            Pass: 100%/26  | Total:  3h 04m | Avg:  7m 06s | Max: 31m 37s | Hits:  98%/14160 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  2h 11m | Avg:  5m 15s | Max: 11m 38s | Hits:  98%/13569 
      🟩 Test               Pass: 100%/3   | Total:  1h 04m | Avg: 21m 39s | Max: 31m 37s | Hits:  99%/1773  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 11m 35s | Avg:  5m 47s | Max:  8m 14s | Hits:  99%/1182  
      🟩 90;90a             Pass: 100%/2   | Total: 15m 24s | Avg:  7m 42s | Max: 11m 38s | Hits:  98%/881   
      🟩 100;120            Pass: 100%/2   | Total: 15m 11s | Avg:  7m 35s | Max: 11m 17s | Hits:  97%/881   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 14m 09s | Avg:  4m 43s | Max:  7m 38s | Hits:  98%/1771  
      🟩 20                 Pass: 100%/25  | Total:  3h 02m | Avg:  7m 17s | Max: 31m 37s | Hits:  98%/13571 
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 2h 45m | Avg: 41m 18s | Max: 2h 11m | Hits: 98%/668

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total:  2h 45m | Avg: 41m 18s | Max:  2h 11m | Hits:  98%/668   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 15m 34s | Avg: 15m 34s | Max: 15m 34s | Hits:  98%/167   
      🟩 l4                 Pass: 100%/1   | Total: 16m 31s | Avg: 16m 31s | Max: 16m 31s | Hits:  98%/167   
      🟩 rtx2080            Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  2h 11m | Hits:  98%/334   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s | Hits:  98%/167   
      🟩 Test               Pass: 100%/3   | Total:  2h 43m | Avg: 54m 23s | Max:  2h 11m | Hits:  98%/501   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 26m 04s | Avg: 6m 31s | Max: 7m 59s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 26m 04s | Avg:  6m 31s | Max:  7m 59s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  7m 51s
      🟩 12.9               Pass: 100%/2   | Total: 15m 53s | Avg:  7m 56s | Max:  7m 59s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 10m 11s | Avg:  5m 05s | Max:  7m 51s
      🟩 nvcc12.9           Pass: 100%/2   | Total: 15m 53s | Avg:  7m 56s | Max:  7m 59s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 26m 04s | Avg:  6m 31s | Max:  7m 59s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  7m 51s | Avg:  7m 51s | Max:  7m 51s
      🟩 Clang19            Pass: 100%/1   | Total:  7m 54s | Avg:  7m 54s | Max:  7m 54s
      🟩 GCC12              Pass: 100%/1   | Total:  2m 20s | Avg:  2m 20s | Max:  2m 20s
      🟩 GCC13              Pass: 100%/1   | Total:  7m 59s | Avg:  7m 59s | Max:  7m 59s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 15m 45s | Avg:  7m 52s | Max:  7m 54s
      🟩 GCC                Pass: 100%/2   | Total: 10m 19s | Avg:  5m 09s | Max:  7m 59s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 26m 04s | Avg:  6m 31s | Max:  7m 59s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 26m 04s | Avg:  6m 31s | Max:  7m 59s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 15m 17s | Avg: 3m 49s | Max: 4m 14s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 03s | Avg:  4m 01s | Max:  4m 14s
      🟩 arm64              Pass: 100%/2   | Total:  7m 14s | Avg:  3m 37s | Max:  3m 42s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 15m 17s | Avg:  3m 49s | Max:  4m 14s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  7m 46s | Avg:  3m 53s | Max:  4m 14s
      🟩 20                 Pass: 100%/2   | Total:  7m 31s | Avg:  3m 45s | Max:  3m 49s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@elstehle elstehle force-pushed the enh/vsmem-reduce-by-key branch from 18b5fec to 2bc5315 Compare August 6, 2025 13:26
@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2025

🟨 CI finished in 2h 16m: Pass: 91%/162 | Total: 1d 04h | Avg: 10m 23s | Max: 2h 13m | Hits: 99%/152709
  • 🟨 python: Pass: 36%/22 | Total: 1h 54m | Avg: 5m 11s | Max: 9m 38s

    🟨 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 38s
      🟥 Test cuda.cccl.cooperative Pass:   0%/5   | Total: 23m 22s | Avg:  4m 40s | Max:  5m 26s
      🟨 Test cuda.cccl.examples Pass:  20%/5   | Total: 20m 41s | Avg:  4m 08s | Max:  4m 17s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 20m 18s | Avg:  4m 03s | Max:  4m 37s
      🟥 Test cuda.cccl.parallel Pass:   0%/5   | Total: 30m 47s | Avg:  6m 09s | Max:  9m 09s
    🟨 cpu
      🟨 amd64              Pass:  36%/22  | Total:  1h 54m | Avg:  5m 11s | Max:  9m 38s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  36%/22  | Total:  1h 54m | Avg:  5m 11s | Max:  9m 38s
    🟨 cxx
      🟨 GCC13              Pass:  36%/22  | Total:  1h 54m | Avg:  5m 11s | Max:  9m 38s
    🟨 cxx_family
      🟨 GCC                Pass:  36%/22  | Total:  1h 54m | Avg:  5m 11s | Max:  9m 38s
    🟨 ctk
      🟨 12.5               Pass:  50%/6   | Total: 24m 44s | Avg:  4m 07s | Max:  4m 28s
      🟥 12.8               Pass:   0%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 38s
      🟨 12.9               Pass:  35%/14  | Total:  1h 20m | Avg:  5m 44s | Max:  9m 38s
    🟨 cudacxx
      🟨 nvcc12.5           Pass:  50%/6   | Total: 24m 44s | Avg:  4m 07s | Max:  4m 28s
      🟥 nvcc12.8           Pass:   0%/2   | Total:  9m 04s | Avg:  4m 32s | Max:  4m 38s
      🟨 nvcc12.9           Pass:  35%/14  | Total:  1h 20m | Avg:  5m 44s | Max:  9m 38s
    🟨 gpu
      🟨 h100               Pass:  25%/4   | Total: 21m 15s | Avg:  5m 18s | Max:  8m 03s
      🟨 l4                 Pass:  38%/18  | Total:  1h 32m | Avg:  5m 09s | Max:  9m 38s
    🟨 py_version
      🟨 3.10               Pass:  33%/9   | Total: 44m 40s | Avg:  4m 57s | Max:  9m 26s
      🟨 3.13               Pass:  38%/13  | Total:  1h 09m | Avg:  5m 20s | Max:  9m 38s
    
  • 🟩 cub: Pass: 100%/50 | Total: 11h 05m | Avg: 13m 18s | Max: 35m 03s | Hits: 99%/52560

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 10h 50m | Avg: 13m 32s | Max: 35m 03s | Hits:  99%/50000 
      🟩 arm64              Pass: 100%/2   | Total: 14m 55s | Avg:  7m 27s | Max:  8m 40s | Hits:  99%/2560  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 02m | Avg: 12m 27s | Max: 30m 16s | Hits:  99%/6296  
      🟩 12.9               Pass: 100%/45  | Total: 10h 02m | Avg: 13m 23s | Max: 35m 03s | Hits:  99%/46264 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 28s | Hits:  99%/2207  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 02m | Avg: 12m 27s | Max: 30m 16s | Hits:  99%/6296  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  9h 51m | Avg: 13m 45s | Max: 35m 03s | Hits:  99%/44057 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 10m 52s | Avg:  5m 26s | Max:  5m 28s | Hits:  99%/2207  
      🟩 nvcc               Pass: 100%/48  | Total: 10h 54m | Avg: 13m 37s | Max: 35m 03s | Hits:  99%/50353 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 29m 25s | Avg:  7m 21s | Max:  7m 53s | Hits:  99%/5122  
      🟩 Clang15            Pass: 100%/2   | Total: 15m 02s | Avg:  7m 31s | Max:  7m 44s | Hits:  99%/2557  
      🟩 Clang16            Pass: 100%/2   | Total: 14m 27s | Avg:  7m 13s | Max:  7m 30s | Hits:  99%/2557  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 09s | Avg:  7m 04s | Max:  7m 07s | Hits:  99%/2557  
      🟩 Clang18            Pass: 100%/2   | Total: 13m 48s | Avg:  6m 54s | Max:  6m 59s | Hits:  99%/2557  
      🟩 Clang19            Pass: 100%/7   | Total:  1h 15m | Avg: 10m 48s | Max: 24m 27s | Hits:  99%/6043  
      🟩 GCC7               Pass: 100%/2   | Total: 17m 05s | Avg:  8m 32s | Max:  8m 50s | Hits:  99%/2560  
      🟩 GCC8               Pass: 100%/1   | Total:  9m 24s | Avg:  9m 24s | Max:  9m 24s | Hits:  99%/1280  
      🟩 GCC9               Pass: 100%/2   | Total: 19m 24s | Avg:  9m 42s | Max:  9m 59s | Hits:  99%/2560  
      🟩 GCC10              Pass: 100%/2   | Total: 18m 03s | Avg:  9m 01s | Max:  9m 08s | Hits:  99%/2561  
      🟩 GCC11              Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 56s | Hits:  99%/2557  
      🟩 GCC12              Pass: 100%/2   | Total: 20m 50s | Avg: 10m 25s | Max: 11m 24s | Hits:  99%/2557  
      🟩 GCC13              Pass: 100%/12  | Total:  3h 07m | Avg: 15m 36s | Max: 28m 14s | Hits:  99%/7683  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 00m | Avg: 30m 12s | Max: 30m 16s | Hits:  98%/2350  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  2h 03m | Avg: 30m 59s | Max: 35m 03s | Hits:  98%/4700  
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 27m 03s | Avg: 13m 31s | Max: 13m 32s | Hits:  98%/2359  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 42m | Avg:  8m 33s | Max: 24m 27s | Hits:  99%/21393 
      🟩 GCC                Pass: 100%/23  | Total:  4h 51m | Avg: 12m 39s | Max: 28m 14s | Hits:  99%/21758 
      🟩 MSVC               Pass: 100%/6   | Total:  3h 04m | Avg: 30m 43s | Max: 35m 03s | Hits:  98%/7050  
      🟩 NVHPC              Pass: 100%/2   | Total: 27m 03s | Avg: 13m 31s | Max: 13m 32s | Hits:  98%/2359  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total: 59m 44s | Avg: 19m 54s | Max: 28m 14s | Hits:  99%/1281  
      🟩 rtx2080            Pass: 100%/39  | Total:  7h 41m | Avg: 11m 49s | Max: 35m 03s | Hits:  99%/48719 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 23m | Avg: 17m 59s | Max: 24m 27s | Hits:  99%/2560  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total:  8h 07m | Avg: 11m 35s | Max: 35m 03s | Hits:  99%/52560 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 22m 14s | Avg: 22m 14s | Max: 22m 14s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 52s | Avg: 14m 52s | Max: 14m 52s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 16m | Avg: 25m 37s | Max: 28m 14s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 21s | Max: 24m 45s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 59m 44s | Avg: 19m 54s | Max: 28m 14s | Hits:  99%/1281  
      🟩 90;90a             Pass: 100%/2   | Total: 38m 06s | Avg: 19m 03s | Max: 29m 43s | Hits:  99%/2456  
      🟩 100;120            Pass: 100%/2   | Total: 35m 16s | Avg: 17m 38s | Max: 26m 55s | Hits:  99%/2456  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 09m | Avg: 11m 53s | Max: 35m 03s | Hits:  99%/26271 
      🟩 20                 Pass: 100%/29  | Total:  6h 55m | Avg: 14m 19s | Max: 32m 18s | Hits:  99%/26289 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 8h 50m | Avg: 10m 37s | Max: 33m 02s | Hits: 99%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 13m 34s | Avg:  6m 47s | Max:  8m 27s | Hits:  99%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  8h 38m | Avg: 10m 48s | Max: 33m 02s | Hits:  99%/80312 
      🟩 arm64              Pass: 100%/2   | Total: 12m 09s | Avg:  6m 04s | Max:  7m 01s | Hits:  99%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total: 51m 52s | Avg: 10m 22s | Max: 26m 28s | Hits:  99%/9560  
      🟩 12.9               Pass: 100%/45  | Total:  7h 58m | Avg: 10m 38s | Max: 33m 02s | Hits:  99%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 46s | Hits: 100%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total: 51m 52s | Avg: 10m 22s | Max: 26m 28s | Hits:  99%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  7h 47m | Avg: 10m 52s | Max: 33m 02s | Hits:  99%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 22s | Avg:  5m 41s | Max:  5m 46s | Hits: 100%/3826  
      🟩 nvcc               Pass: 100%/48  | Total:  8h 39m | Avg: 10m 49s | Max: 33m 02s | Hits:  99%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 57s | Avg:  5m 44s | Max:  6m 05s | Hits: 100%/7652  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 48s | Avg:  5m 54s | Max:  5m 57s | Hits: 100%/3826  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 05s | Hits: 100%/3826  
      🟩 Clang17            Pass: 100%/2   | Total: 11m 46s | Avg:  5m 53s | Max:  5m 57s | Hits: 100%/3826  
      🟩 Clang18            Pass: 100%/2   | Total: 12m 00s | Avg:  6m 00s | Max:  6m 12s | Hits: 100%/3826  
      🟩 Clang19            Pass: 100%/7   | Total: 39m 03s | Avg:  5m 34s | Max:  6m 16s | Hits: 100%/9565  
      🟩 GCC7               Pass: 100%/2   | Total: 14m 15s | Avg:  7m 07s | Max:  7m 14s | Hits:  99%/3828  
      🟩 GCC8               Pass: 100%/1   | Total:  7m 23s | Avg:  7m 23s | Max:  7m 23s | Hits:  99%/1914  
      🟩 GCC9               Pass: 100%/2   | Total: 16m 15s | Avg:  8m 07s | Max:  8m 43s | Hits:  99%/3828  
      🟩 GCC10              Pass: 100%/2   | Total: 14m 32s | Avg:  7m 16s | Max:  7m 24s | Hits:  99%/3828  
      🟩 GCC11              Pass: 100%/2   | Total: 15m 28s | Avg:  7m 44s | Max:  7m 58s | Hits:  99%/3828  
      🟩 GCC12              Pass: 100%/2   | Total: 16m 20s | Avg:  8m 10s | Max:  8m 24s | Hits:  99%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  1h 14m | Avg:  6m 48s | Max:  8m 56s | Hits:  99%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total: 54m 34s | Avg: 27m 17s | Max: 28m 06s | Hits:  99%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  2h 25m | Avg: 29m 07s | Max: 32m 30s | Hits:  99%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  1h 01m | Avg: 30m 58s | Max: 33m 02s | Hits:  99%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 49m | Avg:  5m 45s | Max:  6m 16s | Hits: 100%/32521 
      🟩 GCC                Pass: 100%/22  | Total:  2h 39m | Avg:  7m 14s | Max:  8m 56s | Hits:  99%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  3h 20m | Avg: 28m 35s | Max: 32m 30s | Hits:  99%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 01m | Avg: 30m 58s | Max: 33m 02s | Hits:  99%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 14m 17s | Avg:  7m 08s | Max:  7m 51s | Hits:  99%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total:  6h 47m | Avg: 10m 42s | Max: 33m 02s | Hits:  99%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  1h 49m | Avg: 10m 56s | Max: 32m 30s | Hits:  99%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total:  7h 46m | Avg: 10m 50s | Max: 33m 02s | Hits:  99%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 40m 44s | Avg: 13m 34s | Max: 32m 30s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 24m 00s | Avg:  6m 00s | Max:  7m 51s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 14m 17s | Avg:  7m 08s | Max:  7m 51s | Hits:  99%/1914  
      🟩 90;90a             Pass: 100%/2   | Total: 34m 29s | Avg: 17m 14s | Max: 27m 34s | Hits:  99%/3820  
      🟩 100;120            Pass: 100%/2   | Total: 34m 16s | Avg: 17m 08s | Max: 27m 19s | Hits:  99%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  3h 49m | Avg: 10m 55s | Max: 29m 16s | Hits:  99%/40160 
      🟩 20                 Pass: 100%/27  | Total:  4h 47m | Avg: 10m 39s | Max: 33m 02s | Hits:  99%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 2h 35m | Avg: 5m 32s | Max: 11m 52s | Hits: 98%/15342

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  2h 20m | Avg:  5m 52s | Max: 11m 52s | Hits:  98%/12978 
      🟩 arm64              Pass: 100%/4   | Total: 14m 25s | Avg:  3m 36s | Max:  5m 40s | Hits:  97%/2364  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 18m 03s | Avg:  6m 01s | Max: 11m 25s | Hits:  98%/1470  
      🟩 12.9               Pass: 100%/25  | Total:  2h 17m | Avg:  5m 29s | Max: 11m 52s | Hits:  98%/13872 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 18m 03s | Avg:  6m 01s | Max: 11m 25s | Hits:  98%/1470  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  2h 17m | Avg:  5m 29s | Max: 11m 52s | Hits:  98%/13872 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  2h 35m | Avg:  5m 32s | Max: 11m 52s | Hits:  98%/15342 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  6m 23s | Avg:  3m 11s | Max:  3m 16s | Hits: 100%/1184  
      🟩 Clang15            Pass: 100%/1   | Total:  3m 32s | Avg:  3m 32s | Max:  3m 32s | Hits: 100%/591   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s | Hits: 100%/591   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 14s | Avg:  3m 14s | Max:  3m 14s | Hits: 100%/591   
      🟩 Clang18            Pass: 100%/1   | Total:  3m 20s | Avg:  3m 20s | Max:  3m 20s | Hits: 100%/591   
      🟩 Clang19            Pass: 100%/4   | Total: 16m 10s | Avg:  4m 02s | Max:  7m 29s | Hits: 100%/2364  
      🟩 GCC10              Pass: 100%/2   | Total:  6m 59s | Avg:  3m 29s | Max:  3m 31s | Hits:  99%/1184  
      🟩 GCC11              Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s | Hits:  99%/591   
      🟩 GCC12              Pass: 100%/1   | Total:  3m 51s | Avg:  3m 51s | Max:  3m 51s | Hits:  99%/591   
      🟩 GCC13              Pass: 100%/8   | Total: 45m 01s | Avg:  5m 37s | Max: 11m 23s | Hits:  97%/4728  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 11m 25s | Avg: 11m 25s | Max: 11m 25s | Hits:  95%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 33m 48s | Avg: 11m 16s | Max: 11m 52s | Hits:  95%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max:  7m 18s | Hits:  97%/1178  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 35m 59s | Avg:  3m 35s | Max:  7m 29s | Hits: 100%/5912  
      🟩 GCC                Pass: 100%/12  | Total: 59m 30s | Avg:  4m 57s | Max: 11m 23s | Hits:  98%/7094  
      🟩 MSVC               Pass: 100%/4   | Total: 45m 13s | Avg: 11m 18s | Max: 11m 52s | Hits:  95%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 14m 33s | Avg:  7m 16s | Max:  7m 18s | Hits:  97%/1178  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 13m 10s | Avg:  6m 35s | Max:  9m 59s | Hits:  94%/1182  
      🟩 rtx2080            Pass: 100%/26  | Total:  2h 22m | Avg:  5m 27s | Max: 11m 52s | Hits:  98%/14160 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  2h 06m | Avg:  5m 03s | Max: 11m 52s | Hits:  98%/13569 
      🟩 Test               Pass: 100%/3   | Total: 28m 51s | Avg:  9m 37s | Max: 11m 23s | Hits:  96%/1773  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 13m 10s | Avg:  6m 35s | Max:  9m 59s | Hits:  94%/1182  
      🟩 90;90a             Pass: 100%/2   | Total: 15m 13s | Avg:  7m 36s | Max: 11m 34s | Hits:  98%/881   
      🟩 100;120            Pass: 100%/2   | Total: 14m 09s | Avg:  7m 04s | Max: 10m 22s | Hits:  98%/881   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 13m 18s | Avg:  4m 26s | Max:  7m 15s | Hits:  99%/1771  
      🟩 20                 Pass: 100%/25  | Total:  2h 21m | Avg:  5m 40s | Max: 11m 52s | Hits:  98%/13571 
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 2h 51m | Avg: 42m 47s | Max: 2h 13m | Hits: 98%/668

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total:  2h 51m | Avg: 42m 47s | Max:  2h 13m | Hits:  98%/668   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 20m 03s | Avg: 20m 03s | Max: 20m 03s | Hits:  98%/167   
      🟩 l4                 Pass: 100%/1   | Total: 15m 58s | Avg: 15m 58s | Max: 15m 58s | Hits:  98%/167   
      🟩 rtx2080            Pass: 100%/2   | Total:  2h 15m | Avg:  1h 07m | Max:  2h 13m | Hits:  98%/334   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s | Hits:  98%/167   
      🟩 Test               Pass: 100%/3   | Total:  2h 49m | Avg: 56m 22s | Max:  2h 13m | Hits:  98%/501   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 31m 18s | Avg: 7m 49s | Max: 13m 28s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 31m 18s | Avg:  7m 49s | Max: 13m 28s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 15m 25s | Avg:  7m 42s | Max: 11m 46s
      🟩 12.9               Pass: 100%/2   | Total: 15m 53s | Avg:  7m 56s | Max: 13m 28s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 15m 25s | Avg:  7m 42s | Max: 11m 46s
      🟩 nvcc12.9           Pass: 100%/2   | Total: 15m 53s | Avg:  7m 56s | Max: 13m 28s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 31m 18s | Avg:  7m 49s | Max: 13m 28s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 11m 46s | Avg: 11m 46s | Max: 11m 46s
      🟩 Clang19            Pass: 100%/1   | Total: 13m 28s | Avg: 13m 28s | Max: 13m 28s
      🟩 GCC12              Pass: 100%/1   | Total:  3m 39s | Avg:  3m 39s | Max:  3m 39s
      🟩 GCC13              Pass: 100%/1   | Total:  2m 25s | Avg:  2m 25s | Max:  2m 25s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 25m 14s | Avg: 12m 37s | Max: 13m 28s
      🟩 GCC                Pass: 100%/2   | Total:  6m 04s | Avg:  3m 02s | Max:  3m 39s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 31m 18s | Avg:  7m 49s | Max: 13m 28s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 31m 18s | Avg:  7m 49s | Max: 13m 28s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 07s | Avg: 4m 01s | Max: 4m 27s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 27s
      🟩 arm64              Pass: 100%/2   | Total:  7m 19s | Avg:  3m 39s | Max:  3m 40s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 07s | Avg:  4m 01s | Max:  4m 27s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 06s | Avg:  4m 03s | Max:  4m 27s
      🟩 20                 Pass: 100%/2   | Total:  8m 01s | Avg:  4m 00s | Max:  4m 21s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@elstehle elstehle enabled auto-merge (squash) August 12, 2025 12:46
@github-actions
Copy link
Contributor

🟩 CI finished in 1h 51m: Pass: 100%/162 | Total: 3d 17h | Avg: 33m 05s | Max: 1h 49m | Hits: 76%/153019
  • 🟩 cub: Pass: 100%/50 | Total: 2d 00h | Avg: 58m 28s | Max: 1h 49m | Hits: 65%/52810

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 22h | Avg: 58m 21s | Max:  1h 49m | Hits:  65%/50238 
      🟩 arm64              Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 05m | Hits:  65%/2572  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 52m | Avg:  1h 10m | Max:  1h 49m | Hits:  65%/6326  
      🟩 12.9               Pass: 100%/45  | Total:  1d 18h | Avg: 57m 08s | Max:  1h 48m | Hits:  65%/46484 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 05m | Avg: 32m 58s | Max: 33m 24s | Hits:  71%/2217  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 52m | Avg:  1h 10m | Max:  1h 49m | Hits:  65%/6326  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 17h | Avg: 58m 15s | Max:  1h 48m | Hits:  65%/44267 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 05m | Avg: 32m 58s | Max: 33m 24s | Hits:  71%/2217  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 23h | Avg: 59m 32s | Max:  1h 49m | Hits:  65%/50593 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 01m | Avg:  1h 00m | Max:  1h 04m | Hits:  65%/5146  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 02m | Avg:  1h 01m | Max:  1h 03m | Hits:  65%/2569  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 00m | Hits:  65%/2569  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 09m | Hits:  65%/2569  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 27s | Max: 59m 47s | Hits:  65%/2569  
      🟩 Clang19            Pass: 100%/7   | Total:  4h 50m | Avg: 41m 28s | Max:  1h 02m | Hits:  67%/6071  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 06m | Hits:  65%/2572  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 08m | Avg:  1h 08m | Max:  1h 08m | Hits:  65%/1286  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 09m | Hits:  65%/2572  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 12m | Hits:  65%/2573  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 11m | Avg:  1h 05m | Max:  1h 07m | Hits:  65%/2569  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 08m | Hits:  65%/2569  
      🟩 GCC13              Pass: 100%/12  | Total:  7h 28m | Avg: 37m 22s | Max:  1h 05m | Hits:  65%/7719  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  3h 32m | Avg:  1h 46m | Max:  1h 49m | Hits:  65%/2362  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  5h 59m | Avg:  1h 29m | Max:  1h 48m | Hits:  65%/4724  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 16m | Hits:  64%/2371  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 16h 59m | Avg: 53m 38s | Max:  1h 09m | Hits:  65%/21493 
      🟩 GCC                Pass: 100%/23  | Total: 19h 39m | Avg: 51m 17s | Max:  1h 12m | Hits:  65%/21860 
      🟩 MSVC               Pass: 100%/6   | Total:  9h 32m | Avg:  1h 35m | Max:  1h 49m | Hits:  65%/7086  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 32m | Avg:  1h 16m | Max:  1h 16m | Hits:  64%/2371  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 18m | Avg: 26m 07s | Max: 30m 29s | Hits:  65%/1287  
      🟩 rtx2080            Pass: 100%/39  | Total:  1d 19h | Avg:  1h 06m | Max:  1h 49m | Hits:  65%/48951 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 15m | Avg: 31m 57s | Max:  1h 04m | Hits:  65%/2572  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total:  1d 21h | Avg:  1h 05m | Max:  1h 49m | Hits:  65%/52810 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 34s | Avg: 23m 34s | Max: 23m 34s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 34s | Avg: 14m 34s | Max: 14m 34s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 13m | Avg: 24m 34s | Max: 24m 57s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 04m | Avg: 21m 37s | Max: 23m 24s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 18m | Avg: 26m 07s | Max: 30m 29s | Hits:  65%/1287  
      🟩 90;90a             Pass: 100%/2   | Total:  2h 04m | Avg:  1h 02m | Max:  1h 18m | Hits:  65%/2468  
      🟩 100;120            Pass: 100%/2   | Total:  1h 59m | Avg: 59m 30s | Max:  1h 11m | Hits:  65%/2468  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  1d 00h | Avg:  1h 09m | Max:  1h 49m | Hits:  65%/26396 
      🟩 20                 Pass: 100%/29  | Total:  1d 00h | Avg: 50m 14s | Max:  1h 40m | Hits:  65%/26414 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 1d 07h | Avg: 38m 15s | Max: 1h 26m | Hits: 80%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 30s | Avg: 20m 45s | Max: 36m 20s | Hits:  80%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 06h | Avg: 38m 20s | Max:  1h 26m | Hits:  80%/80312 
      🟩 arm64              Pass: 100%/2   | Total:  1h 12m | Avg: 36m 22s | Max: 38m 59s | Hits:  80%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 20m | Avg: 40m 06s | Max:  1h 05m | Hits:  79%/9560  
      🟩 12.9               Pass: 100%/45  | Total:  1d 04h | Avg: 38m 03s | Max:  1h 26m | Hits:  80%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 54m 13s | Avg: 27m 06s | Max: 27m 09s | Hits:  80%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 20m | Avg: 40m 06s | Max:  1h 05m | Hits:  79%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 03h | Avg: 38m 34s | Max:  1h 26m | Hits:  80%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 13s | Avg: 27m 06s | Max: 27m 09s | Hits:  80%/3826  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 06h | Avg: 38m 43s | Max:  1h 26m | Hits:  80%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 26m | Avg: 36m 34s | Max: 42m 37s | Hits:  80%/7652  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 45s | Max: 37m 54s | Hits:  80%/3826  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 16m | Avg: 38m 06s | Max: 38m 46s | Hits:  80%/3826  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 49s | Max: 43m 13s | Hits:  80%/3826  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 12s | Max: 35m 24s | Hits:  80%/3826  
      🟩 Clang19            Pass: 100%/7   | Total:  2h 53m | Avg: 24m 47s | Max: 40m 23s | Hits:  80%/9565  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 13m | Avg: 36m 59s | Max: 40m 08s | Hits:  80%/3828  
      🟩 GCC8               Pass: 100%/1   | Total: 41m 02s | Avg: 41m 02s | Max: 41m 02s | Hits:  80%/1914  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 18m | Avg: 39m 07s | Max: 41m 51s | Hits:  80%/3828  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 22m | Avg: 41m 06s | Max: 45m 04s | Hits:  80%/3828  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 23m | Avg: 41m 49s | Max: 43m 59s | Hits:  80%/3828  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 14s | Max: 39m 32s | Hits:  80%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  4h 22m | Avg: 23m 49s | Max: 41m 24s | Hits:  80%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 07m | Hits:  76%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  4h 55m | Avg: 59m 00s | Max:  1h 11m | Hits:  80%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 26m | Hits:  70%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 10h 17m | Avg: 32m 30s | Max: 43m 13s | Hits:  80%/32521 
      🟩 GCC                Pass: 100%/22  | Total: 11h 39m | Avg: 31m 48s | Max: 45m 04s | Hits:  80%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  7h 08m | Avg:  1h 01m | Max:  1h 11m | Hits:  79%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 47m | Avg:  1h 23m | Max:  1h 26m | Hits:  70%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 30m 53s | Avg: 15m 26s | Max: 23m 13s | Hits:  80%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total:  1d 03h | Avg: 43m 14s | Max:  1h 26m | Hits:  79%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 59m | Avg: 23m 57s | Max:  1h 10m | Hits:  83%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total:  1d 06h | Avg: 42m 58s | Max:  1h 26m | Hits:  79%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 41m 22s | Avg: 13m 47s | Max: 33m 03s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 23m 49s | Avg:  5m 57s | Max:  7m 40s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 30m 53s | Avg: 15m 26s | Max: 23m 13s | Hits:  80%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 32m | Avg: 46m 24s | Max:  1h 02m | Hits:  78%/3820  
      🟩 100;120            Pass: 100%/2   | Total:  1h 26m | Avg: 43m 21s | Max: 57m 47s | Hits:  78%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 36m | Avg: 44m 36s | Max:  1h 21m | Hits:  79%/40160 
      🟩 20                 Pass: 100%/27  | Total: 15h 35m | Avg: 34m 38s | Max:  1h 26m | Hits:  80%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 39m | Avg: 7m 49s | Max: 14m 33s | Hits: 89%/15390

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  3h 17m | Avg:  8m 13s | Max: 14m 33s | Hits:  89%/13018 
      🟩 arm64              Pass: 100%/4   | Total: 21m 36s | Avg:  5m 24s | Max:  5m 57s | Hits:  89%/2372  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 24m 47s | Avg:  8m 15s | Max: 14m 33s | Hits:  87%/1474  
      🟩 12.9               Pass: 100%/25  | Total:  3h 14m | Avg:  7m 46s | Max: 14m 32s | Hits:  90%/13916 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 24m 47s | Avg:  8m 15s | Max: 14m 33s | Hits:  87%/1474  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  3h 14m | Avg:  7m 46s | Max: 14m 32s | Hits:  90%/13916 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 39m | Avg:  7m 49s | Max: 14m 33s | Hits:  89%/15390 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total: 10m 13s | Avg:  5m 06s | Max:  5m 51s | Hits:  90%/1188  
      🟩 Clang15            Pass: 100%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s | Hits:  89%/593   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 42s | Avg:  5m 42s | Max:  5m 42s | Hits:  89%/593   
      🟩 Clang17            Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s | Hits:  89%/593   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 58s | Avg:  5m 58s | Max:  5m 58s | Hits:  89%/593   
      🟩 Clang19            Pass: 100%/4   | Total: 26m 07s | Avg:  6m 31s | Max: 10m 35s | Hits:  92%/2372  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 34s | Avg:  6m 17s | Max:  6m 42s | Hits:  89%/1188  
      🟩 GCC11              Pass: 100%/1   | Total:  5m 49s | Avg:  5m 49s | Max:  5m 49s | Hits:  89%/593   
      🟩 GCC12              Pass: 100%/1   | Total:  6m 08s | Avg:  6m 08s | Max:  6m 08s | Hits:  89%/593   
      🟩 GCC13              Pass: 100%/8   | Total: 57m 00s | Avg:  7m 07s | Max: 11m 52s | Hits:  92%/4744  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 14m 33s | Avg: 14m 33s | Max: 14m 33s | Hits:  77%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 41m 47s | Avg: 13m 55s | Max: 14m 32s | Hits:  76%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 21m 18s | Avg: 10m 39s | Max: 11m 00s | Hits:  87%/1182  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 59m 51s | Avg:  5m 59s | Max: 10m 35s | Hits:  91%/5932  
      🟩 GCC                Pass: 100%/12  | Total:  1h 21m | Avg:  6m 47s | Max: 11m 52s | Hits:  91%/7118  
      🟩 MSVC               Pass: 100%/4   | Total: 56m 20s | Avg: 14m 05s | Max: 14m 33s | Hits:  76%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 21m 18s | Avg: 10m 39s | Max: 11m 00s | Hits:  87%/1182  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 14m 15s | Avg:  7m 07s | Max:  8m 57s | Hits:  94%/1186  
      🟩 rtx2080            Pass: 100%/26  | Total:  3h 24m | Avg:  7m 52s | Max: 14m 33s | Hits:  89%/14204 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  3h 07m | Avg:  7m 30s | Max: 14m 33s | Hits:  88%/13611 
      🟩 Test               Pass: 100%/3   | Total: 31m 24s | Avg: 10m 28s | Max: 11m 52s | Hits:  99%/1779  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 14m 15s | Avg:  7m 07s | Max:  8m 57s | Hits:  94%/1186  
      🟩 90;90a             Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 13m 55s | Hits:  85%/883   
      🟩 100;120            Pass: 100%/2   | Total: 19m 19s | Avg:  9m 39s | Max: 13m 20s | Hits:  85%/883   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 21m 08s | Avg:  7m 02s | Max: 10m 18s | Hits:  88%/1777  
      🟩 20                 Pass: 100%/25  | Total:  3h 17m | Avg:  7m 54s | Max: 14m 33s | Hits:  89%/13613 
    
  • 🟩 python: Pass: 100%/22 | Total: 3h 41m | Avg: 10m 02s | Max: 19m 34s

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  3h 41m | Avg: 10m 02s | Max: 19m 34s
    🟩 ctk
      🟩 12.5               Pass: 100%/6   | Total: 42m 49s | Avg:  7m 08s | Max: 13m 51s
      🟩 12.8               Pass: 100%/2   | Total: 38m 23s | Avg: 19m 11s | Max: 19m 30s
      🟩 12.9               Pass: 100%/14  | Total:  2h 19m | Avg:  9m 59s | Max: 19m 34s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/6   | Total: 42m 49s | Avg:  7m 08s | Max: 13m 51s
      🟩 nvcc12.8           Pass: 100%/2   | Total: 38m 23s | Avg: 19m 11s | Max: 19m 30s
      🟩 nvcc12.9           Pass: 100%/14  | Total:  2h 19m | Avg:  9m 59s | Max: 19m 34s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  3h 41m | Avg: 10m 02s | Max: 19m 34s
    🟩 cxx
      🟩 GCC13              Pass: 100%/22  | Total:  3h 41m | Avg: 10m 02s | Max: 19m 34s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/22  | Total:  3h 41m | Avg: 10m 02s | Max: 19m 34s
    🟩 gpu
      🟩 h100               Pass: 100%/4   | Total: 38m 24s | Avg:  9m 36s | Max: 16m 25s
      🟩 l4                 Pass: 100%/18  | Total:  3h 02m | Avg: 10m 08s | Max: 19m 34s
    🟩 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 28s | Avg:  9m 44s | Max:  9m 50s
      🟩 Test cuda.cccl.cooperative Pass: 100%/5   | Total:  1h 06m | Avg: 13m 17s | Max: 13m 51s
      🟩 Test cuda.cccl.examples Pass: 100%/5   | Total: 23m 08s | Avg:  4m 37s | Max:  5m 49s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 18m 53s | Avg:  3m 46s | Max:  4m 08s
      🟩 Test cuda.cccl.parallel Pass: 100%/5   | Total:  1h 33m | Avg: 18m 37s | Max: 19m 34s
    🟩 py_version
      🟩 3.10               Pass: 100%/9   | Total:  1h 31m | Avg: 10m 10s | Max: 19m 30s
      🟩 3.13               Pass: 100%/13  | Total:  2h 09m | Avg:  9m 57s | Max: 19m 34s
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits: 98%/680

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total: 51m 21s | Avg: 12m 50s | Max: 17m 23s | Hits:  98%/680   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 16m 07s | Avg: 16m 07s | Max: 16m 07s | Hits:  98%/170   
      🟩 l4                 Pass: 100%/1   | Total: 17m 23s | Avg: 17m 23s | Max: 17m 23s | Hits:  98%/170   
      🟩 rtx2080            Pass: 100%/2   | Total: 17m 51s | Avg:  8m 55s | Max: 15m 19s | Hits:  97%/340   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 32s | Avg:  2m 32s | Max:  2m 32s | Hits:  95%/170   
      🟩 Test               Pass: 100%/3   | Total: 48m 49s | Avg: 16m 16s | Max: 17m 23s | Hits:  98%/510   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 16m 37s | Avg: 4m 09s | Max: 4m 45s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 45s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  8m 30s | Avg:  4m 15s | Max:  4m 45s
      🟩 12.9               Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 05s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  8m 30s | Avg:  4m 15s | Max:  4m 45s
      🟩 nvcc12.9           Pass: 100%/2   | Total:  8m 07s | Avg:  4m 03s | Max:  4m 05s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 45s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 45s | Avg:  3m 45s | Max:  3m 45s
      🟩 Clang19            Pass: 100%/1   | Total:  4m 02s | Avg:  4m 02s | Max:  4m 02s
      🟩 GCC12              Pass: 100%/1   | Total:  4m 45s | Avg:  4m 45s | Max:  4m 45s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total:  7m 47s | Avg:  3m 53s | Max:  4m 02s
      🟩 GCC                Pass: 100%/2   | Total:  8m 50s | Avg:  4m 25s | Max:  4m 45s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 45s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 16m 37s | Avg:  4m 09s | Max:  4m 45s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 49s | Avg: 4m 12s | Max: 4m 27s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 27s
      🟩 arm64              Pass: 100%/2   | Total:  8m 13s | Avg:  4m 06s | Max:  4m 14s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 49s | Avg:  4m 12s | Max:  4m 27s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 23s | Avg:  4m 11s | Max:  4m 14s
      🟩 20                 Pass: 100%/2   | Total:  8m 26s | Avg:  4m 13s | Max:  4m 27s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@elstehle elstehle merged commit 5edf751 into NVIDIA:main Aug 12, 2025
172 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 12, 2025
shwina pushed a commit to shwina/cccl that referenced this pull request Aug 19, 2025
…A#5440)

* adds vsmem to reduce_by_key

* adds tests for vsmem

* fixes rle, which does not support vsmem yet

* addresses review comments
davebayer pushed a commit to davebayer/cccl that referenced this pull request Sep 23, 2025
…A#5440)

* adds vsmem to reduce_by_key

* adds tests for vsmem

* fixes rle, which does not support vsmem yet

* addresses review comments
bdice pushed a commit to bdice/cccl that referenced this pull request Nov 21, 2025
…A#5440)

* adds vsmem to reduce_by_key

* adds tests for vsmem

* fixes rle, which does not support vsmem yet

* addresses review comments
bernhardmgruber pushed a commit that referenced this pull request Nov 23, 2025
* Adds support for large number of items to `DeviceRunLengthEncode::NonTrivialRuns` (#5252)
* streaming non trivial runs
* change global offset computation
* fixes style
* integrate latest bench and test changes
* addresses review comments
* replaces getters with member var
* Add support for virtual shared memory to `DispatchReduceByKey` (#5440)
* adds vsmem to reduce_by_key
* adds tests for vsmem
* fixes rle, which does not support vsmem yet
* addresses review comments


* Fixes non-default-constructible iterators for large number of items types in `DeviceRunLengthEncode::Encode` (#6451)
* adds tests for non default constructible iterators
* fixes non default constructible iterators in rle
* Simplify generation of `streaming_context` for run_length_encode
* Reinstate regression test

* Revert test/benchmark changes

Co-authored-by: Elias Stehle <3958403+elstehle@users.noreply.github.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add support for virtual shared memory to DispatchReduceByKey

2 participants