Skip to content

Fix atomic reduce for arches < 600 with dtype double#5428

Merged
NaderAlAwar merged 7 commits intoNVIDIA:mainfrom
NaderAlAwar:atomic-reduce-old-arch-fix
Aug 8, 2025
Merged

Fix atomic reduce for arches < 600 with dtype double#5428
NaderAlAwar merged 7 commits intoNVIDIA:mainfrom
NaderAlAwar:atomic-reduce-old-arch-fix

Conversation

@NaderAlAwar
Copy link
Contributor

Description

closes #5427

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@NaderAlAwar NaderAlAwar requested a review from a team as a code owner August 4, 2025 23:00
@NaderAlAwar NaderAlAwar requested a review from elstehle August 4, 2025 23:00
@github-project-automation github-project-automation bot moved this to Todo in CCCL Aug 4, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Aug 4, 2025
@fbusato
Copy link
Contributor

fbusato commented Aug 4, 2025

the second option is to use emulation, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions, see __device__ double atomicAdd(double* address, double val)

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2025

🟨 CI finished in 1h 51m: Pass: 93%/162 | Total: 3d 16h | Avg: 32m 40s | Max: 1h 36m | Hits: 77%/145529
  • 🟨 cub: Pass: 80%/50 | Total: 1d 23h | Avg: 57m 33s | Max: 1h 36m | Hits: 67%/45404

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  79%/48  | Total:  1d 22h | Avg: 57m 30s | Max:  1h 36m | Hits:  67%/42850 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 06s | Max:  1h 02m | Hits:  66%/2554  
    🔍 cudacxx_family: nvcc 🔍
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 28s | Max: 33m 20s | Hits:  72%/2203  
      🔍 nvcc               Pass:  79%/48  | Total:  1d 22h | Avg: 58m 36s | Max:  1h 36m | Hits:  66%/43201 
    🟨 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 04m | Avg: 32m 28s | Max: 33m 20s | Hits:  72%/2203  
      🟨 nvcc12.0           Pass:  80%/5   | Total:  5h 33m | Avg:  1h 06m | Max:  1h 36m | Hits:  66%/5109  
      🟨 nvcc12.9           Pass:  79%/43  | Total:  1d 17h | Avg: 57m 40s | Max:  1h 36m | Hits:  66%/38092 
    🟨 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 55m | Avg: 58m 55s | Max:  1h 06m | Hits:  67%/5110  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  67%/2551  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 33s | Max: 58m 46s | Hits:  67%/2551  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 05s | Max: 58m 14s | Hits:  67%/2551  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 56m | Avg: 58m 04s | Max:  1h 00m | Hits:  67%/2551  
      🟨 Clang19            Pass:  85%/7   | Total:  4h 47m | Avg: 41m 02s | Max:  1h 01m | Hits:  69%/6030  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 04m | Hits:  66%/2554  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  66%/1277  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 08m | Hits:  66%/2554  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m | Hits:  66%/2555  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 16m | Avg:  1h 08m | Max:  1h 09m | Hits:  66%/2551  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 09m | Hits:  66%/2551  
      🟨 GCC13              Pass:  75%/12  | Total:  7h 47m | Avg: 38m 59s | Max:  1h 08m | Hits:  66%/7665  
      🟥 MSVC14.29          Pass:   0%/2   | Total:  3h 11m | Avg:  1h 35m | Max:  1h 36m
      🟥 MSVC14.43          Pass:   0%/4   | Total:  5h 36m | Avg:  1h 24m | Max:  1h 36m
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 20m | Hits:  66%/2353  
    🟨 cxx_family
      🟨 Clang              Pass:  94%/19  | Total: 16h 35m | Avg: 52m 24s | Max:  1h 06m | Hits:  67%/21344 
      🟨 GCC                Pass:  86%/23  | Total: 20h 03m | Avg: 52m 20s | Max:  1h 10m | Hits:  66%/21707 
      🟥 MSVC               Pass:   0%/6   | Total:  8h 48m | Avg:  1h 28m | Max:  1h 36m
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 30m | Avg:  1h 15m | Max:  1h 20m | Hits:  66%/2353  
    🟨 jobs
      🟨 Build              Pass:  85%/42  | Total:  1d 20h | Avg:  1h 03m | Max:  1h 36m | Hits:  67%/45404 
      🟥 DeviceLaunch       Pass:   0%/1   | Total: 22m 30s | Avg: 22m 30s | Max: 22m 30s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 19s | Avg: 15m 19s | Max: 15m 19s
      🟥 HostLaunch         Pass:   0%/3   | Total:  1h 18m | Avg: 26m 09s | Max: 30m 01s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 16m | Avg: 25m 31s | Max: 28m 48s
    🟨 ctk
      🟨 12.0               Pass:  80%/5   | Total:  5h 33m | Avg:  1h 06m | Max:  1h 36m | Hits:  66%/5109  
      🟨 12.9               Pass:  80%/45  | Total:  1d 18h | Avg: 56m 33s | Max:  1h 36m | Hits:  67%/40295 
    🟨 gpu
      🟨 h100               Pass:  66%/3   | Total:  1h 32m | Avg: 30m 56s | Max: 34m 00s | Hits:  66%/1278  
      🟨 rtx2080            Pass:  84%/39  | Total:  1d 18h | Avg:  1h 04m | Max:  1h 36m | Hits:  67%/41572 
      🟨 rtxa6000           Pass:  62%/8   | Total:  4h 14m | Avg: 31m 47s | Max:  1h 03m | Hits:  66%/2554  
    🟨 sm
      🟨 90                 Pass:  66%/3   | Total:  1h 32m | Avg: 30m 56s | Max: 34m 00s | Hits:  66%/1278  
      🟨 90;90a             Pass:  50%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 16m | Hits:  66%/1278  
      🟨 100;120            Pass:  50%/2   | Total:  1h 52m | Avg: 56m 12s | Max:  1h 07m | Hits:  66%/1278  
    🟨 std
      🟨 17                 Pass:  85%/21  | Total: 23h 39m | Avg:  1h 07m | Max:  1h 36m | Hits:  67%/22693 
      🟨 20                 Pass:  75%/29  | Total:  1d 00h | Avg: 50m 18s | Max:  1h 35m | Hits:  67%/22711 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 1d 07h | Avg: 37m 46s | Max: 1h 12m | Hits: 80%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 47m 07s | Avg: 23m 33s | Max: 41m 41s | Hits:  80%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 06h | Avg: 37m 52s | Max:  1h 12m | Hits:  80%/80312 
      🟩 arm64              Pass: 100%/2   | Total:  1h 10m | Avg: 35m 27s | Max: 38m 50s | Hits:  80%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 27m | Avg: 41m 31s | Max:  1h 08m | Hits:  79%/9560  
      🟩 12.9               Pass: 100%/45  | Total:  1d 04h | Avg: 37m 22s | Max:  1h 12m | Hits:  80%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 59m 36s | Avg: 29m 48s | Max: 30m 12s | Hits:  80%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 27m | Avg: 41m 31s | Max:  1h 08m | Hits:  79%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 03h | Avg: 37m 43s | Max:  1h 12m | Hits:  80%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 59m 36s | Avg: 29m 48s | Max: 30m 12s | Hits:  80%/3826  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 06h | Avg: 38m 06s | Max:  1h 12m | Hits:  80%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 21m | Avg: 35m 22s | Max: 39m 39s | Hits:  80%/7652  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 42s | Max: 37m 54s | Hits:  80%/3826  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 17m | Avg: 38m 53s | Max: 43m 31s | Hits:  80%/3826  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 13m | Avg: 36m 36s | Max: 37m 57s | Hits:  80%/3826  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 15m | Avg: 37m 57s | Max: 40m 05s | Hits:  80%/3826  
      🟩 Clang19            Pass: 100%/7   | Total:  3h 01m | Avg: 25m 53s | Max: 40m 43s | Hits:  80%/9565  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 16m | Avg: 38m 23s | Max: 40m 55s | Hits:  80%/3828  
      🟩 GCC8               Pass: 100%/1   | Total: 38m 24s | Avg: 38m 24s | Max: 38m 24s | Hits:  80%/1914  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 21m | Avg: 40m 40s | Max: 42m 29s | Hits:  80%/3828  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 22s | Max: 42m 09s | Hits:  80%/3828  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 17s | Max: 40m 50s | Hits:  80%/3828  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 18m | Avg: 39m 07s | Max: 39m 29s | Hits:  80%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  4h 29m | Avg: 24m 28s | Max: 41m 51s | Hits:  80%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  1h 09m | Hits:  76%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  4h 53m | Avg: 58m 46s | Max:  1h 12m | Hits:  80%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 09m | Hits:  76%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 10h 22m | Avg: 32m 47s | Max: 43m 31s | Hits:  80%/32521 
      🟩 GCC                Pass: 100%/22  | Total: 11h 41m | Avg: 31m 52s | Max: 42m 29s | Hits:  80%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  7h 11m | Avg:  1h 01m | Max:  1h 12m | Hits:  79%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 09m | Hits:  76%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 20m 27s | Hits:  80%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total:  1d 02h | Avg: 42m 15s | Max:  1h 12m | Hits:  79%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 14m | Avg: 25m 24s | Max:  1h 11m | Hits:  83%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total:  1d 06h | Avg: 42m 18s | Max:  1h 12m | Hits:  79%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 44m 13s | Avg: 14m 44s | Max: 36m 12s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 25m 54s | Avg:  6m 28s | Max:  8m 56s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 23s | Avg: 14m 41s | Max: 20m 27s | Hits:  80%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 29m | Avg: 44m 57s | Max: 58m 14s | Hits:  78%/3820  
      🟩 100;120            Pass: 100%/2   | Total:  1h 25m | Avg: 42m 32s | Max: 55m 27s | Hits:  78%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 15h 36m | Avg: 44m 35s | Max:  1h 12m | Hits:  79%/40160 
      🟩 20                 Pass: 100%/27  | Total: 15h 05m | Avg: 33m 32s | Max:  1h 11m | Hits:  80%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 34m | Avg: 7m 39s | Max: 14m 23s | Hits: 89%/15318

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  3h 11m | Avg:  7m 59s | Max: 14m 23s | Hits:  89%/12958 
      🟩 arm64              Pass: 100%/4   | Total: 22m 30s | Avg:  5m 37s | Max:  5m 58s | Hits:  89%/2360  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 24m 23s | Avg:  8m 07s | Max: 14m 05s | Hits:  87%/1468  
      🟩 12.9               Pass: 100%/25  | Total:  3h 09m | Avg:  7m 35s | Max: 14m 23s | Hits:  89%/13850 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 24m 23s | Avg:  8m 07s | Max: 14m 05s | Hits:  87%/1468  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  3h 09m | Avg:  7m 35s | Max: 14m 23s | Hits:  89%/13850 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 34m | Avg:  7m 39s | Max: 14m 23s | Hits:  89%/15318 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total: 10m 59s | Avg:  5m 29s | Max:  6m 26s | Hits:  90%/1182  
      🟩 Clang15            Pass: 100%/1   | Total:  5m 43s | Avg:  5m 43s | Max:  5m 43s | Hits:  89%/590   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 47s | Avg:  5m 47s | Max:  5m 47s | Hits:  89%/590   
      🟩 Clang17            Pass: 100%/1   | Total:  5m 46s | Avg:  5m 46s | Max:  5m 46s | Hits:  89%/590   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 59s | Avg:  5m 59s | Max:  5m 59s | Hits:  89%/590   
      🟩 Clang19            Pass: 100%/4   | Total: 24m 28s | Avg:  6m 07s | Max:  8m 14s | Hits:  92%/2360  
      🟩 GCC10              Pass: 100%/2   | Total: 11m 59s | Avg:  5m 59s | Max:  6m 14s | Hits:  89%/1182  
      🟩 GCC11              Pass: 100%/1   | Total:  5m 59s | Avg:  5m 59s | Max:  5m 59s | Hits:  89%/590   
      🟩 GCC12              Pass: 100%/1   | Total:  7m 23s | Avg:  7m 23s | Max:  7m 23s | Hits:  89%/590   
      🟩 GCC13              Pass: 100%/8   | Total: 54m 16s | Avg:  6m 47s | Max:  9m 40s | Hits:  92%/4720  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 14m 05s | Avg: 14m 05s | Max: 14m 05s | Hits:  77%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 40m 47s | Avg: 13m 35s | Max: 14m 23s | Hits:  76%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 21m 05s | Avg: 10m 32s | Max: 10m 52s | Hits:  87%/1176  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 58m 42s | Avg:  5m 52s | Max:  8m 14s | Hits:  90%/5902  
      🟩 GCC                Pass: 100%/12  | Total:  1h 19m | Avg:  6m 38s | Max:  9m 40s | Hits:  91%/7082  
      🟩 MSVC               Pass: 100%/4   | Total: 54m 52s | Avg: 13m 43s | Max: 14m 23s | Hits:  76%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 21m 05s | Avg: 10m 32s | Max: 10m 52s | Hits:  87%/1176  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 13m 45s | Avg:  6m 52s | Max:  8m 59s | Hits:  94%/1180  
      🟩 rtx2080            Pass: 100%/26  | Total:  3h 20m | Avg:  7m 42s | Max: 14m 23s | Hits:  89%/14138 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  3h 07m | Avg:  7m 29s | Max: 14m 23s | Hits:  88%/13548 
      🟩 Test               Pass: 100%/3   | Total: 26m 53s | Avg:  8m 57s | Max:  9m 40s | Hits:  99%/1770  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 13m 45s | Avg:  6m 52s | Max:  8m 59s | Hits:  94%/1180  
      🟩 90;90a             Pass: 100%/2   | Total: 18m 36s | Avg:  9m 18s | Max: 12m 31s | Hits:  85%/880   
      🟩 100;120            Pass: 100%/2   | Total: 20m 05s | Avg: 10m 02s | Max: 13m 53s | Hits:  85%/880   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 21m 16s | Avg:  7m 05s | Max: 10m 13s | Hits:  88%/1768  
      🟩 20                 Pass: 100%/25  | Total:  3h 13m | Avg:  7m 43s | Max: 14m 23s | Hits:  89%/13550 
    
  • 🟩 python: Pass: 100%/22 | Total: 3h 45m | Avg: 10m 16s | Max: 20m 44s

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  3h 45m | Avg: 10m 16s | Max: 20m 44s
    🟩 ctk
      🟩 12.5               Pass: 100%/6   | Total: 43m 48s | Avg:  7m 18s | Max: 13m 49s
      🟩 12.8               Pass: 100%/2   | Total: 37m 04s | Avg: 18m 32s | Max: 18m 50s
      🟩 12.9               Pass: 100%/14  | Total:  2h 25m | Avg: 10m 21s | Max: 20m 44s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/6   | Total: 43m 48s | Avg:  7m 18s | Max: 13m 49s
      🟩 nvcc12.8           Pass: 100%/2   | Total: 37m 04s | Avg: 18m 32s | Max: 18m 50s
      🟩 nvcc12.9           Pass: 100%/14  | Total:  2h 25m | Avg: 10m 21s | Max: 20m 44s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  3h 45m | Avg: 10m 16s | Max: 20m 44s
    🟩 cxx
      🟩 GCC13              Pass: 100%/22  | Total:  3h 45m | Avg: 10m 16s | Max: 20m 44s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/22  | Total:  3h 45m | Avg: 10m 16s | Max: 20m 44s
    🟩 gpu
      🟩 h100               Pass: 100%/4   | Total: 42m 32s | Avg: 10m 38s | Max: 20m 44s
      🟩 l4                 Pass: 100%/18  | Total:  3h 03m | Avg: 10m 11s | Max: 19m 44s
    🟩 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 28s | Avg:  9m 44s | Max:  9m 55s
      🟩 Test cuda.cccl.cooperative Pass: 100%/5   | Total:  1h 07m | Avg: 13m 35s | Max: 14m 05s
      🟩 Test cuda.cccl.examples Pass: 100%/5   | Total: 21m 36s | Avg:  4m 19s | Max:  4m 33s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 20m 49s | Avg:  4m 09s | Max:  5m 03s
      🟩 Test cuda.cccl.parallel Pass: 100%/5   | Total:  1h 36m | Avg: 19m 12s | Max: 20m 44s
    🟩 py_version
      🟩 3.10               Pass: 100%/9   | Total:  1h 33m | Avg: 10m 22s | Max: 19m 44s
      🟩 3.13               Pass: 100%/13  | Total:  2h 12m | Avg: 10m 11s | Max: 20m 44s
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits: 98%/668

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total: 52m 36s | Avg: 13m 09s | Max: 19m 57s | Hits:  98%/668   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 19m 57s | Avg: 19m 57s | Max: 19m 57s | Hits:  98%/167   
      🟩 l4                 Pass: 100%/1   | Total: 16m 31s | Avg: 16m 31s | Max: 16m 31s | Hits:  98%/167   
      🟩 rtx2080            Pass: 100%/2   | Total: 16m 08s | Avg:  8m 04s | Max: 13m 39s | Hits:  97%/334   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 29s | Avg:  2m 29s | Max:  2m 29s | Hits:  95%/167   
      🟩 Test               Pass: 100%/3   | Total: 50m 07s | Avg: 16m 42s | Max: 19m 57s | Hits:  98%/501   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 16m 04s | Avg: 4m 01s | Max: 5m 52s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  5m 52s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  6m 36s | Avg:  3m 18s | Max:  3m 23s
      🟩 12.9               Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  5m 52s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  6m 36s | Avg:  3m 18s | Max:  3m 23s
      🟩 nvcc12.9           Pass: 100%/2   | Total:  9m 28s | Avg:  4m 44s | Max:  5m 52s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  5m 52s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 13s | Avg:  3m 13s | Max:  3m 13s
      🟩 Clang19            Pass: 100%/1   | Total:  3m 36s | Avg:  3m 36s | Max:  3m 36s
      🟩 GCC12              Pass: 100%/1   | Total:  3m 23s | Avg:  3m 23s | Max:  3m 23s
      🟩 GCC13              Pass: 100%/1   | Total:  5m 52s | Avg:  5m 52s | Max:  5m 52s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total:  6m 49s | Avg:  3m 24s | Max:  3m 36s
      🟩 GCC                Pass: 100%/2   | Total:  9m 15s | Avg:  4m 37s | Max:  5m 52s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  5m 52s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  5m 52s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 57s | Avg: 4m 14s | Max: 4m 20s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 38s | Avg:  4m 19s | Max:  4m 20s
      🟩 arm64              Pass: 100%/2   | Total:  8m 19s | Avg:  4m 09s | Max:  4m 10s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 57s | Avg:  4m 14s | Max:  4m 20s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 30s | Avg:  4m 15s | Max:  4m 20s
      🟩 20                 Pass: 100%/2   | Total:  8m 27s | Avg:  4m 13s | Max:  4m 18s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

…ing for architecture at compile time to decide to fall back or not will not work because arch macros are 0 in host code
@NaderAlAwar
Copy link
Contributor Author

the second option is to use emulation, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions, see __device__ double atomicAdd(double* address, double val)

@fbusato I ended up going with this approach. Falling back to another implementation in device_reduce.cuh with my previous approach did not work because the arch macros don't work in host code.

@fbusato
Copy link
Contributor

fbusato commented Aug 5, 2025

I ended up going with this approach. Falling back to another implementation in device_reduce.cuh with my previous approach did not work because the arch macros don't work in host code.

@NaderAlAwar be careful about NaNs if you want to go with this path😄

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2025

🟩 CI finished in 1h 52m: Pass: 100%/162 | Total: 1d 22h | Avg: 17m 16s | Max: 1h 50m | Hits: 91%/152477
  • 🟩 cub: Pass: 100%/50 | Total: 22h 24m | Avg: 26m 53s | Max: 1h 50m | Hits: 88%/52268

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 21h 18m | Avg: 26m 37s | Max:  1h 50m | Hits:  88%/49722 
      🟩 arm64              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 21s | Max:  1h 00m | Hits:  83%/2546  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 56m | Avg: 23m 20s | Max:  1h 27m | Hits:  92%/6261  
      🟩 12.9               Pass: 100%/45  | Total: 20h 28m | Avg: 27m 17s | Max:  1h 50m | Hits:  88%/46007 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 04m | Avg: 32m 18s | Max: 33m 24s | Hits:  68%/2195  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 56m | Avg: 23m 20s | Max:  1h 27m | Hits:  92%/6261  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 19h 23m | Avg: 27m 03s | Max:  1h 50m | Hits:  89%/43812 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 04m | Avg: 32m 18s | Max: 33m 24s | Hits:  68%/2195  
      🟩 nvcc               Pass: 100%/48  | Total: 21h 20m | Avg: 26m 40s | Max:  1h 50m | Hits:  89%/50073 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 27m 12s | Avg:  6m 48s | Max:  7m 43s | Hits:  99%/5094  
      🟩 Clang15            Pass: 100%/2   | Total: 14m 02s | Avg:  7m 01s | Max:  7m 12s | Hits:  99%/2543  
      🟩 Clang16            Pass: 100%/2   | Total: 13m 59s | Avg:  6m 59s | Max:  7m 06s | Hits:  99%/2543  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 03s | Avg:  7m 01s | Max:  7m 05s | Hits:  99%/2543  
      🟩 Clang18            Pass: 100%/2   | Total: 13m 23s | Avg:  6m 41s | Max:  6m 44s | Hits:  99%/2543  
      🟩 Clang19            Pass: 100%/7   | Total:  2h 14m | Avg: 19m 10s | Max: 33m 24s | Hits:  88%/6010  
      🟩 GCC7               Pass: 100%/2   | Total: 16m 45s | Avg:  8m 22s | Max:  8m 36s | Hits:  99%/2546  
      🟩 GCC8               Pass: 100%/1   | Total:  8m 37s | Avg:  8m 37s | Max:  8m 37s | Hits:  99%/1273  
      🟩 GCC9               Pass: 100%/2   | Total: 18m 31s | Avg:  9m 15s | Max: 10m 14s | Hits:  99%/2546  
      🟩 GCC10              Pass: 100%/2   | Total: 18m 10s | Avg:  9m 05s | Max:  9m 09s | Hits:  99%/2547  
      🟩 GCC11              Pass: 100%/2   | Total: 17m 59s | Avg:  8m 59s | Max:  9m 05s | Hits:  99%/2543  
      🟩 GCC12              Pass: 100%/2   | Total: 19m 09s | Avg:  9m 34s | Max: 10m 06s | Hits:  99%/2543  
      🟩 GCC13              Pass: 100%/12  | Total:  5h 52m | Avg: 29m 20s | Max:  1h 00m | Hits:  77%/7641  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  3h 03m | Avg:  1h 31m | Max:  1h 35m | Hits:  63%/2336  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  5h 49m | Avg:  1h 27m | Max:  1h 50m | Hits:  63%/4672  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 12m | Hits:  62%/2345  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  3h 36m | Avg: 11m 24s | Max: 33m 24s | Hits:  96%/21276 
      🟩 GCC                Pass: 100%/23  | Total:  7h 31m | Avg: 19m 37s | Max:  1h 00m | Hits:  91%/21639 
      🟩 MSVC               Pass: 100%/6   | Total:  8h 53m | Avg:  1h 28m | Max:  1h 50m | Hits:  63%/7008  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 23m | Avg:  1h 11m | Max:  1h 12m | Hits:  62%/2345  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 30m | Avg: 30m 11s | Max: 31m 21s | Hits:  66%/1274  
      🟩 rtx2080            Pass: 100%/39  | Total: 18h 22m | Avg: 28m 16s | Max:  1h 50m | Hits:  88%/48448 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 31m | Avg: 18m 56s | Max: 24m 51s | Hits:  99%/2546  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total: 19h 11m | Avg: 27m 25s | Max:  1h 50m | Hits:  88%/52268 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 44s | Avg: 23m 44s | Max: 23m 44s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 19s | Avg: 15m 19s | Max: 15m 19s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 18m | Avg: 26m 17s | Max: 30m 19s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 15m | Avg: 25m 06s | Max: 28m 53s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 30m | Avg: 30m 11s | Max: 31m 21s | Hits:  66%/1274  
      🟩 90;90a             Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 15m | Hits:  65%/2442  
      🟩 100;120            Pass: 100%/2   | Total:  1h 55m | Avg: 57m 41s | Max:  1h 10m | Hits:  65%/2442  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  8h 49m | Avg: 25m 12s | Max:  1h 50m | Hits:  91%/26125 
      🟩 20                 Pass: 100%/29  | Total: 13h 35m | Avg: 28m 07s | Max:  1h 33m | Hits:  85%/26143 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 15h 56m | Avg: 19m 08s | Max: 1h 14m | Hits: 92%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 13m 15s | Avg:  6m 37s | Max:  7m 52s | Hits:  99%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 15h 44m | Avg: 19m 40s | Max:  1h 14m | Hits:  92%/80312 
      🟩 arm64              Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 55s | Hits:  99%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 27m | Avg: 17m 26s | Max:  1h 01m | Hits:  95%/9560  
      🟩 12.9               Pass: 100%/45  | Total: 14h 29m | Avg: 19m 19s | Max:  1h 14m | Hits:  92%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 53m 06s | Avg: 26m 33s | Max: 27m 45s | Hits:  80%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 27m | Avg: 17m 26s | Max:  1h 01m | Hits:  95%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 13h 36m | Avg: 18m 59s | Max:  1h 14m | Hits:  92%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 06s | Avg: 26m 33s | Max: 27m 45s | Hits:  80%/3826  
      🟩 nvcc               Pass: 100%/48  | Total: 15h 03m | Avg: 18m 49s | Max:  1h 14m | Hits:  92%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 26s | Avg:  5m 51s | Max:  6m 14s | Hits: 100%/7652  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 55s | Avg:  5m 57s | Max:  6m 04s | Hits: 100%/3826  
      🟩 Clang16            Pass: 100%/2   | Total: 11m 45s | Avg:  5m 52s | Max:  5m 56s | Hits: 100%/3826  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  6m 11s | Hits: 100%/3826  
      🟩 Clang18            Pass: 100%/2   | Total: 11m 33s | Avg:  5m 46s | Max:  5m 50s | Hits: 100%/3826  
      🟩 Clang19            Pass: 100%/7   | Total:  1h 20m | Avg: 11m 33s | Max: 27m 45s | Hits:  92%/9565  
      🟩 GCC7               Pass: 100%/2   | Total: 14m 09s | Avg:  7m 04s | Max:  7m 26s | Hits:  99%/3828  
      🟩 GCC8               Pass: 100%/1   | Total:  7m 15s | Avg:  7m 15s | Max:  7m 15s | Hits:  99%/1914  
      🟩 GCC9               Pass: 100%/2   | Total: 15m 44s | Avg:  7m 52s | Max:  8m 23s | Hits:  99%/3828  
      🟩 GCC10              Pass: 100%/2   | Total: 15m 06s | Avg:  7m 33s | Max:  7m 37s | Hits:  99%/3828  
      🟩 GCC11              Pass: 100%/2   | Total: 51m 19s | Avg: 25m 39s | Max: 43m 29s | Hits:  75%/3828  
      🟩 GCC12              Pass: 100%/2   | Total: 15m 26s | Avg:  7m 43s | Max:  7m 51s | Hits:  99%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  2h 18m | Avg: 12m 32s | Max: 34m 00s | Hits:  91%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 05m | Hits:  76%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  4h 40m | Avg: 56m 11s | Max:  1h 11m | Hits:  80%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m | Hits:  76%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 31m | Avg:  7m 58s | Max: 27m 45s | Hits:  97%/32521 
      🟩 GCC                Pass: 100%/22  | Total:  4h 16m | Avg: 11m 40s | Max: 43m 29s | Hits:  94%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  6h 47m | Avg: 58m 16s | Max:  1h 11m | Hits:  79%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 14m | Hits:  76%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 29m 03s | Avg: 14m 31s | Max: 19m 43s | Hits:  80%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total: 12h 53m | Avg: 20m 20s | Max:  1h 14m | Hits:  92%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 34m | Avg: 15m 26s | Max:  1h 11m | Hits:  95%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total: 14h 46m | Avg: 20m 37s | Max:  1h 14m | Hits:  92%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 43m 32s | Avg: 14m 30s | Max: 35m 13s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 26m 21s | Avg:  6m 35s | Max:  9m 20s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 29m 03s | Avg: 14m 31s | Max: 19m 43s | Hits:  80%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 20m | Avg: 40m 29s | Max: 51m 49s | Hits:  78%/3820  
      🟩 100;120            Pass: 100%/2   | Total:  1h 28m | Avg: 44m 09s | Max: 54m 18s | Hits:  78%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  6h 47m | Avg: 19m 24s | Max:  1h 14m | Hits:  94%/40160 
      🟩 20                 Pass: 100%/27  | Total:  8h 55m | Avg: 19m 50s | Max:  1h 11m | Hits:  89%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 2h 58m | Avg: 6m 22s | Max: 22m 08s | Hits: 96%/15390

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  2h 46m | Avg:  6m 56s | Max: 22m 08s | Hits:  96%/13018 
      🟩 arm64              Pass: 100%/4   | Total: 11m 50s | Avg:  2m 57s | Max:  3m 18s | Hits:  99%/2372  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 19m 38s | Avg:  6m 32s | Max: 13m 20s | Hits:  95%/1474  
      🟩 12.9               Pass: 100%/25  | Total:  2h 38m | Avg:  6m 21s | Max: 22m 08s | Hits:  96%/13916 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 19m 38s | Avg:  6m 32s | Max: 13m 20s | Hits:  95%/1474  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  2h 38m | Avg:  6m 21s | Max: 22m 08s | Hits:  96%/13916 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  2h 58m | Avg:  6m 22s | Max: 22m 08s | Hits:  96%/15390 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  6m 13s | Avg:  3m 06s | Max:  3m 18s | Hits: 100%/1188  
      🟩 Clang15            Pass: 100%/1   | Total:  3m 28s | Avg:  3m 28s | Max:  3m 28s | Hits: 100%/593   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 16s | Avg:  3m 16s | Max:  3m 16s | Hits: 100%/593   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 29s | Avg:  3m 29s | Max:  3m 29s | Hits: 100%/593   
      🟩 Clang18            Pass: 100%/1   | Total:  3m 26s | Avg:  3m 26s | Max:  3m 26s | Hits: 100%/593   
      🟩 Clang19            Pass: 100%/4   | Total: 18m 12s | Avg:  4m 33s | Max:  9m 39s | Hits: 100%/2372  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 01s | Avg:  3m 30s | Max:  3m 38s | Hits:  99%/1188  
      🟩 GCC11              Pass: 100%/1   | Total:  3m 46s | Avg:  3m 46s | Max:  3m 46s | Hits:  99%/593   
      🟩 GCC12              Pass: 100%/1   | Total:  3m 50s | Avg:  3m 50s | Max:  3m 50s | Hits:  99%/593   
      🟩 GCC13              Pass: 100%/8   | Total: 52m 21s | Avg:  6m 32s | Max: 22m 08s | Hits:  98%/4744  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 20s | Avg: 13m 20s | Max: 13m 20s | Hits:  77%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 39m 25s | Avg: 13m 08s | Max: 13m 57s | Hits:  76%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 35s | Hits:  87%/1182  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 38m 04s | Avg:  3m 48s | Max:  9m 39s | Hits: 100%/5932  
      🟩 GCC                Pass: 100%/12  | Total:  1h 06m | Avg:  5m 34s | Max: 22m 08s | Hits:  98%/7118  
      🟩 MSVC               Pass: 100%/4   | Total: 52m 45s | Avg: 13m 11s | Max: 13m 57s | Hits:  76%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 20m 48s | Avg: 10m 24s | Max: 10m 35s | Hits:  87%/1182  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  7m 12s | Hits:  94%/1186  
      🟩 rtx2080            Pass: 100%/26  | Total:  2h 46m | Avg:  6m 23s | Max: 22m 08s | Hits:  96%/14204 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  2h 19m | Avg:  5m 35s | Max: 13m 57s | Hits:  96%/13611 
      🟩 Test               Pass: 100%/3   | Total: 38m 59s | Avg: 12m 59s | Max: 22m 08s | Hits:  99%/1779  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 12m 20s | Avg:  6m 10s | Max:  7m 12s | Hits:  94%/1186  
      🟩 90;90a             Pass: 100%/2   | Total: 16m 11s | Avg:  8m 05s | Max: 12m 38s | Hits:  92%/883   
      🟩 100;120            Pass: 100%/2   | Total: 16m 45s | Avg:  8m 22s | Max: 12m 50s | Hits:  92%/883   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 16m 29s | Avg:  5m 29s | Max: 10m 35s | Hits:  95%/1777  
      🟩 20                 Pass: 100%/25  | Total:  2h 42m | Avg:  6m 29s | Max: 22m 08s | Hits:  96%/13613 
    
  • 🟩 python: Pass: 100%/22 | Total: 3h 47m | Avg: 10m 21s | Max: 20m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  3h 47m | Avg: 10m 21s | Max: 20m 40s
    🟩 ctk
      🟩 12.5               Pass: 100%/6   | Total: 44m 26s | Avg:  7m 24s | Max: 13m 44s
      🟩 12.8               Pass: 100%/2   | Total: 37m 03s | Avg: 18m 31s | Max: 18m 33s
      🟩 12.9               Pass: 100%/14  | Total:  2h 26m | Avg: 10m 27s | Max: 20m 40s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/6   | Total: 44m 26s | Avg:  7m 24s | Max: 13m 44s
      🟩 nvcc12.8           Pass: 100%/2   | Total: 37m 03s | Avg: 18m 31s | Max: 18m 33s
      🟩 nvcc12.9           Pass: 100%/14  | Total:  2h 26m | Avg: 10m 27s | Max: 20m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  3h 47m | Avg: 10m 21s | Max: 20m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/22  | Total:  3h 47m | Avg: 10m 21s | Max: 20m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/22  | Total:  3h 47m | Avg: 10m 21s | Max: 20m 40s
    🟩 gpu
      🟩 h100               Pass: 100%/4   | Total: 44m 10s | Avg: 11m 02s | Max: 20m 40s
      🟩 l4                 Pass: 100%/18  | Total:  3h 03m | Avg: 10m 12s | Max: 18m 53s
    🟩 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 18m 48s | Avg:  9m 24s | Max:  9m 31s
      🟩 Test cuda.cccl.cooperative Pass: 100%/5   | Total:  1h 10m | Avg: 14m 08s | Max: 15m 24s
      🟩 Test cuda.cccl.examples Pass: 100%/5   | Total: 22m 32s | Avg:  4m 30s | Max:  4m 58s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 20m 30s | Avg:  4m 06s | Max:  4m 16s
      🟩 Test cuda.cccl.parallel Pass: 100%/5   | Total:  1h 35m | Avg: 19m 04s | Max: 20m 40s
    🟩 py_version
      🟩 3.10               Pass: 100%/9   | Total:  1h 32m | Avg: 10m 16s | Max: 18m 48s
      🟩 3.13               Pass: 100%/13  | Total:  2h 15m | Avg: 10m 25s | Max: 20m 40s
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits: 98%/680

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total: 52m 59s | Avg: 13m 14s | Max: 20m 39s | Hits:  98%/680   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 20m 39s | Avg: 20m 39s | Max: 20m 39s | Hits:  98%/170   
      🟩 l4                 Pass: 100%/1   | Total: 16m 46s | Avg: 16m 46s | Max: 16m 46s | Hits:  98%/170   
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 34s | Avg:  7m 47s | Max: 13m 31s | Hits:  98%/340   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 03s | Avg:  2m 03s | Max:  2m 03s | Hits:  98%/170   
      🟩 Test               Pass: 100%/3   | Total: 50m 56s | Avg: 16m 58s | Max: 20m 39s | Hits:  98%/510   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 21m 10s | Avg: 5m 17s | Max: 6m 35s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 21m 10s | Avg:  5m 17s | Max:  6m 35s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 35s
      🟩 12.9               Pass: 100%/2   | Total:  8m 45s | Avg:  4m 22s | Max:  6m 09s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 12m 25s | Avg:  6m 12s | Max:  6m 35s
      🟩 nvcc12.9           Pass: 100%/2   | Total:  8m 45s | Avg:  4m 22s | Max:  6m 09s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 21m 10s | Avg:  5m 17s | Max:  6m 35s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  5m 50s | Avg:  5m 50s | Max:  5m 50s
      🟩 Clang19            Pass: 100%/1   | Total:  2m 36s | Avg:  2m 36s | Max:  2m 36s
      🟩 GCC12              Pass: 100%/1   | Total:  6m 35s | Avg:  6m 35s | Max:  6m 35s
      🟩 GCC13              Pass: 100%/1   | Total:  6m 09s | Avg:  6m 09s | Max:  6m 09s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total:  8m 26s | Avg:  4m 13s | Max:  5m 50s
      🟩 GCC                Pass: 100%/2   | Total: 12m 44s | Avg:  6m 22s | Max:  6m 35s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 21m 10s | Avg:  5m 17s | Max:  6m 35s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 21m 10s | Avg:  5m 17s | Max:  6m 35s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 42s | Avg: 4m 10s | Max: 4m 25s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 48s | Avg:  4m 24s | Max:  4m 25s
      🟩 arm64              Pass: 100%/2   | Total:  7m 54s | Avg:  3m 57s | Max:  3m 57s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 42s | Avg:  4m 10s | Max:  4m 25s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 22s | Avg:  4m 11s | Max:  4m 25s
      🟩 20                 Pass: 100%/2   | Total:  8m 20s | Avg:  4m 10s | Max:  4m 23s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@NaderAlAwar
Copy link
Contributor Author

I ended up going with this approach. Falling back to another implementation in device_reduce.cuh with my previous approach did not work because the arch macros don't work in host code.

@NaderAlAwar be careful about NaNs if you want to go with this path😄

@fbusato after considering this and discussing with Georgii we decided to just disable atomic reduce with doubles for pre sm60. Supporting this might prove to be more trouble than it's worth.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

🟨 CI finished in 2h 18m: Pass: 90%/162 | Total: 1d 18h | Avg: 15m 37s | Max: 2h 15m | Hits: 94%/150282
  • 🟨 python: Pass: 36%/22 | Total: 1h 47m | Avg: 4m 52s | Max: 9m 55s

    🟨 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 24s | Avg:  9m 42s | Max:  9m 55s
      🟥 Test cuda.cccl.cooperative Pass:   0%/5   | Total: 20m 04s | Avg:  4m 00s | Max:  4m 08s
      🟨 Test cuda.cccl.examples Pass:  20%/5   | Total: 20m 06s | Avg:  4m 01s | Max:  4m 30s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 18m 50s | Avg:  3m 46s | Max:  4m 04s
      🟥 Test cuda.cccl.parallel Pass:   0%/5   | Total: 28m 54s | Avg:  5m 46s | Max:  8m 43s
    🟨 cpu
      🟨 amd64              Pass:  36%/22  | Total:  1h 47m | Avg:  4m 52s | Max:  9m 55s
    🟨 cudacxx_family
      🟨 nvcc               Pass:  36%/22  | Total:  1h 47m | Avg:  4m 52s | Max:  9m 55s
    🟨 cxx
      🟨 GCC13              Pass:  36%/22  | Total:  1h 47m | Avg:  4m 52s | Max:  9m 55s
    🟨 cxx_family
      🟨 GCC                Pass:  36%/22  | Total:  1h 47m | Avg:  4m 52s | Max:  9m 55s
    🟨 ctk
      🟨 12.5               Pass:  50%/6   | Total: 22m 51s | Avg:  3m 48s | Max:  4m 08s
      🟥 12.8               Pass:   0%/2   | Total:  7m 48s | Avg:  3m 54s | Max:  3m 56s
      🟨 12.9               Pass:  35%/14  | Total:  1h 16m | Avg:  5m 28s | Max:  9m 55s
    🟨 cudacxx
      🟨 nvcc12.5           Pass:  50%/6   | Total: 22m 51s | Avg:  3m 48s | Max:  4m 08s
      🟥 nvcc12.8           Pass:   0%/2   | Total:  7m 48s | Avg:  3m 54s | Max:  3m 56s
      🟨 nvcc12.9           Pass:  35%/14  | Total:  1h 16m | Avg:  5m 28s | Max:  9m 55s
    🟨 gpu
      🟨 h100               Pass:  25%/4   | Total: 19m 46s | Avg:  4m 56s | Max:  8m 04s
      🟨 l4                 Pass:  38%/18  | Total:  1h 27m | Avg:  4m 51s | Max:  9m 55s
    🟨 py_version
      🟨 3.10               Pass:  33%/9   | Total: 41m 57s | Avg:  4m 39s | Max:  9m 55s
      🟨 3.13               Pass:  38%/13  | Total:  1h 05m | Avg:  5m 01s | Max:  9m 29s
    
  • 🟨 cub: Pass: 96%/50 | Total: 18h 46m | Avg: 22m 32s | Max: 1h 38m | Hits: 93%/50073

    🔍 cpu: amd64 🔍
      🔍 amd64              Pass:  95%/48  | Total: 18h 32m | Avg: 23m 10s | Max:  1h 38m | Hits:  92%/47527 
      🟩 arm64              Pass: 100%/2   | Total: 14m 31s | Avg:  7m 15s | Max:  8m 29s | Hits:  99%/2546  
    🔍 ctk: 12.9 🔍
      🟩 12.0               Pass: 100%/5   | Total:  2h 00m | Avg: 24m 02s | Max:  1h 30m | Hits:  93%/6261  
      🔍 12.9               Pass:  95%/45  | Total: 16h 46m | Avg: 22m 22s | Max:  1h 38m | Hits:  93%/43812 
    🚨 cudacxx: ClangCUDA19 🚨
      🔥 ClangCUDA19        Pass:   0%/2   | Total: 38m 01s | Avg: 19m 00s | Max: 20m 02s
      🟩 nvcc12.0           Pass: 100%/5   | Total:  2h 00m | Avg: 24m 02s | Max:  1h 30m | Hits:  93%/6261  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 16h 08m | Avg: 22m 31s | Max:  1h 38m | Hits:  93%/43812 
    🚨 cudacxx_family: ClangCUDA 🚨
      🔥 ClangCUDA          Pass:   0%/2   | Total: 38m 01s | Avg: 19m 00s | Max: 20m 02s
      🟩 nvcc               Pass: 100%/48  | Total: 18h 08m | Avg: 22m 40s | Max:  1h 38m | Hits:  93%/50073 
    🔍 cxx: Clang19 🔍
      🟩 Clang14            Pass: 100%/4   | Total: 26m 57s | Avg:  6m 44s | Max:  7m 09s | Hits:  99%/5094  
      🟩 Clang15            Pass: 100%/2   | Total: 14m 28s | Avg:  7m 14s | Max:  7m 21s | Hits:  99%/2543  
      🟩 Clang16            Pass: 100%/2   | Total: 13m 47s | Avg:  6m 53s | Max:  7m 01s | Hits:  99%/2543  
      🟩 Clang17            Pass: 100%/2   | Total: 14m 18s | Avg:  7m 09s | Max:  7m 21s | Hits:  99%/2543  
      🟩 Clang18            Pass: 100%/2   | Total: 13m 23s | Avg:  6m 41s | Max:  6m 45s | Hits:  99%/2543  
      🔍 Clang19            Pass:  71%/7   | Total:  1h 42m | Avg: 14m 37s | Max: 24m 30s | Hits:  99%/3815  
      🟩 GCC7               Pass: 100%/2   | Total: 17m 02s | Avg:  8m 31s | Max:  8m 52s | Hits:  99%/2546  
      🟩 GCC8               Pass: 100%/1   | Total:  8m 54s | Avg:  8m 54s | Max:  8m 54s | Hits:  99%/1273  
      🟩 GCC9               Pass: 100%/2   | Total: 18m 09s | Avg:  9m 04s | Max:  9m 06s | Hits:  99%/2546  
      🟩 GCC10              Pass: 100%/2   | Total: 19m 04s | Avg:  9m 32s | Max:  9m 39s | Hits:  99%/2547  
      🟩 GCC11              Pass: 100%/2   | Total: 17m 44s | Avg:  8m 52s | Max:  8m 54s | Hits:  99%/2543  
      🟩 GCC12              Pass: 100%/2   | Total: 19m 00s | Avg:  9m 30s | Max:  9m 33s | Hits:  99%/2543  
      🟩 GCC13              Pass: 100%/12  | Total:  3h 03m | Avg: 15m 16s | Max: 24m 43s | Hits:  99%/7641  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  3h 02m | Avg:  1h 31m | Max:  1h 32m | Hits:  64%/2336  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  5h 33m | Avg:  1h 23m | Max:  1h 38m | Hits:  64%/4672  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m | Hits:  64%/2345  
    🔍 cxx_family: Clang 🔍
      🔍 Clang              Pass:  89%/19  | Total:  3h 05m | Avg:  9m 45s | Max: 24m 30s | Hits:  99%/19081 
      🟩 GCC                Pass: 100%/23  | Total:  4h 43m | Avg: 12m 18s | Max: 24m 43s | Hits:  99%/21639 
      🟩 MSVC               Pass: 100%/6   | Total:  8h 36m | Avg:  1h 26m | Max:  1h 38m | Hits:  64%/7008  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 21m | Avg:  1h 10m | Max:  1h 12m | Hits:  64%/2345  
    🔍 gpu: rtx2080 🔍
      🟩 h100               Pass: 100%/3   | Total: 55m 32s | Avg: 18m 30s | Max: 24m 43s | Hits:  99%/1274  
      🔍 rtx2080            Pass:  94%/39  | Total: 15h 26m | Avg: 23m 44s | Max:  1h 38m | Hits:  92%/46253 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 24m | Avg: 18m 07s | Max: 24m 30s | Hits:  99%/2546  
    🔍 jobs: Build 🔍
      🔍 Build              Pass:  95%/42  | Total: 15h 50m | Avg: 22m 37s | Max:  1h 38m | Hits:  93%/50073 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 23m 10s | Avg: 23m 10s | Max: 23m 10s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 28s | Avg: 15m 28s | Max: 15m 28s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 12m | Avg: 24m 10s | Max: 24m 30s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 05m | Avg: 21m 44s | Max: 24m 43s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total: 55m 32s | Avg: 18m 30s | Max: 24m 43s | Hits:  99%/1274  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 17m | Avg: 38m 57s | Max:  1h 09m | Hits:  82%/2442  
      🟩 100;120            Pass: 100%/2   | Total:  1h 20m | Avg: 40m 20s | Max:  1h 12m | Hits:  82%/2442  
    🟨 std
      🟨 17                 Pass:  95%/21  | Total:  8h 23m | Avg: 23m 57s | Max:  1h 38m | Hits:  93%/25028 
      🟨 20                 Pass:  96%/29  | Total: 10h 23m | Avg: 21m 30s | Max:  1h 33m | Hits:  93%/25045 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 14h 50m | Avg: 17m 48s | Max: 1h 16m | Hits: 94%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 13m 06s | Avg:  6m 33s | Max:  8m 02s | Hits:  99%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 14h 38m | Avg: 18m 18s | Max:  1h 16m | Hits:  94%/80312 
      🟩 arm64              Pass: 100%/2   | Total: 11m 55s | Avg:  5m 57s | Max:  6m 50s | Hits:  99%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 29m | Avg: 17m 57s | Max:  1h 04m | Hits:  95%/9560  
      🟩 12.9               Pass: 100%/45  | Total: 13h 20m | Avg: 17m 47s | Max:  1h 16m | Hits:  94%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 53m 28s | Avg: 26m 44s | Max: 27m 20s | Hits:  80%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 29m | Avg: 17m 57s | Max:  1h 04m | Hits:  95%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 12h 27m | Avg: 17m 22s | Max:  1h 16m | Hits:  94%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 53m 28s | Avg: 26m 44s | Max: 27m 20s | Hits:  80%/3826  
      🟩 nvcc               Pass: 100%/48  | Total: 13h 57m | Avg: 17m 26s | Max:  1h 16m | Hits:  94%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 22m 49s | Avg:  5m 42s | Max:  6m 04s | Hits: 100%/7652  
      🟩 Clang15            Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 23s | Hits: 100%/3826  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 52s | Avg:  6m 26s | Max:  6m 27s | Hits: 100%/3826  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 08s | Avg:  6m 04s | Max:  6m 11s | Hits: 100%/3826  
      🟩 Clang18            Pass: 100%/2   | Total: 12m 17s | Avg:  6m 08s | Max:  6m 18s | Hits: 100%/3826  
      🟩 Clang19            Pass: 100%/7   | Total:  1h 20m | Avg: 11m 32s | Max: 27m 20s | Hits:  92%/9565  
      🟩 GCC7               Pass: 100%/2   | Total: 14m 45s | Avg:  7m 22s | Max:  7m 31s | Hits:  99%/3828  
      🟩 GCC8               Pass: 100%/1   | Total:  7m 57s | Avg:  7m 57s | Max:  7m 57s | Hits:  99%/1914  
      🟩 GCC9               Pass: 100%/2   | Total: 15m 11s | Avg:  7m 35s | Max:  8m 05s | Hits:  99%/3828  
      🟩 GCC10              Pass: 100%/2   | Total: 14m 53s | Avg:  7m 26s | Max:  7m 32s | Hits:  99%/3828  
      🟩 GCC11              Pass: 100%/2   | Total: 16m 53s | Avg:  8m 26s | Max:  8m 41s | Hits:  99%/3828  
      🟩 GCC12              Pass: 100%/2   | Total: 16m 12s | Avg:  8m 06s | Max:  8m 21s | Hits:  99%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  1h 16m | Avg:  6m 57s | Max: 10m 34s | Hits:  96%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 13m | Hits:  76%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  4h 49m | Avg: 57m 58s | Max:  1h 15m | Hits:  80%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 16m | Hits:  76%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  2h 33m | Avg:  8m 03s | Max: 27m 20s | Hits:  97%/32521 
      🟩 GCC                Pass: 100%/22  | Total:  2h 42m | Avg:  7m 22s | Max: 10m 34s | Hits:  98%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  7h 08m | Avg:  1h 01m | Max:  1h 15m | Hits:  79%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 26m | Avg:  1h 13m | Max:  1h 16m | Hits:  76%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 16m 38s | Avg:  8m 19s | Max: 10m 34s | Hits:  79%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total: 11h 58m | Avg: 18m 53s | Max:  1h 16m | Hits:  94%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 35m | Avg: 15m 34s | Max:  1h 15m | Hits:  95%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total: 13h 46m | Avg: 19m 13s | Max:  1h 16m | Hits:  94%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 41m 34s | Avg: 13m 51s | Max: 33m 31s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 22m 02s | Avg:  5m 30s | Max:  6m 04s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 16m 38s | Avg:  8m 19s | Max: 10m 34s | Hits:  79%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 01m | Avg: 30m 32s | Max: 53m 48s | Hits:  88%/3820  
      🟩 100;120            Pass: 100%/2   | Total:  1h 03m | Avg: 31m 58s | Max: 57m 12s | Hits:  88%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  7h 04m | Avg: 20m 12s | Max:  1h 16m | Hits:  94%/40160 
      🟩 20                 Pass: 100%/27  | Total:  7h 33m | Avg: 16m 47s | Max:  1h 15m | Hits:  93%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 07m | Avg: 6m 40s | Max: 30m 11s | Hits: 97%/15390

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  2h 55m | Avg:  7m 17s | Max: 30m 11s | Hits:  96%/13018 
      🟩 arm64              Pass: 100%/4   | Total: 11m 59s | Avg:  2m 59s | Max:  3m 18s | Hits:  99%/2372  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 20m 30s | Avg:  6m 50s | Max: 14m 02s | Hits:  95%/1474  
      🟩 12.9               Pass: 100%/25  | Total:  2h 46m | Avg:  6m 39s | Max: 30m 11s | Hits:  97%/13916 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 20m 30s | Avg:  6m 50s | Max: 14m 02s | Hits:  95%/1474  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  2h 46m | Avg:  6m 39s | Max: 30m 11s | Hits:  97%/13916 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 07m | Avg:  6m 40s | Max: 30m 11s | Hits:  97%/15390 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  6m 20s | Avg:  3m 10s | Max:  3m 17s | Hits: 100%/1188  
      🟩 Clang15            Pass: 100%/1   | Total:  3m 17s | Avg:  3m 17s | Max:  3m 17s | Hits: 100%/593   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 18s | Avg:  3m 18s | Max:  3m 18s | Hits: 100%/593   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s | Hits: 100%/593   
      🟩 Clang18            Pass: 100%/1   | Total:  3m 34s | Avg:  3m 34s | Max:  3m 34s | Hits: 100%/593   
      🟩 Clang19            Pass: 100%/4   | Total: 18m 40s | Avg:  4m 40s | Max:  9m 49s | Hits: 100%/2372  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 24s | Avg:  3m 42s | Max:  3m 59s | Hits:  99%/1188  
      🟩 GCC11              Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s | Hits:  99%/593   
      🟩 GCC12              Pass: 100%/1   | Total:  3m 59s | Avg:  3m 59s | Max:  3m 59s | Hits:  99%/593   
      🟩 GCC13              Pass: 100%/8   | Total: 57m 47s | Avg:  7m 13s | Max: 30m 11s | Hits:  99%/4744  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 14m 02s | Avg: 14m 02s | Max: 14m 02s | Hits:  77%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 39m 14s | Avg: 13m 04s | Max: 13m 39s | Hits:  76%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 11m 44s | Hits:  87%/1182  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 38m 30s | Avg:  3m 51s | Max:  9m 49s | Hits: 100%/5932  
      🟩 GCC                Pass: 100%/12  | Total:  1h 13m | Avg:  6m 05s | Max: 30m 11s | Hits:  99%/7118  
      🟩 MSVC               Pass: 100%/4   | Total: 53m 16s | Avg: 13m 19s | Max: 14m 02s | Hits:  76%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 22m 13s | Avg: 11m 06s | Max: 11m 44s | Hits:  87%/1182  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  6m 36s | Hits:  99%/1186  
      🟩 rtx2080            Pass: 100%/26  | Total:  2h 57m | Avg:  6m 49s | Max: 30m 11s | Hits:  96%/14204 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  2h 20m | Avg:  5m 37s | Max: 14m 02s | Hits:  96%/13611 
      🟩 Test               Pass: 100%/3   | Total: 46m 36s | Avg: 15m 32s | Max: 30m 11s | Hits:  99%/1779  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total:  9m 46s | Avg:  4m 53s | Max:  6m 36s | Hits:  99%/1186  
      🟩 90;90a             Pass: 100%/2   | Total: 16m 50s | Avg:  8m 25s | Max: 13m 14s | Hits:  92%/883   
      🟩 100;120            Pass: 100%/2   | Total: 15m 57s | Avg:  7m 58s | Max: 12m 21s | Hits:  92%/883   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 16m 24s | Avg:  5m 28s | Max: 10m 29s | Hits:  95%/1777  
      🟩 20                 Pass: 100%/25  | Total:  2h 50m | Avg:  6m 49s | Max: 30m 11s | Hits:  97%/13613 
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 2h 50m | Avg: 42m 34s | Max: 2h 15m | Hits: 98%/680

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total:  2h 50m | Avg: 42m 34s | Max:  2h 15m | Hits:  98%/680   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 15m 56s | Avg: 15m 56s | Max: 15m 56s | Hits:  98%/170   
      🟩 l4                 Pass: 100%/1   | Total: 16m 53s | Avg: 16m 53s | Max: 16m 53s | Hits:  98%/170   
      🟩 rtx2080            Pass: 100%/2   | Total:  2h 17m | Avg:  1h 08m | Max:  2h 15m | Hits:  98%/340   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 01s | Avg:  2m 01s | Max:  2m 01s | Hits:  98%/170   
      🟩 Test               Pass: 100%/3   | Total:  2h 48m | Avg: 56m 05s | Max:  2h 15m | Hits:  98%/510   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 31m 45s | Avg: 7m 56s | Max: 9m 58s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 31m 45s | Avg:  7m 56s | Max:  9m 58s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 17m 16s | Avg:  8m 38s | Max:  9m 58s
      🟩 12.9               Pass: 100%/2   | Total: 14m 29s | Avg:  7m 14s | Max:  9m 57s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 17m 16s | Avg:  8m 38s | Max:  9m 58s
      🟩 nvcc12.9           Pass: 100%/2   | Total: 14m 29s | Avg:  7m 14s | Max:  9m 57s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 31m 45s | Avg:  7m 56s | Max:  9m 58s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  9m 58s | Avg:  9m 58s | Max:  9m 58s
      🟩 Clang19            Pass: 100%/1   | Total:  9m 57s | Avg:  9m 57s | Max:  9m 57s
      🟩 GCC12              Pass: 100%/1   | Total:  7m 18s | Avg:  7m 18s | Max:  7m 18s
      🟩 GCC13              Pass: 100%/1   | Total:  4m 32s | Avg:  4m 32s | Max:  4m 32s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 19m 55s | Avg:  9m 57s | Max:  9m 58s
      🟩 GCC                Pass: 100%/2   | Total: 11m 50s | Avg:  5m 55s | Max:  7m 18s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 31m 45s | Avg:  7m 56s | Max:  9m 58s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 31m 45s | Avg:  7m 56s | Max:  9m 58s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 17m 00s | Avg: 4m 15s | Max: 4m 22s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 29s | Avg:  4m 14s | Max:  4m 15s
      🟩 arm64              Pass: 100%/2   | Total:  8m 31s | Avg:  4m 15s | Max:  4m 22s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 17m 00s | Avg:  4m 15s | Max:  4m 22s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 24s | Avg:  4m 12s | Max:  4m 15s
      🟩 20                 Pass: 100%/2   | Total:  8m 36s | Avg:  4m 18s | Max:  4m 22s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

🟩 CI finished in 1h 30m: Pass: 100%/162 | Total: 1d 12h | Avg: 13m 39s | Max: 1h 16m | Hits: 97%/152477
  • 🟩 cub: Pass: 100%/50 | Total: 15h 54m | Avg: 19m 05s | Max: 1h 16m | Hits: 95%/52268

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 15h 39m | Avg: 19m 34s | Max:  1h 16m | Hits:  95%/49722 
      🟩 arm64              Pass: 100%/2   | Total: 15m 03s | Avg:  7m 31s | Max:  8m 58s | Hits:  99%/2546  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 36m | Avg: 19m 12s | Max:  1h 07m | Hits:  96%/6261  
      🟩 12.9               Pass: 100%/45  | Total: 14h 18m | Avg: 19m 04s | Max:  1h 16m | Hits:  95%/46007 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 36m 06s | Avg: 18m 03s | Max: 18m 51s | Hits:  86%/2195  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 36m | Avg: 19m 12s | Max:  1h 07m | Hits:  96%/6261  
      🟩 nvcc12.9           Pass: 100%/43  | Total: 13h 42m | Avg: 19m 07s | Max:  1h 16m | Hits:  96%/43812 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 36m 06s | Avg: 18m 03s | Max: 18m 51s | Hits:  86%/2195  
      🟩 nvcc               Pass: 100%/48  | Total: 15h 18m | Avg: 19m 07s | Max:  1h 16m | Hits:  96%/50073 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 26m 56s | Avg:  6m 44s | Max:  7m 34s | Hits:  99%/5094  
      🟩 Clang15            Pass: 100%/2   | Total: 13m 48s | Avg:  6m 54s | Max:  7m 00s | Hits:  99%/2543  
      🟩 Clang16            Pass: 100%/2   | Total: 13m 48s | Avg:  6m 54s | Max:  7m 01s | Hits:  99%/2543  
      🟩 Clang17            Pass: 100%/2   | Total: 13m 51s | Avg:  6m 55s | Max:  7m 02s | Hits:  99%/2543  
      🟩 Clang18            Pass: 100%/2   | Total: 13m 45s | Avg:  6m 52s | Max:  6m 56s | Hits:  99%/2543  
      🟩 Clang19            Pass: 100%/7   | Total:  1h 46m | Avg: 15m 10s | Max: 28m 44s | Hits:  95%/6010  
      🟩 GCC7               Pass: 100%/2   | Total: 17m 05s | Avg:  8m 32s | Max:  9m 00s | Hits:  99%/2546  
      🟩 GCC8               Pass: 100%/1   | Total:  8m 40s | Avg:  8m 40s | Max:  8m 40s | Hits:  99%/1273  
      🟩 GCC9               Pass: 100%/2   | Total: 18m 24s | Avg:  9m 12s | Max: 10m 02s | Hits:  99%/2546  
      🟩 GCC10              Pass: 100%/2   | Total: 17m 48s | Avg:  8m 54s | Max:  8m 58s | Hits:  99%/2547  
      🟩 GCC11              Pass: 100%/2   | Total: 18m 05s | Avg:  9m 02s | Max:  9m 16s | Hits:  99%/2543  
      🟩 GCC12              Pass: 100%/2   | Total: 19m 02s | Avg:  9m 31s | Max:  9m 39s | Hits:  99%/2543  
      🟩 GCC13              Pass: 100%/12  | Total:  3h 29m | Avg: 17m 25s | Max: 35m 05s | Hits:  99%/7641  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 18m | Avg:  1h 09m | Max:  1h 11m | Hits:  80%/2336  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  3h 57m | Avg: 59m 20s | Max:  1h 16m | Hits:  80%/4672  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  1h 22m | Avg: 41m 04s | Max: 44m 20s | Hits:  83%/2345  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  3h 08m | Avg:  9m 54s | Max: 28m 44s | Hits:  98%/21276 
      🟩 GCC                Pass: 100%/23  | Total:  5h 08m | Avg: 13m 24s | Max: 35m 05s | Hits:  99%/21639 
      🟩 MSVC               Pass: 100%/6   | Total:  6h 15m | Avg:  1h 02m | Max:  1h 16m | Hits:  80%/7008  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 22m | Avg: 41m 04s | Max: 44m 20s | Hits:  83%/2345  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 06m | Avg: 22m 11s | Max: 33m 13s | Hits:  99%/1274  
      🟩 rtx2080            Pass: 100%/39  | Total: 12h 03m | Avg: 18m 32s | Max:  1h 16m | Hits:  95%/48448 
      🟩 rtxa6000           Pass: 100%/8   | Total:  2h 44m | Avg: 20m 33s | Max: 35m 05s | Hits:  99%/2546  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total: 12h 28m | Avg: 17m 49s | Max:  1h 16m | Hits:  95%/52268 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 46s | Avg: 24m 46s | Max: 24m 46s
      🟩 GraphCapture       Pass: 100%/1   | Total: 14m 52s | Avg: 14m 52s | Max: 14m 52s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 37m | Avg: 32m 20s | Max: 35m 05s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 09m | Avg: 23m 00s | Max: 26m 13s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 06m | Avg: 22m 11s | Max: 33m 13s | Hits:  99%/1274  
      🟩 90;90a             Pass: 100%/2   | Total: 50m 41s | Avg: 25m 20s | Max: 42m 39s | Hits:  92%/2442  
      🟩 100;120            Pass: 100%/2   | Total: 53m 03s | Avg: 26m 31s | Max: 44m 14s | Hits:  92%/2442  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  6h 47m | Avg: 19m 25s | Max:  1h 16m | Hits:  95%/26125 
      🟩 20                 Pass: 100%/29  | Total:  9h 06m | Avg: 18m 50s | Max:  1h 14m | Hits:  96%/26143 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 10h 34m | Avg: 12m 41s | Max: 48m 06s | Hits: 97%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 14m 42s | Avg:  7m 21s | Max:  8m 49s | Hits:  99%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total: 10h 22m | Avg: 12m 58s | Max: 48m 06s | Hits:  97%/80312 
      🟩 arm64              Pass: 100%/2   | Total: 12m 03s | Avg:  6m 01s | Max:  6m 57s | Hits:  99%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  1h 00m | Avg: 12m 10s | Max: 35m 51s | Hits:  98%/9560  
      🟩 12.9               Pass: 100%/45  | Total:  9h 33m | Avg: 12m 45s | Max: 48m 06s | Hits:  97%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 33s | Hits: 100%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  1h 00m | Avg: 12m 10s | Max: 35m 51s | Hits:  98%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  9h 22m | Avg: 13m 05s | Max: 48m 06s | Hits:  97%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 11m 02s | Avg:  5m 31s | Max:  5m 33s | Hits: 100%/3826  
      🟩 nvcc               Pass: 100%/48  | Total: 10h 23m | Avg: 12m 59s | Max: 48m 06s | Hits:  97%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total: 23m 13s | Avg:  5m 48s | Max:  6m 16s | Hits: 100%/7652  
      🟩 Clang15            Pass: 100%/2   | Total: 11m 48s | Avg:  5m 54s | Max:  5m 59s | Hits: 100%/3826  
      🟩 Clang16            Pass: 100%/2   | Total: 12m 07s | Avg:  6m 03s | Max:  6m 08s | Hits: 100%/3826  
      🟩 Clang17            Pass: 100%/2   | Total: 12m 30s | Avg:  6m 15s | Max:  6m 33s | Hits: 100%/3826  
      🟩 Clang18            Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  6m 11s | Hits: 100%/3826  
      🟩 Clang19            Pass: 100%/7   | Total: 39m 42s | Avg:  5m 40s | Max:  6m 39s | Hits: 100%/9565  
      🟩 GCC7               Pass: 100%/2   | Total: 13m 51s | Avg:  6m 55s | Max:  7m 12s | Hits:  99%/3828  
      🟩 GCC8               Pass: 100%/1   | Total:  7m 18s | Avg:  7m 18s | Max:  7m 18s | Hits:  99%/1914  
      🟩 GCC9               Pass: 100%/2   | Total: 15m 17s | Avg:  7m 38s | Max:  7m 53s | Hits:  99%/3828  
      🟩 GCC10              Pass: 100%/2   | Total: 15m 06s | Avg:  7m 33s | Max:  7m 55s | Hits:  99%/3828  
      🟩 GCC11              Pass: 100%/2   | Total: 15m 03s | Avg:  7m 31s | Max:  7m 54s | Hits:  99%/3828  
      🟩 GCC12              Pass: 100%/2   | Total: 16m 34s | Avg:  8m 17s | Max:  8m 25s | Hits:  99%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  1h 46m | Avg:  9m 40s | Max: 36m 32s | Hits:  94%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 23m | Avg: 41m 58s | Max: 48m 06s | Hits:  91%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  2h 57m | Avg: 35m 33s | Max: 43m 20s | Hits:  93%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  1h 11m | Avg: 35m 57s | Max: 37m 14s | Hits:  92%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  1h 51m | Avg:  5m 52s | Max:  6m 39s | Hits: 100%/32521 
      🟩 GCC                Pass: 100%/22  | Total:  3h 09m | Avg:  8m 37s | Max: 36m 32s | Hits:  97%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  4h 21m | Avg: 37m 23s | Max: 48m 06s | Hits:  92%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 11m | Avg: 35m 57s | Max: 37m 14s | Hits:  92%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 14m 34s | Avg:  7m 17s | Max:  8m 41s | Hits:  99%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total:  8h 13m | Avg: 12m 59s | Max: 48m 06s | Hits:  97%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  2h 06m | Avg: 12m 39s | Max: 42m 44s | Hits:  97%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total:  9h 26m | Avg: 13m 10s | Max: 48m 06s | Hits:  97%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 41m 58s | Avg: 13m 59s | Max: 33m 31s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 26m 26s | Avg:  6m 36s | Max:  8m 41s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 14m 34s | Avg:  7m 17s | Max:  8m 41s | Hits:  99%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 05m | Avg: 32m 50s | Max: 36m 32s | Hits:  78%/3820  
      🟩 100;120            Pass: 100%/2   | Total: 35m 49s | Avg: 17m 54s | Max: 29m 05s | Hits:  97%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total:  4h 42m | Avg: 13m 25s | Max: 48m 06s | Hits:  98%/40160 
      🟩 20                 Pass: 100%/27  | Total:  5h 38m | Avg: 12m 31s | Max: 42m 44s | Hits:  96%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 59m | Avg: 8m 34s | Max: 53m 00s | Hits: 99%/15390

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  3h 47m | Avg:  9m 29s | Max: 53m 00s | Hits:  99%/13018 
      🟩 arm64              Pass: 100%/4   | Total: 12m 03s | Avg:  3m 00s | Max:  3m 22s | Hits:  99%/2372  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 18m 31s | Avg:  6m 10s | Max: 12m 10s | Hits:  98%/1474  
      🟩 12.9               Pass: 100%/25  | Total:  3h 41m | Avg:  8m 51s | Max: 53m 00s | Hits:  99%/13916 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 18m 31s | Avg:  6m 10s | Max: 12m 10s | Hits:  98%/1474  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  3h 41m | Avg:  8m 51s | Max: 53m 00s | Hits:  99%/13916 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 59m | Avg:  8m 34s | Max: 53m 00s | Hits:  99%/15390 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  6m 16s | Avg:  3m 08s | Max:  3m 23s | Hits: 100%/1188  
      🟩 Clang15            Pass: 100%/1   | Total:  3m 21s | Avg:  3m 21s | Max:  3m 21s | Hits: 100%/593   
      🟩 Clang16            Pass: 100%/1   | Total:  3m 25s | Avg:  3m 25s | Max:  3m 25s | Hits: 100%/593   
      🟩 Clang17            Pass: 100%/1   | Total:  3m 30s | Avg:  3m 30s | Max:  3m 30s | Hits: 100%/593   
      🟩 Clang18            Pass: 100%/1   | Total:  3m 12s | Avg:  3m 12s | Max:  3m 12s | Hits: 100%/593   
      🟩 Clang19            Pass: 100%/4   | Total: 57m 01s | Avg: 14m 15s | Max: 48m 27s | Hits:  99%/2372  
      🟩 GCC10              Pass: 100%/2   | Total:  7m 43s | Avg:  3m 51s | Max:  4m 15s | Hits:  99%/1188  
      🟩 GCC11              Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  99%/593   
      🟩 GCC12              Pass: 100%/1   | Total:  4m 05s | Avg:  4m 05s | Max:  4m 05s | Hits:  99%/593   
      🟩 GCC13              Pass: 100%/8   | Total:  1h 23m | Avg: 10m 24s | Max: 53m 00s | Hits:  99%/4744  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 12m 10s | Avg: 12m 10s | Max: 12m 10s | Hits:  95%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 33m 54s | Avg: 11m 18s | Max: 11m 37s | Hits:  95%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 17m 58s | Avg:  8m 59s | Max: 10m 19s | Hits:  97%/1182  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total:  1h 16m | Avg:  7m 40s | Max: 48m 27s | Hits:  99%/5932  
      🟩 GCC                Pass: 100%/12  | Total:  1h 39m | Avg:  8m 15s | Max: 53m 00s | Hits:  99%/7118  
      🟩 MSVC               Pass: 100%/4   | Total: 46m 04s | Avg: 11m 31s | Max: 12m 10s | Hits:  95%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 17m 58s | Avg:  8m 59s | Max: 10m 19s | Hits:  97%/1182  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  8m 57s | Hits:  99%/1186  
      🟩 rtx2080            Pass: 100%/26  | Total:  3h 47m | Avg:  8m 45s | Max: 53m 00s | Hits:  99%/14204 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  2h 09m | Avg:  5m 10s | Max: 12m 10s | Hits:  99%/13611 
      🟩 Test               Pass: 100%/3   | Total:  1h 50m | Avg: 36m 48s | Max: 53m 00s | Hits:  99%/1779  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 12m 15s | Avg:  6m 07s | Max:  8m 57s | Hits:  99%/1186  
      🟩 90;90a             Pass: 100%/2   | Total: 14m 36s | Avg:  7m 18s | Max: 10m 50s | Hits:  98%/883   
      🟩 100;120            Pass: 100%/2   | Total: 15m 14s | Avg:  7m 37s | Max: 11m 37s | Hits:  98%/883   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 13m 40s | Avg:  4m 33s | Max:  7m 39s | Hits:  99%/1777  
      🟩 20                 Pass: 100%/25  | Total:  3h 46m | Avg:  9m 03s | Max: 53m 00s | Hits:  99%/13613 
    
  • 🟩 python: Pass: 100%/22 | Total: 4h 19m | Avg: 11m 46s | Max: 25m 01s

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  4h 19m | Avg: 11m 46s | Max: 25m 01s
    🟩 ctk
      🟩 12.5               Pass: 100%/6   | Total: 51m 20s | Avg:  8m 33s | Max: 16m 19s
      🟩 12.8               Pass: 100%/2   | Total: 41m 29s | Avg: 20m 44s | Max: 21m 50s
      🟩 12.9               Pass: 100%/14  | Total:  2h 46m | Avg: 11m 52s | Max: 25m 01s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/6   | Total: 51m 20s | Avg:  8m 33s | Max: 16m 19s
      🟩 nvcc12.8           Pass: 100%/2   | Total: 41m 29s | Avg: 20m 44s | Max: 21m 50s
      🟩 nvcc12.9           Pass: 100%/14  | Total:  2h 46m | Avg: 11m 52s | Max: 25m 01s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  4h 19m | Avg: 11m 46s | Max: 25m 01s
    🟩 cxx
      🟩 GCC13              Pass: 100%/22  | Total:  4h 19m | Avg: 11m 46s | Max: 25m 01s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/22  | Total:  4h 19m | Avg: 11m 46s | Max: 25m 01s
    🟩 gpu
      🟩 h100               Pass: 100%/4   | Total: 57m 21s | Avg: 14m 20s | Max: 25m 01s
      🟩 l4                 Pass: 100%/18  | Total:  3h 21m | Avg: 11m 12s | Max: 21m 50s
    🟩 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 02s | Avg:  9m 31s | Max:  9m 32s
      🟩 Test cuda.cccl.cooperative Pass: 100%/5   | Total:  1h 17m | Avg: 15m 24s | Max: 16m 19s
      🟩 Test cuda.cccl.examples Pass: 100%/5   | Total: 33m 36s | Avg:  6m 43s | Max: 11m 31s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 23m 07s | Avg:  4m 37s | Max:  5m 06s
      🟩 Test cuda.cccl.parallel Pass: 100%/5   | Total:  1h 46m | Avg: 21m 15s | Max: 25m 01s
    🟩 py_version
      🟩 3.10               Pass: 100%/9   | Total:  1h 40m | Avg: 11m 11s | Max: 21m 50s
      🟩 3.13               Pass: 100%/13  | Total:  2h 38m | Avg: 12m 11s | Max: 25m 01s
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits: 98%/680

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total: 49m 10s | Avg: 12m 17s | Max: 17m 10s | Hits:  98%/680   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 16m 15s | Avg: 16m 15s | Max: 16m 15s | Hits:  98%/170   
      🟩 l4                 Pass: 100%/1   | Total: 17m 10s | Avg: 17m 10s | Max: 17m 10s | Hits:  98%/170   
      🟩 rtx2080            Pass: 100%/2   | Total: 15m 45s | Avg:  7m 52s | Max: 13m 37s | Hits:  98%/340   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 08s | Avg:  2m 08s | Max:  2m 08s | Hits:  98%/170   
      🟩 Test               Pass: 100%/3   | Total: 47m 02s | Avg: 15m 40s | Max: 17m 10s | Hits:  98%/510   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 50m 45s | Avg: 12m 41s | Max: 15m 48s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 50m 45s | Avg: 12m 41s | Max: 15m 48s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total: 23m 00s | Avg: 11m 30s | Max: 11m 57s
      🟩 12.9               Pass: 100%/2   | Total: 27m 45s | Avg: 13m 52s | Max: 15m 48s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total: 23m 00s | Avg: 11m 30s | Max: 11m 57s
      🟩 nvcc12.9           Pass: 100%/2   | Total: 27m 45s | Avg: 13m 52s | Max: 15m 48s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 50m 45s | Avg: 12m 41s | Max: 15m 48s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total: 11m 03s | Avg: 11m 03s | Max: 11m 03s
      🟩 Clang19            Pass: 100%/1   | Total: 15m 48s | Avg: 15m 48s | Max: 15m 48s
      🟩 GCC12              Pass: 100%/1   | Total: 11m 57s | Avg: 11m 57s | Max: 11m 57s
      🟩 GCC13              Pass: 100%/1   | Total: 11m 57s | Avg: 11m 57s | Max: 11m 57s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total: 26m 51s | Avg: 13m 25s | Max: 15m 48s
      🟩 GCC                Pass: 100%/2   | Total: 23m 54s | Avg: 11m 57s | Max: 11m 57s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 50m 45s | Avg: 12m 41s | Max: 15m 48s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 50m 45s | Avg: 12m 41s | Max: 15m 48s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 24m 32s | Avg: 6m 08s | Max: 7m 24s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 10m 49s | Avg:  5m 24s | Max:  7m 13s
      🟩 arm64              Pass: 100%/2   | Total: 13m 43s | Avg:  6m 51s | Max:  7m 24s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 24m 32s | Avg:  6m 08s | Max:  7m 24s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total: 14m 37s | Avg:  7m 18s | Max:  7m 24s
      🟩 20                 Pass: 100%/2   | Total:  9m 55s | Avg:  4m 57s | Max:  6m 19s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@NaderAlAwar NaderAlAwar requested a review from miscco August 7, 2025 18:02
@github-project-automation github-project-automation bot moved this from In Review to In Progress in CCCL Aug 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2025

🟩 CI finished in 4h 06m: Pass: 100%/162 | Total: 3d 18h | Avg: 33m 23s | Max: 1h 42m | Hits: 77%/152477
  • 🟩 cub: Pass: 100%/50 | Total: 2d 00h | Avg: 57m 53s | Max: 1h 42m | Hits: 65%/52268

    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 22h | Avg: 57m 50s | Max:  1h 42m | Hits:  65%/49722 
      🟩 arm64              Pass: 100%/2   | Total:  1h 58m | Avg: 59m 16s | Max:  1h 01m | Hits:  64%/2546  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 37m | Avg:  1h 07m | Max:  1h 37m | Hits:  65%/6261  
      🟩 12.9               Pass: 100%/45  | Total:  1d 18h | Avg: 56m 49s | Max:  1h 42m | Hits:  65%/46007 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total:  1h 09m | Avg: 34m 45s | Max: 36m 07s | Hits:  70%/2195  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 37m | Avg:  1h 07m | Max:  1h 37m | Hits:  65%/6261  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 17h | Avg: 57m 51s | Max:  1h 42m | Hits:  65%/43812 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  1h 09m | Avg: 34m 45s | Max: 36m 07s | Hits:  70%/2195  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 23h | Avg: 58m 51s | Max:  1h 42m | Hits:  65%/50073 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  3h 53m | Avg: 58m 26s | Max:  1h 02m | Hits:  64%/5094  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m | Hits:  64%/2543  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 04m | Hits:  64%/2543  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 57m | Avg: 58m 48s | Max:  1h 01m | Hits:  64%/2543  
      🟩 Clang18            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 00m | Hits:  64%/2543  
      🟩 Clang19            Pass: 100%/7   | Total:  4h 49m | Avg: 41m 18s | Max: 59m 12s | Hits:  66%/6010  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 07m | Hits:  64%/2546  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 10m | Avg:  1h 10m | Max:  1h 10m | Hits:  64%/1273  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 09m | Avg:  1h 04m | Max:  1h 08m | Hits:  64%/2546  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 04m | Hits:  64%/2547  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 06m | Hits:  64%/2543  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:  64%/2543  
      🟩 GCC13              Pass: 100%/12  | Total:  7h 56m | Avg: 39m 43s | Max:  1h 13m | Hits:  64%/7641  
      🟩 MSVC14.29          Pass: 100%/2   | Total:  3h 20m | Avg:  1h 40m | Max:  1h 42m | Hits:  67%/2336  
      🟩 MSVC14.43          Pass: 100%/4   | Total:  5h 44m | Avg:  1h 26m | Max:  1h 39m | Hits:  67%/4672  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 23m | Hits:  64%/2345  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 16h 47m | Avg: 53m 00s | Max:  1h 04m | Hits:  65%/21276 
      🟩 GCC                Pass: 100%/23  | Total: 19h 48m | Avg: 51m 39s | Max:  1h 13m | Hits:  64%/21639 
      🟩 MSVC               Pass: 100%/6   | Total:  9h 05m | Avg:  1h 30m | Max:  1h 42m | Hits:  67%/7008  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 33m | Avg:  1h 16m | Max:  1h 23m | Hits:  64%/2345  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 21m | Avg: 27m 15s | Max: 32m 00s | Hits:  64%/1274  
      🟩 rtx2080            Pass: 100%/39  | Total:  1d 18h | Avg:  1h 05m | Max:  1h 42m | Hits:  65%/48448 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 23m | Avg: 32m 54s | Max:  1h 13m | Hits:  64%/2546  
    🟩 jobs
      🟩 Build              Pass: 100%/42  | Total:  1d 21h | Avg:  1h 04m | Max:  1h 42m | Hits:  65%/52268 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 24m 49s | Avg: 24m 49s | Max: 24m 49s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 24s | Avg: 15m 24s | Max: 15m 24s
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 15m | Avg: 25m 05s | Max: 25m 15s
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 09m | Avg: 23m 00s | Max: 24m 56s
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 21m | Avg: 27m 15s | Max: 32m 00s | Hits:  64%/1274  
      🟩 90;90a             Pass: 100%/2   | Total:  2h 10m | Avg:  1h 05m | Max:  1h 17m | Hits:  66%/2442  
      🟩 100;120            Pass: 100%/2   | Total:  2h 01m | Avg:  1h 00m | Max:  1h 11m | Hits:  66%/2442  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 23h 25m | Avg:  1h 06m | Max:  1h 42m | Hits:  65%/26125 
      🟩 20                 Pass: 100%/29  | Total:  1d 00h | Avg: 51m 20s | Max:  1h 39m | Hits:  65%/26143 
    
  • 🟩 thrust: Pass: 100%/50 | Total: 1d 06h | Avg: 36m 46s | Max: 1h 16m | Hits: 82%/84139

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 42m 41s | Avg: 21m 20s | Max: 34m 46s | Hits:  80%/1914  
    🟩 cpu
      🟩 amd64              Pass: 100%/48  | Total:  1d 05h | Avg: 36m 53s | Max:  1h 16m | Hits:  82%/80312 
      🟩 arm64              Pass: 100%/2   | Total:  1h 08m | Avg: 34m 11s | Max: 36m 31s | Hits:  80%/3827  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 05m | Avg: 37m 03s | Max: 51m 55s | Hits:  82%/9560  
      🟩 12.9               Pass: 100%/45  | Total:  1d 03h | Avg: 36m 44s | Max:  1h 16m | Hits:  82%/74579 
    🟩 cudacxx
      🟩 ClangCUDA19        Pass: 100%/2   | Total: 57m 37s | Avg: 28m 48s | Max: 30m 01s | Hits:  80%/3826  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 05m | Avg: 37m 03s | Max: 51m 55s | Hits:  82%/9560  
      🟩 nvcc12.9           Pass: 100%/43  | Total:  1d 02h | Avg: 37m 06s | Max:  1h 16m | Hits:  82%/70753 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 57m 37s | Avg: 28m 48s | Max: 30m 01s | Hits:  80%/3826  
      🟩 nvcc               Pass: 100%/48  | Total:  1d 05h | Avg: 37m 06s | Max:  1h 16m | Hits:  82%/80313 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 18m | Avg: 34m 31s | Max: 41m 27s | Hits:  80%/7652  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 12m | Avg: 36m 21s | Max: 36m 44s | Hits:  80%/3826  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 14m | Avg: 37m 15s | Max: 38m 00s | Hits:  80%/3826  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 20m | Avg: 40m 01s | Max: 41m 32s | Hits:  80%/3826  
      🟩 Clang18            Pass: 100%/2   | Total:  1h 10m | Avg: 35m 20s | Max: 35m 35s | Hits:  80%/3826  
      🟩 Clang19            Pass: 100%/7   | Total:  2h 56m | Avg: 25m 16s | Max: 37m 22s | Hits:  80%/9565  
      🟩 GCC7               Pass: 100%/2   | Total:  1h 21m | Avg: 40m 58s | Max: 46m 04s | Hits:  80%/3828  
      🟩 GCC8               Pass: 100%/1   | Total: 37m 11s | Avg: 37m 11s | Max: 37m 11s | Hits:  80%/1914  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 15m | Avg: 37m 41s | Max: 39m 44s | Hits:  80%/3828  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 20m | Avg: 40m 02s | Max: 40m 43s | Hits:  80%/3828  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 17m | Avg: 38m 42s | Max: 39m 04s | Hits:  80%/3828  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 30m | Avg: 45m 10s | Max: 47m 11s | Hits:  80%/3828  
      🟩 GCC13              Pass: 100%/11  | Total:  4h 28m | Avg: 24m 22s | Max: 41m 09s | Hits:  80%/13398 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 57m | Avg: 58m 32s | Max:  1h 05m | Hits:  90%/3812  
      🟩 MSVC14.43          Pass: 100%/5   | Total:  4h 13m | Avg: 50m 45s | Max:  1h 08m | Hits:  92%/9530  
      🟩 NVHPC25.5          Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 16m | Hits:  76%/3824  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 10h 12m | Avg: 32m 15s | Max: 41m 32s | Hits:  80%/32521 
      🟩 GCC                Pass: 100%/22  | Total: 11h 50m | Avg: 32m 17s | Max: 47m 11s | Hits:  80%/34452 
      🟩 MSVC               Pass: 100%/7   | Total:  6h 10m | Avg: 52m 58s | Max:  1h 08m | Hits:  91%/13342 
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 24m | Avg:  1h 12m | Max:  1h 16m | Hits:  76%/3824  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 22m 21s | Hits:  80%/1914  
      🟩 rtx2080            Pass: 100%/38  | Total:  1d 01h | Avg: 41m 01s | Max:  1h 16m | Hits:  81%/72672 
      🟩 rtx4090            Pass: 100%/10  | Total:  4h 11m | Avg: 25m 06s | Max:  1h 08m | Hits:  86%/9553  
    🟩 jobs
      🟩 Build              Pass: 100%/43  | Total:  1d 05h | Avg: 40m 54s | Max:  1h 16m | Hits:  81%/82233 
      🟩 TestCPU            Pass: 100%/3   | Total: 41m 36s | Avg: 13m 52s | Max: 33m 06s | Hits:  99%/1906  
      🟩 TestGPU            Pass: 100%/4   | Total: 38m 16s | Avg:  9m 34s | Max: 13m 21s
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 28m 59s | Avg: 14m 29s | Max: 22m 21s | Hits:  80%/1914  
      🟩 90;90a             Pass: 100%/2   | Total:  1h 17m | Avg: 38m 40s | Max: 45m 52s | Hits:  85%/3820  
      🟩 100;120            Pass: 100%/2   | Total:  1h 18m | Avg: 39m 08s | Max: 48m 02s | Hits:  85%/3820  
    🟩 std
      🟩 17                 Pass: 100%/21  | Total: 14h 52m | Avg: 42m 29s | Max:  1h 08m | Hits:  81%/40160 
      🟩 20                 Pass: 100%/27  | Total: 15h 03m | Avg: 33m 28s | Max:  1h 16m | Hits:  82%/42065 
    
  • 🟩 cudax: Pass: 100%/28 | Total: 3h 37m | Avg: 7m 45s | Max: 15m 45s | Hits: 90%/15390

    🟩 cpu
      🟩 amd64              Pass: 100%/24  | Total:  3h 15m | Avg:  8m 07s | Max: 15m 45s | Hits:  90%/13018 
      🟩 arm64              Pass: 100%/4   | Total: 22m 10s | Avg:  5m 32s | Max:  6m 03s | Hits:  89%/2372  
    🟩 ctk
      🟩 12.0               Pass: 100%/3   | Total: 23m 23s | Avg:  7m 47s | Max: 13m 11s | Hits:  88%/1474  
      🟩 12.9               Pass: 100%/25  | Total:  3h 13m | Avg:  7m 45s | Max: 15m 45s | Hits:  90%/13916 
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/3   | Total: 23m 23s | Avg:  7m 47s | Max: 13m 11s | Hits:  88%/1474  
      🟩 nvcc12.9           Pass: 100%/25  | Total:  3h 13m | Avg:  7m 45s | Max: 15m 45s | Hits:  90%/13916 
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/28  | Total:  3h 37m | Avg:  7m 45s | Max: 15m 45s | Hits:  90%/15390 
    🟩 cxx
      🟩 Clang14            Pass: 100%/2   | Total:  9m 53s | Avg:  4m 56s | Max:  5m 20s | Hits:  90%/1188  
      🟩 Clang15            Pass: 100%/1   | Total:  5m 37s | Avg:  5m 37s | Max:  5m 37s | Hits:  89%/593   
      🟩 Clang16            Pass: 100%/1   | Total:  5m 52s | Avg:  5m 52s | Max:  5m 52s | Hits:  89%/593   
      🟩 Clang17            Pass: 100%/1   | Total:  6m 03s | Avg:  6m 03s | Max:  6m 03s | Hits:  89%/593   
      🟩 Clang18            Pass: 100%/1   | Total:  5m 57s | Avg:  5m 57s | Max:  5m 57s | Hits:  89%/593   
      🟩 Clang19            Pass: 100%/4   | Total: 25m 31s | Avg:  6m 22s | Max:  9m 43s | Hits:  92%/2372  
      🟩 GCC10              Pass: 100%/2   | Total: 12m 35s | Avg:  6m 17s | Max:  6m 56s | Hits:  89%/1188  
      🟩 GCC11              Pass: 100%/1   | Total:  6m 54s | Avg:  6m 54s | Max:  6m 54s | Hits:  89%/593   
      🟩 GCC12              Pass: 100%/1   | Total:  6m 36s | Avg:  6m 36s | Max:  6m 36s | Hits:  89%/593   
      🟩 GCC13              Pass: 100%/8   | Total: 52m 54s | Avg:  6m 36s | Max: 10m 27s | Hits:  92%/4744  
      🟩 MSVC14.39          Pass: 100%/1   | Total: 13m 11s | Avg: 13m 11s | Max: 13m 11s | Hits:  80%/288   
      🟩 MSVC14.43          Pass: 100%/3   | Total: 44m 00s | Avg: 14m 40s | Max: 15m 45s | Hits:  80%/870   
      🟩 NVHPC25.5          Pass: 100%/2   | Total: 22m 11s | Avg: 11m 05s | Max: 11m 20s | Hits:  87%/1182  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/10  | Total: 58m 53s | Avg:  5m 53s | Max:  9m 43s | Hits:  90%/5932  
      🟩 GCC                Pass: 100%/12  | Total:  1h 18m | Avg:  6m 34s | Max: 10m 27s | Hits:  91%/7118  
      🟩 MSVC               Pass: 100%/4   | Total: 57m 11s | Avg: 14m 17s | Max: 15m 45s | Hits:  80%/1158  
      🟩 NVHPC              Pass: 100%/2   | Total: 22m 11s | Avg: 11m 05s | Max: 11m 20s | Hits:  87%/1182  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 11m 52s | Avg:  5m 56s | Max:  6m 43s | Hits:  94%/1186  
      🟩 rtx2080            Pass: 100%/26  | Total:  3h 25m | Avg:  7m 53s | Max: 15m 45s | Hits:  89%/14204 
    🟩 jobs
      🟩 Build              Pass: 100%/25  | Total:  3h 10m | Avg:  7m 36s | Max: 15m 45s | Hits:  88%/13611 
      🟩 Test               Pass: 100%/3   | Total: 26m 53s | Avg:  8m 57s | Max: 10m 27s | Hits:  99%/1779  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 11m 52s | Avg:  5m 56s | Max:  6m 43s | Hits:  94%/1186  
      🟩 90;90a             Pass: 100%/2   | Total: 20m 21s | Avg: 10m 10s | Max: 14m 31s | Hits:  86%/883   
      🟩 100;120            Pass: 100%/2   | Total: 19m 29s | Avg:  9m 44s | Max: 13m 44s | Hits:  86%/883   
    🟩 std
      🟩 17                 Pass: 100%/3   | Total: 22m 01s | Avg:  7m 20s | Max: 10m 51s | Hits:  88%/1777  
      🟩 20                 Pass: 100%/25  | Total:  3h 15m | Avg:  7m 48s | Max: 15m 45s | Hits:  90%/13613 
    
  • 🟩 python: Pass: 100%/22 | Total: 6h 08m | Avg: 16m 45s | Max: 29m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/22  | Total:  6h 08m | Avg: 16m 45s | Max: 29m 40s
    🟩 ctk
      🟩 12.5               Pass: 100%/6   | Total:  1h 31m | Avg: 15m 18s | Max: 22m 47s
      🟩 12.8               Pass: 100%/2   | Total: 51m 51s | Avg: 25m 55s | Max: 27m 52s
      🟩 12.9               Pass: 100%/14  | Total:  3h 44m | Avg: 16m 04s | Max: 29m 40s
    🟩 cudacxx
      🟩 nvcc12.5           Pass: 100%/6   | Total:  1h 31m | Avg: 15m 18s | Max: 22m 47s
      🟩 nvcc12.8           Pass: 100%/2   | Total: 51m 51s | Avg: 25m 55s | Max: 27m 52s
      🟩 nvcc12.9           Pass: 100%/14  | Total:  3h 44m | Avg: 16m 04s | Max: 29m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/22  | Total:  6h 08m | Avg: 16m 45s | Max: 29m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/22  | Total:  6h 08m | Avg: 16m 45s | Max: 29m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/22  | Total:  6h 08m | Avg: 16m 45s | Max: 29m 40s
    🟩 gpu
      🟩 h100               Pass: 100%/4   | Total: 55m 18s | Avg: 13m 49s | Max: 25m 43s
      🟩 l4                 Pass: 100%/18  | Total:  5h 13m | Avg: 17m 24s | Max: 29m 40s
    🟩 jobs
      🟩 Build cuda.cccl    Pass: 100%/2   | Total: 19m 35s | Avg:  9m 47s | Max: 10m 03s
      🟩 Test cuda.cccl.cooperative Pass: 100%/5   | Total:  1h 41m | Avg: 20m 20s | Max: 22m 47s
      🟩 Test cuda.cccl.examples Pass: 100%/5   | Total: 52m 51s | Avg: 10m 34s | Max: 15m 34s
      🟩 Test cuda.cccl.headers Pass: 100%/5   | Total: 59m 34s | Avg: 11m 54s | Max: 14m 27s
      🟩 Test cuda.cccl.parallel Pass: 100%/5   | Total:  2h 14m | Avg: 26m 59s | Max: 29m 40s
    🟩 py_version
      🟩 3.10               Pass: 100%/9   | Total:  2h 27m | Avg: 16m 24s | Max: 27m 41s
      🟩 3.13               Pass: 100%/13  | Total:  3h 40m | Avg: 16m 59s | Max: 29m 40s
    
  • 🟩 cccl_c_parallel: Pass: 100%/4 | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits: 97%/680

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 cxx
      🟩 GCC13              Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/4   | Total: 55m 54s | Avg: 13m 58s | Max: 21m 10s | Hits:  97%/680   
    🟩 gpu
      🟩 h100               Pass: 100%/1   | Total: 16m 02s | Avg: 16m 02s | Max: 16m 02s | Hits:  98%/170   
      🟩 l4                 Pass: 100%/1   | Total: 21m 10s | Avg: 21m 10s | Max: 21m 10s | Hits:  98%/170   
      🟩 rtx2080            Pass: 100%/2   | Total: 18m 42s | Avg:  9m 21s | Max: 15m 59s | Hits:  97%/340   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 43s | Avg:  2m 43s | Max:  2m 43s | Hits:  95%/170   
      🟩 Test               Pass: 100%/3   | Total: 53m 11s | Avg: 17m 43s | Max: 21m 10s | Hits:  98%/510   
    
  • 🟩 packaging: Pass: 100%/4 | Total: 17m 54s | Avg: 4m 28s | Max: 5m 19s

    🟩 cpu
      🟩 amd64              Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  5m 19s
    🟩 ctk
      🟩 12.0               Pass: 100%/2   | Total:  7m 42s | Avg:  3m 51s | Max:  3m 53s
      🟩 12.9               Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 19s
    🟩 cudacxx
      🟩 nvcc12.0           Pass: 100%/2   | Total:  7m 42s | Avg:  3m 51s | Max:  3m 53s
      🟩 nvcc12.9           Pass: 100%/2   | Total: 10m 12s | Avg:  5m 06s | Max:  5m 19s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  5m 19s
    🟩 cxx
      🟩 Clang14            Pass: 100%/1   | Total:  3m 53s | Avg:  3m 53s | Max:  3m 53s
      🟩 Clang19            Pass: 100%/1   | Total:  4m 53s | Avg:  4m 53s | Max:  4m 53s
      🟩 GCC12              Pass: 100%/1   | Total:  3m 49s | Avg:  3m 49s | Max:  3m 49s
      🟩 GCC13              Pass: 100%/1   | Total:  5m 19s | Avg:  5m 19s | Max:  5m 19s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/2   | Total:  8m 46s | Avg:  4m 23s | Max:  4m 53s
      🟩 GCC                Pass: 100%/2   | Total:  9m 08s | Avg:  4m 34s | Max:  5m 19s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  5m 19s
    🟩 jobs
      🟩 Test               Pass: 100%/4   | Total: 17m 54s | Avg:  4m 28s | Max:  5m 19s
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 16m 04s | Avg: 4m 01s | Max: 4m 15s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total:  8m 23s | Avg:  4m 11s | Max:  4m 15s
      🟩 arm64              Pass: 100%/2   | Total:  7m 41s | Avg:  3m 50s | Max:  3m 51s
    🟩 ctk
      🟩 12.9               Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 cudacxx
      🟩 nvcc12.9           Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 cxx
      🟩 NVHPC25.5          Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 16m 04s | Avg:  4m 01s | Max:  4m 15s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total:  8m 05s | Avg:  4m 02s | Max:  4m 15s
      🟩 20                 Pass: 100%/2   | Total:  7m 59s | Avg:  3m 59s | Max:  4m 08s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
CCCL Packaging
libcu++
+/- CUB
Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
+/- CCCL Packaging
libcu++
+/- CUB
+/- Thrust
+/- CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 162)

# Runner
93 linux-amd64-cpu16
17 linux-amd64-gpu-l4-latest-1
17 windows-amd64-cpu16
10 linux-arm64-cpu16
9 linux-amd64-gpu-h100-latest-1
7 linux-amd64-gpu-rtx2080-latest-1
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-rtx4090-latest-1

@NaderAlAwar NaderAlAwar requested a review from fbusato August 8, 2025 15:47
@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Aug 8, 2025
@NaderAlAwar NaderAlAwar merged commit 415c746 into NVIDIA:main Aug 8, 2025
173 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Aug 8, 2025
shwina pushed a commit to shwina/cccl that referenced this pull request Aug 19, 2025
davebayer pushed a commit to davebayer/cccl that referenced this pull request Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Nondeterministic atomic reduce fails to compile when dtype is double for older architectures

3 participants