Skip to content

Refactor three_way_parition tuning#3140

Merged
bernhardmgruber merged 6 commits intoNVIDIA:mainfrom
bernhardmgruber:ref_3wpartition_tuning
Dec 12, 2024
Merged

Refactor three_way_parition tuning#3140
bernhardmgruber merged 6 commits intoNVIDIA:mainfrom
bernhardmgruber:ref_3wpartition_tuning

Conversation

@bernhardmgruber
Copy link
Contributor

@bernhardmgruber bernhardmgruber commented Dec 12, 2024

  • SASS of cub.test.device_three_way_partition.lid_0 did not change except for kernel symbol names.

@github-actions
Copy link
Contributor

🟩 CI finished in 1h 48m: Pass: 100%/94 | Total: 2d 13h | Avg: 39m 11s | Max: 1h 08m | Hits: 74%/12324
  • 🟩 thrust: Pass: 100%/46 | Total: 23h 29m | Avg: 30m 39s | Max: 1h 03m | Hits: 78%/9260

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 41m 21s | Avg: 20m 40s | Max: 28m 24s
    🟩 cpu
      🟩 amd64              Pass: 100%/44  | Total: 22h 23m | Avg: 30m 32s | Max:  1h 03m | Hits:  78%/9260  
      🟩 arm64              Pass: 100%/2   | Total:  1h 06m | Avg: 33m 03s | Max: 38m 59s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  3h 32m | Avg: 30m 18s | Max:  1h 00m | Hits:  73%/1852  
      🟩 12.5               Pass: 100%/2   | Total:  1h 42m | Avg: 51m 18s | Max: 54m 02s
      🟩 12.6               Pass: 100%/37  | Total: 18h 15m | Avg: 29m 36s | Max:  1h 03m | Hits:  79%/7408  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 51m 21s | Avg: 25m 40s | Max: 25m 59s
      🟩 nvcc11.1           Pass: 100%/7   | Total:  3h 32m | Avg: 30m 18s | Max:  1h 00m | Hits:  73%/1852  
      🟩 nvcc12.5           Pass: 100%/2   | Total:  1h 42m | Avg: 51m 18s | Max: 54m 02s
      🟩 nvcc12.6           Pass: 100%/35  | Total: 17h 23m | Avg: 29m 49s | Max:  1h 03m | Hits:  79%/7408  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 51m 21s | Avg: 25m 40s | Max: 25m 59s
      🟩 nvcc               Pass: 100%/44  | Total: 22h 38m | Avg: 30m 52s | Max:  1h 03m | Hits:  78%/9260  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  1h 40m | Avg: 25m 11s | Max: 28m 23s
      🟩 Clang10            Pass: 100%/1   | Total: 34m 18s | Avg: 34m 18s | Max: 34m 18s
      🟩 Clang11            Pass: 100%/1   | Total: 32m 31s | Avg: 32m 31s | Max: 32m 31s
      🟩 Clang12            Pass: 100%/1   | Total: 30m 22s | Avg: 30m 22s | Max: 30m 22s
      🟩 Clang13            Pass: 100%/1   | Total: 28m 37s | Avg: 28m 37s | Max: 28m 37s
      🟩 Clang14            Pass: 100%/1   | Total: 30m 25s | Avg: 30m 25s | Max: 30m 25s
      🟩 Clang15            Pass: 100%/1   | Total: 32m 49s | Avg: 32m 49s | Max: 32m 49s
      🟩 Clang16            Pass: 100%/1   | Total: 29m 12s | Avg: 29m 12s | Max: 29m 12s
      🟩 Clang17            Pass: 100%/1   | Total: 31m 21s | Avg: 31m 21s | Max: 31m 21s
      🟩 Clang18            Pass: 100%/7   | Total:  2h 38m | Avg: 22m 41s | Max: 32m 48s
      🟩 GCC6               Pass: 100%/2   | Total: 51m 05s | Avg: 25m 32s | Max: 28m 36s
      🟩 GCC7               Pass: 100%/2   | Total: 57m 58s | Avg: 28m 59s | Max: 32m 01s
      🟩 GCC8               Pass: 100%/1   | Total: 30m 09s | Avg: 30m 09s | Max: 30m 09s
      🟩 GCC9               Pass: 100%/3   | Total:  1h 26m | Avg: 28m 59s | Max: 34m 40s
      🟩 GCC10              Pass: 100%/1   | Total: 36m 16s | Avg: 36m 16s | Max: 36m 16s
      🟩 GCC11              Pass: 100%/1   | Total: 30m 46s | Avg: 30m 46s | Max: 30m 46s
      🟩 GCC12              Pass: 100%/1   | Total: 32m 01s | Avg: 32m 01s | Max: 32m 01s
      🟩 GCC13              Pass: 100%/8   | Total:  3h 09m | Avg: 23m 44s | Max: 38m 59s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 36m 54s | Avg: 36m 54s | Max: 36m 54s
      🟩 MSVC14.16          Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  73%/1852  
      🟩 MSVC14.29          Pass: 100%/1   | Total: 50m 35s | Avg: 50m 35s | Max: 50m 35s | Hits:  73%/1852  
      🟩 MSVC14.39          Pass: 100%/3   | Total:  2h 14m | Avg: 44m 59s | Max:  1h 03m | Hits:  82%/5556  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  1h 42m | Avg: 51m 18s | Max: 54m 02s
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total:  8h 29m | Avg: 26m 48s | Max: 34m 18s
      🟩 GCC                Pass: 100%/19  | Total:  8h 35m | Avg: 27m 06s | Max: 38m 59s
      🟩 Intel              Pass: 100%/1   | Total: 36m 54s | Avg: 36m 54s | Max: 36m 54s
      🟩 MSVC               Pass: 100%/5   | Total:  4h 06m | Avg: 49m 13s | Max:  1h 03m | Hits:  78%/9260  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 42m | Avg: 51m 18s | Max: 54m 02s
    🟩 gpu
      🟩 v100               Pass: 100%/46  | Total: 23h 29m | Avg: 30m 39s | Max:  1h 03m | Hits:  78%/9260  
    🟩 jobs
      🟩 Build              Pass: 100%/40  | Total: 22h 17m | Avg: 33m 26s | Max:  1h 03m | Hits:  73%/7408  
      🟩 TestCPU            Pass: 100%/3   | Total: 36m 58s | Avg: 12m 19s | Max: 20m 39s | Hits:  99%/1852  
      🟩 TestGPU            Pass: 100%/3   | Total: 35m 07s | Avg: 11m 42s | Max: 12m 57s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 18m 13s | Avg: 18m 13s | Max: 18m 13s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  1h 59m | Avg: 23m 49s | Max: 25m 57s
      🟩 14                 Pass: 100%/4   | Total:  2h 29m | Avg: 37m 23s | Max:  1h 00m | Hits:  73%/1852  
      🟩 17                 Pass: 100%/12  | Total:  7h 16m | Avg: 36m 20s | Max: 54m 02s | Hits:  73%/3704  
      🟩 20                 Pass: 100%/23  | Total: 11h 03m | Avg: 28m 51s | Max:  1h 03m | Hits:  86%/3704  
    
  • 🟩 cub: Pass: 100%/45 | Total: 1d 13h | Avg: 49m 24s | Max: 1h 08m | Hits: 62%/3064

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 11h | Avg: 49m 02s | Max:  1h 08m | Hits:  62%/3064  
      🟩 arm64              Pass: 100%/2   | Total:  1h 54m | Avg: 57m 18s | Max: 57m 30s
    🟩 ctk
      🟩 11.1               Pass: 100%/7   | Total:  5h 33m | Avg: 47m 35s | Max: 57m 01s | Hits:  62%/766   
      🟩 12.5               Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
      🟩 12.6               Pass: 100%/36  | Total:  1d 05h | Avg: 48m 58s | Max:  1h 08m | Hits:  62%/2298  
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟩 nvcc11.1           Pass: 100%/7   | Total:  5h 33m | Avg: 47m 35s | Max: 57m 01s | Hits:  62%/766   
      🟩 nvcc12.5           Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
      🟩 nvcc12.6           Pass: 100%/34  | Total:  1d 03h | Avg: 48m 19s | Max:  1h 08m | Hits:  62%/2298  
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 00m | Avg:  1h 00m | Max:  1h 02m
      🟩 nvcc               Pass: 100%/43  | Total:  1d 11h | Avg: 48m 54s | Max:  1h 08m | Hits:  62%/3064  
    🟩 cxx
      🟩 Clang9             Pass: 100%/4   | Total:  3h 26m | Avg: 51m 40s | Max:  1h 00m
      🟩 Clang10            Pass: 100%/1   | Total: 55m 18s | Avg: 55m 18s | Max: 55m 18s
      🟩 Clang11            Pass: 100%/1   | Total: 53m 32s | Avg: 53m 32s | Max: 53m 32s
      🟩 Clang12            Pass: 100%/1   | Total: 50m 53s | Avg: 50m 53s | Max: 50m 53s
      🟩 Clang13            Pass: 100%/1   | Total: 53m 50s | Avg: 53m 50s | Max: 53m 50s
      🟩 Clang14            Pass: 100%/1   | Total: 55m 54s | Avg: 55m 54s | Max: 55m 54s
      🟩 Clang15            Pass: 100%/1   | Total: 56m 28s | Avg: 56m 28s | Max: 56m 28s
      🟩 Clang16            Pass: 100%/1   | Total: 54m 59s | Avg: 54m 59s | Max: 54m 59s
      🟩 Clang17            Pass: 100%/1   | Total: 55m 40s | Avg: 55m 40s | Max: 55m 40s
      🟩 Clang18            Pass: 100%/7   | Total:  5h 19m | Avg: 45m 38s | Max:  1h 02m
      🟩 GCC6               Pass: 100%/2   | Total:  1h 31m | Avg: 45m 46s | Max: 48m 24s
      🟩 GCC7               Pass: 100%/2   | Total:  1h 43m | Avg: 51m 41s | Max: 52m 07s
      🟩 GCC8               Pass: 100%/1   | Total: 54m 55s | Avg: 54m 55s | Max: 54m 55s
      🟩 GCC9               Pass: 100%/3   | Total:  2h 22m | Avg: 47m 31s | Max: 51m 16s
      🟩 GCC10              Pass: 100%/1   | Total: 59m 05s | Avg: 59m 05s | Max: 59m 05s
      🟩 GCC11              Pass: 100%/1   | Total: 51m 23s | Avg: 51m 23s | Max: 51m 23s
      🟩 GCC12              Pass: 100%/1   | Total: 54m 23s | Avg: 54m 23s | Max: 54m 23s
      🟩 GCC13              Pass: 100%/8   | Total:  4h 28m | Avg: 33m 33s | Max: 57m 26s
      🟩 Intel2023.2.0      Pass: 100%/1   | Total: 54m 45s | Avg: 54m 45s | Max: 54m 45s
      🟩 MSVC14.16          Pass: 100%/1   | Total: 57m 01s | Avg: 57m 01s | Max: 57m 01s | Hits:  62%/766   
      🟩 MSVC14.29          Pass: 100%/1   | Total:  1h 03m | Avg:  1h 03m | Max:  1h 03m | Hits:  62%/766   
      🟩 MSVC14.39          Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 08m | Hits:  62%/1532  
      🟩 NVHPC24.7          Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
    🟩 cxx_family
      🟩 Clang              Pass: 100%/19  | Total: 16h 02m | Avg: 50m 40s | Max:  1h 02m
      🟩 GCC                Pass: 100%/19  | Total: 13h 45m | Avg: 43m 27s | Max: 59m 05s
      🟩 Intel              Pass: 100%/1   | Total: 54m 45s | Avg: 54m 45s | Max: 54m 45s
      🟩 MSVC               Pass: 100%/4   | Total:  4h 13m | Avg:  1h 03m | Max:  1h 08m | Hits:  62%/3064  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 06m | Avg:  1h 03m | Max:  1h 07m
    🟩 gpu
      🟩 v100               Pass: 100%/45  | Total:  1d 13h | Avg: 49m 24s | Max:  1h 08m | Hits:  62%/3064  
    🟩 jobs
      🟩 Build              Pass: 100%/39  | Total:  1d 11h | Avg: 54m 05s | Max:  1h 08m | Hits:  62%/3064  
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 18m 51s | Avg: 18m 51s | Max: 18m 51s
      🟩 GraphCapture       Pass: 100%/1   | Total: 15m 54s | Avg: 15m 54s | Max: 15m 54s
      🟩 HostLaunch         Pass: 100%/2   | Total: 37m 17s | Avg: 18m 38s | Max: 19m 39s
      🟩 TestGPU            Pass: 100%/2   | Total: 41m 38s | Avg: 20m 49s | Max: 21m 03s
    🟩 sm
      🟩 90a                Pass: 100%/1   | Total: 23m 53s | Avg: 23m 53s | Max: 23m 53s
    🟩 std
      🟩 11                 Pass: 100%/5   | Total:  4h 06m | Avg: 49m 17s | Max: 52m 36s
      🟩 14                 Pass: 100%/4   | Total:  3h 33m | Avg: 53m 17s | Max:  1h 00m | Hits:  62%/766   
      🟩 17                 Pass: 100%/12  | Total: 11h 10m | Avg: 55m 53s | Max:  1h 08m | Hits:  62%/1532  
      🟩 20                 Pass: 100%/24  | Total: 18h 13m | Avg: 45m 32s | Max:  1h 07m | Hits:  62%/766   
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 12m 01s | Avg: 6m 00s | Max: 9m 55s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 ctk
      🟩 12.6               Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 gpu
      🟩 v100               Pass: 100%/2   | Total: 12m 01s | Avg:  6m 00s | Max:  9m 55s
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  2m 06s | Avg:  2m 06s | Max:  2m 06s
      🟩 Test               Pass: 100%/1   | Total:  9m 55s | Avg:  9m 55s | Max:  9m 55s
    
  • 🟩 python: Pass: 100%/1 | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 ctk
      🟩 12.6               Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 gpu
      🟩 v100               Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total: 38m 40s | Avg: 38m 40s | Max: 38m 40s
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
+/- CUB
Thrust
CUDA Experimental
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 94)

# Runner
70 linux-amd64-cpu16
11 linux-amd64-gpu-v100-latest-1
9 windows-amd64-cpu16
4 linux-arm64-cpu16

@bernhardmgruber bernhardmgruber merged commit 0ea508f into NVIDIA:main Dec 12, 2024
113 checks passed
@bernhardmgruber bernhardmgruber deleted the ref_3wpartition_tuning branch December 12, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants