Skip to content

[Backport branch/3.0.x] [Thrust] Perform asynchronous allocations by default for the par_nosync policy#4483

Merged
brycelelbach merged 1 commit intobranch/3.0.xfrom
backport-4204-to-branch/3.0.x
Apr 18, 2025
Merged

[Backport branch/3.0.x] [Thrust] Perform asynchronous allocations by default for the par_nosync policy#4483
brycelelbach merged 1 commit intobranch/3.0.xfrom
backport-4204-to-branch/3.0.x

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Description

Backport of #4204 to branch/3.0.x.

…ync` policy (#4204)

* [Thrust] Perform asynchronous allocations by default for the `par_nosync` policy.
This will make algorithms (like scans) that don't have a computation-dependent
result but do temporary allocation properly asynchronous under `par_nosync`.

* Cleanup

* Apply suggestions from code review to `par_nosync` async allocation

Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>

* Switch from `reinterpret_pointer_cast` to `raw_pointer_cast` when we're going to `void*`.

* [Thrust] Pass a raw pointer instead of a Thrust pointer to `cudaFree`.

* Run pre-commit.

* [Thrust]: Correct comment on `par_nosync` fallback path.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Bernhard Manfred Gruber <bernhardmgruber@gmail.com>
(cherry picked from commit fbf517d)
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@brycelelbach brycelelbach enabled auto-merge (squash) April 16, 2025 19:30
@brycelelbach
Copy link
Copy Markdown
Contributor

/ok to test

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Apr 17, 2025

/ok to test

@brycelelbach, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@brycelelbach
Copy link
Copy Markdown
Contributor

/ok to test 6a96cb4

@wmaxey
Copy link
Copy Markdown
Member

wmaxey commented Apr 17, 2025

/ok to test 6a96cb4

@github-actions
Copy link
Copy Markdown
Contributor Author

🟩 CI finished in 1h 35m: Pass: 100%/97 | Total: 2d 21h | Avg: 43m 15s | Max: 1h 27m | Hits: 78%/134318
  • 🟩 cub: Pass: 100%/45 | Total: 1d 19h | Avg: 58m 23s | Max: 1h 27m | Hits: 75%/53817

    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total:  1d 17h | Avg: 57m 50s | Max:  1h 27m | Hits:  75%/51371 
      🟩 arm64              Pass: 100%/2   | Total:  2h 20m | Avg:  1h 10m | Max:  1h 13m | Hits:  69%/2446  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  5h 28m | Avg:  1h 05m | Max:  1h 09m | Hits:  70%/5944  
      🟩 12.6               Pass: 100%/2   | Total:  2h 38m | Avg:  1h 19m | Max:  1h 19m | Hits:  69%/2260  
      🟩 12.8               Pass: 100%/38  | Total:  1d 11h | Avg: 56m 19s | Max:  1h 27m | Hits:  76%/45613 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  75%/2108  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  5h 28m | Avg:  1h 05m | Max:  1h 09m | Hits:  70%/5944  
      🟩 nvcc12.6           Pass: 100%/2   | Total:  2h 38m | Avg:  1h 19m | Max:  1h 19m | Hits:  69%/2260  
      🟩 nvcc12.8           Pass: 100%/36  | Total:  1d 09h | Avg: 56m 01s | Max:  1h 27m | Hits:  76%/43505 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 02m | Hits:  75%/2108  
      🟩 nvcc               Pass: 100%/43  | Total:  1d 17h | Avg: 58m 14s | Max:  1h 27m | Hits:  75%/51709 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  4h 17m | Avg:  1h 04m | Max:  1h 06m | Hits:  69%/4900  
      🟩 Clang15            Pass: 100%/2   | Total:  2h 03m | Avg:  1h 01m | Max:  1h 03m | Hits:  69%/2446  
      🟩 Clang16            Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m | Hits:  69%/2446  
      🟩 Clang17            Pass: 100%/2   | Total:  2h 08m | Avg:  1h 04m | Max:  1h 05m | Hits:  69%/2446  
      🟩 Clang18            Pass: 100%/7   | Total:  6h 02m | Avg: 51m 43s | Max:  1h 06m | Hits:  80%/8223  
      🟩 GCC7               Pass: 100%/2   | Total:  2h 07m | Avg:  1h 03m | Max:  1h 04m | Hits:  69%/2450  
      🟩 GCC8               Pass: 100%/1   | Total:  1h 00m | Avg:  1h 00m | Max:  1h 00m | Hits:  69%/1225  
      🟩 GCC9               Pass: 100%/2   | Total:  2h 12m | Avg:  1h 06m | Max:  1h 06m | Hits:  69%/2450  
      🟩 GCC10              Pass: 100%/2   | Total:  2h 05m | Avg:  1h 02m | Max:  1h 03m | Hits:  69%/2450  
      🟩 GCC11              Pass: 100%/2   | Total:  2h 13m | Avg:  1h 06m | Max:  1h 09m | Hits:  69%/2446  
      🟩 GCC12              Pass: 100%/2   | Total:  2h 14m | Avg:  1h 07m | Max:  1h 08m | Hits:  69%/2446  
      🟩 GCC13              Pass: 100%/11  | Total:  7h 27m | Avg: 40m 39s | Max:  1h 13m | Hits:  85%/13453 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  2h 25m | Avg:  1h 12m | Max:  1h 16m | Hits:  75%/2088  
      🟩 MSVC14.42          Pass: 100%/2   | Total:  2h 44m | Avg:  1h 22m | Max:  1h 27m | Hits:  75%/2088  
      🟩 NVHPC25.1          Pass: 100%/2   | Total:  2h 38m | Avg:  1h 19m | Max:  1h 19m | Hits:  69%/2260  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total: 16h 36m | Avg: 58m 37s | Max:  1h 06m | Hits:  73%/20461 
      🟩 GCC                Pass: 100%/22  | Total: 19h 21m | Avg: 52m 47s | Max:  1h 13m | Hits:  77%/26920 
      🟩 MSVC               Pass: 100%/4   | Total:  5h 10m | Avg:  1h 17m | Max:  1h 27m | Hits:  75%/4176  
      🟩 NVHPC              Pass: 100%/2   | Total:  2h 38m | Avg:  1h 19m | Max:  1h 19m | Hits:  69%/2260  
    🟩 gpu
      🟩 h100               Pass: 100%/3   | Total:  1h 15m | Avg: 25m 16s | Max: 28m 50s | Hits:  89%/3669  
      🟩 rtx2080            Pass: 100%/34  | Total:  1d 14h | Avg:  1h 07m | Max:  1h 27m | Hits:  70%/40364 
      🟩 rtxa6000           Pass: 100%/8   | Total:  4h 29m | Avg: 33m 44s | Max:  1h 06m | Hits:  92%/9784  
    🟩 jobs
      🟩 Build              Pass: 100%/37  | Total:  1d 16h | Avg:  1h 05m | Max:  1h 27m | Hits:  70%/44033 
      🟩 DeviceLaunch       Pass: 100%/1   | Total: 26m 00s | Avg: 26m 00s | Max: 26m 00s | Hits:  99%/1223  
      🟩 GraphCapture       Pass: 100%/1   | Total: 18m 43s | Avg: 18m 43s | Max: 18m 43s | Hits:  99%/1223  
      🟩 HostLaunch         Pass: 100%/3   | Total:  1h 14m | Avg: 24m 59s | Max: 25m 17s | Hits:  99%/3669  
      🟩 TestGPU            Pass: 100%/3   | Total:  1h 11m | Avg: 23m 52s | Max: 26m 44s | Hits:  99%/3669  
    🟩 sm
      🟩 90                 Pass: 100%/3   | Total:  1h 15m | Avg: 25m 16s | Max: 28m 50s | Hits:  89%/3669  
      🟩 90;90a;100         Pass: 100%/1   | Total:  1h 09m | Avg:  1h 09m | Max:  1h 09m | Hits:  69%/1223  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 22h 17m | Avg:  1h 06m | Max:  1h 19m | Hits:  70%/23677 
      🟩 20                 Pass: 100%/25  | Total: 21h 29m | Avg: 51m 35s | Max:  1h 27m | Hits:  79%/30140 
    
  • 🟩 thrust: Pass: 100%/45 | Total: 1d 00h | Avg: 32m 12s | Max: 1h 02m | Hits: 79%/80181

    🟩 cmake_options
      🟩 -DTHRUST_DISPATCH_TYPE=Force32bit Pass: 100%/2   | Total: 42m 36s | Avg: 21m 18s | Max: 30m 57s | Hits:  88%/3566  
    🟩 cpu
      🟩 amd64              Pass: 100%/43  | Total: 23h 06m | Avg: 32m 14s | Max:  1h 02m | Hits:  79%/76616 
      🟩 arm64              Pass: 100%/2   | Total:  1h 02m | Avg: 31m 29s | Max: 33m 12s | Hits:  77%/3565  
    🟩 ctk
      🟩 12.0               Pass: 100%/5   | Total:  3h 01m | Avg: 36m 15s | Max: 53m 19s | Hits:  75%/8906  
      🟩 12.6               Pass: 100%/2   | Total:  1h 53m | Avg: 56m 40s | Max: 57m 42s | Hits:  66%/3564  
      🟩 12.8               Pass: 100%/38  | Total: 19h 14m | Avg: 30m 22s | Max:  1h 02m | Hits:  80%/67711 
    🟩 cudacxx
      🟩 ClangCUDA18        Pass: 100%/2   | Total: 54m 54s | Avg: 27m 27s | Max: 27m 31s | Hits:  77%/3564  
      🟩 nvcc12.0           Pass: 100%/5   | Total:  3h 01m | Avg: 36m 15s | Max: 53m 19s | Hits:  75%/8906  
      🟩 nvcc12.6           Pass: 100%/2   | Total:  1h 53m | Avg: 56m 40s | Max: 57m 42s | Hits:  66%/3564  
      🟩 nvcc12.8           Pass: 100%/36  | Total: 18h 19m | Avg: 30m 32s | Max:  1h 02m | Hits:  81%/64147 
    🟩 cudacxx_family
      🟩 ClangCUDA          Pass: 100%/2   | Total: 54m 54s | Avg: 27m 27s | Max: 27m 31s | Hits:  77%/3564  
      🟩 nvcc               Pass: 100%/43  | Total: 23h 14m | Avg: 32m 25s | Max:  1h 02m | Hits:  79%/76617 
    🟩 cxx
      🟩 Clang14            Pass: 100%/4   | Total:  2h 02m | Avg: 30m 31s | Max: 31m 56s | Hits:  77%/7128  
      🟩 Clang15            Pass: 100%/2   | Total:  1h 04m | Avg: 32m 02s | Max: 32m 11s | Hits:  77%/3564  
      🟩 Clang16            Pass: 100%/2   | Total:  1h 01m | Avg: 30m 30s | Max: 31m 13s | Hits:  77%/3564  
      🟩 Clang17            Pass: 100%/2   | Total:  1h 05m | Avg: 32m 39s | Max: 34m 15s | Hits:  77%/3564  
      🟩 Clang18            Pass: 100%/7   | Total:  2h 45m | Avg: 23m 35s | Max: 31m 52s | Hits:  84%/12474 
      🟩 GCC7               Pass: 100%/2   | Total:  1h 07m | Avg: 33m 48s | Max: 33m 50s | Hits:  77%/3566  
      🟩 GCC8               Pass: 100%/1   | Total: 33m 06s | Avg: 33m 06s | Max: 33m 06s | Hits:  77%/1783  
      🟩 GCC9               Pass: 100%/2   | Total:  1h 06m | Avg: 33m 20s | Max: 34m 24s | Hits:  77%/3566  
      🟩 GCC10              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 49s | Max: 34m 50s | Hits:  77%/3566  
      🟩 GCC11              Pass: 100%/2   | Total:  1h 05m | Avg: 32m 35s | Max: 34m 02s | Hits:  77%/3566  
      🟩 GCC12              Pass: 100%/2   | Total:  1h 09m | Avg: 34m 39s | Max: 35m 47s | Hits:  77%/3566  
      🟩 GCC13              Pass: 100%/10  | Total:  3h 50m | Avg: 23m 01s | Max: 34m 41s | Hits:  86%/17830 
      🟩 MSVC14.29          Pass: 100%/2   | Total:  1h 51m | Avg: 55m 47s | Max: 58m 15s | Hits:  66%/3552  
      🟩 MSVC14.42          Pass: 100%/3   | Total:  2h 28m | Avg: 49m 35s | Max:  1h 02m | Hits:  77%/5328  
      🟩 NVHPC25.1          Pass: 100%/2   | Total:  1h 53m | Avg: 56m 40s | Max: 57m 42s | Hits:  66%/3564  
    🟩 cxx_family
      🟩 Clang              Pass: 100%/17  | Total:  7h 57m | Avg: 28m 05s | Max: 34m 15s | Hits:  80%/30294 
      🟩 GCC                Pass: 100%/21  | Total:  9h 57m | Avg: 28m 27s | Max: 35m 47s | Hits:  81%/37443 
      🟩 MSVC               Pass: 100%/5   | Total:  4h 20m | Avg: 52m 04s | Max:  1h 02m | Hits:  73%/8880  
      🟩 NVHPC              Pass: 100%/2   | Total:  1h 53m | Avg: 56m 40s | Max: 57m 42s | Hits:  66%/3564  
    🟩 gpu
      🟩 h100               Pass: 100%/2   | Total: 31m 21s | Avg: 15m 40s | Max: 19m 22s | Hits:  88%/3566  
      🟩 rtx2080            Pass: 100%/33  | Total: 19h 40m | Avg: 35m 46s | Max: 58m 15s | Hits:  75%/58802 
      🟩 rtx4090            Pass: 100%/10  | Total:  3h 56m | Avg: 23m 41s | Max:  1h 02m | Hits:  89%/17813 
    🟩 jobs
      🟩 Build              Pass: 100%/38  | Total: 22h 37m | Avg: 35m 42s | Max:  1h 02m | Hits:  75%/67709 
      🟩 TestCPU            Pass: 100%/3   | Total: 45m 51s | Avg: 15m 17s | Max: 29m 23s | Hits:  99%/5341  
      🟩 TestGPU            Pass: 100%/4   | Total: 46m 05s | Avg: 11m 31s | Max: 12m 01s | Hits:  99%/7131  
    🟩 sm
      🟩 90                 Pass: 100%/2   | Total: 31m 21s | Avg: 15m 40s | Max: 19m 22s | Hits:  88%/3566  
      🟩 90;90a;100         Pass: 100%/1   | Total: 34m 41s | Avg: 34m 41s | Max: 34m 41s | Hits:  77%/1783  
    🟩 std
      🟩 17                 Pass: 100%/20  | Total: 12h 26m | Avg: 37m 19s | Max: 58m 15s | Hits:  75%/35631 
      🟩 20                 Pass: 100%/23  | Total: 10h 59m | Avg: 28m 41s | Max:  1h 02m | Hits:  82%/40984 
    
  • 🟩 stdpar: Pass: 100%/4 | Total: 32m 24s | Avg: 8m 06s | Max: 9m 27s

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 17m 26s | Avg:  8m 43s | Max:  9m 27s
      🟩 arm64              Pass: 100%/2   | Total: 14m 58s | Avg:  7m 29s | Max:  8m 23s
    🟩 ctk
      🟩 12.6               Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 cudacxx
      🟩 nvcc12.6           Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 cxx
      🟩 NVHPC25.1          Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 cxx_family
      🟩 NVHPC              Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 gpu
      🟩 rtx2080            Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 jobs
      🟩 Build              Pass: 100%/4   | Total: 32m 24s | Avg:  8m 06s | Max:  9m 27s
    🟩 std
      🟩 17                 Pass: 100%/2   | Total: 14m 34s | Avg:  7m 17s | Max:  7m 59s
      🟩 20                 Pass: 100%/2   | Total: 17m 50s | Avg:  8m 55s | Max:  9m 27s
    
  • 🟩 cccl_c_parallel: Pass: 100%/2 | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits: 98%/320

    🟩 cpu
      🟩 amd64              Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 ctk
      🟩 12.8               Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 cxx
      🟩 GCC13              Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 cxx_family
      🟩 GCC                Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 gpu
      🟩 rtx2080            Pass: 100%/2   | Total: 20m 24s | Avg: 10m 12s | Max: 15m 25s | Hits:  98%/320   
    🟩 jobs
      🟩 Build              Pass: 100%/1   | Total:  4m 59s | Avg:  4m 59s | Max:  4m 59s | Hits:  98%/160   
      🟩 Test               Pass: 100%/1   | Total: 15m 25s | Avg: 15m 25s | Max: 15m 25s | Hits:  98%/160   
    
  • 🟩 python: Pass: 100%/1 | Total: 1h 06m | Avg: 1h 06m | Max: 1h 06m

    🟩 cpu
      🟩 amd64              Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 ctk
      🟩 12.8               Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 cudacxx
      🟩 nvcc12.8           Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 cudacxx_family
      🟩 nvcc               Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 cxx
      🟩 GCC13              Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 cxx_family
      🟩 GCC                Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 gpu
      🟩 rtx2080            Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    🟩 jobs
      🟩 Test               Pass: 100%/1   | Total:  1h 06m | Avg:  1h 06m | Max:  1h 06m
    

👃 Inspect Changes

Modifications in project?

Project
CCCL Infrastructure
libcu++
CUB
+/- Thrust
CUDA Experimental
stdpar
python
CCCL C Parallel Library
Catch2Helper

Modifications in project or dependencies?

Project
CCCL Infrastructure
libcu++
+/- CUB
+/- Thrust
CUDA Experimental
+/- stdpar
+/- python
+/- CCCL C Parallel Library
+/- Catch2Helper

🏃‍ Runner counts (total jobs: 97)

# Runner
68 linux-amd64-cpu16
9 windows-amd64-cpu16
6 linux-arm64-cpu16
6 linux-amd64-gpu-rtxa6000-latest-1
3 linux-amd64-gpu-h100-latest-1
3 linux-amd64-gpu-rtx4090-latest-1
2 linux-amd64-gpu-rtx2080-latest-1

@brycelelbach brycelelbach merged commit 410ae14 into branch/3.0.x Apr 18, 2025
111 of 112 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to In Review in CCCL Apr 18, 2025
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Apr 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants