Skip to content

Avoid relying on a RNG in the back-end.#686

Merged
maleadt merged 1 commit intorelease-0.9from
tb/rand
Mar 22, 2026
Merged

Avoid relying on a RNG in the back-end.#686
maleadt merged 1 commit intorelease-0.9from
tb/rand

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented Mar 22, 2026

GPUArrays only provides Random.rand!(::RNG, ::AnyGPUArray), not Random.rand!(::AnyGPUArray), so the examples only work when the back-end provides a native RNG. Avoid that by copying random memory from the CPU.

Ran into this when trying to move out the cuRAND stuff from CUDA.jl.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results

main 469aa70... main / 469aa70...
saxpy/default/Float32/1024 0.082 ± 0.01 ms 0.641 ± 0.0092 μs 128 ± 16
saxpy/default/Float32/1048576 0.457 ± 0.021 ms 0.248 ± 0.015 ms 1.84 ± 0.14
saxpy/default/Float32/16384 0.0759 ± 0.031 ms 2.78 ± 0.15 μs 27.4 ± 11
saxpy/default/Float32/2048 0.0821 ± 0.026 ms 0.752 ± 0.066 μs 109 ± 36
saxpy/default/Float32/256 0.082 ± 0.013 ms 0.573 ± 0.0063 μs 143 ± 23
saxpy/default/Float32/262144 0.174 ± 0.03 ms 0.0635 ± 0.0041 ms 2.74 ± 0.5
saxpy/default/Float32/32768 0.0755 ± 0.031 ms 5.31 ± 0.28 μs 14.2 ± 5.9
saxpy/default/Float32/4096 0.0782 ± 0.033 ms 1.14 ± 0.096 μs 68.4 ± 29
saxpy/default/Float32/512 0.0816 ± 0.012 ms 0.61 ± 0.0057 μs 134 ± 20
saxpy/default/Float32/64 0.0821 ± 0.009 ms 0.563 ± 0.0058 μs 146 ± 16
saxpy/default/Float32/65536 0.0922 ± 0.031 ms 13.3 ± 1.2 μs 6.94 ± 2.4
saxpy/default/Float64/1024 0.0815 ± 0.022 ms 0.774 ± 0.064 μs 105 ± 30
saxpy/default/Float64/1048576 0.577 ± 0.086 ms 0.512 ± 0.044 ms 1.13 ± 0.19
saxpy/default/Float64/16384 0.0818 ± 0.031 ms 5.31 ± 0.29 μs 15.4 ± 5.9
saxpy/default/Float64/2048 0.0827 ± 0.027 ms 1.15 ± 0.092 μs 71.6 ± 24
saxpy/default/Float64/256 0.0813 ± 0.021 ms 0.59 ± 0.0072 μs 138 ± 35
saxpy/default/Float64/262144 0.199 ± 0.033 ms 0.123 ± 0.0086 ms 1.62 ± 0.29
saxpy/default/Float64/32768 0.0905 ± 0.03 ms 13.2 ± 1.3 μs 6.88 ± 2.4
saxpy/default/Float64/4096 0.0802 ± 0.031 ms 1.71 ± 0.11 μs 47 ± 18
saxpy/default/Float64/512 0.0813 ± 0.012 ms 0.646 ± 0.0097 μs 126 ± 18
saxpy/default/Float64/64 0.082 ± 0.013 ms 0.569 ± 0.0058 μs 144 ± 23
saxpy/default/Float64/65536 0.107 ± 0.031 ms 31.5 ± 2.9 μs 3.39 ± 1
saxpy/static workgroup=(1024,)/Float32/1024 0.0796 ± 0.0085 ms 2.1 ± 0.029 μs 37.9 ± 4.1
saxpy/static workgroup=(1024,)/Float32/1048576 0.46 ± 0.021 ms 0.251 ± 0.018 ms 1.84 ± 0.16
saxpy/static workgroup=(1024,)/Float32/16384 0.0764 ± 0.031 ms 4.22 ± 0.2 μs 18.1 ± 7.3
saxpy/static workgroup=(1024,)/Float32/2048 0.0799 ± 0.024 ms 2.25 ± 0.072 μs 35.4 ± 11
saxpy/static workgroup=(1024,)/Float32/256 0.0803 ± 0.016 ms 2.69 ± 0.019 μs 29.8 ± 5.9
saxpy/static workgroup=(1024,)/Float32/262144 0.172 ± 0.031 ms 0.0658 ± 0.0044 ms 2.61 ± 0.5
saxpy/static workgroup=(1024,)/Float32/32768 0.0736 ± 0.031 ms 7.23 ± 0.35 μs 10.2 ± 4.3
saxpy/static workgroup=(1024,)/Float32/4096 0.0788 ± 0.031 ms 2.56 ± 0.086 μs 30.8 ± 12
saxpy/static workgroup=(1024,)/Float32/512 0.0797 ± 0.01 ms 2.8 ± 0.028 μs 28.5 ± 3.7
saxpy/static workgroup=(1024,)/Float32/64 0.0801 ± 0.011 ms 2.63 ± 0.019 μs 30.5 ± 4.1
saxpy/static workgroup=(1024,)/Float32/65536 0.095 ± 0.032 ms 15.9 ± 1.3 μs 5.96 ± 2.1
saxpy/static workgroup=(1024,)/Float64/1024 0.0786 ± 0.022 ms 2.39 ± 0.076 μs 32.9 ± 9.4
saxpy/static workgroup=(1024,)/Float64/1048576 0.534 ± 0.079 ms 0.526 ± 0.065 ms 1.01 ± 0.2
saxpy/static workgroup=(1024,)/Float64/16384 0.0785 ± 0.03 ms 7.55 ± 0.48 μs 10.4 ± 4.1
saxpy/static workgroup=(1024,)/Float64/2048 0.0798 ± 0.026 ms 2.7 ± 0.09 μs 29.6 ± 9.8
saxpy/static workgroup=(1024,)/Float64/256 0.0797 ± 0.016 ms 2.84 ± 0.026 μs 28.1 ± 5.6
saxpy/static workgroup=(1024,)/Float64/262144 0.2 ± 0.032 ms 0.127 ± 0.0086 ms 1.58 ± 0.27
saxpy/static workgroup=(1024,)/Float64/32768 0.0823 ± 0.03 ms 16.8 ± 1.7 μs 4.91 ± 1.9
saxpy/static workgroup=(1024,)/Float64/4096 0.0786 ± 0.031 ms 3.22 ± 0.12 μs 24.4 ± 9.7
saxpy/static workgroup=(1024,)/Float64/512 0.0791 ± 0.014 ms 2.95 ± 0.031 μs 26.8 ± 4.9
saxpy/static workgroup=(1024,)/Float64/64 0.0796 ± 0.012 ms 2.76 ± 0.021 μs 28.8 ± 4.3
saxpy/static workgroup=(1024,)/Float64/65536 0.104 ± 0.032 ms 0.0349 ± 0.0028 ms 2.97 ± 0.95
time_to_load 0.97 ± 0.0043 s 0.284 ± 0.0011 s 3.42 ± 0.02

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.85%. Comparing base (4224ad9) to head (469aa70).
⚠️ Report is 2 commits behind head on release-0.9.

Additional details and impacted files
@@             Coverage Diff              @@
##           release-0.9     #686   +/-   ##
============================================
  Coverage        71.85%   71.85%           
============================================
  Files               14       14           
  Lines              906      906           
============================================
  Hits               651      651           
  Misses             255      255           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 3ae1162 into release-0.9 Mar 22, 2026
46 of 47 checks passed
@maleadt maleadt deleted the tb/rand branch March 22, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants