Use `randn!` for stochastic forcing implementations #351

navidcy · 2024-03-10T11:46:33Z

This forcing implementation ensures non-allocating calcF! methods both for CPU and GPU.

Closes #350

Few benchmarks:

# CPU with Fh .= sqrt.(forcing_spectrum) .* cis.(2π * random_uniform(T, size(forcing_spectrum)...)) ./ sqrt(clock.dt)
julia> @btime calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  239.890 μs (10 allocations: 130.23 KiB)

# GPU with Fh .= sqrt.(forcing_spectrum) .* cis.(2π * random_uniform(T, size(forcing_spectrum)...)) ./ sqrt(clock.dt)
julia> @btime CUDA.@sync calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  26.709 μs (49 allocations: 2.59 KiB)

# randn! on CPU
julia> @btime calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  122.298 μs (4 allocations: 96 bytes)

# randn! on GPU
julia> @btime CUDA.@sync calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  19.923 μs (32 allocations: 2.02 KiB)

Thus, this PR is 1.5-2x faster than the solution originally proposed in #350 and with less allocations.

glwagner

Is it possible to perform long-running simulations on the GPU when there are allocations? Can GPU garbage collection keep up?

navidcy · 2024-03-10T18:27:03Z

Is it possible to perform long-running simulations on the GPU when there are allocations? Can GPU garbage collection keep up?

I'm not sure about that.
But I'm also bit confused regarding where are the allocations coming from.

glwagner · 2024-03-10T18:30:21Z

random_uniform(T, size(forcing_spectrum)...))

Calling CUDA.randn(sz...) allocates an array, and then populates it with random numbers.

navidcy · 2024-03-10T18:34:47Z

random_uniform(T, size(forcing_spectrum)...))
Calling CUDA.randn(sz...) allocates an array, and then populates it with random numbers.

Oh yeah... that was the "allocating" version I suggested in the issue. The PR doesn't have that version, I just put it here for comparison.

But still using randn! you see there are some allocations... Those I don't understand where they come from.

julia> @btime CUDA.@sync calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  19.923 μs (32 allocations: 2.02 KiB)

glwagner · 2024-03-11T16:15:37Z

random_uniform(T, size(forcing_spectrum)...))
Calling CUDA.randn(sz...) allocates an array, and then populates it with random numbers.
Oh yeah... that was the "allocating" version I suggested in the issue. The PR doesn't have that version, I just put it here for comparison.

But still using randn! you see there are some allocations... Those I don't understand where they come from.
julia> @btime CUDA.@sync calcF!($vars.Fh, $sol, 0.0, $clock, $vars, $params, $grid)
  19.923 μs (32 allocations: 2.02 KiB)

Did you look into the code for randn!? You'd probably find your answer quickly.

navidcy · 2024-03-11T18:04:25Z

I actually didn't :(

navidcy · 2024-03-11T18:15:17Z

omg, I figured it out!

navidcy · 2024-03-11T18:18:55Z

randn! calls inplace_pow2 which, if is not provided with an array of length that is a power of 2, then it creates a new array that is of size the next power of 2 --- thus, it allocates!!

https://github.com/JuliaGPU/CUDA.jl/blob/bb49887198f258ffcb186d81df4a787453428b38/lib/curand/random.jl#L83-L111

If we have arrays that have length that is a power of 2 then there is no allocations:

julia> using CUDA, Random

julia> A = CUDA.zeros(1024, 1024);

julia> @btime Random.randn!($A);
  2.417 μs (0 allocations: 0 bytes)

julia> A = CUDA.zeros(1024, 1025);

julia> @btime Random.randn!($A);
  14.119 μs (10 allocations: 352 bytes)

navidcy · 2024-03-11T18:22:22Z

Did you look into the code for randn!? You'd probably find your answer quickly.

You were right. In my head this was like an impossible task but it actually took me less than 10 minutes.

glwagner · 2024-03-11T19:04:33Z

Nice work 🕵️‍♂️

use randn! for stochastic forcing

e11e3bb

navidcy added 🐞 bug Something isn't working 🎮 gpu labels Mar 10, 2024

better line break

f4ef394

navidcy requested review from glwagner and jbisits March 10, 2024 12:33

better phrasing

feb78dc

glwagner approved these changes Mar 10, 2024

View reviewed changes

no need to use CUDA.randn

cb37bc5

navidcy added 2 commits March 10, 2024 20:43

add missing end

4a27891

remove GeophysicalFlows

9ac215a

navidcy merged commit 7693b7b into main Mar 11, 2024
5 checks passed

navidcy deleted the ncc/fix-stoch-forcing branch March 11, 2024 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `randn!` for stochastic forcing implementations #351

Use `randn!` for stochastic forcing implementations #351

navidcy commented Mar 10, 2024 •

edited

Loading

glwagner left a comment

navidcy commented Mar 10, 2024

glwagner commented Mar 10, 2024

navidcy commented Mar 10, 2024 •

edited

Loading

glwagner commented Mar 11, 2024

navidcy commented Mar 11, 2024

navidcy commented Mar 11, 2024

navidcy commented Mar 11, 2024 •

edited

Loading

navidcy commented Mar 11, 2024

glwagner commented Mar 11, 2024

Use randn! for stochastic forcing implementations #351

Use randn! for stochastic forcing implementations #351

Conversation

navidcy commented Mar 10, 2024 • edited Loading

glwagner left a comment

Choose a reason for hiding this comment

navidcy commented Mar 10, 2024

glwagner commented Mar 10, 2024

navidcy commented Mar 10, 2024 • edited Loading

glwagner commented Mar 11, 2024

navidcy commented Mar 11, 2024

navidcy commented Mar 11, 2024

navidcy commented Mar 11, 2024 • edited Loading

navidcy commented Mar 11, 2024

glwagner commented Mar 11, 2024

Use `randn!` for stochastic forcing implementations #351

Use `randn!` for stochastic forcing implementations #351

navidcy commented Mar 10, 2024 •

edited

Loading

navidcy commented Mar 10, 2024 •

edited

Loading

navidcy commented Mar 11, 2024 •

edited

Loading