-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oceananigans is much slower when using relaxation forcing functions #1827
Comments
I think the problem is not forcing functions in general, but There's always a trivial forcing function present:
which is the trivial function
So any slowdown is due to the specific forcing function that's being used. Let's look into why If the slowdown is different on CPU than GPU that might be an important clue. |
Could be good to put together a benchmarking script for |
There is the dreaded exponentiation by Oceananigans.jl/src/Forcings/relaxation.jl Lines 126 to 128 in 4e9e5b7
though this shouldn't affect CPU.
|
I haven't had a chance to test this properly on GPUs, although I believe (from experience in a less controlled scenario) a similar slowdown occurs.
I also can't prove/test it right now, but I also think this issue has been there since before 1.6. Basically since I started using Oceananigans. Because I've always had simulations with |
Hmm ok. Not sure what that would mean, but I guess that is some kind of clue. |
I think it's probable that @ali-ramadhan put together a benchmark script for forcing functions a while ago I thought, but it might have disappeared (because it wasn't informative?) That might've been before we had |
I've tried using the forcing as: @inline sponge_func(x, y, z, ϕ) = -rate * bot_mask(x, y, z) * (ϕ - 0)
sponge_u(x, y, z, t, u) = sponge_func(x, y, z, u)
sponge_v(x, y, z, t, v) = sponge_func(x, y, z, v)
sponge_w(x, y, z, t, w) = sponge_func(x, y, z, w)
forc_u = Forcing(sponge_u, field_dependencies=:u,)
forc_v = Forcing(sponge_v, field_dependencies=:v,)
forc_w = Forcing(sponge_w, field_dependencies=:w,)
forcing = (u=forc_u, v=forc_v, w=forc_w) and the same performance issues appear. But I guess this still uses I've never used |
How about using just one forcing function for simplicity? I think something like this might work: @inline u_mask(i, j, k, grid, p) = exp(-(xnode(Face(), Center(), Center(), i, j, k, grid) - p.center)^2 / (2 * p.width^2))
@inline u_forcing_func(i, j, k, grid, clock, model_fields, p) = @inbounds - p.rate * u_mask(i, j, k, grid, p) * model_fields.u[i, j, k]
u_forcing = Forcing(u_forcing_func, discrete_form=true, parameters=(rate=1/10, center=-grid.Lz, width=grid.Lz/10)) There's another example in the docs: |
I just tested Z(k) = @inbounds -grid.Lz + grid.Δz*(k-1/2)
bottom_mask(k) = @inbounds exp(-(Z(k)+80)^2 / ((2*8)^2))
sponge_u_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.u[i, j, k] -0)
sponge_v_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.v[i, j, k] -0)
sponge_w_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.w[i, j, k] -0)
forc_u = Forcing(sponge_u_disc, discrete_form=true)
forc_v = Forcing(sponge_v_disc, discrete_form=true)
forc_w = Forcing(sponge_w_disc, discrete_form=true)
forcing = (u=forc_u, v=forc_v, w=forc_w) I may have made rookie errors here as well since this is my first time using |
I think you may need |
If the slow down is the same for |
Looks like you're referencing |
Nevermind, I was doing something very dumb. Inlining gives me exact same performance as not inlining (0.20% of the simulation). I guess the compiler is getting smarter about inlining.
I'll try that. Although I have tried non-exponential masks in the past with a similar slowdown, so I'm not sure if that's the issue. |
You can use both |
For sure, it should inline tiny functions... we just add it because "sometimes" the compiler doesn't do the right thing. So it's just conservative to add |
I guess the key here is something that doesn't have a transcendental function. I'd be surprised if its the issue but it's possible so worth testing. |
Using @inline bottom_mask(k) = 1
sponge_u_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.u[i, j, k] -0)
sponge_v_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.v[i, j, k] -0)
sponge_w_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.w[i, j, k] -0)
forc_u = Forcing(sponge_u_disc, discrete_form=true)
forc_v = Forcing(sponge_v_disc, discrete_form=true)
forc_w = Forcing(sponge_w_disc, discrete_form=true)
forcing = (u=forc_u, v=forc_v, w=forc_w) |
Can you put |
Again unsure if it affects performance but since |
9 times slower might be 3x per kernel function (maybe...) |
Done. Same result. I also tried
Yes! That makes a big difference! I feel silly that I forgot that. With Using the same "trick" with I should say though, I'm having some trouble securing a GPU right now, so I haven't been able to run these tests on a GPU. Would a MWE help here? |
Just got a hold of a GPU. I tried this and only saw a slowdown of 10% or so with So this seems to be a CPU issue. |
Slow down of 10% when introducing
Okay, that makes sense.
So the problem is that
I think what would help the most is a simple benchmarking script that compares identical forcing function implementations with |
I'm closing this issue because I'm judging that it's not of current, timely relevance to Oceananigans development. If you would like to make it a higher priority or if you think the issue was closed in error please feel free to re-open. |
I've noticed that Oceananigans is much slower when using forcing functions. As an example, I set-up a simulation without any forcing functions and I noticed that in the first minute (wall time) of the running simulation I complete 3.5% of the whole simulation period.
However, if I include forcing functions as
then in the first (wall time) minute of running I complete only 0.15% of the simulation. Basically around 20 times slower!
I of course expected a slowdown after including forcing functions, but not by this much. Is this normal behavior?
So far I ran my tests only on CPUs, but I've observed similar behaviors on GPUs.
The text was updated successfully, but these errors were encountered: