Oceananigans is much slower when using relaxation forcing functions #1827

tomchor · 2021-07-06T18:36:08Z

I've noticed that Oceananigans is much slower when using forcing functions. As an example, I set-up a simulation without any forcing functions and I noticed that in the first minute (wall time) of the running simulation I complete 3.5% of the whole simulation period.

However, if I include forcing functions as

bot_mask = GaussianMask{:z}(center=minimum(znodes(Face, grid)), width=grid.Lz/10)
mom_sponge = Relaxation(rate=1/10, mask=bot_mask, target=0)
forcing = (u=mom_sponge, v=mom_sponge, w=mom_sponge)

then in the first (wall time) minute of running I complete only 0.15% of the simulation. Basically around 20 times slower!

I of course expected a slowdown after including forcing functions, but not by this much. Is this normal behavior?

So far I ran my tests only on CPUs, but I've observed similar behaviors on GPUs.

glwagner · 2021-07-07T13:19:09Z

I think the problem is not forcing functions in general, but Relaxation.

There's always a trivial forcing function present:

Oceananigans.jl/src/Forcings/model_forcing.jl

Line 26 in d58bbaa

    
           regularize_forcing(::Nothing, field::AbstractField, field_name, model_field_names) = zeroforcing

which is the trivial function

Oceananigans.jl/src/Forcings/model_forcing.jl

Line 1 in d58bbaa

@inline zeroforcing(args...) = 0

So any slowdown is due to the specific forcing function that's being used.

Let's look into why Relaxation or GaussianMask is so slow. I don't think it should be. Do you mind changing the title of this issue?

If the slowdown is different on CPU than GPU that might be an important clue.

glwagner · 2021-07-07T13:21:28Z

Could be good to put together a benchmarking script for ContinuousForcing and Relaxation so that we can test ideas.

glwagner · 2021-07-07T13:23:12Z

There is the dreaded exponentiation by Int64:

Oceananigans.jl/src/Forcings/relaxation.jl

Lines 126 to 128 in 4e9e5b7

    
           @inline (g::GaussianMask{:x})(x, y, z) = exp(-(x - g.center)^2 / (2 * g.width^2)) 
        
           @inline (g::GaussianMask{:y})(x, y, z) = exp(-(y - g.center)^2 / (2 * g.width^2)) 
        
           @inline (g::GaussianMask{:z})(x, y, z) = exp(-(z - g.center)^2 / (2 * g.width^2))

though this shouldn't affect CPU.

Relaxation uses ContinuousForcing so we should probably look into whether the problem comes from that code. It does seem possible there were changes after upgrading to 1.6.

tomchor · 2021-07-07T13:59:14Z

If the slowdown is different on CPU than GPU that might be an important clue.

I haven't had a chance to test this properly on GPUs, although I believe (from experience in a less controlled scenario) a similar slowdown occurs.

Relaxation uses ContinuousForcing so we should probably look into whether the problem comes from that code. It does seem possible there were changes after upgrading to 1.6.

I also can't prove/test it right now, but I also think this issue has been there since before 1.6. Basically since I started using Oceananigans. Because I've always had simulations with Relaxation and the time it takes them to run has always been pretty much the same (apart from that WENO5 issue we found a couple of weeks ago). Only now I realized that the simulations run much faster without these forcings though.

glwagner · 2021-07-07T14:11:30Z

I haven't had a chance to test this properly on GPUs, although I believe (from experience in a less controlled scenario) a similar slowdown occurs.

Hmm ok. Not sure what that would mean, but I guess that is some kind of clue.

glwagner · 2021-07-07T14:13:19Z

I think it's probable that DiscreteForcing doesn't have the same performance issues.

@ali-ramadhan put together a benchmark script for forcing functions a while ago I thought, but it might have disappeared (because it wasn't informative?) That might've been before we had ContinuousForcing though.

tomchor · 2021-07-07T14:27:03Z

I've tried using the forcing as:

@inline sponge_func(x, y, z, ϕ) = -rate * bot_mask(x, y, z) * (ϕ - 0)
sponge_u(x, y, z, t, u) = sponge_func(x, y, z, u)
sponge_v(x, y, z, t, v) = sponge_func(x, y, z, v)
sponge_w(x, y, z, t, w) = sponge_func(x, y, z, w)

forc_u = Forcing(sponge_u, field_dependencies=:u,)
forc_v = Forcing(sponge_v, field_dependencies=:v,)
forc_w = Forcing(sponge_w, field_dependencies=:w,)
forcing = (u=forc_u, v=forc_v, w=forc_w)

and the same performance issues appear. But I guess this still uses ContinuousForcing.

I've never used DiscreteForcing but I'll try to use it and see what happens. If you have any examples that would help.

glwagner · 2021-07-07T14:46:15Z

How about using just one forcing function for simplicity?

I think something like this might work:

@inline u_mask(i, j, k, grid, p) = exp(-(xnode(Face(), Center(), Center(), i, j, k, grid) - p.center)^2 / (2 * p.width^2))

@inline u_forcing_func(i, j, k, grid, clock, model_fields, p) = @inbounds - p.rate * u_mask(i, j, k, grid, p) * model_fields.u[i, j, k]

u_forcing = Forcing(u_forcing_func, discrete_form=true, parameters=(rate=1/10, center=-grid.Lz, width=grid.Lz/10))

There's another example in the docs:

https://clima.github.io/OceananigansDocumentation/stable/model_setup/forcing_functions/#%22Discrete-form%22-forcing-functions

tomchor · 2021-07-07T14:47:34Z

I just tested DiscreteForcing and I have almost the same slowdown (I completed 0.20% of the simulation, compared to 0.15% using ContinuousForcing). Here's what I used:

Z(k) = @inbounds -grid.Lz + grid.Δz*(k-1/2)
bottom_mask(k) = @inbounds exp(-(Z(k)+80)^2 / ((2*8)^2))

sponge_u_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.u[i, j, k] -0)
sponge_v_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.v[i, j, k] -0)
sponge_w_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.w[i, j, k] -0)

forc_u = Forcing(sponge_u_disc, discrete_form=true)
forc_v = Forcing(sponge_v_disc, discrete_form=true)
forc_w = Forcing(sponge_w_disc, discrete_form=true)

forcing = (u=forc_u, v=forc_v, w=forc_w)

I may have made rookie errors here as well since this is my first time using DiscreteForcing.

glwagner · 2021-07-07T14:51:15Z

I think you may need @inline in front of most of those functions (only matters for CPU)

glwagner · 2021-07-07T14:52:43Z

If the slow down is the same for DiscreteForcing then the problem may really just be evaluating exp, sadly... You could try @inline bottom_mask(k) = 1 to test...

glwagner · 2021-07-07T14:53:36Z

Looks like you're referencing grid as a global in Z(k). Not sure if that causes performance issues, but it won't compile on the GPU. So you might want to propagate that argument through.

tomchor · 2021-07-07T14:54:24Z

I think you may need @inline in front of most of those functions (only matters for CPU)

That was my first attempt, but I get an error saying ERROR: LoadError: LoadError: -(grid.Lz) + grid.Δz * (k - 1 / 2) is not a function expression. I guess I need to choose between @inline or @inbounds?

Nevermind, I was doing something very dumb. Inlining gives me exact same performance as not inlining (0.20% of the simulation). I guess the compiler is getting smarter about inlining.

If the slow down is the same for DiscreteForcing then the problem may really just be evaluating exp, sadly... You could try @inline bottom_mask(k) = 1 to test...

I'll try that. Although I have tried non-exponential masks in the past with a similar slowdown, so I'm not sure if that's the issue.

glwagner · 2021-07-07T14:58:06Z

You can use both @inline and @inbounds; they mean different things. @inline is a compiler directive to inline a function (important for performance when a function is called within inner loops). We put @inline in front of functions; eg @inline f(x) = .... @inbounds elides bounds checking when an array / field is indexed into. We need @inbounds in front of any indexing operation that occurs in a loop (eg u[i, j, k]).

glwagner · 2021-07-07T14:59:25Z

I guess the compiler is getting smarter about inlining.

For sure, it should inline tiny functions... we just add it because "sometimes" the compiler doesn't do the right thing. So it's just conservative to add @inline and for something like this we should always have it so that we don't have to wonder whether it's causing a difference.

glwagner · 2021-07-07T15:00:15Z

Although I have tried non-exponential masks in the past with a similar slowdown, so I'm not sure if that's the issue.

I guess the key here is something that doesn't have a transcendental function. I'd be surprised if its the issue but it's possible so worth testing.

tomchor · 2021-07-07T15:01:01Z

Using @inline bottom_mask(k) = 1 improves things (twice as fast as the exponentiation), but it's still much slower than the non-forced version. About 9 times slower to be more precise. I'm pasting the relevant part of the code below:

@inline bottom_mask(k) = 1

sponge_u_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.u[i, j, k] -0)
sponge_v_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.v[i, j, k] -0)
sponge_w_disc(i, j, k, grid, clock, model_fields) = @inbounds - rate * bottom_mask(k) * (model_fields.w[i, j, k] -0)

forc_u = Forcing(sponge_u_disc, discrete_form=true)
forc_v = Forcing(sponge_v_disc, discrete_form=true)
forc_w = Forcing(sponge_w_disc, discrete_form=true)

forcing = (u=forc_u, v=forc_v, w=forc_w)

glwagner · 2021-07-07T15:04:21Z

Can you put @inline in front of sponge_u_disc, etc?

glwagner · 2021-07-07T15:05:22Z

Again unsure if it affects performance but since rate is referenced as global it needs to be const; eg const rate = 1/10.

glwagner · 2021-07-07T15:08:06Z

9 times slower might be 3x per kernel function (maybe...)

tomchor · 2021-07-07T15:28:31Z

Can you put @inline in front of sponge_u_disc, etc?

Done. Same result. I also tried ContinuousForcing with bottom_mask(x, y, z) = 1 and it's slower than its discrete counterpart. Apparently DiscreteForcing is a bit faster than ContinuousForcing, everything else being the same.

Again unsure if it affects performance but since rate is referenced as global it needs to be const; eg const rate = 1/10

Yes! That makes a big difference! I feel silly that I forgot that. With const rate=1/10 and DiscreteForcing things are as fast as with no forcing.

Using the same "trick" with ContinuousForcing doesn't change things though. So it does seem like the source of the issue is ContinuousForcing.

I should say though, I'm having some trouble securing a GPU right now, so I haven't been able to run these tests on a GPU.

Would a MWE help here?

tomchor · 2021-07-07T15:53:36Z

Just got a hold of a GPU. I tried this and only saw a slowdown of 10% or so with ContinuousForcing, which I guess is okay.

So this seems to be a CPU issue.

glwagner · 2021-07-07T16:47:13Z

Slow down of 10% when introducing exp could make sense.

With const rate=1/10 and DiscreteForcing things are as fast as with no forcing.

Okay, that makes sense.

So this seems to be a CPU issue.

So the problem is that ContinuousForcing is rather slow on the CPU. That is a bit annoying. Kind of like how WENO is really slow on the CPU for unknown reasons. At least we have a workaround with DiscreteForcing...

Would a MWE help here?

I think what would help the most is a simple benchmarking script that compares identical forcing function implementations with ContinuousForcing and DiscreteForcing to no forcing. I think we only need one forcing function, and it's probably best if its simple (eg - p.mu * model_fields.u[i, j, k]) and doesn't involve complicated functions like exp. Since it's only a CPU issue I think it's not the highest priority though (it might not be something we can easily solve either...)

glwagner · 2023-03-22T16:26:17Z

I'm closing this issue because I'm judging that it's not of current, timely relevance to Oceananigans development. If you would like to make it a higher priority or if you think the issue was closed in error please feel free to re-open.

tomchor changed the title ~~Oeananigans is much slower when using forcing functions~~ Oeananigans is much slower when using relaxation forcing functions Jul 7, 2021

glwagner added the performance 🏍️ So we can get the wrong answer even faster label Jul 9, 2021

navidcy changed the title ~~Oeananigans is much slower when using relaxation forcing functions~~ Oceananigans is much slower when using relaxation forcing functions Jul 17, 2021

glwagner closed this as completed Mar 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oceananigans is much slower when using relaxation forcing functions #1827

Oceananigans is much slower when using relaxation forcing functions #1827

tomchor commented Jul 6, 2021

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

tomchor commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Mar 22, 2023

Oceananigans is much slower when using relaxation forcing functions #1827

Oceananigans is much slower when using relaxation forcing functions #1827

Comments

tomchor commented Jul 6, 2021

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021 • edited Loading

tomchor commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Jul 7, 2021

tomchor commented Jul 7, 2021

tomchor commented Jul 7, 2021

glwagner commented Jul 7, 2021 • edited Loading

glwagner commented Mar 22, 2023

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

tomchor commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

tomchor commented Jul 7, 2021 •

edited

Loading

tomchor commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading

glwagner commented Jul 7, 2021 •

edited

Loading