-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't pass function coefficients to solve!
with BatchedTridiagonalSolver
for vertically implicit diffusion
#3030
Conversation
I made a "clean" version of the BCI example: bench_BCI.jlusing Oceananigans
using Oceananigans.Units
Lx = 1000kilometers
Ly = 1000kilometers
Lz = 1kilometers
Nx = 64
Ny = 64
Nz = 40
grid = RectilinearGrid(size = (Nx, Ny, Nz),
x = (0, Lx),
y = (-Ly/2, Ly/2),
z = (-Lz, 0),
topology = (Periodic, Bounded, Bounded))
model = HydrostaticFreeSurfaceModel(; grid,
coriolis = BetaPlane(latitude = -45),
buoyancy = BuoyancyTracer(),
tracers = :b,
momentum_advection = WENO(),
tracer_advection = WENO())
ramp(y, Δy) = min(max(0, y/Δy + 1/2), 1)
N² = 4e-6 # [s⁻²] buoyancy frequency / stratification
M² = 8e-8 # [s⁻²] horizontal buoyancy gradient
Δy = 50kilometers # width of the region of the front
Δb = Δy * M² # buoyancy jump associated with the front
ϵb = 1e-2 * Δb # noise amplitude
bᵢ(x, y, z) = N² * z + Δb * ramp(y, Δy) + ϵb * randn()
set!(model, b=bᵢ)
Δt₀ = 5minutes
stop_time = 40days
simulation = Simulation(model, Δt=Δt₀, stop_time=stop_time)
wizard = TimeStepWizard(cfl=0.2, max_change=1.1, max_Δt=20minutes)
simulation.callbacks[:wizard] = Callback(wizard, IterationInterval(20))
using Printf
wall_clock = [time_ns()]
function print_progress(sim)
@printf("[%05.2f%%] i: %d, t: %s, wall time: %s, max(u): (%6.3e, %6.3e, %6.3e) m/s, next Δt: %s\n",
100 * (sim.model.clock.time / sim.stop_time),
sim.model.clock.iteration,
prettytime(sim.model.clock.time),
prettytime(1e-9 * (time_ns() - wall_clock[1])),
maximum(abs, sim.model.velocities.u),
maximum(abs, sim.model.velocities.v),
maximum(abs, sim.model.velocities.w),
prettytime(sim.Δt))
wall_clock[1] = time_ns()
return nothing
end
simulation.callbacks[:print_progress] = Callback(print_progress, IterationInterval(100))
@info "Running the simulation..."
run!(simulation)
@info "Simulation completed in " * prettytime(simulation.run_wall_time) For me: on
|
My results seem very different from yours @simone-silvestri. I'll check again later in case I did something wrong... |
To see a larger difference you have to use a vertically implicit closure. |
Using your example, on main: julia> run!(simulation)
[ Info: Initializing simulation...
[00.00%] i: 0, t: 0 seconds, wall time: 9.525 seconds, max(u): (0.000e+00, 0.000e+00, 0.000e+00) m/s, next Δt: 5.500 minutes
[ Info: ... simulation initialization complete (4.249 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (19.768 seconds).
[01.17%] i: 100, t: 11.193 hours, wall time: 1.965 minutes, max(u): (6.980e-01, 3.315e-01, 3.357e-03) m/s, next Δt: 8.858 minutes
[03.04%] i: 200, t: 1.217 days, wall time: 1.684 minutes, max(u): (5.929e-01, 3.535e-01, 2.520e-03) m/s, next Δt: 14.266 minutes
[06.04%] i: 300, t: 2.415 days, wall time: 1.583 minutes, max(u): (5.931e-01, 2.530e-01, 1.924e-03) m/s, next Δt: 20 minutes
[09.51%] i: 400, t: 3.804 days, wall time: 1.644 minutes, max(u): (5.163e-01, 2.431e-01, 1.688e-03) m/s, next Δt: 20 minutes on this PR julia> run!(simulation)
[ Info: Initializing simulation...
[00.00%] i: 0, t: 0 seconds, wall time: 9.839 seconds, max(u): (0.000e+00, 0.000e+00, 0.000e+00) m/s, next Δt: 5.500 minutes
[ Info: ... simulation initialization complete (4.567 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (19.823 seconds).
[09.33%] i: 100, t: 11.193 hours, wall time: 1.573 minutes, max(u): (6.961e-01, 3.346e-01, 3.385e-03) m/s, next Δt: 8.858 minutes
[24.35%] i: 200, t: 1.217 days, wall time: 1.195 minutes, max(u): (5.956e-01, 3.538e-01, 2.506e-03) m/s, next Δt: 14.266 minutes
[48.30%] i: 300, t: 2.415 days, wall time: 1.199 minutes, max(u): (5.931e-01, 2.529e-01, 1.828e-03) m/s, next Δt: 20 minutes
[76.07%] i: 400, t: 3.804 days, wall time: 1.210 minutes, max(u): (5.090e-01, 2.369e-01, 1.658e-03) m/s, next Δt: 20 minutes on Oceananigans 0.79.2 julia> run!(simulation)
[ Info: Initializing simulation...
[00.00%] i: 0, t: 0 seconds, wall time: 8.533 seconds, max(u): (0.000e+00, 0.000e+00, 0.000e+00) m/s, next Δt: 5.500 minutes
[ Info: ... simulation initialization complete (2.251 seconds)
[ Info: Executing initial time step...
[ Info: ... initial time step complete (35.854 seconds).
[11.66%] i: 100, t: 11.193 hours, wall time: 1.961 minutes, max(u): (6.968e-01, 3.324e-01, 3.355e-03) m/s, next Δt: 8.858 minutes
[30.44%] i: 200, t: 1.217 days, wall time: 1.180 minutes, max(u): (5.960e-01, 3.550e-01, 2.434e-03) m/s, next Δt: 14.266 minutes
[60.37%] i: 300, t: 2.415 days, wall time: 1.200 minutes, max(u): (5.939e-01, 2.541e-01, 1.830e-03) m/s, next Δt: 20 minutes
[95.09%] i: 400, t: 3.804 days, wall time: 1.269 minutes, max(u): (5.162e-01, 2.501e-01, 1.745e-03) m/s, next Δt: 20 minutes Might it be the threading? |
But you reported results with the BCI example, right? Or you used a different solver? |
I used |
@simone-silvestri will the changes here impact CATKE on CliMA/ClimaOcean.jl#17 ? |
with one thread on
|
But it's the same solver in the validations script also, right? The FFT solver... |
OK, now my results are similar to yours -- so it was the threading. |
not the implicit free surface solver, the vertical tridiagonal solver that solves for implicit diffusion of tracer and momentum in the vertical |
not on the GPU, this PR affects only CPU performance |
sorry, I was not clear... I was asking whether any syntax for adding parametrization with additional tracers changed. (But I think, no, right?) |
nono syntax stays the same |
upper_diagonal = maybe_tupled_ivd_upper_diagonal) | ||
lower_diagonal = Val(:maybe_tupled_ivd_lower_diagonal), | ||
diagonal = Val(:ivd_diagonal), | ||
upper_diagonal = Val(:maybe_tupled_ivd_upper_diagonal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems complicated. Can we come up with a better design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll raise an issue?
the thing is that we can’t pass a function as an argument to a kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can't quite be true because we pass forcing
into the tendency kernels.
Still, there's no reason why we have to use the "function" interface here. There's many possible solutions: array-like structs, new types + extending get_coefficient
(probably the best solution), or redesigning the solver. It's not like there's only one solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I am wrong but new types + extending get_coefficient
sounds a bit like this with types instead of Val{:function_name}
). I guess probably redesigning the solvers might be a long-term solution that we can explore
This can't be true because, for example Oceananigans.jl/src/Models/NonhydrostaticModels/nonhydrostatic_tendency_kernel_functions.jl Line 58 in bcc34f0
is a Oceananigans.jl/src/Models/NonhydrostaticModels/nonhydrostatic_tendency_kernel_functions.jl Line 74 in bcc34f0
This might be the key. |
Let's do better than this? |
return hydrostatic_turbulent_kinetic_energy_tendency, mews_closure, mews_diffusivity_fields | ||
return calculate_hydrostatic_free_surface_Ge!, mews_closure, mews_diffusivity_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the old name, it's more clear. Can we change it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the kernel, not the function
i, j, k = @index(Global, NTuple) | ||
@inbounds Gc[i, j, k] = tendency_kernel_function(i, j, k, grid, args...) | ||
@inbounds Ge[i, j, k] = hydrostatic_turbulent_kinetic_energy_tendency(i, j, k, grid, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting inconsistency
@inline get_coefficient(::Val{:maybe_tupled_ivd_lower_diagonal}, i, j, k, grid, p, args...) = maybe_tupled_ivd_lower_diagonal(i, j, k, grid, args...) | ||
@inline get_coefficient(::Val{:maybe_tupled_ivd_upper_diagonal}, i, j, k, grid, p, args...) = maybe_tupled_ivd_upper_diagonal(i, j, k, grid, args...) | ||
@inline get_coefficient(::Val{:ivd_diagonal}, i, j, k, grid, p, args...) = ivd_diagonal(i, j, k, grid, args...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage of dispatching on Val
, versus another type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say using Val
or a Type
results in the same behavior unless you had another idea for the type. Val
avoids having to define a new type.
maybe we could actually try
@inline get_coefficient(::Val{:function_name}, i, j, k, grid, p, args...) where function_name = function_name(i, j, k, grid, p, args...)
and see if it does compile (probably not)
BatchedTridiagonalSolver
for vertically implicit diffusion
BatchedTridiagonalSolver
for vertically implicit diffusionsolve!
with BatchedTridiagonalSolver
for vertically implicit diffusion
On main we are passing the tendency kernel function as an argument to the kernel.
Apparently, this prevents compilation on the CPU.
Another place where this design was implemented is the vertically implicit solver, where we pass functions to calculate the tridiagonal matrix coefficients.
This PR fixes both problems.
After this PR we should remember that we cannot pass functions as kernel arguments,
not even as properties of a struct! Instead we can pass
Val(:function_name)
and dispatch on that to call the correctfunction_name
(as implemented in this PR for the vertically implicit solver)Baroclinic adjustment test (with
Nx = Ny = 128, Nz = 10
)on main:
on this PR:
on Oceananigans v0.79.2 (with KA 0.7.2):
closes #2996