Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easier to restart a simulation from a checkpoint with additional passive tracers #2938

Merged
merged 2 commits into from
Feb 23, 2023

Conversation

tomchor
Copy link
Collaborator

@tomchor tomchor commented Feb 22, 2023

This feature adds support for using a checkpoint file to initialize a model that contains additional passive tracers that weren't present in the original simulation. The use case in mind is whenever a user wants to start passive tracers only after simulation spin-up, for example.

At the moment, on main, this isn't possible since if we try to pickup a simulation but a given variable can't be found in the checkpointer file, the code throws a warning when trying to set the data for that variable, and an error when trying to set its tendencies.

This PR changes the code so that it throws an error for both cases. After this PR a user can then do:

using Oceananigans

grid = RectilinearGrid(size = (4, 4, 4), extent = (1,1,1))

model_spinup = NonhydrostaticModel(; grid, tracers = :b)
set!(model_spinup, b=1)

simulation = Simulation(model_spinup, Δt = 1, stop_time = 10)
simulation.output_writers[:checkpointer] = checkpointer = Checkpointer(model_spinup,
                                                                       schedule=TimeInterval(5),
                                                                       prefix="checkpoint")

run!(simulation)

using Oceananigans.OutputWriters: write_output!
write_output!(checkpointer, model_spinup)

model = NonhydrostaticModel(; grid,
                            tracers = (keys(model_spinup.tracers)..., :t1, :t2))

@info "Restarting model with more tracers"
checkpoint_file_path = Oceananigans.OutputWriters.checkpoint_path(true, simulation.output_writers)
set!(model, checkpoint_file_path)

simulation = Simulation(model, Δt = 1, stop_time = 20)

run!(simulation)

On main this throws a KeyError. On this branch this produces:

[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (1.614 seconds)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (20.304 seconds).
[ Info: Simulation is stopping after running for 22.039 seconds.
[ Info: Simulation time 10 seconds equals or exceeds stop time 10 seconds.
[ Info: Restarting model with more tracers
┌ Warning: Could not restore t1 from checkpoint.
└ @ Oceananigans.OutputWriters ~/repos/Oceananigans.jl/src/OutputWriters/checkpointer.jl:218
┌ Warning: Could not restore t2 from checkpoint.
└ @ Oceananigans.OutputWriters ~/repos/Oceananigans.jl/src/OutputWriters/checkpointer.jl:218
┌ Warning: Could not restore tendencies for t1 from checkpoint.
└ @ Oceananigans.OutputWriters ~/repos/Oceananigans.jl/src/OutputWriters/checkpointer.jl:257
┌ Warning: Could not restore tendencies for t2 from checkpoint.
└ @ Oceananigans.OutputWriters ~/repos/Oceananigans.jl/src/OutputWriters/checkpointer.jl:257
[ Info: Initializing simulation...
[ Info:     ... simulation initialization complete (514.400 μs)
[ Info: Executing initial time step...
[ Info:     ... initial time step complete (41.614 seconds).
[ Info: Simulation is stopping after running for 41.750 seconds.
[ Info: Simulation time 20 seconds equals or exceeds stop time 20 seconds.

julia> interior(model.tracers.b)
4×4×4 view(::Array{Float64, 3}, 4:7, 4:7, 4:7) with eltype Float64:
[:, :, 1] =
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

[:, :, 2] =
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

[:, :, 3] =
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

[:, :, 4] =
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0

julia> interior(model.tracers.t1)
4×4×4 view(::Array{Float64, 3}, 4:7, 4:7, 4:7) with eltype Float64:
[:, :, 1] =
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

[:, :, 2] =
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

[:, :, 3] =
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

[:, :, 4] =
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0

One option to add safety is to modify set!(model, filepath::AbstractString) to take an option argument skip_missing_variables = false (or something), which would change the behavior from throwing a warning to throwing an error.

@whitleyv

@glwagner
Copy link
Member

This feature would be used to restart a simulation from a checkpoint with additional passive tracers? Is that correct?

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 22, 2023

This feature would be used to restart a simulation from a checkpoint with additional passive tracers? Is that correct?

That is correct. That's what I'm currently trying to do (and eventually @whitleyv too) and it's much easier if with we make this change with the Checkpointer.

@glwagner
Copy link
Member

glwagner commented Feb 22, 2023

Can we update the PR description to state this goal? "This feature adds support for using a checkpoint file to initialize a model that contains additional passive tracers..."

@tomchor tomchor changed the title Throw a warning if we can't restore tendencies for a given variable Make it easier to restart a simulation from a checkpoint with additional passive tracers Feb 22, 2023
@tomchor
Copy link
Collaborator Author

tomchor commented Feb 22, 2023

Can we update the PR description to state this goal? "This feature adds support for using a checkpoint file to initialize a model that contains additional passive tracers..."

Done!

catch
@warn "Could not retore $name from checkpoint."
catch err
if err isa KeyError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why KeyError? Maybe add a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the expected error when a variable isn't present in the checkpoint. It's possible to have other errors though, so I think should rethrow() the error if something else goes wrong

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case your comment made me realize the way it was written probably isn't best practice, so I re-wrote the code using an if-else statement instead of a try-catch. The new code is shorter and (I think) clearer. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's easier to understand.

There's the concept of "Easier to Ask for Forgiveness than Permission" which claims that try/catch is better style esp for dynamic languages.

I think our project and code is unique. So while there are places that so-called "best practices" apply, there are also places where we should do what works best for us.

Copy link
Member

@glwagner glwagner Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of "look before you leap" is when we have many error conditions which causes the if-statements to pile up. Here we have just one condition so I think explicit writing the condition is better and more readable (for me).

@@ -209,16 +209,12 @@ function set!(model, filepath::AbstractString)
model_fields = prognostic_fields(model)

for name in propertynames(model_fields)
try
if string(name) ∈ keys(file) # Test if variable exist in checkpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this! Easier to understand when reading it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants