-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating block_timings leads to checkpoint loading errors #219
Comments
vasdommes
added a commit
that referenced
this issue
Apr 1, 2024
Fix #219 Updating block_timings leads to checkpoint loading errors
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This bug was introduced by PR #215
Example
from
end-to-end_tests/SingletScalar_cT_test_nmax6/primal_dual_optimal
:It fails with
What happens?
ck/block_timings
.--verbosity debug
to see Block Grid Mapping).ck/block_timings
tock/block_timings.0
and writes new timings tock/block_timings
.ck/block_timings
to distribute SDP blocks. Then it loads a checkpoint, which assumes that blocks are distributed according tock/block_timings.0
.Temporary workaround
Move
block_timings.0
toblock_timings
before loading from a checkpoint.Solution
Do not write new
block_timings
after actual run (revert relevant changes from #215)TODO
Introduce new checkpoint format, invariant to block mapping, number of MPI ranks/nodes etc. Currently, we have one file per rank (containing matrix elements owned by rank). We may write each block to a separate file, like in SDP. NB: this could be more time- and memory-consuming.
The text was updated successfully, but these errors were encountered: