Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"restart from checkpoint file" not working #242

Closed
saubhagya-gatech opened this issue Mar 3, 2024 · 1 comment
Closed

"restart from checkpoint file" not working #242

saubhagya-gatech opened this issue Mar 3, 2024 · 1 comment

Comments

@saubhagya-gatech
Copy link
Contributor

This is to report that the "restart from checkpoint file" feature seems to be broken for the master branch. I tested it on two different HPCs (Perlmutter and CADES) and both gave different error messages. However, when I use a checkpoint file for setting initial conditions, I do not get any error (of course we do not get fluxes for the first observation in this case).

Error on Perlmutter:

terminate called after throwing an instance of 'Errors::Message'
  what():  HDF5_MPI: error opening file "checkpoint_final.h5" with READ_WRITE access.

Error on CADES:

*** An error occurred in MPI_Allreduce
*** reported by process [216662016,1]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_TRUNCATE: message truncated
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
]***    and potentially your MPI job)
@saubhagya-gatech
Copy link
Contributor Author

This is resolved. The particular built-on HPC was outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant