Enhancement: GROMACS FAHCore need to keep two levels of checkpoints. #1474
Labels
1.Type - Defect
Reported issue is a defect.
1.Type - Enhancement
Reported issue is an enhamcement.
3.Component - GROMACS Core
Reported issue relates to FahCore_a7.
When a checkpoint is written to disk, there will often be a previous checkpoint. If it is renamed to checkpoint-old before writing checkpoint-new (and deleting 3rd checkpoint if an even older checkpoint if it exists) this will provide an additional level of redundancy. The various issues mentioning the guru meditation issues (which also need to be fixed) would become less critical if the restarting FAHCore could, upon recognizing that it cannot start from the most recent checkpoint could revert to the previous one.
This would also be essential in the cases where a cloud-based VM is abruptly terminated when it is preempted by a higher priority invocation.
The text was updated successfully, but these errors were encountered: