Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: GROMACS FAHCore need to keep two levels of checkpoints. #1474

Open
bb30994 opened this issue May 12, 2020 · 3 comments
Open

Enhancement: GROMACS FAHCore need to keep two levels of checkpoints. #1474

bb30994 opened this issue May 12, 2020 · 3 comments
Labels
1.Type - Defect Reported issue is a defect. 1.Type - Enhancement Reported issue is an enhamcement. 3.Component - GROMACS Core Reported issue relates to FahCore_a7.

Comments

@bb30994
Copy link

bb30994 commented May 12, 2020

When a checkpoint is written to disk, there will often be a previous checkpoint. If it is renamed to checkpoint-old before writing checkpoint-new (and deleting 3rd checkpoint if an even older checkpoint if it exists) this will provide an additional level of redundancy. The various issues mentioning the guru meditation issues (which also need to be fixed) would become less critical if the restarting FAHCore could, upon recognizing that it cannot start from the most recent checkpoint could revert to the previous one.

This would also be essential in the cases where a cloud-based VM is abruptly terminated when it is preempted by a higher priority invocation.

@bb30994
Copy link
Author

bb30994 commented May 12, 2020

See Running FAHClient on a cloud resources on temporary VMs
https://foldingforum.org/viewtopic.php?p=333009#p333009

@shorttack shorttack added 1.Type - Defect Reported issue is a defect. 1.Type - Enhancement Reported issue is an enhamcement. 3.Component - GROMACS Core Reported issue relates to FahCore_a7. labels May 12, 2020
@shorttack
Copy link

I rate this as an enhancement to fix a design defect. @bb30994 is spot on about need for fall-back checkpoint.

@PantherX
Copy link
Contributor

Any reason this can't be expanded to include OpenCL too? It would be wicked to have the consistent pattern in FahCore_22 as you can run GPUs in cloud compute and while not as common as CPUs, we're getting there so makes sense to future proof it now and have consistent behaviors across all FahCores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.Type - Defect Reported issue is a defect. 1.Type - Enhancement Reported issue is an enhamcement. 3.Component - GROMACS Core Reported issue relates to FahCore_a7.
Projects
None yet
Development

No branches or pull requests

3 participants