Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeV2 corrupted checkpoints no automatic recovery #1317

Open
PHA-SYSOPS opened this issue Jun 20, 2023 · 1 comment
Open

RuntimeV2 corrupted checkpoints no automatic recovery #1317

PHA-SYSOPS opened this issue Jun 20, 2023 · 1 comment

Comments

@PHA-SYSOPS
Copy link

When a pruntimev2 has a corrupted checkpoint (you can reproduce this to restart the docker when a snapshot is in progress) it reports that it can not load the checkpoint and dies with error. No attempt is made to load one of the other (backup) snapshots. To solve the issue i have to manually delete the broken snapshot and then it would load a previous version and all is well.

I would expect pruntime to try and load the previous version on such error before die, it does not have to be all the oldder snaps, but go back 1. If that works , you might want to delete/rename the snapshot that was broken. This would allow easy and automated recovery

@kvinwang
Copy link
Collaborator

Does the --remove-corrupted-checkpoint resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants