Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata: make sure snapshots' children are removable #6356

Closed
wants to merge 1 commit into from

Conversation

kzys
Copy link
Member

@kzys kzys commented Dec 8, 2021

Sometimes I see snapshots that have children in "children" bucket,
but not in "snapshots" bucket. In this case, there is no way to remove
the snapshots.

This change adds an extra check to make sure that snapshots' children
are actually removable. If the child is deleted even partially,
it should not cause ErrFailedPrecondition.

Signed-off-by: Kazuyoshi Kato katokazu@amazon.com

@theopenlab-ci
Copy link

theopenlab-ci bot commented Dec 8, 2021

Build succeeded.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Dec 8, 2021

Build succeeded.

Sometimes I see snapshots that have children in "children" bucket,
but not in "snapshots" bucket. In this case, there is no way to remove
the snapshots.

This change adds an extra check to make sure that snapshots' children
are actually removable. If the child is deleted even partially,
it should not cause ErrFailedPrecondition.

Signed-off-by: Kazuyoshi Kato <katokazu@amazon.com>
@theopenlab-ci
Copy link

theopenlab-ci bot commented Dec 8, 2021

Build succeeded.

@dmcgowan
Copy link
Member

dmcgowan commented Dec 8, 2021

Is there an associated bug that causes some cases to not remove from the parent? Adding the cleanup is fine but would like to know how that occurred since the child cleanup and parent removal should be in the same transaction.

@kzys
Copy link
Member Author

kzys commented Dec 9, 2021

No. I'm just having this odd situation with devmapper snapshotter. I have killed containerd with kill -9 and it may interrupt a bblot transaction.

Killing containerd forcibly is something we need to expect. We should gracefully handle the situation.

@kzys
Copy link
Member Author

kzys commented Dec 9, 2021

Seems the inconsistency was caused by my bug in in-development code. But I'd like to have a path to "fix" the inconsistent state. Should we ask operators to explicitly handle the situation (like fsck command?).

@dmcgowan
Copy link
Member

@kzys personally I am a fan of having fsck type tools for debugging and fixing on disk corruption or unexpected disk state. It is hard to account for every possible data corruption situation. Some of these situation could be handled by gc, but the case here should only ever occur due to a previously found bug. I actually started such a tool for containerd 4 years ago expecting these situation would be more common and difficult to debug, however, yet to have to deal with any serious issue and bolt has been very stable. I can clean it up if that is something we want to think more about.

@dmcgowan dmcgowan added this to New in Code Review via automation Dec 13, 2021
@dmcgowan dmcgowan moved this from New to Needs Discussion in Code Review Dec 13, 2021
@kzys
Copy link
Member Author

kzys commented Jan 31, 2022

@dmcgowan Sounds reasonable. Let me think about fsck subcommand.

Resolving.

@kzys kzys closed this Jan 31, 2022
Code Review automation moved this from Needs Discussion to Done Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants