New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot/Restore: snapshot during rolling restart of a 2 node cluster might get stuck #9924
Comments
👍 ran into this a few times. |
… nodes that no longer exist Related to elastic#9924
… nodes that no longer exist Related to #9924
… nodes that no longer exist Related to #9924
Ran into a similar issue and tried the snapshot cleanup utility. It didn't work as all shards were ignored:
What's the reason for ignoring shards when the node exists? |
@srgclr if a node exists and a shard is in ABORTED state it can mean one of the two things - we hit #11314 or the shard is stuck in the I/O operation and we need to wait until the I/O operation is over or we need to restart the node. It's impossible for the cleanup utility to determine which state we are in. Because of this, it takes a safer route - assume that we are stuck in I/O operation and skip such shards. |
… nodes that no longer exist Related to elastic#9924
This should be solved by #11450. Closing. |
The issue was originally reported in #7980 (comment) If a current master node that contains all primary shards is restarted in the middle of snapshot operation, it might leave the snapshot hanging in
ABORTED
state.The text was updated successfully, but these errors were encountered: