Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot/Restore: snapshot during rolling restart of a 2 node cluster might get stuck #9924

Closed
imotov opened this issue Feb 27, 2015 · 4 comments
Assignees
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@imotov
Copy link
Contributor

imotov commented Feb 27, 2015

The issue was originally reported in #7980 (comment) If a current master node that contains all primary shards is restarted in the middle of snapshot operation, it might leave the snapshot hanging in ABORTED state.

@cxxr
Copy link

cxxr commented Mar 10, 2015

👍 ran into this a few times.

imotov added a commit to imotov/elasticsearch that referenced this issue Mar 12, 2015
imotov added a commit that referenced this issue Mar 12, 2015
imotov added a commit that referenced this issue Mar 12, 2015
@srgclr
Copy link
Contributor

srgclr commented Jun 8, 2015

Ran into a similar issue and tried the snapshot cleanup utility. It didn't work as all shards were ignored:

Ignoring shard [[dev1_10_event.2015-03-15][4]] with state [ABORTED] on node [kyU3N9lpTIuTbdeUGp5ThQ] - node exists : [true]

What's the reason for ignoring shards when the node exists?

@imotov
Copy link
Contributor Author

imotov commented Jun 10, 2015

@srgclr if a node exists and a shard is in ABORTED state it can mean one of the two things - we hit #11314 or the shard is stuck in the I/O operation and we need to wait until the I/O operation is over or we need to restart the node. It's impossible for the cleanup utility to determine which state we are in. Because of this, it takes a safer route - assume that we are stuck in I/O operation and skip such shards.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
@imotov
Copy link
Contributor Author

imotov commented Aug 20, 2015

This should be solved by #11450. Closing.

@imotov imotov closed this as completed Aug 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

3 participants