-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to DELETE _snapshot even after a rolling restart #31624
Comments
Pinging @elastic/es-distributed |
@tlrx can you take a look? |
@TimHeckel Do you have more information about the snapshot deletion? It was a currently running / unfinished snapshot that you tried to delete? How long did it hang before you tried to restart the cluster? |
@tlrx - apologies for my delay in getting back to you here; the snapshot had been running for many days before I attempted deleting it and/or restarting the cluster. Is there anything I can do to force the removal of this aborted snapshot attempt? Thanks |
Hi -- I've since upgraded from 6.3.0 to 6.4.1, but this one hanging snapshot remains. Below are all the responses I've gotten after upgrading. @tlrx - wondering if you could take another look or give me a pointer? I'd prefer not to have to migrate to a whole new cluster, but I simply cannot delete this hanging _snapshot, and that may be my last option.
|
The simplest solution to get rid of the stuck snapshot is to do a full cluster restart, i.e., all nodes down, and only then start them up again. This will clear the snapshot state, but will ofc also mean downtime. The more involved solution consists of the following: Look at the clusterstate, and check the snapshot entries that are marked as ABORTED. Check the node id associated with that entry. For example, let's look at:
The node id is |
I believe you may also have to look for shards that are in the INIT state, not just the ABORTED state. Please correct me if I am off on this. |
In the situation outlined by @TimHeckel here, he had already issued a delete snapshot command. This moves the entries from INIT to the ABORTED state (but lets the delete snapshot request hang until the abort is fully completed and confirmed by the nodes). So a prerequisite to the above procedure is to first issue a delete snapshot command. |
@ywelsch - thank you so much for your help. I did attempt the second scenario, where I shut down the
I think I will try for the full cluster restart tonight. At any rate, you've given me the first actionable advice, so I really appreciate it. |
@ywelsch - just to close this, the FULL restart of the cluster worked. Thanks again. |
Elasticsearch version:
6.3.0
Plugins installed:
s3-repository
x-pack
JVM version
1.8
OS version
Amazon Linux
Description of the problem including expected versus actual behavior:
curl -XDELETE "http://localhost:9200/_snapshot/s3-6-backup/curator-20180623143002"
hangs even after a full restart of the cluster.I've tried turning
trace
logging on, but it unfortunately hasn't helped.Here is the status information on the backup:
curl -GET "http://localhost:9200/_snapshot/s3-6-backup/_status?pretty
Relevant:
The text was updated successfully, but these errors were encountered: