New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to recover shards after the disk was full #12055
Comments
The problem you solved? |
this will be fixed in Elasticsearch 2.0. It will unlikely make it into 1.x series since it depends on a large amount of changes that are only in 2.0 |
When is Elasticsearch 2.0 scheduled for release? |
Delete .recovery file inside the translog folder Eg:/es/elasticsearch-1.7.1/data/[elasticsearch_clustername]/nodes/0/indices/[indexname]/2/translog/ |
I also had this kind of error after a partition was full, and deleting the .recovery files as balaji006 suggested worked fine. I had a lot of affected index/shard directories, but after deleting each .recovery file elasticsearch worked fine again. |
I am running ElasticSearch 2.0, but am still receiving IndexShard Recovery failures: [2015-11-23 18:03:32,670][WARN ][cluster.action.shard ] [The Russian] [logstash-2015.10.24][4] received shard failed for [logstash-2015.10.24][4], node[omb9PXHUTXqpKeesvkCbPw], [P], v[742647], s[INITIALIZING], a[id=XUctUOPUQLiHXyK2J9gdlg], unassigned_info[[reason=ALLOCATION_FAILED], at[2015-11-23T18:03:32.486Z], details[failed recovery, failure IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[latest found translog has a lower generation that the excepcted uncommitted 1421133423283 > -1]; ]], indexUUID [jf5m3aXaQLyH9gMhwMBuDQ], message [failed recovery], failure [IndexShardRecoveryException[failed recovery]; nested: IllegalStateException[latest found translog has a lower generation that the excepcted uncommitted 1421133423283 > -1]; ] There has been no disk full issue since my upgrade to 2.0, so possibility of recovery file getting corrupted is very low. Any fixes / workaround would be very much appreciated. Regards, |
Today, the disk got full and ElasticSearch is not able to go back again. Isn't there a built-in system that prevents such failures. I agree that we should be monitoring the hard space and not let this happen in first place, but some times things happen. My setup is a single node at present. I don't see a clear way to recover the node. A post at https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html seemed to help, but still few indices got corrupted and I have no way to recovering them. At the end, I ended up deleted the indices, but that's not the way it should be. Such things must be taken care of ultimatley |
Same issue here. Applied tips at https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html but not solved. This is really really disappointing! |
I'm on 2.0, but going to upgrade to 2.1 |
I think I've experienced the same just after updating from 2.1.0 to 2.2.0 (from official stable PPA) Its only a few devel indexes but the recovery seems to just (after stopping elastic, growing the disk, starting elastic) filled up the disk very quickly with translog "stuff" I'm just going to delete the stuff but this shouldn't be difficult to replicate. |
@starkers clean please capture the files and logs before deleting, and share them somewhere? these things are typically not easy to reproduce :( |
Should this not be reopned? I just a disk full and now and getting
This is for 2.1.1 |
@systeminsightsbuild sadly there can be many reasons for this can of failure. this specific issue is about translog corruption due to a failure to fully write an operation. This is fixed in 2.0. There might be other issues as well. It's hard to tell from the log line you sent as it misses the part that tells why the shard failed. If you can post that (and feel free open a new issue), we can see what's going on. |
it works!!! balaji006 commented on 1 Sep 2015 Eg:/es/elasticsearch-1.7.1/data/[elasticsearch_clustername]/nodes/0/indices/[indexname]/2/translog/ Thanks @balaji006 |
@simonw This is still there for 2.2.3. @balaji006 workaround fixed the issue, but I think that needs to be addressed. |
@ambodi can open a new issue with the details of what you saw? this can come in many flavors. I'm also curious how you had a |
@bleskes here is what I see:
@bleskes we upgraded from 1.5 to 2.2.3 |
@ambodi thx. That exception stack trace refers to a class that has been removed in the 2.x series. The code that generated this exception is therefore from your 1.5 version. This makes me think something went wrong with your upgrade and that that node is still on 1.5. PS. I taking you mean upgrade to 2.2.3 (as you wrote before) and not 2.8 |
For reference, an original thread with a complete set of instructions for this error is at: And to correct mistakes found above:
For us, after stopping ES, moving these |
@tamsky the link doesn't work, maybe the elasticsearch group was deleted/moved? |
Thanks for pointing out the group is gone. I'm disappointed the ES team invalidated (and made unsearchable by old URL) all those groups links after their bulk import and announcement. I've learned my lesson : at a minimum, quote the thread subject. A bit of spelunking later, I found a citation containing both thread URL and subject Here's the migrated thread: I guess the message I had linked to was this |
On one of our servers running Elasticsearch, some other process wrote to many logfiles such that the disk was out of space. After deleting these logfiles and rebooting the system, Elasticsearch did not recover.
We are running on a single server, using Elasticsearch 1.5.2
I believe we manages to recover by deleting some of the *.recovering files in the elasticsearch data directories, however it would be great if Elasticsearch could recover as much as possible by itself.
Note: This issue seems very similar to #10606 which I have reported before.
The text was updated successfully, but these errors were encountered: