New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck on shard recovery, NPE in _recovery API #6430
Labels
Comments
I don't know why the shard recovery got stuck here. The NPE you saw in the recovery api is likely to get fixed via: #6190 |
I'll close this when we've run 1.3.x for a while and not seen any problems with recovery. |
Closing - please reopen is you see the problem recur. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm running a cluster of 5 nodes. After a normal reboot, recovery of certain shards started as normally. However, one of the replica shards doing recovery stopped when it had reached a shard size of about 233MB, out of the 1.4GB total shard size on the primary shard.
Here's what I've found out so far:
_recovery
API on any node in the cluster produces a{"error":"NullPointerException[null]","status":500}
(verified multiple times). No NPE in a healthy cluster.Output from
_cat/recovery/
on my index:The segments have not yet been created/registered:
Here is the stack trace from the ElasticSearch process that should be recovering this shard:
https://gist.github.com/magnhaug/11fa5750fe76a6adca4b
Here are the contents from the
indices
folder:https://dl.dropboxusercontent.com/u/233260280/unhealthy_shard.tar.gz (problematic shard)
https://dl.dropboxusercontent.com/u/233260280/healthy_shard.tar.gz (primary shard, for reference)
This is how a sample stuck shard looks in HEAD:
This was recovering from the following primary shard:
The text was updated successfully, but these errors were encountered: