New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Red Cluster State: failed to obtain in-memory shard lock and closeShard NPE #23939
Comments
The first problem: null pointer problem |
|
A lot of main shards are lost before reboot.No problem after the restart. |
old gc logs: third fifth seventh ...... But why does it lead to second problems? |
It's all explained in #23199:
When the cluster forms again after long GC / network disconnects, and the master allocates the shards to the problematic node, the node first has to properly shutdown the existing shard resources. This can take a while, depending on the operations that were previously running. This manifests in the ShardLock failures, as the new shard instance cannot be created while the old one is not fully cleaned up. We are currently investigating how we can provide more insight into which types of operations are preventing the old shard instance from being released, which will require code changes.
It's not lost. What happens is that the shard cannot be allocated as long as the locks of the previous instance are not being released. As this is a duplicate of #23199 now, I'm closing this one. |
Elasticsearch version:
5.0.0
Plugins installed: [null]
JVM version:
"1.8.0_72"
OS version:
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013
Description of the problem including expected versus actual behavior:
master node disconnected, rejoin, failed to obtain in-memory shard lock and closeShard NPE
Steps to reproduce:
1.Had a 2 node cluster , 114 indices, 449322260 docs, indices size:160G
2.Things were working out just fine for a while, then master node left, disconnected, rejoined ,exception happened
Provide logs (if relevant):
There is a null pointer exception,So far, I haven't find out why.This problem is also from disconnected to re-election, rejoin the cluster was wrong. My log is like this:
[2017-04-03T10:04:48,503][WARN ][o.e.i.IndexService ] [mogu015052] [es_xp_item_mgj] [2] failed to close store on shard removal (reason: [initialization failed])
java.lang.NullPointerException
at org.elasticsearch.index.IndexService.closeShard(IndexService.java:409) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:361) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:449) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:137) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:534) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:511) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:200) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.cluster.service.ClusterService.runTasksForExecutor(ClusterService.java:708) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:894) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:444) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:237) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:200) [elasticsearch-5.0.0.jar:5.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72]
[2017-04-03T10:04:48,504][WARN ][o.e.i.c.IndicesClusterStateService] [mogu015052] [[es_xp_item_mgj][2]] marking and sending shard failed due to [failed to create shard]
java.io.IOException: failed to obtain in-memory shard lock
at org.elasticsearch.index.IndexService.createShard(IndexService.java:355) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:449) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:137) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:534) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:511) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:200) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.cluster.service.ClusterService.runTasksForExecutor(ClusterService.java:708) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:894) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:444) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:237) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:200) [elasticsearch-5.0.0.jar:5.0.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72]
Caused by: org.elasticsearch.env.ShardLockObtainFailedException: [es_xp_item_mgj][2]: obtaining shard lock timed out after 5000ms
at org.elasticsearch.env.NodeEnvironment$InternalShardLock.acquire(NodeEnvironment.java:711) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.env.NodeEnvironment.shardLock(NodeEnvironment.java:630) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.index.IndexService.createShard(IndexService.java:285) ~[elasticsearch-5.0.0.jar:5.0.0]
... 13 more
The text was updated successfully, but these errors were encountered: