Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node lock is occasionally being lost #27846

Closed
617m4rc opened this issue Dec 15, 2017 · 4 comments
Closed

Node lock is occasionally being lost #27846

617m4rc opened this issue Dec 15, 2017 · 4 comments
Labels
:Core/Infra/Core Core issues without another label feedback_needed

Comments

@617m4rc
Copy link

617m4rc commented Dec 15, 2017

Elasticsearch version (bin/elasticsearch --version):

Version: 5.6.5, Build: 6a37571/2017-12-04T07:50:10.466Z, JVM: 1.8.0_144

Plugins installed:

[]

JVM version (java -version):

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

OS version (uname -a if on a Unix-like system):

OS Name: Microsoft Windows Server 2008 R2 Enterprise
OS Version: 6.1.7601 Service Pack 1 Build 7601

Description of the problem including expected versus actual behavior:

Elasticsearch 5.6.5 occasionally reports a failed lock assertion during the startup, or shortly after the startup has finished. We have already ruled out the possiblity of another application tampering with the lock file. We did not face such a behavior with elasticsearch 2.x.

Steps to reproduce:

Start and stop the application repeatedly and at frequent intervals.

Provide logs (if relevant):

[20:06:14,200][INFO ][o.e.n.Node               ] [ANAME] started
[20:06:14,200][DEBUG][o.e.i.s.IndexShard       ] [ANAME] [anindex][4] state: [CREATED]
[20:06:14,216][TRACE][o.e.i.s.IndexShard       ] [ANAME] [anindex][4] [anindex][4] writing shard state, reason [initial state with allocation id [[id=Io2ngdY8RlqG9s_LM-zquA]]]
[20:06:14,231][DEBUG][o.e.i.s.IndexShard       ] [ANAME] [anindex][4] state: [CREATED]->[RECOVERING], reason [from store]
[20:06:14,231][DEBUG][o.e.i.c.IndicesClusterStateService] [ANAME] [anindex][3] creating shard
[20:06:14,231][TRACE][o.e.e.NodeEnvironment    ] [ANAME] acquiring node shardlock on [[anindex][3]], timeout [5000]
[20:06:14,231][TRACE][o.e.e.NodeEnvironment    ] [ANAME] successfully acquired shardlock for [[anindex][3]]
[20:06:14,231][DEBUG][o.e.i.s.IndexShard       ] [ANAME] [anindex][4] starting recovery from store ...
[20:06:14,231][DEBUG][i.n.b.AbstractByteBuf    ] -Dio.netty.buffer.bytebuf.checkAccessible: true
[20:06:14,247][DEBUG][i.n.u.ResourceLeakDetectorFactory] Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@33678879
[20:06:14,231][WARN ][o.e.e.NodeEnvironment    ] [ANAME] lock assertion failed
java.io.IOException: The handle is invalid
            at sun.nio.ch.FileDispatcherImpl.size0(Native Method) ~[?:?]
            at sun.nio.ch.FileDispatcherImpl.size(FileDispatcherImpl.java:97) ~[?:?]
            at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:310) ~[?:?]
            at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:170) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
            at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.env.NodeEnvironment.availableShardPaths(NodeEnvironment.java:804) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.index.shard.ShardPath.loadShardPath(ShardPath.java:114) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.index.IndexService.createShard(IndexService.java:298) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:499) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:147) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:542) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:519) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:204) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.6.5.jar:5.6.5]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
[20:06:14,247][WARN ][o.e.i.IndexService       ] [ANAME] [anindex] [anindex][3] failed to load shard path, trying to remove leftover
[20:06:14,247][TRACE][o.e.e.NodeEnvironment    ] [ANAME] shard lock wait count for [anindex][3] is now [0]
[20:06:14,247][TRACE][o.e.e.NodeEnvironment    ] [ANAME] last shard lock wait decremented, removing lock for [anindex][3]
[20:06:14,247][TRACE][o.e.e.NodeEnvironment    ] [ANAME] released shard lock for [[anindex][3]]
[20:06:14,247][TRACE][o.e.i.IndexService       ] [ANAME] [anindex] [3] store not initialized prior to closing shard, nothing to close
[20:06:14,247][WARN ][o.e.i.c.IndicesClusterStateService] [ANAME] [[anindex][3]] marking and sending shard failed due to [failed to create shard]
java.lang.IllegalStateException: environment is not locked
            at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:944) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.env.NodeEnvironment.availableShardPaths(NodeEnvironment.java:804) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.index.shard.ShardPath.loadShardPath(ShardPath.java:114) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.index.IndexService.createShard(IndexService.java:298) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:499) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:147) ~[elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:542) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:519) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:204) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.6.5.jar:5.6.5]
            at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.6.5.jar:5.6.5]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
            Suppressed: org.apache.lucene.store.AlreadyClosedException: Unexpected lock file size: 162, (lock=NativeFSLock(path=D:\APATH\data\nodes\0\node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=18:58:30.65886Z))
                        at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:172) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
                        at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.env.NodeEnvironment.availableShardPaths(NodeEnvironment.java:804) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.index.shard.ShardPath.deleteLeftoverShardDirectory(ShardPath.java:153) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.index.IndexService.createShard(IndexService.java:302) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:499) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.indices.IndicesService.createShard(IndicesService.java:147) ~[elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.indices.cluster.IndicesClusterStateService.createShard(IndicesClusterStateService.java:542) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.indices.cluster.IndicesClusterStateService.createOrUpdateShards(IndicesClusterStateService.java:519) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:204) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.ClusterService.callClusterStateAppliers(ClusterService.java:814) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.ClusterService.publishAndApplyChanges(ClusterService.java:768) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:587) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247) [elasticsearch-5.6.5.jar:5.6.5]
                        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210) [elasticsearch-5.6.5.jar:5.6.5]
                        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
                        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
                        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: java.io.IOException: The handle is invalid
            at sun.nio.ch.FileDispatcherImpl.size0(Native Method) ~[?:?]
            at sun.nio.ch.FileDispatcherImpl.size(FileDispatcherImpl.java:97) ~[?:?]
            at sun.nio.ch.FileChannelImpl.size(FileChannelImpl.java:310) ~[?:?]
            at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:170) ~[lucene-core-6.6.1.jar:6.6.1 9aa465a89b64ff2dabe7b4d50c472de32c298683 - varunthacker - 2017-08-29 21:54:39]
            at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:941) ~[elasticsearch-5.6.5.jar:5.6.5]
            ... 20 more
@colings86
Copy link
Contributor

Is your data directory in a local disk or a remote one (i.e. is it a SAN, NFS etc.)?

@colings86 colings86 added :Core/Infra/Core Core issues without another label feedback_needed labels Dec 18, 2017
@cbuescher
Copy link
Member

Closing this for lack of feedback, please reopen if you can provide an answer to the question raised above.

@rana-prashant-208
Copy link

Hi,I am facing the same issue. My data directory is in the same machine in which es is running

@617m4rc
Copy link
Author

617m4rc commented Aug 2, 2018

In my case the problems were caused by the --silent parameter and vanished after removing it. You may want to check that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label feedback_needed
Projects
None yet
Development

No branches or pull requests

4 participants