Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed node exception due to translog already closed #23099

Closed
matschaffer opened this issue Feb 10, 2017 · 7 comments
Closed

Failed node exception due to translog already closed #23099

matschaffer opened this issue Feb 10, 2017 · 7 comments
Assignees
Labels
>bug :Data Management/Stats Statistics tracking and retrieval APIs :Distributed/Engine Anything around managing Lucene and the Translog in an open shard.

Comments

@matschaffer
Copy link
Contributor

matschaffer commented Feb 10, 2017

Elasticsearch version: 5.2.0

Plugins installed: found-elasticsearch repository-s3 x-pack (default cloud set)

JVM version: java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

OS version: Ubuntu 14.04.1 LTS

Description of the problem including expected versus actual behavior:

Unclear on resulting behavior, but got a ran into it with the following logs.

Provide logs (if relevant):

[2017-02-10T05:10:35,904][WARN ][org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction] not accumulating exceptions, excluding exception from response org.elasticsearch.action.FailedNodeException: Failed node [WmfKMkelS7qOP_43OOpkVA]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:247) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$300(TransportNodesAction.java:160) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:219) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1024) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1126) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1104) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:68) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:123) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.onFailure(SecurityServerTransportInterceptor.java:224) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:109) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.lambda$messageReceived$0(SecurityServerTransportInterceptor.java:289) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:56) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$null$2(ServerTransportFilter.java:164) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.maybeRun(AuthorizationUtils.java:127) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.setRunAsRoles(AuthorizationUtils.java:121) ~[?:?]
	at org.elasticsearch.xpack.security.authz.AuthorizationUtils$AsyncAuthorizer.authorize(AuthorizationUtils.java:109) ~[?:?]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.lambda$inbound$3(ServerTransportFilter.java:166) ~[?:?]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:56) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$authenticateAsync$0(AuthenticationService.java:182) ~[x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lambda$lookForExistingAuthentication$2(AuthenticationService.java:201) ~[x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.lookForExistingAuthentication(AuthenticationService.java:213) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.authenticateAsync(AuthenticationService.java:180) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService$Authenticator.access$000(AuthenticationService.java:142) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.authc.AuthenticationService.authenticate(AuthenticationService.java:114) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.ServerTransportFilter$NodeProfile.inbound(ServerTransportFilter.java:142) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:296) [x-pack-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) [elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:610) [elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:596) [elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.2.0.jar:5.2.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72]
Caused by: org.elasticsearch.transport.RemoteTransportException: [instance-XX][X.X.X.X:X][cluster:monitor/nodes/stats[n]]
Caused by: org.apache.lucene.store.AlreadyClosedException: translog is already closed
	at org.elasticsearch.index.translog.Translog.ensureOpen(Translog.java:1310) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.index.translog.Translog.totalOperations(Translog.java:355) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.index.translog.Translog.totalOperations(Translog.java:340) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.index.translog.Translog.stats(Translog.java:572) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.index.shard.IndexShard.translogStats(IndexShard.java:734) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:213) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.indices.IndicesService.stats(IndicesService.java:309) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.node.service.NodeService.stats(NodeService.java:107) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:77) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.admin.cluster.node.stats.TransportNodesStatsAction.nodeOperation(TransportNodesStatsAction.java:42) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:145) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:270) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:266) ~[elasticsearch-5.2.0.jar:5.2.0]
	at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:237) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.2.0.jar:5.2.0]
	... 24 more
@matschaffer matschaffer changed the title Failed node exception due to translog is already closed Failed node exception due to translog already closed Feb 10, 2017
@clintongormley clintongormley added :Engine :Data Management/Stats Statistics tracking and retrieval APIs >bug labels Feb 10, 2017
@bleskes
Copy link
Contributor

bleskes commented Feb 10, 2017

@pickypg I'm assigning this to you as it seems you plan to pick this up. We can debate whether node stats should return errors to the users (rather than log them under WARN) but this is not the cause of this issue. I believe this goes wrong now because we stopped wrapping up internal engine exceptions and that confuses the logic here. I think we should teach that clause about AlreadyClosedException. The shard was just closed concurrently to the stats call, which is not a problem

@bleskes bleskes assigned pickypg and unassigned bleskes Feb 10, 2017
@pickypg
Copy link
Member

pickypg commented Feb 10, 2017

@bleskes I totally agree that this is a fake failure, but I do wonder about the value of ever throwing away exceptions to a TransportNodesAction?

In addition to making the appropriate fix here, I wonder if a secondary fix would be to remove the accumulateExceptions method on it?

@bleskes
Copy link
Contributor

bleskes commented Feb 12, 2017

In addition to making the appropriate fix here, I wonder if a secondary fix would be to remove the accumulateExceptions method on it?

I tend to agree - we should report what happened to the use. It will put an extra burden on finding the right exceptions to ignore, but I think it's the right tradeoff. IMO it should be a separate change.

@pickypg
Copy link
Member

pickypg commented Feb 12, 2017

Agree it should be a separate change.

@pickypg
Copy link
Member

pickypg commented Jun 2, 2017

Going to fix this by:

@mats16
Copy link

mats16 commented Jun 16, 2017

This is also occurring with docker.elastic.co/elasticsearch/elasticsearch:5.4.1

@pickypg
Copy link
Member

pickypg commented Jun 28, 2017

This was merged and backported to the respective branches both PRs. Thanks!

@pickypg pickypg closed this as completed Jun 28, 2017
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Stats Statistics tracking and retrieval APIs :Distributed/Engine Anything around managing Lucene and the Translog in an open shard.
Projects
None yet
Development

No branches or pull requests

5 participants