Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException during discovery #3515

Closed
rdeaton opened this issue Aug 15, 2013 · 1 comment
Closed

NullPointerException during discovery #3515

rdeaton opened this issue Aug 15, 2013 · 1 comment

Comments

@rdeaton
Copy link

rdeaton commented Aug 15, 2013

Some logs from during discovery of a brand new cluster today. The zen.multicast is set to disable and they are all given a list of the other nodes for unicast.

[2013-08-15 02:41:17,744][INFO ][node ] [qphosphorus2] stopping ...
[2013-08-15 02:41:18,237][INFO ][node ] [qphosphorus2] stopped
[2013-08-15 02:41:18,237][INFO ][node ] [qphosphorus2] closing ...
[2013-08-15 02:41:18,273][INFO ][node ] [qphosphorus2] closed
[2013-08-15 02:57:51,994][INFO ][node ] [qphosphorus2] version[0.90.3], pid[72565], build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-15 02:57:51,995][INFO ][node ] [qphosphorus2] initializing ...
[2013-08-15 02:57:52,004][INFO ][plugins ] [qphosphorus2] loaded [], sites [head]
[2013-08-15 02:57:54,627][INFO ][node ] [qphosphorus2] initialized
[2013-08-15 02:57:54,627][INFO ][node ] [qphosphorus2] starting ...
[2013-08-15 02:57:54,856][INFO ][transport ] [qphosphorus2] bound_address {inet[/192.168.72.120:9300]}, publish_address {inet[/192.168.72.120:9300]}
[2013-08-15 02:58:24,865][WARN ][discovery ] [qphosphorus2] waited for 30s and no initial state was set by the discovery
[2013-08-15 02:58:24,866][INFO ][discovery ] [qphosphorus2] QuizletProductionCluster/HstcP8sQRWGi6nwjZU5KHw
[2013-08-15 02:58:24,949][INFO ][http ] [qphosphorus2] bound_address {inet[/192.168.72.120:9200]}, publish_address {inet[/192.168.72.120:9200]}
[2013-08-15 02:58:24,950][INFO ][node ] [qphosphorus2] started
[2013-08-15 02:58:54,909][INFO ][cluster.service ] [qphosphorus2] new_master [qphosphorus2][HstcP8sQRWGi6nwjZU5KHw][inet[/192.168.72.120:9300]]{master=true}, reason: zen-disco-join (elected_as_master)
[2013-08-15 02:59:30,213][INFO ][cluster.service ] [qphosphorus2] added {[qaluminium2][YH9kVKH-Rgyc9tVcxQq_-g][inet[/192.168.72.115:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qaluminium2][YH9kVKH-Rgyc9tVcxQq_-g][inet[/192.168.72.115:9300]]{master=true}])
[2013-08-15 03:00:35,465][INFO ][cluster.service ] [qphosphorus2] added {[qargon2][Mif6T8WDT0Of0Td9vgrf_w][inet[/192.168.72.119:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qargon2][Mif6T8WDT0Of0Td9vgrf_w][inet[/192.168.72.119:9300]]{master=true}])
[2013-08-15 03:00:35,533][DEBUG][action.admin.cluster.node.stats] [qphosphorus2] failed to execute on node [Mif6T8WDT0Of0Td9vgrf_w]
org.elasticsearch.transport.RemoteTransportException: [qargon2][inet[/192.168.72.119:9300]][cluster/nodes/stats/n]
Caused by: java.lang.NullPointerException
at org.elasticsearch.action.support.nodes.NodeOperationResponse.writeTo(NodeOperationResponse.java:59)
at org.elasticsearch.action.admin.cluster.node.stats.NodeStats.writeTo(NodeStats.java:215)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:83)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:62)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:276)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:267)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:269)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-08-15 03:00:37,546][INFO ][cluster.service ] [qphosphorus2] added {[qchlorine2][PFq_0tNNSC6yvom-RNCzPw][inet[/192.168.72.114:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qchlorine2][PFq_0tNNSC6yvom-RNCzPw][inet[/192.168.72.114:9300]]{master=true}])

@ghost ghost assigned s1monw Aug 15, 2013
s1monw added a commit to s1monw/elasticsearch that referenced this issue Aug 15, 2013
The ClusterState can hold an 'invalid' local 'DiscoveryNode' during
node startup and rare race conditions can cause NPEs if an 'invalid'
'DiscoveryNode' is serialized.

Closes elastic#3515
@s1monw
Copy link
Contributor

s1monw commented Aug 15, 2013

thanks for opening this. This is a very rare race-condition that happens during node startup. There is a small chance that the clusterstate is still empty and that can cause this problem. I will fix that by asking the discovery service or cluster service for the local node instead of going through the cluster state which will be initialized correctly.

@s1monw s1monw closed this as completed in 27b9738 Aug 15, 2013
s1monw added a commit that referenced this issue Aug 15, 2013
The ClusterState can hold an 'invalid' local 'DiscoveryNode' during
node startup and rare race conditions can cause NPEs if an 'invalid'
'DiscoveryNode' is serialized.

Closes #3515
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
The ClusterState can hold an 'invalid' local 'DiscoveryNode' during
node startup and rare race conditions can cause NPEs if an 'invalid'
'DiscoveryNode' is serialized.

Closes elastic#3515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants