NullPointerException during discovery #3515

rdeaton · 2013-08-15T03:07:28Z

Some logs from during discovery of a brand new cluster today. The zen.multicast is set to disable and they are all given a list of the other nodes for unicast.

[2013-08-15 02:41:17,744][INFO ][node ] [qphosphorus2] stopping ...
[2013-08-15 02:41:18,237][INFO ][node ] [qphosphorus2] stopped
[2013-08-15 02:41:18,237][INFO ][node ] [qphosphorus2] closing ...
[2013-08-15 02:41:18,273][INFO ][node ] [qphosphorus2] closed
[2013-08-15 02:57:51,994][INFO ][node ] [qphosphorus2] version[0.90.3], pid[72565], build[5c38d60/2013-08-06T13:18:31Z]
[2013-08-15 02:57:51,995][INFO ][node ] [qphosphorus2] initializing ...
[2013-08-15 02:57:52,004][INFO ][plugins ] [qphosphorus2] loaded [], sites [head]
[2013-08-15 02:57:54,627][INFO ][node ] [qphosphorus2] initialized
[2013-08-15 02:57:54,627][INFO ][node ] [qphosphorus2] starting ...
[2013-08-15 02:57:54,856][INFO ][transport ] [qphosphorus2] bound_address {inet[/192.168.72.120:9300]}, publish_address {inet[/192.168.72.120:9300]}
[2013-08-15 02:58:24,865][WARN ][discovery ] [qphosphorus2] waited for 30s and no initial state was set by the discovery
[2013-08-15 02:58:24,866][INFO ][discovery ] [qphosphorus2] QuizletProductionCluster/HstcP8sQRWGi6nwjZU5KHw
[2013-08-15 02:58:24,949][INFO ][http ] [qphosphorus2] bound_address {inet[/192.168.72.120:9200]}, publish_address {inet[/192.168.72.120:9200]}
[2013-08-15 02:58:24,950][INFO ][node ] [qphosphorus2] started
[2013-08-15 02:58:54,909][INFO ][cluster.service ] [qphosphorus2] new_master [qphosphorus2][HstcP8sQRWGi6nwjZU5KHw][inet[/192.168.72.120:9300]]{master=true}, reason: zen-disco-join (elected_as_master)
[2013-08-15 02:59:30,213][INFO ][cluster.service ] [qphosphorus2] added {[qaluminium2][YH9kVKH-Rgyc9tVcxQq_-g][inet[/192.168.72.115:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qaluminium2][YH9kVKH-Rgyc9tVcxQq_-g][inet[/192.168.72.115:9300]]{master=true}])
[2013-08-15 03:00:35,465][INFO ][cluster.service ] [qphosphorus2] added {[qargon2][Mif6T8WDT0Of0Td9vgrf_w][inet[/192.168.72.119:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qargon2][Mif6T8WDT0Of0Td9vgrf_w][inet[/192.168.72.119:9300]]{master=true}])
[2013-08-15 03:00:35,533][DEBUG][action.admin.cluster.node.stats] [qphosphorus2] failed to execute on node [Mif6T8WDT0Of0Td9vgrf_w]
org.elasticsearch.transport.RemoteTransportException: [qargon2][inet[/192.168.72.119:9300]][cluster/nodes/stats/n]
Caused by: java.lang.NullPointerException
at org.elasticsearch.action.support.nodes.NodeOperationResponse.writeTo(NodeOperationResponse.java:59)
at org.elasticsearch.action.admin.cluster.node.stats.NodeStats.writeTo(NodeStats.java:215)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:83)
at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:62)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:276)
at org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:267)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:269)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[2013-08-15 03:00:37,546][INFO ][cluster.service ] [qphosphorus2] added {[qchlorine2][PFq_0tNNSC6yvom-RNCzPw][inet[/192.168.72.114:9300]]{master=true},}, reason: zen-disco-receive(join from node[[qchlorine2][PFq_0tNNSC6yvom-RNCzPw][inet[/192.168.72.114:9300]]{master=true}])

The text was updated successfully, but these errors were encountered:

The ClusterState can hold an 'invalid' local 'DiscoveryNode' during node startup and rare race conditions can cause NPEs if an 'invalid' 'DiscoveryNode' is serialized. Closes elastic#3515

s1monw · 2013-08-15T09:44:01Z

thanks for opening this. This is a very rare race-condition that happens during node startup. There is a small chance that the clusterstate is still empty and that can cause this problem. I will fix that by asking the discovery service or cluster service for the local node instead of going through the cluster state which will be initialized correctly.

The ClusterState can hold an 'invalid' local 'DiscoveryNode' during node startup and rare race conditions can cause NPEs if an 'invalid' 'DiscoveryNode' is serialized. Closes #3515

The ClusterState can hold an 'invalid' local 'DiscoveryNode' during node startup and rare race conditions can cause NPEs if an 'invalid' 'DiscoveryNode' is serialized. Closes elastic#3515

ghost assigned s1monw Aug 15, 2013

s1monw closed this as completed in 27b9738 Aug 15, 2013

andrewclegg mentioned this issue Nov 1, 2013

On full cluster restart, node fails to detect master, then hangs for 10+ minutes #4041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException during discovery #3515

NullPointerException during discovery #3515

rdeaton commented Aug 15, 2013

s1monw commented Aug 15, 2013

NullPointerException during discovery #3515

NullPointerException during discovery #3515

Comments

rdeaton commented Aug 15, 2013

s1monw commented Aug 15, 2013