Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

nkvoll · 2014-10-10T10:25:32Z

I ran into this issue where a freshly 1.4.0.Beta.1 instance would not be able to ping (or receive pings?) from a 1.3.4-instance, and found the following in the 1.4 instance log:

[2014-10-09 09:43:15,759][TRACE][discovery.zen.ping.unicast] [foo-instance] failed to ping [#zfd0#][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]], falling back to <=1.4.0 ping action
[2014-10-09 09:43:15,760][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] sending to [#zen_unicast_1_#zfd0##][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:15,767][WARN ][discovery.zen.ping.unicast] [foo-instance] failed to send ping to [[#zfd0#][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]]]
org.elasticsearch.transport.RemoteTransportException: [bar-instance][inet[/10.0.3.4:9300]][internal:discovery/zen/unicast]
Caused by: org.elasticsearch.transport.ActionNotFoundTransportException: No handler for action [internal:discovery/zen/unicast]
        at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:210)
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Is there something obvious I have missed/done wrong? The action name seems pretty odd.

The text was updated successfully, but these errors were encountered:

nkvoll · 2014-10-10T10:30:48Z

I should add that the 1.4 instance discovers the 1.3.4-node using a separate UnicastHostsProvider, which creates a discovery node similar this:

new DiscoveryNode("#zfd"+discoveryNodeCounter.getAndIncrement()+"#", address, Version.CURRENT);

The part where it uses Version.CURRENT might not be technically correct in this case, since it's actually connecting to an older version, but it's not obvious (to me, at least) what other way there is if the version of the destination node isn't known at that point (i.e before connecting to it).

nkvoll · 2014-10-10T10:35:45Z

I'm also unsure of whether I'm reading this commit message correctly: d2fea53

.. should I take it to mean that I have to specify the lower bound during the DiscoveryNode instantiation, or should it really be handled in a backwards-compatible way by Elasticsearch itself? I initially understood it as the latter, but this doesn't seem to be the case.

s1monw · 2014-10-10T10:43:46Z

are you actual having problems with your cluster of is it only that you see this errors showing up? I think this error comes from the backwards layer we have in 1.4 that can't tell what version it's pinging on the first contact so it goes and tries the new (1.4 and above) and the old endpoint (1.3.4 and below). If the new endpoint is causing an error it falls back to the old for backwards compatibility. It might be just this initial probe you are seeing?

nkvoll · 2014-10-10T10:53:18Z

The end result here was that the 1.4-instance did not join the cluster, but instead elected itself as master after what looked like three retries:

[2014-10-09 09:43:18,718][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_1_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,722][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_3_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,723][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_2_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,725][TRACE][discovery.zen            ] [foo-instance] full ping responses: {none}
[2014-10-09 09:43:18,725][DEBUG][discovery.zen            ] [foo-instance] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-10-09 09:43:18,732][INFO ][cluster.service          ] [foo-instance] new_master [foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}, reason: zen-disco-join (elected_as_master)
[2014-10-09 09:43:18,734][INFO ][plugin.found.zookeeper.updaters] [foo-instance] Master node changed. Current masterNode: [[foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}]
[2014-10-09 09:43:18,740][TRACE][discovery.zen            ] [foo-instance] cluster joins counter set to [1] (elected as master)
[2014-10-09 09:43:18,755][INFO ][http                     ] [foo-instance] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9200]}

bleskes · 2014-10-10T11:16:05Z

@nkvoll this is not intended. The backwards compatibility check that @s1monw mentioned does exist but it doesn't manifest itself in warn level logs.

Can you tell me some more about the cluster layout? I see aws related logs. I also see the zoo keeper plugin?

nkvoll · 2014-10-10T12:17:00Z

@bleskes what I'm attempting is a simple cluster upgrade, adding a 1.4.0.Beta1 instance to a cluster consisting of a single instance running 1.3.4. They discover each-other via a custom UnicastHostProvider, which provides the host/port of the other instances in the cluster.

The 1.3.4 instance only has the following to say (discovery is logged down to the trace level):

[2014-10-10 12:03:14,671][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:14,703][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [2] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:16,124][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:16,132][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [5] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:17,619][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:17,626][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [8] and action [internal:discovery/zen/unicast], resetting

While the 1.4.0.Beta1-instance logs the following three times before forming its own cluster (which is permitted since minimum_master_nodes: 1 for both instances -- but shouldn't really be happening)

[2014-10-10 12:03:14,601][TRACE][discovery.zen            ] [instance-0000000001] starting to ping
[2014-10-10 12:03:14,604][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,624][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,625][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], using >=1.4.0 serialization
[2014-10-10 12:03:14,648][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,649][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,658][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}, ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}]
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], falling back to <=1.4.0 ping action
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,707][WARN ][discovery.zen.ping.unicast] [instance-0000000001] failed to send ping to [[#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]]
org.elasticsearch.transport.RemoteTransportException: [instance-0000000000][inet[/10.74.20.146:19775]][internal:discovery/zen/unicast]
Caused by: org.elasticsearch.transport.ActionNotFoundTransportException: No handler for action [internal:discovery/zen/unicast]
    at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:210)
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

The zookeeper plugin handles the UnicastHostProvider-part, but does not currently include any versioning information, which is why it uses Version.CURRENT, which I immediately thought could be the culprit -- but if I understand it correctly, it shouldn't actually matter.

nkvoll · 2014-10-10T12:32:11Z

If I simply change the DiscoveryNode instantiation in our code from using Version.CURRENT to using Version.V_1_3_0 it works, with the following in the 1.3.4 instance logs:

[2014-10-10 12:23:35,247][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:36,679][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,186][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,243][INFO ][cluster.service          ] [instance-0000000000] added {[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(join from node[[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}])

and the following from the 1.4.0.Beta1-instance:

[2014-10-10 12:23:35,152][TRACE][discovery.zen            ] [instance-0000000001] starting to ping
[2014-10-10 12:23:35,155][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,157][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,168][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,199][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,200][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:35,220][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,221][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,232][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:35,272][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:35,274][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,281][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,669][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:36,673][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,674][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:36,676][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[4], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,681][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:36,682][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,685][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,174][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,179][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,180][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:38,183][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[5], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[6], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,187][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:38,188][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,192][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,193][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,196][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,197][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,199][TRACE][discovery.zen            ] [instance-0000000001] full ping responses:
    --> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,200][DEBUG][discovery.zen            ] [instance-0000000001] filtered ping responses: (filter_client[true], filter_data[false])
    --> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,201][TRACE][discovery.zen            ] [instance-0000000001] ignoring joined once flags in ping responses, minimum ping version [1.3.4]
[2014-10-10 12:23:38,208][TRACE][discovery.zen            ] [instance-0000000001] joining master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,248][DEBUG][discovery.zen.publish    ] [instance-0000000001] received cluster state version 10
[2014-10-10 12:23:38,250][DEBUG][discovery.zen.fd         ] [instance-0000000001] [master] restarting fault detection against master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], reason [new cluster state received and we are monitoring the wrong master [null]]
[2014-10-10 12:23:38,251][DEBUG][discovery.zen            ] [instance-0000000001] got first state from fresh master [o5ulTkS4TV-BDs_NbkReqQ]
[2014-10-10 12:23:38,252][TRACE][discovery.zen            ] [instance-0000000001] updated cluster join cluster to [1]
[2014-10-10 12:23:38,254][INFO ][cluster.service          ] [instance-0000000001] detected_master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}, added {[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(from master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}])

The notable difference being that there were no exceptions logged when instantiating the DiscoveryNode using a lower-bound Version, and that the 1.4 instance was able to join the 1.3 cluster.

Does this possibly mean that there might be an issue with providing DiscoveryNodes using a UnicastHostsProvider when the target instance version is actually unknown?

s1monw · 2014-10-10T19:44:54Z

oh I see - sorry for the confusion about the BackwardsCompatibility endpoint. @nkvoll for the case when you don't know the discovery nodes version you can just use Version. CURRENT.minimumCompatibilityVersion() which returns the minimal compatible version. This should work just fine in that case.

nkvoll · 2014-10-10T23:03:50Z

Ok @s1monw, I'll verify that it works for us either this weekend, or at latest, on Monday. Since we have a workaround already, I'm fine with closing this issue now if you wish to do so, and I leave it entirely up to you whether this behaviour needs to be noted somewhere else as a help for other developers that have latched onto the UnicastHostsProvider-interface and are instantiating DiscoveryNodes the same way we did.

Thanks for the feedback and for letting me know about the minimumCompatibilityVersion method :)

s1monw · 2014-10-11T06:48:32Z

I will add some javadocs to the ctors to clarify! Thanks for raising this issue

Today it's easy to use the wrong version when initializing DiscoveryNode instances. This commit adds javadocs and a utility constant to initialize DiscoveryNode instances if the the remotes node version is unknown. Closes elastic#8051

Today it's easy to use the wrong version when initializing DiscoveryNode instances. This commit adds javadocs and a utility constant to initialize DiscoveryNode instances if the the remotes node version is unknown. Closes #8051

Today it's easy to use the wrong version when initializing DiscoveryNode instances. This commit adds javadocs and a utility constant to initialize DiscoveryNode instances if the the remotes node version is unknown. Closes elastic#8051

s1monw mentioned this issue Oct 11, 2014

Simplify discovery node initialization if version is unknown #8055

Merged

s1monw closed this as completed in #8055 Oct 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 10, 2014

nkvoll commented Oct 10, 2014

bleskes commented Oct 10, 2014

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 11, 2014

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

Comments

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 10, 2014

nkvoll commented Oct 10, 2014

bleskes commented Oct 10, 2014

nkvoll commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 10, 2014

nkvoll commented Oct 10, 2014

s1monw commented Oct 11, 2014