Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

Closed
nkvoll opened this issue Oct 10, 2014 · 10 comments · Fixed by #8055
Closed

Error during unicast discovery between 1.3.4 and 1.4.0.Beta1 #8051

nkvoll opened this issue Oct 10, 2014 · 10 comments · Fixed by #8055

Comments

@nkvoll
Copy link
Member

nkvoll commented Oct 10, 2014

I ran into this issue where a freshly 1.4.0.Beta.1 instance would not be able to ping (or receive pings?) from a 1.3.4-instance, and found the following in the 1.4 instance log:

[2014-10-09 09:43:15,759][TRACE][discovery.zen.ping.unicast] [foo-instance] failed to ping [#zfd0#][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]], falling back to <=1.4.0 ping action
[2014-10-09 09:43:15,760][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] sending to [#zen_unicast_1_#zfd0##][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:15,767][WARN ][discovery.zen.ping.unicast] [foo-instance] failed to send ping to [[#zfd0#][ec2-host-1.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.3.4:9300]]]
org.elasticsearch.transport.RemoteTransportException: [bar-instance][inet[/10.0.3.4:9300]][internal:discovery/zen/unicast]
Caused by: org.elasticsearch.transport.ActionNotFoundTransportException: No handler for action [internal:discovery/zen/unicast]
        at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:210)
        at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
        at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
        at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
        at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
        at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
        at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
        at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
        at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
        at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Is there something obvious I have missed/done wrong? The action name seems pretty odd.

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

I should add that the 1.4 instance discovers the 1.3.4-node using a separate UnicastHostsProvider, which creates a discovery node similar this:

new DiscoveryNode("#zfd"+discoveryNodeCounter.getAndIncrement()+"#", address, Version.CURRENT);

The part where it uses Version.CURRENT might not be technically correct in this case, since it's actually connecting to an older version, but it's not obvious (to me, at least) what other way there is if the version of the destination node isn't known at that point (i.e before connecting to it).

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

I'm also unsure of whether I'm reading this commit message correctly: d2fea53

.. should I take it to mean that I have to specify the lower bound during the DiscoveryNode instantiation, or should it really be handled in a backwards-compatible way by Elasticsearch itself? I initially understood it as the latter, but this doesn't seem to be the case.

@s1monw
Copy link
Contributor

s1monw commented Oct 10, 2014

are you actual having problems with your cluster of is it only that you see this errors showing up? I think this error comes from the backwards layer we have in 1.4 that can't tell what version it's pinging on the first contact so it goes and tries the new (1.4 and above) and the old endpoint (1.3.4 and below). If the new endpoint is causing an error it falls back to the old for backwards compatibility. It might be just this initial probe you are seeing?

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

The end result here was that the 1.4-instance did not join the cluster, but instead elected itself as master after what looked like three retries:

[2014-10-09 09:43:18,718][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_1_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,722][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_3_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,723][TRACE][discovery.zen.ping.unicast] [foo-instance] [1] disconnecting from [#zen_unicast_2_#zen_found_discovery_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.0.3.4:9300]]
[2014-10-09 09:43:18,725][TRACE][discovery.zen            ] [foo-instance] full ping responses: {none}
[2014-10-09 09:43:18,725][DEBUG][discovery.zen            ] [foo-instance] filtered ping responses: (filter_client[true], filter_data[false]) {none}
[2014-10-09 09:43:18,732][INFO ][cluster.service          ] [foo-instance] new_master [foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}, reason: zen-disco-join (elected_as_master)
[2014-10-09 09:43:18,734][INFO ][plugin.found.zookeeper.updaters] [foo-instance] Master node changed. Current masterNode: [[foo-instance][nacDp91kRDCmsh04NoJBWA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9300]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}]
[2014-10-09 09:43:18,740][TRACE][discovery.zen            ] [foo-instance] cluster joins counter set to [1] (elected as master)
[2014-10-09 09:43:18,755][INFO ][http                     ] [foo-instance] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[ec2-host-2.compute.amazonaws.com/10.0.4.4:9200]}

@bleskes
Copy link
Contributor

bleskes commented Oct 10, 2014

@nkvoll this is not intended. The backwards compatibility check that @s1monw mentioned does exist but it doesn't manifest itself in warn level logs.

Can you tell me some more about the cluster layout? I see aws related logs. I also see the zoo keeper plugin?

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

@bleskes what I'm attempting is a simple cluster upgrade, adding a 1.4.0.Beta1 instance to a cluster consisting of a single instance running 1.3.4. They discover each-other via a custom UnicastHostProvider, which provides the host/port of the other instances in the cluster.

The 1.3.4 instance only has the following to say (discovery is logged down to the trace level):

[2014-10-10 12:03:14,671][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:14,703][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [2] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:16,124][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:16,132][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [5] and action [internal:discovery/zen/unicast], resetting
[2014-10-10 12:03:17,619][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:03:17,626][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [8] and action [internal:discovery/zen/unicast], resetting

While the 1.4.0.Beta1-instance logs the following three times before forming its own cluster (which is permitted since minimum_master_nodes: 1 for both instances -- but shouldn't really be happening)

[2014-10-10 12:03:14,601][TRACE][discovery.zen            ] [instance-0000000001] starting to ping
[2014-10-10 12:03:14,604][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,606][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,624][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,625][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], using >=1.4.0 serialization
[2014-10-10 12:03:14,648][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,649][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:03:14,658][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}, ping_response{node [[instance-0000000001][lEeCD6d4QjyETwJLV3ao5Q][ec2-host-2.compute.amazonaws.com][inet[/10.82.2.158:19182]]{region=eu-west-1, availability_zone=eu-west-1a, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[538289781bfa9cb13184890f9f23584b]}]
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]], falling back to <=1.4.0 ping action
[2014-10-10 12:03:14,700][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]
[2014-10-10 12:03:14,707][WARN ][discovery.zen.ping.unicast] [instance-0000000001] failed to send ping to [[#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.74.20.146:19775]]]
org.elasticsearch.transport.RemoteTransportException: [instance-0000000000][inet[/10.74.20.146:19775]][internal:discovery/zen/unicast]
Caused by: org.elasticsearch.transport.ActionNotFoundTransportException: No handler for action [internal:discovery/zen/unicast]
    at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:210)
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:111)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

The zookeeper plugin handles the UnicastHostProvider-part, but does not currently include any versioning information, which is why it uses Version.CURRENT, which I immediately thought could be the culprit -- but if I understand it correctly, it shouldn't actually matter.

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

If I simply change the DiscoveryNode instantiation in our code from using Version.CURRENT to using Version.V_1_3_0 it works, with the following in the 1.3.4 instance logs:

[2014-10-10 12:23:35,247][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [0] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:36,679][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [4] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,186][WARN ][transport.netty          ] [instance-0000000000] Message not fully read (request) for [7] and action [internal:discovery/zen/unicast_gte_1_4], resetting
[2014-10-10 12:23:38,243][INFO ][cluster.service          ] [instance-0000000000] added {[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(join from node[[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}])

and the following from the 1.4.0.Beta1-instance:

[2014-10-10 12:23:35,152][TRACE][discovery.zen            ] [instance-0000000001] starting to ping
[2014-10-10 12:23:35,155][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,157][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,168][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,199][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,200][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:35,220][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,221][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:35,232][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[2], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:35,272][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:35,274][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:35,281][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,669][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,670][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:36,673][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,674][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:36,676][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[4], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:36,681][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:36,682][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:36,685][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,174][TRACE][discovery.zen.ping.unicast] [instance-0000000001] replacing [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]] with temp node [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connecting (light) to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,176][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,179][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] connected to [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,180][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], using >=1.4.0 serialization
[2014-10-10 12:23:38,183][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[ec2-host-2.compute.amazonaws.com/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[1], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[3], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[5], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[6], master [null], hasJoinedOnce [false], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,187][TRACE][discovery.zen.ping.unicast] [instance-0000000001] failed to ping [#zfd_0#][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]], falling back to <=1.4.0 ping action
[2014-10-10 12:23:38,188][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] sending to [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,192][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] received response from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]: [ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000001][ifebecHgTh6lPu8IPgDMGA][ec2-host-2.compute.amazonaws.com][inet[/10.74.20.146:19861]]{region=eu-west-1, availability_zone=eu-west-1b, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [null], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}, ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}]
[2014-10-10 12:23:38,193][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_1_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,196][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_3_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,197][TRACE][discovery.zen.ping.unicast] [instance-0000000001] [1] disconnecting from [#zen_unicast_2_#zfd_0##][ec2-host-2.compute.amazonaws.com][inet[ec2-host-1.compute.amazonaws.com/10.106.7.243:19224]]
[2014-10-10 12:23:38,199][TRACE][discovery.zen            ] [instance-0000000001] full ping responses:
    --> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,200][DEBUG][discovery.zen            ] [instance-0000000001] filtered ping responses: (filter_client[true], filter_data[false])
    --> ping_response{node [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], id[-1], master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], hasJoinedOnce [null], cluster_name[414938d3d78690f02ac6a9df4356b6c6]}
[2014-10-10 12:23:38,201][TRACE][discovery.zen            ] [instance-0000000001] ignoring joined once flags in ping responses, minimum ping version [1.3.4]
[2014-10-10 12:23:38,208][TRACE][discovery.zen            ] [instance-0000000001] joining master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}
[2014-10-10 12:23:38,248][DEBUG][discovery.zen.publish    ] [instance-0000000001] received cluster state version 10
[2014-10-10 12:23:38,250][DEBUG][discovery.zen.fd         ] [instance-0000000001] [master] restarting fault detection against master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}], reason [new cluster state received and we are monitoring the wrong master [null]]
[2014-10-10 12:23:38,251][DEBUG][discovery.zen            ] [instance-0000000001] got first state from fresh master [o5ulTkS4TV-BDs_NbkReqQ]
[2014-10-10 12:23:38,252][TRACE][discovery.zen            ] [instance-0000000001] updated cluster join cluster to [1]
[2014-10-10 12:23:38,254][INFO ][cluster.service          ] [instance-0000000001] detected_master [instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}, added {[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0},}, reason: zen-disco-receive(from master [[instance-0000000000][o5ulTkS4TV-BDs_NbkReqQ][ec2-host-1.compute.amazonaws.com][inet[/10.106.7.243:19224]]{region=eu-west-1, availability_zone=eu-west-1c, max_local_storage_nodes=1, logical_availability_zone=zone-0}])

The notable difference being that there were no exceptions logged when instantiating the DiscoveryNode using a lower-bound Version, and that the 1.4 instance was able to join the 1.3 cluster.

Does this possibly mean that there might be an issue with providing DiscoveryNodes using a UnicastHostsProvider when the target instance version is actually unknown?

@s1monw
Copy link
Contributor

s1monw commented Oct 10, 2014

oh I see - sorry for the confusion about the BackwardsCompatibility endpoint. @nkvoll for the case when you don't know the discovery nodes version you can just use Version. CURRENT.minimumCompatibilityVersion() which returns the minimal compatible version. This should work just fine in that case.

@nkvoll
Copy link
Member Author

nkvoll commented Oct 10, 2014

Ok @s1monw, I'll verify that it works for us either this weekend, or at latest, on Monday. Since we have a workaround already, I'm fine with closing this issue now if you wish to do so, and I leave it entirely up to you whether this behaviour needs to be noted somewhere else as a help for other developers that have latched onto the UnicastHostsProvider-interface and are instantiating DiscoveryNodes the same way we did.

Thanks for the feedback and for letting me know about the minimumCompatibilityVersion method :)

@s1monw
Copy link
Contributor

s1monw commented Oct 11, 2014

I will add some javadocs to the ctors to clarify! Thanks for raising this issue

s1monw added a commit to s1monw/elasticsearch that referenced this issue Oct 13, 2014
Today it's easy to use the wrong version when initializing DiscoveryNode
instances. This commit adds javadocs and a utility constant to
initialize DiscoveryNode instances if the the remotes node version is
unknown.

Closes elastic#8051
s1monw added a commit that referenced this issue Oct 13, 2014
Today it's easy to use the wrong version when initializing DiscoveryNode
instances. This commit adds javadocs and a utility constant to
initialize DiscoveryNode instances if the the remotes node version is
unknown.

Closes #8051
s1monw added a commit that referenced this issue Oct 13, 2014
Today it's easy to use the wrong version when initializing DiscoveryNode
instances. This commit adds javadocs and a utility constant to
initialize DiscoveryNode instances if the the remotes node version is
unknown.

Closes #8051
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Today it's easy to use the wrong version when initializing DiscoveryNode
instances. This commit adds javadocs and a utility constant to
initialize DiscoveryNode instances if the the remotes node version is
unknown.

Closes elastic#8051
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants