UnicastZenPing should also ping last known discoNodes #7336

bleskes · 2014-08-19T12:17:38Z

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Note: this is agains the feature/improve_zen branch

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

martijnvg · 2014-08-19T12:23:22Z

src/main/java/org/elasticsearch/discovery/zen/elect/ElectMasterService.java

@@ -120,6 +131,12 @@ public DiscoveryNode electMaster(Iterable<DiscoveryNode> nodes) {

        @Override
        public int compare(DiscoveryNode o1, DiscoveryNode o2) {
+            if (o1.masterNode() && !o2.masterNode()) {
+                return -1;


Shouldn't this return 1 and the other if statement return -1?

Wait this code is ok, we want nodes with master eligible nodes to be in the beginning of the list and therefor it should be deemed smaller (return -1) then a node with that isn't master.

Maybe add a unit test for the elect master logic :)

martijnvg · 2014-08-19T12:54:47Z

Left one comment regarding unit test for elect logic, other than that LGTM.

bleskes · 2014-08-19T13:11:07Z

@martijnvg added unit tests.

martijnvg · 2014-08-19T19:09:48Z

@bleskes LGTM

kimchy · 2014-08-20T03:29:50Z

lgtm

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed. Closes #7336

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed. Closes elastic#7336

dadoonet · 2014-09-05T14:18:30Z

@bleskes Should we label this issue with 1.4.0 and 2.0.0?

Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor. Closes #115.

Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor. Closes #34.

Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor. Closes #35.

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed. Closes #7336

bleskes added resiliency labels Aug 19, 2014

martijnvg reviewed Aug 19, 2014
View reviewed changes

Add a unit test for ElectMasterService

655d783

bleskes closed this Aug 20, 2014

bleskes deleted the unicast_ping_discovery_nodes branch August 20, 2014 13:50

jpountz removed the review label Aug 26, 2014

dadoonet mentioned this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance elastic/elasticsearch-cloud-aws#115

Closed

dadoonet mentioned this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance elastic/elasticsearch-cloud-azure#34

Closed

dadoonet mentioned this pull request Sep 5, 2014

ZenDiscovery constructor needs ElectMasterService instance elastic/elasticsearch-cloud-gce#35

Closed

bleskes added the v1.4.0.Beta1 label Sep 8, 2014

bleskes added the v2.0.0-beta1 label Sep 8, 2014

clintongormley added the >enhancement label Sep 11, 2014

clintongormley changed the title ~~[Discovery] UnicastZenPing should also ping last known discoNodes~~ Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes Sep 11, 2014

clintongormley added the :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure label Jun 7, 2015

clintongormley changed the title ~~Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes~~ UnicastZenPing should also ping last known discoNodes Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicastZenPing should also ping last known discoNodes #7336

UnicastZenPing should also ping last known discoNodes #7336

bleskes commented Aug 19, 2014

martijnvg Aug 19, 2014

martijnvg Aug 19, 2014

martijnvg commented Aug 19, 2014

bleskes commented Aug 19, 2014

martijnvg commented Aug 19, 2014

kimchy commented Aug 20, 2014

dadoonet commented Sep 5, 2014

UnicastZenPing should also ping last known discoNodes #7336

UnicastZenPing should also ping last known discoNodes #7336

Conversation

bleskes commented Aug 19, 2014

martijnvg Aug 19, 2014

Choose a reason for hiding this comment

martijnvg Aug 19, 2014

Choose a reason for hiding this comment

martijnvg commented Aug 19, 2014

bleskes commented Aug 19, 2014

martijnvg commented Aug 19, 2014

kimchy commented Aug 20, 2014

dadoonet commented Sep 5, 2014