Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicastZenPing should also ping last known discoNodes #7336

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Aug 19, 2014

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Note: this is agains the feature/improve_zen branch

At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.
@@ -120,6 +131,12 @@ public DiscoveryNode electMaster(Iterable<DiscoveryNode> nodes) {

@Override
public int compare(DiscoveryNode o1, DiscoveryNode o2) {
if (o1.masterNode() && !o2.masterNode()) {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return 1 and the other if statement return -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait this code is ok, we want nodes with master eligible nodes to be in the beginning of the list and therefor it should be deemed smaller (return -1) then a node with that isn't master.

Maybe add a unit test for the elect master logic :)

@martijnvg
Copy link
Member

Left one comment regarding unit test for elect logic, other than that LGTM.

@bleskes
Copy link
Contributor Author

bleskes commented Aug 19, 2014

@martijnvg added unit tests.

@martijnvg
Copy link
Member

@bleskes LGTM

@kimchy
Copy link
Member

kimchy commented Aug 20, 2014

lgtm

bleskes added a commit that referenced this pull request Aug 20, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
@bleskes bleskes closed this Aug 20, 2014
@bleskes bleskes deleted the unicast_ping_discovery_nodes branch August 20, 2014 13:50
bleskes added a commit that referenced this pull request Aug 22, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
bleskes added a commit that referenced this pull request Aug 23, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
@jpountz jpountz removed the review label Aug 26, 2014
bleskes added a commit that referenced this pull request Aug 27, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes elastic#7336
@dadoonet
Copy link
Member

dadoonet commented Sep 5, 2014

@bleskes Should we label this issue with 1.4.0 and 2.0.0?

dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.
dadoonet added a commit to elastic/elasticsearch-cloud-aws that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change Ec2Discovery constructor.

Closes #115.
dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.
dadoonet added a commit to elastic/elasticsearch-cloud-azure that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change AzureDiscovery constructor.

Closes #34.
dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.
dadoonet added a commit to elastic/elasticsearch-cloud-gce that referenced this pull request Sep 5, 2014
Introduced in elastic/elasticsearch#7336 (elasticsearch 1.4 and 2.0), we need to change GceDiscovery constructor.

Closes #35.
bleskes added a commit that referenced this pull request Sep 8, 2014
At the moment, when a node looses connection to the master (due to a partition or the master was stopped), we ping the unicast hosts in order to discover other nodes and elect a new master or get of another master than has been elected in the mean time. This can go wrong if all unicast targets are on the same side of a minority partition and therefore will never rejoin once the partition is healed.

Closes #7336
@clintongormley clintongormley changed the title [Discovery] UnicastZenPing should also ping last known discoNodes Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes Sep 11, 2014
@clintongormley clintongormley added the :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure label Jun 7, 2015
@clintongormley clintongormley changed the title Resiliency: Discovery - UnicastZenPing should also ping last known discoNodes UnicastZenPing should also ping last known discoNodes Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >enhancement resiliency v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants