Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

During discovery, remove any local state and use clusterService.state instead #7834

Closed
wants to merge 19 commits into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Sep 23, 2014

At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication.

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Note - this is PR is against the 1.x branch.

bleskes added a commit that referenced this pull request Sep 26, 2014
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes #7834
bleskes added a commit that referenced this pull request Sep 26, 2014
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes #7834
@bleskes bleskes closed this in db54e9c Sep 26, 2014
@bleskes bleskes deleted the feature/zen_direct_on_cluster_state branch September 26, 2014 09:32
@bleskes bleskes restored the feature/zen_direct_on_cluster_state branch September 26, 2014 09:32
@clintongormley clintongormley changed the title Discovery: remove any local state and use clusterService.state instead Resiliency: During discovery, remove any local state and use clusterService.state instead Sep 26, 2014
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 29, 2014
…pon error

With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

 This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.
bleskes added a commit that referenced this pull request Sep 29, 2014
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908
bleskes added a commit that referenced this pull request Sep 29, 2014
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908
bleskes added a commit that referenced this pull request Sep 29, 2014
…pon error

With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes #7908
@jpountz jpountz removed the review label Oct 21, 2014
@clintongormley clintongormley added the :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure label Jun 7, 2015
@clintongormley clintongormley changed the title Resiliency: During discovery, remove any local state and use clusterService.state instead During discovery, remove any local state and use clusterService.state instead Jun 7, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Closes elastic#7834
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
…pon error

With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master  has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node.

This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

Closes elastic#7908
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jan 29, 2016
…m cluster service

Long ago (elastic#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).
bleskes added a commit that referenced this pull request Feb 1, 2016
…m cluster service

Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

Closes #16317
bleskes added a commit that referenced this pull request Feb 2, 2016
…m cluster service

Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).

This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

Closes #16317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >enhancement resiliency v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants