During discovery, remove any local state and use clusterService.state instead #7834

bleskes · 2014-09-23T11:55:38Z

At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication.

The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start.

Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it.

Note - this is PR is against the 1.x branch.

…top when a cluster state comes from a new master

…inThread

…hile we elected our self, we join it.

…odeFD

At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start. Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it. Closes #7834

…pon error With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node. This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress.

…pon error With #7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node. This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress. Closes #7908

At the moment, ZenDiscovery contains a local copy of the disco nodes plus a flag that indicates whether the local node is master or not. This is redundant as the same information is stored in the cluster state. Have duplicate copy can lead to unneeded concurrency issues. This PR removes the duplication, including moving the ownership of the localNode creation to ClusterState The PR introduces a tighter control of the background joining thread to make sure it is started and stopped together with any cluster state changes. This solves potentially concurrency bugs where a joining thread may fail to start. Last we add a couple of safety checks to make sure that if a nodes receives a cluster state from a new master while actively trying to join another one (or electing itself) we go back to pinging to actively join it. Closes elastic#7834

…pon error With elastic#7834, we simplified ZenDiscovery by making it use the current cluster state for all it's decision. This had the side effect a node may start it's Master FD before the master has fully processed that cluster state update that adds that node (or elects the master master). This is due to the fact that master FD is started when a node receives a cluster state from the master but the master it self may still be publishing to other node. This commit makes sure that a master FD ping is only failed once we know that there is no current cluster state update in progress. Closes elastic#7908

…m cluster service Long ago (elastic#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids). This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).

…m cluster service Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids). This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes). Closes #16317

bleskes added 11 commits September 18, 2014 23:21

wip

57def40

draing tasks

0eadf7a

use currentJoinThread to see if we should stop or not and use it to s…

06591c9

…top when a cluster state comes from a new master

missing currentJoinThread.get()

8ba5b3e

remove innerJoinCluster#retry flag and listen explicitly to currentJo…

3deba60

…inThread

introduce JoinThreadControl

3ffefdd

check if master before starting nodesFD

2575300

move localNode ownership from ZenDiscovery to ClusterService

377ca63

Added java docs + making sure we if we get a CS from another master w…

ee9a1b9

…hile we elected our self, we join it.

added a check for a master change while joining another master...

fee899d

no commit -> TODO

6b6120a

bleskes added review v1.4.0.Beta1 v2.0.0-beta1 resiliency labels Sep 23, 2014

bleskes added 8 commits September 23, 2014 15:44

allows to start the fault detection while updating it's nodes

426f380

change nodesFD.updateNodes to accept a cluster state

a46838f

fix and issue with sending leave request when there is no master.

3f93841

nodes() != nodes and better protection about node leave from non masters

c912625

nodesFD - don't update latestNodes if not running

ce402e4

Test: testNodesFaultDetectionConnectOnDisconnect - set localNode on n…

31ba9e6

…odeFD

added two more comments

f85ad50

NodesFD - simplify concurrency control to fully rely on a single map

8b0003a

bleskes closed this in db54e9c Sep 26, 2014

bleskes deleted the feature/zen_direct_on_cluster_state branch September 26, 2014 09:32

bleskes restored the feature/zen_direct_on_cluster_state branch September 26, 2014 09:32

martijnvg mentioned this pull request Sep 26, 2014

Update ZenDiscovery fields via the cluster service update task. #7790

Closed

clintongormley added the >enhancement label Sep 26, 2014

clintongormley changed the title ~~Discovery: remove any local state and use clusterService.state instead~~ Resiliency: During discovery, remove any local state and use clusterService.state instead Sep 26, 2014

bleskes mentioned this pull request Sep 29, 2014

During discovery, master fault detection should fall back to cluster state thread upon error #7908

Closed

jpountz removed the review label Oct 21, 2014

clintongormley added the :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure label Jun 7, 2015

clintongormley changed the title ~~Resiliency: During discovery, remove any local state and use clusterService.state instead~~ During discovery, remove any local state and use clusterService.state instead Jun 7, 2015

bleskes mentioned this pull request Jan 29, 2016

Local Discovery - don't create a local DiscoNode, but use the one from cluster service #16317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During discovery, remove any local state and use clusterService.state instead #7834

During discovery, remove any local state and use clusterService.state instead #7834

bleskes commented Sep 23, 2014

During discovery, remove any local state and use clusterService.state instead #7834

During discovery, remove any local state and use clusterService.state instead #7834

Conversation

bleskes commented Sep 23, 2014