A new ClusterStateStatus to indicate cluster state life cycles #5741

bleskes · 2014-04-09T06:48:09Z

When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again).

If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action taken (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or timeout.

This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently RECEIVED, BEING_APPLIED & APPLIED). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive.

This commit also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls.

When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again). If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action take (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or fail. This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently `RECEIVED`, `BEING_APPLIED` & `APPLIED`). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive. This commits also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls.

kimchy · 2014-04-09T10:03:38Z

src/main/java/org/elasticsearch/cluster/ClusterStateObserver.java

+ */
+public class ClusterStateObserver {
+
+    protected final ESLogger logger = Loggers.getLogger(getClass());


the logger here should either be static, or passed in the constructor (better, since it will have much more info)

logger is now passed in the constructor.

s1monw · 2014-04-09T10:55:15Z

I like this in general - I left some comments

uboness · 2014-04-09T11:26:31Z

src/main/java/org/elasticsearch/action/support/master/TransportMasterNodeOperationAction.java

-                    });
+                    observer.waitForNextChange(new ClusterStateObserver.Listener() {
+                                                   @Override
+                                                   public void onNewClusterState(ClusterState state) {


formatting (indentation)

…ed variables.

bleskes · 2014-04-11T09:39:00Z

Thx all. Left some comments and pushed an update based on the feedback.

s1monw · 2014-04-11T15:50:28Z

this LGTM I think we should let our CI build chew on it a bit - boaz do you wanna open a enhancement/ branch?

bleskes · 2014-04-12T14:10:19Z

Thx. pushed this to enhance/cluster_state_status.

When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again). If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action take (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or fail. This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently `RECEIVED`, `BEING_APPLIED` & `APPLIED`). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive. This commit also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls. Closes #5741

bleskes · 2014-04-22T08:19:45Z

push. thx for the reviews.

When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again). If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action take (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or fail. This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently `RECEIVED`, `BEING_APPLIED` & `APPLIED`). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive. This commit also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls. Closes elastic#5741

kimchy reviewed Apr 9, 2014
View reviewed changes

uboness reviewed Apr 9, 2014
View reviewed changes

bleskes added 2 commits April 11, 2014 10:27

Feedback round + some cleanup

5c575ca

Added ObservingContext to unify access to a couple of observing relat…

5434345

…ed variables.

bleskes added 2 commits April 11, 2014 14:21

minor tweak to make the code more readable

47ca2a5

Bundeled lastObservedState and lastObservedStatus into a single class

c6a96d9

bleskes closed this in 1434f6b Apr 22, 2014

bleskes added v1.2.0 labels Apr 22, 2014

clintongormley added the :Cluster label Jun 7, 2015

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new ClusterStateStatus to indicate cluster state life cycles #5741

A new ClusterStateStatus to indicate cluster state life cycles #5741

bleskes commented Apr 9, 2014

kimchy Apr 9, 2014

bleskes Apr 11, 2014

s1monw commented Apr 9, 2014

uboness Apr 9, 2014

bleskes commented Apr 11, 2014

s1monw commented Apr 11, 2014

bleskes commented Apr 12, 2014

bleskes commented Apr 22, 2014

A new ClusterStateStatus to indicate cluster state life cycles #5741

A new ClusterStateStatus to indicate cluster state life cycles #5741

Conversation

bleskes commented Apr 9, 2014

kimchy Apr 9, 2014

Choose a reason for hiding this comment

bleskes Apr 11, 2014

Choose a reason for hiding this comment

s1monw commented Apr 9, 2014

uboness Apr 9, 2014

Choose a reason for hiding this comment

bleskes commented Apr 11, 2014

s1monw commented Apr 11, 2014

bleskes commented Apr 12, 2014

bleskes commented Apr 22, 2014