Publishing timeout to log at WARN and indicate pending nodes #9551

bleskes · 2015-02-03T21:27:03Z

When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timely fashion

We also remove the ClusterStatePublishResponseHandler interface because it is an unneeded abstraction, at least for now.

When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timefly fashion

martijnvg · 2015-02-04T10:43:49Z

LGTM

s1monw · 2015-02-04T11:40:33Z

src/test/java/org/elasticsearch/discovery/BlockingClusterStatePublishResponseHandlerTests.java

+        }
+        // wait on the threads to finish
+        for (Thread t : threads) {
+            t.join(30000);


can this be a join without a timeout?

s1monw · 2015-02-04T11:50:08Z

LGTM in general left some cosmetic comments

When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timefly fashion Closes #9551

When the master publishes a new cluster state it waits (by default) for up to 30s for all nodes to respond. If not it continues to process other pending tasks. At the moment, this timeout is logged under DEBUG but it typically represent a serious issue with one or more of the nodes. We should log it in WARN and give the nodes that failed to respond in a timefly fashion Closes elastic#9551

bleskes added review resiliency :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure v1.5.0 v2.0.0-beta1 v1.4.3 labels Feb 3, 2015

s1monw reviewed Feb 4, 2015
View reviewed changes

feedback

df89079

bleskes closed this in 7beaaae Feb 4, 2015

bleskes deleted the publish_timeout_logging branch February 4, 2015 15:40

clintongormley added the >enhancement label Feb 10, 2015

clintongormley removed the review label Mar 19, 2015

clintongormley changed the title ~~Discovery: publishing timeout to log at WARN and indicate pending nodes~~ Publishing timeout to log at WARN and indicate pending nodes Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publishing timeout to log at WARN and indicate pending nodes #9551

Publishing timeout to log at WARN and indicate pending nodes #9551

bleskes commented Feb 3, 2015

martijnvg commented Feb 4, 2015

s1monw Feb 4, 2015

bleskes Feb 4, 2015

s1monw commented Feb 4, 2015

Publishing timeout to log at WARN and indicate pending nodes #9551

Publishing timeout to log at WARN and indicate pending nodes #9551

Conversation

bleskes commented Feb 3, 2015

martijnvg commented Feb 4, 2015

s1monw Feb 4, 2015

Choose a reason for hiding this comment

bleskes Feb 4, 2015

Choose a reason for hiding this comment

s1monw commented Feb 4, 2015