Add a best effort waiting for ongoing recoveries to cancel on close #6741

bleskes · 2014-07-04T19:58:32Z

Currently one can close the engine while there are still on going recoveries. This is not a problem because the engine is close in tandem with the shard it belongs to, which in turn cancels the recoveries. This does cause some issues in our tests as we check the no resources were left behind after an index was deleted, which trips if the recoveries are not canceled.

…n close Currently one can close the engine while there are still on going recoveries. This is not a problem because the engine is close in tandem with the shard it belongs to, which in turn cancels the recoveries. This does cause some issues in our tests as we check the no resources were left behind after an index was deleted, which trips if the recoveries are not canceled.

bleskes · 2014-07-06T08:01:47Z

src/main/java/org/elasticsearch/index/engine/internal/InternalEngine.java

+        //  engine is closed)
+        if (onGoingRecoveries.get() > 0) {
+            logger.trace("best effort waiting for current [{}] ongoing recoveries to finish before closing the engine", onGoingRecoveries.get());
+            long waitUntil = System.currentTimeMillis() + 30000; // wait for 30s


the 30s should be reduced to 10s and be configurable. This may be called during cluster state publishing which times out after 30s by default.

s1monw · 2014-07-09T10:25:45Z

I don't understand this PR - why do we need to wait on the recovery and which resources are you talking about here?

bleskes · 2014-07-09T12:47:25Z

push to 1.4 pending more discussion

s1monw · 2014-09-08T15:43:40Z

@bleskes I think we should have a dedicated API for this that our tests can all before the index is deleted. I think it can be useful even in production when you want to wait until everything is stable. We might be able to extend / leverage the ClusterHealth API to do this - it might be already capable of doing this?

bleskes · 2014-09-08T20:02:27Z

@s1monw the problem is that the master may think things are done but the node have not yet completed acting on it. We can add something that checks all the nodes, but it feels like an over kill.

I think we should just close this PR until we find a better solution. Agreed?

s1monw · 2014-09-08T20:09:55Z

agreed :)

bleskes added 2 commits July 4, 2014 21:13

failing test

8e0a43d

bleskes added review labels Jul 4, 2014

bleskes reviewed Jul 6, 2014
View reviewed changes

bleskes added v1.4.0 and removed v1.3.0 labels Jul 9, 2014

s1monw removed the review label Jul 10, 2014

clintongormley assigned bleskes Aug 7, 2014

clintongormley changed the title ~~[Engine] add a best effort waiting for ongoing recoveries to cancel on close~~ Internal: Add a best effort waiting for ongoing recoveries to cancel on close Sep 8, 2014

bleskes closed this Sep 8, 2014

clintongormley added the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jun 7, 2015

clintongormley changed the title ~~Internal: Add a best effort waiting for ongoing recoveries to cancel on close~~ Add a best effort waiting for ongoing recoveries to cancel on close Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a best effort waiting for ongoing recoveries to cancel on close #6741

Add a best effort waiting for ongoing recoveries to cancel on close #6741

bleskes commented Jul 4, 2014

bleskes Jul 6, 2014

s1monw commented Jul 9, 2014

bleskes commented Jul 9, 2014

s1monw commented Sep 8, 2014

bleskes commented Sep 8, 2014

s1monw commented Sep 8, 2014

Add a best effort waiting for ongoing recoveries to cancel on close #6741

Add a best effort waiting for ongoing recoveries to cancel on close #6741

Conversation

bleskes commented Jul 4, 2014

bleskes Jul 6, 2014

Choose a reason for hiding this comment

s1monw commented Jul 9, 2014

bleskes commented Jul 9, 2014

s1monw commented Sep 8, 2014

bleskes commented Sep 8, 2014

s1monw commented Sep 8, 2014