New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a best effort waiting for ongoing recoveries to cancel on close #6741
Conversation
…n close Currently one can close the engine while there are still on going recoveries. This is not a problem because the engine is close in tandem with the shard it belongs to, which in turn cancels the recoveries. This does cause some issues in our tests as we check the no resources were left behind after an index was deleted, which trips if the recoveries are not canceled.
// engine is closed) | ||
if (onGoingRecoveries.get() > 0) { | ||
logger.trace("best effort waiting for current [{}] ongoing recoveries to finish before closing the engine", onGoingRecoveries.get()); | ||
long waitUntil = System.currentTimeMillis() + 30000; // wait for 30s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 30s should be reduced to 10s and be configurable. This may be called during cluster state publishing which times out after 30s by default.
I don't understand this PR - why do we need to wait on the recovery and which resources are you talking about here? |
push to 1.4 pending more discussion |
@bleskes I think we should have a dedicated API for this that our tests can all before the index is deleted. I think it can be useful even in production when you want to wait until everything is stable. We might be able to extend / leverage the ClusterHealth API to do this - it might be already capable of doing this? |
@s1monw the problem is that the master may think things are done but the node have not yet completed acting on it. We can add something that checks all the nodes, but it feels like an over kill. I think we should just close this PR until we find a better solution. Agreed? |
agreed :) |
Currently one can close the engine while there are still on going recoveries. This is not a problem because the engine is close in tandem with the shard it belongs to, which in turn cancels the recoveries. This does cause some issues in our tests as we check the no resources were left behind after an index was deleted, which trips if the recoveries are not canceled.