Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner interrupt handling during cancellation #9000

Closed
wants to merge 2 commits into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Dec 18, 2014

RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded.

RecoverySource already had a handy utility called CancelableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.

This caused the failure in http://build-us-00.elasticsearch.org/job/es_core_master_strong/1755/

RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded.

RecoverySource already had a handy utility called CancelableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.
@@ -180,6 +168,7 @@ public void fail(RecoveryFailedException e, boolean sendShardFailure) {
} finally {
// release the initial reference. recovery files will be cleaned as soon as ref count goes to zero, potentially now
decRef();
cancelableThreads.cancel("failed recovery [" + e.getMessage() + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the ref should be decremented before canceling the threads? It seems like it should be the other way around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel doesn't really do much (it interrupts the threads) so you have no guarantee those will wake up before decRef() is called. They should hold their own reference. This way it just saves on a try finally within a try finally. Now that I think about it , the same argument holds for the decRef() . I'll change...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, it feels more natural to have the decRef() always be the last thing, even if it requires another nested try/finally

@dakrone
Copy link
Member

dakrone commented Dec 18, 2014

@bleskes left some comments. Also, how do you feel about changing this to be CancelableProcs instead of CancelableThreads? I think it would be good to get away from the notion that CancelableThreads spawns any kind of threads itself? If not, no worries, I am just concerned that people will see "Threads" and a .run() method together and mentally make the connection that this is like subclassing Thread (instead of dealing with the current thread only)

@bleskes
Copy link
Contributor Author

bleskes commented Dec 18, 2014

@dakrone that's a valid point. The point of the class is to deal with blocking threads and it does capture it Maybe call run method execute?

@dakrone
Copy link
Member

dakrone commented Dec 18, 2014

execute sounds better to me

@clintongormley
Copy link

Note: Cancelable -> Cancellable

@bleskes
Copy link
Contributor Author

bleskes commented Dec 18, 2014

@dakrone @clintongormley @mikemccand - I pushed an update

@mikemccand
Copy link
Contributor

LGTM

logger.debug("interrupting recovery thread on canceled recovery");
thread.interrupt();
try {
logger.debug("recovery canceled (reason: [{}])", reason);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canceled -> cancelled

@dakrone
Copy link
Member

dakrone commented Dec 19, 2014

LGTM, left spelling comments

@bleskes bleskes closed this in 635ae29 Dec 19, 2014
@bleskes bleskes deleted the recover_cancelable_threads branch December 19, 2014 09:41
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Dec 19, 2014
RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded.

RecoverySource already had a handy utility called CancellableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.

Closes elastic#9000
@clintongormley clintongormley added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed review labels Mar 19, 2015
@clintongormley clintongormley changed the title Recovery: cleaner interrupt handling during cancellation Cleaner interrupt handling during cancellation Jun 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants