New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaner interrupt handling during cancellation #9000
Conversation
RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded. RecoverySource already had a handy utility called CancelableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.
@@ -180,6 +168,7 @@ public void fail(RecoveryFailedException e, boolean sendShardFailure) { | |||
} finally { | |||
// release the initial reference. recovery files will be cleaned as soon as ref count goes to zero, potentially now | |||
decRef(); | |||
cancelableThreads.cancel("failed recovery [" + e.getMessage() + "]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure the ref should be decremented before canceling the threads? It seems like it should be the other way around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancel doesn't really do much (it interrupts the threads) so you have no guarantee those will wake up before decRef() is called. They should hold their own reference. This way it just saves on a try finally within a try finally. Now that I think about it , the same argument holds for the decRef() . I'll change...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, it feels more natural to have the decRef()
always be the last thing, even if it requires another nested try/finally
@bleskes left some comments. Also, how do you feel about changing this to be |
@dakrone that's a valid point. The point of the class is to deal with blocking threads and it does capture it Maybe call run method execute? |
execute sounds better to me |
Note: Cancelable -> Cancellable |
@dakrone @clintongormley @mikemccand - I pushed an update |
LGTM |
logger.debug("interrupting recovery thread on canceled recovery"); | ||
thread.interrupt(); | ||
try { | ||
logger.debug("recovery canceled (reason: [{}])", reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canceled -> cancelled
LGTM, left spelling comments |
RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded. RecoverySource already had a handy utility called CancellableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well. Closes elastic#9000
RecoveryTarget initiates the recovery by sending a start recovery request to the source node and then waits for the recovery to complete. During recovery cancellation, we interrupt the thread so it will wake up and clean the recovery. Depending on timing, this can leave an unneeded interrupted thread status causing future IO commands to fail unneeded.
RecoverySource already had a handy utility called CancelableThreads. This extracts it to a top level class, and uses it in RecoveryTarget as well.
This caused the failure in http://build-us-00.elasticsearch.org/job/es_core_master_strong/1755/