New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock problems when API flush and finish recovery happens concurrently #9648
Conversation
NOTE: released code is not affected since we added the flush at the end of recovery only in 1.5 |
} | ||
if (waitIfOngoing == false) { | ||
if (flushLock.tryLock() == false) { | ||
flushing.decrementAndGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to throw an exception here, just as we do above with the currentFlushing check.
Left some small comments o.w. looks good. |
@bleskes I simplified the exception logic a bit and removed the flush counter. |
if (flushLock.tryLock() == false) { | ||
// if we can't get the lock right away we block if needed otherwise barf | ||
if (waitIfOngoing) { | ||
flushLock.lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we want a trace message here...
LGTM. Left one minor comment. |
@bleskes added some more traces there :) |
…pens concurrently Unfortunately the lock order is important in the current flush codehe. We have to acquire the readlock fist otherwise if we are flushing at the end of the recovery while holding the write lock we can deadlock if: * Thread 1: flushes via API and gets the flush lock but blocks on the readlock since Thread 2 has the writeLock * Thread 2: flushes at the end of the recovery holding the writeLock and blocks on the flushLock owned by Thread 2 This commit acquires the read lock first which would be done further down anyway for the time of the flush. As a sideeffect we can now safely flush on calling close() while holding the writeLock.
16219ee
to
0b0cd1c
Compare
…nish recovery happens concurrently Issue #9648 fixes a potential deadlock between two concurrent flushes - one at the end of recovery and one through the API or background flush. This back ports the logic to 1.4 . It is slightly more contrived as we still use the write lock in the flush code. If we feel we have some concerns about this approach we can also move the recovery flush to happen on a generic thread. Closes #9942
… and finish recovery happens concurrently Issue elastic#9648 fixes a potential deadlock between two concurrent flushes - one at the end of recovery and one through the API or background flush. This back ports the logic to 1.4 . It is slightly more contrived as we still use the write lock in the flush code. If we feel we have some concerns about this approach we can also move the recovery flush to happen on a generic thread. Closes elastic#9942
Unfortunately the lock order is important in the current flush code. We have to acquire the readlock fist otherwise
if we are flushing at the end of the recovery while holding the write lock we can deadlock if:
This commit acquires the read lock first which would be done further down anyway for the time of the flush.
As a sideeffect we can now safely flush on calling close() while holding the writeLock.