Fix deadlock problems when API flush and finish recovery happens concurrently #9648

s1monw · 2015-02-11T11:24:32Z

Unfortunately the lock order is important in the current flush code. We have to acquire the readlock fist otherwise
if we are flushing at the end of the recovery while holding the write lock we can deadlock if:

Thread 1: flushes via API and gets the flush lock but blocks on the readlock since Thread 2 has the writeLock
Thread 2: flushes at the end of the recovery holding the writeLock and blocks on the flushLock owned by Thread 2

This commit acquires the read lock first which would be done further down anyway for the time of the flush.
As a sideeffect we can now safely flush on calling close() while holding the writeLock.

s1monw · 2015-02-11T11:25:25Z

NOTE: released code is not affected since we added the flush at the end of recovery only in 1.5

bleskes · 2015-02-11T11:39:08Z

src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            }
+            if (waitIfOngoing == false) {
+                if (flushLock.tryLock() == false) {
+                    flushing.decrementAndGet();


I think we need to throw an exception here, just as we do above with the currentFlushing check.

bleskes · 2015-02-11T11:49:42Z

Left some small comments o.w. looks good.

s1monw · 2015-02-11T11:51:52Z

@bleskes I simplified the exception logic a bit and removed the flush counter.

bleskes · 2015-02-11T11:55:00Z

src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            if (flushLock.tryLock() == false) {
+                // if we can't get the lock right away we block if needed otherwise barf
+                if (waitIfOngoing) {
+                    flushLock.lock();


I wonder if we want a trace message here...

bleskes · 2015-02-11T11:55:23Z

LGTM. Left one minor comment.

s1monw · 2015-02-11T12:00:08Z

@bleskes added some more traces there :)

…pens concurrently Unfortunately the lock order is important in the current flush codehe. We have to acquire the readlock fist otherwise if we are flushing at the end of the recovery while holding the write lock we can deadlock if: * Thread 1: flushes via API and gets the flush lock but blocks on the readlock since Thread 2 has the writeLock * Thread 2: flushes at the end of the recovery holding the writeLock and blocks on the flushLock owned by Thread 2 This commit acquires the read lock first which would be done further down anyway for the time of the flush. As a sideeffect we can now safely flush on calling close() while holding the writeLock.

…nish recovery happens concurrently Issue #9648 fixes a potential deadlock between two concurrent flushes - one at the end of recovery and one through the API or background flush. This back ports the logic to 1.4 . It is slightly more contrived as we still use the write lock in the flush code. If we feel we have some concerns about this approach we can also move the recovery flush to happen on a generic thread. Closes #9942

… and finish recovery happens concurrently Issue elastic#9648 fixes a potential deadlock between two concurrent flushes - one at the end of recovery and one through the API or background flush. This back ports the logic to 1.4 . It is slightly more contrived as we still use the write lock in the flush code. If we feel we have some concerns about this approach we can also move the recovery flush to happen on a generic thread. Closes elastic#9942

s1monw added v1.5.0 v2.0.0-beta1 review >bug labels Feb 11, 2015

bleskes reviewed Feb 11, 2015
View reviewed changes

s1monw force-pushed the fix_lock_order branch from 16219ee to 0b0cd1c Compare February 11, 2015 12:18

s1monw merged commit 0b0cd1c into elastic:master Feb 11, 2015

bleskes mentioned this pull request Mar 2, 2015

Engine: back port #9648 - Fix deadlock problems when API flush and finish recovery happens concurrently #9942

Closed

clintongormley added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. and removed review labels Mar 19, 2015

clintongormley changed the title ~~[ENGINE] Fix deadlock problems when API flush and finish recovery happens concurrently~~ Fix deadlock problems when API flush and finish recovery happens concurrently Jun 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock problems when API flush and finish recovery happens concurrently #9648

Fix deadlock problems when API flush and finish recovery happens concurrently #9648

s1monw commented Feb 11, 2015

s1monw commented Feb 11, 2015

bleskes Feb 11, 2015

bleskes commented Feb 11, 2015

s1monw commented Feb 11, 2015

bleskes Feb 11, 2015

bleskes commented Feb 11, 2015

s1monw commented Feb 11, 2015

Fix deadlock problems when API flush and finish recovery happens concurrently #9648

Fix deadlock problems when API flush and finish recovery happens concurrently #9648

Conversation

s1monw commented Feb 11, 2015

s1monw commented Feb 11, 2015

bleskes Feb 11, 2015

Choose a reason for hiding this comment

bleskes commented Feb 11, 2015

s1monw commented Feb 11, 2015

bleskes Feb 11, 2015

Choose a reason for hiding this comment

bleskes commented Feb 11, 2015

s1monw commented Feb 11, 2015