Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1578: Fixed deadlock in auditor blocking ZK thread #1608

Closed
wants to merge 1 commit into from

Conversation

merlimat
Copy link
Contributor

Motivation

Fixes #1578

After getting ZK callback from ZK event thread, we need to jump to a background thread before doing synchronous call to admin.openLedgerNoRecovery(ledgerId); which will try to make a ZK request a wait for a response (which would be coming through same ZK event thread currently blocked..)

Thread.currentThread().interrupt();
// Do not perform blocking calls that involve making ZK calls from within the ZK
// event thread. Jump to background thread instead to avoid deadlock.
ForkJoinPool.commonPool().execute(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have some other threadpool available? If we have a Bookkeeper object (inside BK afmin) we could use its own main threadpool for instance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have some other threadpool available?

we don't have pools for blocking operations.

If we have a Bookkeeper object (inside BK afmin) we could use its own main threadpool for instance

I don't think we need a pool for blocking calls. all the pools in BK are for non-blocking.

@ivankelly
Copy link
Contributor

Where does the backpressure occur with this? Lets say there are 1000 ledgers, and we create a ledgerManager.asyncProcessLedgers with this Processor implementation. Will 1000 tasks be submitted concurrently to the ForkJoinPool? Will this mean 1000 threads are created? I think the sync calls may have acted as a kind of nasty backpressure in the past.

@sijie
Copy link
Member

sijie commented Aug 20, 2018

I think the sync calls may have acted as a kind of nasty backpressure in the past.

No. There was a thread back at 2014 asking for changing autorecovery related methods from sync to async.

http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201411.mbox/%3CCAO2yDyZo5AzYgE%3D%3Dk8C5bifA3Miv-2A9w%2B_-6aS%2Bwa4zxRGcOw%40mail.gmail.com%3E

Will 1000 tasks be submitted concurrently to the ForkJoinPool? Will this mean 1000 threads are created?

I think there is a parallelism cap. I think the default parallelism is Runtime.getRuntime().availableProcessors() - 1. it can be altered by setting java.util.concurrent.ForkJoinPool.common.parallelism".

@sijie sijie added this to the 4.9.0 milestone Aug 21, 2018
@sijie sijie closed this in f782a9d Aug 21, 2018
sijie pushed a commit that referenced this pull request Aug 21, 2018
### Motivation

Fixes #1578

After getting ZK callback from ZK event thread, we need to jump to a background thread before doing synchronous call to `admin.openLedgerNoRecovery(ledgerId);` which will try to make a ZK request a wait for a response (which would be coming through same ZK event thread currently blocked..)

Author: Matteo Merli <mmerli@apache.org>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes #1608 from merlimat/fix-auditor-deadlock, closes #1578

(cherry picked from commit f782a9d)
Signed-off-by: Sijie Guo <sijie@apache.org>
sijie pushed a commit that referenced this pull request Aug 21, 2018
### Motivation

Fixes #1578

After getting ZK callback from ZK event thread, we need to jump to a background thread before doing synchronous call to `admin.openLedgerNoRecovery(ledgerId);` which will try to make a ZK request a wait for a response (which would be coming through same ZK event thread currently blocked..)

Author: Matteo Merli <mmerli@apache.org>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes #1608 from merlimat/fix-auditor-deadlock, closes #1578

(cherry picked from commit f782a9d)
Signed-off-by: Sijie Guo <sijie@apache.org>
@reddycharan
Copy link
Contributor

@merlimat @sijie @ivankelly I've few questions here

  1. Is this bug regression or is it been like this since beginning?
  2. Because of this deadlock is it just 'checkAllLedgers' checker which is blocked? or other components which use 'executor' ("auditBookies" checker and core Auditor functionality as well?
  3. If synchronous call - 'admin.openLedgerNoRecovery' in "checkLedgersProcessor" is blocked as you explained, then 'processDone' latch is not counted down, then "processDone.await()" in "checkAllLedgers" will be blocked forever. Which will make 'executor' blocked, since 'executor' is singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is blocked, right?
  4. why does Issue description say "Auditor run Periodic check only once", if the analysis made for this fix is correct then "checkAllLedgers" shouldn't run even once right?
  5. To begin with, I'm not sure if there is comprehensive testcase for this checker, but I'm little surprised that this commit is merged / issue is closed, with no testcase to prove the analysis of the fix and validness of the fix.

@jvrao @athanatos fyi.

@merlimat merlimat deleted the fix-auditor-deadlock branch August 21, 2018 23:31
@merlimat
Copy link
Contributor Author

Is this bug regression or is it been like this since beginning?

It was always there

Because of this deadlock is it just 'checkAllLedgers' checker which is blocked? or other components which use 'executor' ("auditBookies" checker and core Auditor functionality as well?

The ZK "event-thread" is blocked, so nothing else using ZK will work.

If synchronous call - 'admin.openLedgerNoRecovery' in "checkLedgersProcessor" is blocked as you explained, then 'processDone' latch is not counted down, then "processDone.await()" in "checkAllLedgers" will be blocked forever. Which will make 'executor' blocked, since 'executor' is singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is blocked, right?

why does Issue description say "Auditor run Periodic check only once", if the analysis made for this fix is correct then "checkAllLedgers" shouldn't run even once right?

I think the issue was named (not by me) based on the initial perceived behavior. The analysis of the stack-trace is pretty clear on what the root problem is.

It is a big problem to mix sync and async operation in ZK. It is imperative to not do anything blocking from a ZK callback thread.

To begin with, I'm not sure if there is comprehensive testcase for this checker, but I'm little surprised that this commit is merged / issue is closed, with no testcase to prove the analysis of the fix and validness of the fix.

@sijie
Copy link
Member

sijie commented Aug 21, 2018

@reddycharan : let me start with a summary of my thoughts: As if you read the comment above, I raised the concerns before around AutoRecovery calling sync methods in async callbacks is a very bad practise. so I wouldn't be surprised if there are more other deadlocks found in other places in AutoRecovery. So this bug is not intended to address the whole "calling-sync-methods-in-async-callback" problem in AutoRecovery, the PR is intended to address the problem reported in in #1578 first and get the bugfix available for 4.7.2. so the scope is limited to address the bug reported in #1578.

regarding the test, I think it is a bit hard to reproduce the sequence of this race condition. and the change is limited to scope to address the sync call in the stack trace reported in #1578, moving the sync call to a detailed thread pool without blocking zookeeper callback thread is a simple and straightforward problem. As the aim for this PR is to address the specific stack trace in #1578, this change is okay to go in for a bugfix to address the immediate concerns in AutoRecovery.

A long term fix for AutoRecovery is to audit all the sync calls in callbacks and make then async, which I have raised that up before. http://mail-archives.apache.org/mod_mbox/bookkeeper-dev/201411.mbox/%3CCAO2yDyZo5AzYgE%3D%3Dk8C5bifA3Miv-2A9w%2B_-6aS%2Bwa4zxRGcOw%40mail.gmail.com%3E


more details to your questions:

Is this bug regression or is it been like this since beginning?

I believe this has been since beginning not just a regression. as I pointed out, the AutoRecovery has multiple places calling sync calls in callbacks.

Because of this deadlock is it just 'checkAllLedgers' checker which is blocked? or other components which use 'executor' ("auditBookies" checker and core Auditor functionality as well?

you will only have this issue when you call a sync method (waiting for zookeeper result) in a zookeeper callback thread. not all the "call-sync-methods-in-async-callback" will have this issues. I don't think other components are concerns. However as a good practice, we need to clean up the sync calls in async callbacks. that is a very bad practice. However that is a big problem to address in AutoRecovery.

Which will make 'executor' blocked, since 'executor' is singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is blocked, right?

yes. that's why the issue reporter says "Auditor periodic check only run once", because Auditor executor is blocked.

why does Issue description say "Auditor run Periodic check only once", if the analysis made for this fix is correct then "checkAllLedgers" shouldn't run even once right?

I think "Auditor run periodic check only once" is the reporter observed behavior.

The race condition can be happening at any "checkAllLedgers" run, not necessarily to be the first one. if you look into the code, for each Auditor checkAllLedgers, a new zookeeper client is established, so the race condition can happen any any CheckAllLedgers run. but once it is blocked, no future checkAllLedgers will be run.

I'm not sure if there is comprehensive testcase for this checker,

I think this race condition depends on timing. any timing related stuffs usually very hard to be covered or caught via test cases.

I'm little surprised that this commit is merged / issue is closed, with no testcase to prove the analysis of the fix and validness of the fix

I think it is a bit hard to reproduce the sequence of this race condition. especially this related to timing at zookeeper client callback. and the fix is straightforward by moving the blocking call to a separate thread. that's why we make the exception to merge this for bugfix for 4.7.2.

@sijie
Copy link
Member

sijie commented Aug 21, 2018

@eolivelli : I just created #1617 for addressing the broader issue of AutoRecovery.

@reddycharan
Copy link
Contributor

reddycharan commented Aug 22, 2018

@sijie, From @merlimat description

After getting ZK callback from ZK event thread, we need to jump to a background thread before doing synchronous call to admin.openLedgerNoRecovery(ledgerId); which will try to make a ZK request a wait for a response (which would be coming through same ZK event thread currently blocked..)

I understood it as that, "admin.openLedgerNoRecovery" f782a9d#diff-7525f06ad3a1ad0a00a462df4deb4698L645 will be blocked consistently. Thats why I was wondering how were we ok so far (5 years since 005b62c ) is introduced, since the ZK thread deadlock will eventually lead to Auditor being non-functional.

if you say that because of race condition in ZK library we would run into issue, then it makes some sense for why this issue was not identified completely so far. Being said that I'm just wondering at very high level how probabilistic is it to get into this zk thread deadlock issue? Since this will effectively makes Auditor non-functional, I would like to ascertain how vulnerable we were so far.

The race condition can be happening at any "checkAllLedgers" run, not necessarily to be the first one. if you look into the code, for each Auditor checkAllLedgers, a new zookeeper client is established, so the race condition can happen any any CheckAllLedgers run. but once it is blocked, no future checkAllLedgers will be run.

@reddycharan
Copy link
Contributor

btw, I described in my earlier comment why this ZK thread deadlock will lead to Auditor being non-functional -

If synchronous call - 'admin.openLedgerNoRecovery' in "checkLedgersProcessor" is blocked as you explained, then 'processDone' latch is not counted down, then "processDone.await()" in "checkAllLedgers" will be blocked forever. Which will make 'executor' blocked, since 'executor' is singleThreadScheduledExecutor, then IFIUC all of the Auditor functionality is blocked, right?

@reddycharan
Copy link
Contributor

@sijie @merlimat from the callstack trace reported in the issue #1578, we can say that Auditor's single threaded executor ('executor') is hung while waiting on "processDone.await()" in checkAllLedgers method. So technically even with this fix, there is still scope for 'processDone' countdownlatch not being counted down to zero (for what so ever reasons). So again in this case, executor will be blocked and Auditor will become non-functional. So I believe the important fix needed here is to not wait forever on this latch - https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java#L701 . Instead have some timeout and move on. Ideally I would move the checkers functionality to some other threadpool/executor, so that it wont impact the core functionality of Auditor, which is super critical in Autoreplication system.

"AuditorBookie-XXXXX:3181" #40 daemon prio=5 os_prio=0 tid=0x00007f049c117830 nid=0x5da4 waiting on condition [0x00007f0477dfc000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000e04e54f8> (a java.util.concurrent.CountDownLatch$Sync)
..
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at org.apache.bookkeeper.replication.Auditor.checkAllLedgers(Auditor.java:696)
at org.apache.bookkeeper.replication.Auditor$5.run(Auditor.java:359)

@jvrao
Copy link
Contributor

jvrao commented Aug 22, 2018

@sijie @merlimat Where do we define the number of ZK event threads?
If we have only one event thread, then it is a deadlock everytime right?

@sijie
Copy link
Member

sijie commented Aug 22, 2018

@merlimat @jvrao @reddycharan :

actually I looked at the stack trace again. so the problem is a race condition, the fix is here as okay as well, however the race condition is not zookeeper itself as described in the description, it is a race condition between bk and zk, which happens very rarely. the stack trace is very confusing, which will make people think it is a self-deadlock situation (calling-sync-method-in-async-callback).

Let me summarize my findings and hope these will make things clearer for Charan's questions.

zookeeper deadlock?

Based on matteo's initial investigation, there is a deadlock at zookeeper thread.

"main-EventThread" #11 daemon prio=5 os_prio=0 tid=0x00007f05385d3aa0 nid=0x5bd2 waiting on condition [0x00007f05207f0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e1374598> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.bookkeeper.client.SyncCallbackUtils.waitForResult(SyncCallbackUtils.java:45)
	at org.apache.bookkeeper.client.BookKeeperAdmin.openLedgerNoRecovery(BookKeeperAdmin.java:327)
	at org.apache.bookkeeper.replication.Auditor$6.process(Auditor.java:645)
	at org.apache.bookkeeper.replication.Auditor$6.process(Auditor.java:627)
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$5.operationComplete(AbstractZkLedgerManager.java:510)
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$5.operationComplete(AbstractZkLedgerManager.java:484)
	at org.apache.bookkeeper.util.ZkUtils$6$1.processResult(ZkUtils.java:285)
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$32$1.processResult(ZooKeeperClient.java:1395)
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$31$1.processResult(ZooKeeperClient.java:1356)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:589)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

The stack trace here is very confusing. As you can see from the stack trace, we are calling a synchronous openLedgerNoRecovery from a zookeeper thread org.apache.zookeeper.ClientCnxn$EventThread. we would think openLedgerNoRecovery will call zookeeper again, but since openLedgerNoRecovery is blocking the zookeeper thread and the zookeeper call in openLedgerNoRecovery needs to be run in the same zookeeper thread. then it is a self-deadlock.

However, as I said the stack trace here is very very confusing. because the self-deadlock would only happen if openLedgerNoRecovery is called from same zookeeper client's callback. But, in Auditor, the zookeeper client used in openLedgerNoRecovery is different from the zookeeper client used for iterating ledgers.

https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java#L613

for every checkAllLedgers, there is a brand new zookeeper client is created for opening ledgers. so the self-deadlock situation will not actually happen at current code. (hower I would also highlight #1588 would potentially cause self-deadblock since it attempts to remove the zookeeper client creation. so that PR needs to be revisited)

what is the real deadlock

I looked again at the stack trace here. https://gist.github.com/hrsakai/d65e8e2cd511173232b1010a9bbdf126

  1. there are a bunch of BookKeeperClientWorker-OrderedExecutor threads are blocked on waiting ZkLedgerUnderreplicationManager.markLedgerUnderreplicated to complete. markLedgerUnderreplicated is a zookeeper operation. basically in bookkeeper's ordered executor, Auditor is calling zookeeper client to publish suspected ledgers (blocking calls). so bookkeeper ordered executor is waiting for zookeeper's callback.
"BookKeeperClientWorker-OrderedExecutor-5-0" #53 prio=5 os_prio=0 tid=0x00007f0468002bd0 nid=0x3cb1 waiting on condition [0x00007f04771f0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e137b988> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
	at org.apache.bookkeeper.util.ZkUtils.createFullPathOptimistic(ZkUtils.java:184)
	at org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager.markLedgerUnderreplicated(ZkLedgerUnderreplicationManager.java:269)
	at org.apache.bookkeeper.replication.Auditor.publishSuspectedLedgers(Auditor.java:550)
	at org.apache.bookkeeper.replication.Auditor.access$1400(Auditor.java:79)
	at org.apache.bookkeeper.replication.Auditor$ProcessLostFragmentsCb.operationComplete(Auditor.java:580)
	at org.apache.bookkeeper.replication.Auditor$ProcessLostFragmentsCb.operationComplete(Auditor.java:563)
	at org.apache.bookkeeper.client.LedgerChecker$FullLedgerCallback.operationComplete(LedgerChecker.java:303)
	at org.apache.bookkeeper.client.LedgerChecker$FullLedgerCallback.operationComplete(LedgerChecker.java:282)
	at org.apache.bookkeeper.client.LedgerChecker$LedgerFragmentCallback.operationComplete(LedgerChecker.java:130)
	at org.apache.bookkeeper.client.LedgerChecker$LedgerFragmentCallback.operationComplete(LedgerChecker.java:91)
	at org.apache.bookkeeper.client.LedgerChecker$ReadManyEntriesCallback.readEntryComplete(LedgerChecker.java:83)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$1.readEntryComplete(PerChannelBookieClient.java:1559)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion.lambda$errorOut$0(PerChannelBookieClient.java:1575)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$ReadCompletion$$Lambda$28/167038546.run(Unknown Source)
	at org.apache.bookkeeper.proto.PerChannelBookieClient$CompletionValue$1.safeRun(PerChannelBookieClient.java:1417)
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
  1. Go back to the stack trace of openLedgerNoRecovery. openLedgerRecovery is blocking the zookeeper thread and waiting bookkeeper's callback to complete. However bookkeeper's callback has to be run in ordered scheduler, but those threads are waiting zookeeper callback to complete. hence the deadlock happens.
"main-EventThread" #11 daemon prio=5 os_prio=0 tid=0x00007f05385d3aa0 nid=0x5bd2 waiting on condition [0x00007f05207f0000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000e1374598> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at org.apache.bookkeeper.client.SyncCallbackUtils.waitForResult(SyncCallbackUtils.java:45)
	at org.apache.bookkeeper.client.BookKeeperAdmin.openLedgerNoRecovery(BookKeeperAdmin.java:327)
	at org.apache.bookkeeper.replication.Auditor$6.process(Auditor.java:645)
	at org.apache.bookkeeper.replication.Auditor$6.process(Auditor.java:627)
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$5.operationComplete(AbstractZkLedgerManager.java:510)
	at org.apache.bookkeeper.meta.AbstractZkLedgerManager$5.operationComplete(AbstractZkLedgerManager.java:484)
	at org.apache.bookkeeper.util.ZkUtils$6$1.processResult(ZkUtils.java:285)
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$32$1.processResult(ZooKeeperClient.java:1395)
	at org.apache.bookkeeper.zookeeper.ZooKeeperClient$31$1.processResult(ZooKeeperClient.java:1356)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:589)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

**in summary, the real deadlock happens due to:

  • we are calling sync zookeeper methods in bk's callback thread
  • we are calling sync bookkeeper methods in zk's callback thread

both are waiting for each other.**

This happens rarely because bookkeeper has multiple callback threads. most of the time, the bookie call thread which publishes suspected ledger might be different from the thread that completes openNoLedgerRecovery, since ledger id would be different most of the time. It will only happen if two ledger ids are assigned to same bookkeeper callback thread and publishSuspectLedger and openLedgerNoRecovery happen at the same time.

What is the right fix?

The fix here is okay but not correct. The right fix is to move all synchronous calls out of bk callback threads and zk callback thread. That's the first immediate action we can take. Then consider rewrite those sync methods to use async methods to completely get rid of sync logic.


Above is all my findings regarding this issue. Below I will leave my thoughts to some of Charan and JV's questions.

Where do we define the number of ZK event threads?

there is only one send thread and one receive thread per zookeeper client. it is non-configurable.

If we have only one event thread, then it is a deadlock everytime right?

yes if we call same zk client's sync method in its callback.

So I believe the important fix needed here is to not wait forever on this latch

I don't think "not wait forever" on this latch is the right fix. for sure you can timeout and let next check kick in. However the problem here is deadlock makes both bookkeeper and zookeeper client are not functionable. so introducing timeout just delay and hide the problem. In a production system, I would rather than let it wait forever and use metric (e.g. the number of auditor runs) for alerting. for example, if the number of auditor runs doesn't increment for more than checkIntervals, that means auditor is stuck. then an alert should be triggered.

Ideally I would move the checkers functionality to some other threadpool/executor, so that it wont impact the core functionality of Auditor,

yeah dedicated threadpools for different checkers is a good idea. however I think the most critical fix is not calling sync methods in async callbacks.

@merlimat
Copy link
Contributor Author

Thanks @sijie for detailed analysis!

@reddycharan
Copy link
Contributor

reddycharan commented Aug 22, 2018

Thanks @sijie for the detailed analysis. Despite your detailed analysis it needs more digging/understanding to know exactly what is going on.

I think the reason for confusion/uncertainty/convolutedness is because of the fact that each call of Auditor.checkAllLedgers, partially has it own set of resources but not all. Auditor.checkAllLedgers creates its own 'newzk', 'client', 'admin', and 'checker', but for 'ledgerManager' and 'ledgerUnderreplicationManager' (ProcessLostFragmentsCb in Auditor.checkAllLedgers uses 'ledgerUnderreplicationManager') it uses Auditor's instance variables. Because of this, 2 ZK clients * 2 threads (ZKClients IOThread and event thread), 2 BK clients * 1 mainWorkerPool (OrderedExecutor for each BKClient) and Auditor's singlethreaded executor are in play here. And our maze of callbacks in this component, makes understanding of control/execution transfer between threads super complicated. I'm not sure if this is intentional, but to reason out any such issues, it needs multiple pair of eyes and multiple hours of debugging.

Can you please consider evaluating if Auditor.checkAllLedgers needs its own set of resources or not? If it needs its own set of resources then is it ok to go partial path?

Anyhow I'm glad I started this conversation, since I'm not convinced with the original Issue description and description of the fix.

@reddycharan
Copy link
Contributor

If Auditor.checkAllLedgers is reusing Auditor's - newzk, client, admin and checker then it would be self-deadlock situation or if Auditor.checkAllLedgers has its own ledgerManager and ledgerUnderreplicationManager along with other newly created resources then also we would be in self-deadlock situation. So AFAIU this issue is not completely found because of the fact that partial new resource creation is allowed in original commit 005b62c .

@sijie
Copy link
Member

sijie commented Aug 23, 2018

Can you please consider evaluating if Auditor.checkAllLedgers needs its own set of resources or not? If it needs its own set of resources then is it ok to go partial path?

for current code, it needs its own zk, otherwise it is self-deadlock. after my change #1619, it doesn't need its own resources.

So AFAIU this issue is not completely found because of the fact that partial new resource creation is allowed in original commit

the problem will be gone if we don't call sync calls in async callbacks. so my fix #1619 should be the correct fix. please take a look.

reddycharan pushed a commit to reddycharan/bookkeeper that referenced this pull request Oct 17, 2018
### Motivation

Fixes apache#1578

After getting ZK callback from ZK event thread, we need to jump to a background thread before doing synchronous call to `admin.openLedgerNoRecovery(ledgerId);` which will try to make a ZK request a wait for a response (which would be coming through same ZK event thread currently blocked..)

Author: Matteo Merli <mmerli@apache.org>

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes apache#1608 from merlimat/fix-auditor-deadlock, closes apache#1578

(cherry picked from commit f782a9d)
Signed-off-by: Sijie Guo <sijie@apache.org>
(cherry picked from commit 51040cf)
Signed-off-by: JV Jujjuri <vjujjuri@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants