New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bookkeeper shutdown when we stop ZK leader node - Pulsar V2.9.2 #3292
Comments
The restart is caused by the auto-recovery component of the bookies. In general, it is better to run the auto-recovery as a separate service (it's completely stateless), rather than as part of the bookies. |
Thanks merlimat , |
@GBM-tamerm yes, the auto-recovery process will still restart, though the bookie process won't do that anymore. It will not be a problem since auto-recovery runs in background and won't cause any disruptions to existing clients. |
But it is causing issue as shown the above excpetion trace |
Same issue reported in BK community |
@GBM-tamerm In bookies you need to disable auto-recovery by setting in
Then you can run auto-recovery as a separate stateless service:
|
i tried that now , but autorecovery is failing with below excpetion 2022-05-25T19:15:56,497-0400 [main] INFO org.apache.bookkeeper.common.component.ComponentStarter - Starting component autorecovery-server. |
auto recovery component will take affect the bookie-server , if zk leader down, auto recovery will throw a connection loss expcetion ,then it will execute the shutdown hook. auto recovery do not process connection loss correctly. |
BUG REPORT
Describe the bug
When we stop ZK leader node , it start new elections , and ZK clients get disconnected , any Bookie node with auto recovery running in the background will be shutdown with below exception
2022-05-24T02:13:33,263-0400 [AuditorElector-10.119.33.232:3181] ERROR org.apache.bookkeeper.replication.AuditorElector - Exception while performing auditor election
java.io.IOException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ledgers/underreplication/auditorelection/V_0000000079
at org.apache.bookkeeper.meta.ZkLedgerAuditorManager.createMyVote(ZkLedgerAuditorManager.java:204) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.meta.ZkLedgerAuditorManager.tryToBecomeAuditor(ZkLedgerAuditorManager.java:98) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.replication.AuditorElector$3.run(AuditorElector.java:184) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
2022-05-24T02:13:33,362-0400 [AutoRecoveryDeathWatcher-3181] INFO org.apache.bookkeeper.replication.AutoRecoveryMain - AutoRecoveryDeathWatcher noticed the AutoRecovery is not running any more,exiting the watch loop!
2022-05-24T02:13:33,363-0400 [AutoRecoveryDeathWatcher-3181] ERROR org.apache.bookkeeper.common.component.ComponentStarter - Triggered exceptionHandler of Component: bookie-server because of Exception in Thread: Thread[AutoRecoveryDeathWatcher-3181,5,main]
java.lang.RuntimeException: AutoRecovery is not running any more
at org.apache.bookkeeper.replication.AutoRecoveryMain$AutoRecoveryDeathWatcher.run(AutoRecoveryMain.java:237) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
2022-05-24T02:13:33,364-0400 [component-shutdown-thread] INFO org.apache.bookkeeper.common.component.ComponentStarter - Closing component bookie-server in shutdown hook.
2022-05-24T02:13:34,072-0400 [component-shutdown-thread] INFO org.apache.bookkeeper.replication.ReplicationWorker - Shutting down replication worker
2022-05-24T02:13:34,072-0400 [component-shutdown-thread] INFO org.apache.bookkeeper.replication.ReplicationWorker - Shutting down ReplicationWorker
2022-05-24T02:13:34,073-0400 [ReplicationWorker] INFO org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker exited loop!
2022-05-24T02:13:34,237-0400 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x500000042f40000
2022-05-24T02:13:34,238-0400 [component-shutdown-thread] INFO org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer
2022-05-24T02:13:34,238-0400 [component-shutdown-thread] INFO org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer
To Reproduce
Steps to reproduce the behavior:
Expected behavior
other running BKs should not be shutdown
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Pulsar V2.9.2
OS: Ubuntu 18.04
Java 8
Pulsar running as systemd service
6 brokers
6 bookies
5 ZK.
The text was updated successfully, but these errors were encountered: