-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bookie/BookieServer components shutdown will fail to end/exit the BookieProcess #1540
Comments
sijie
added a commit
to sijie/bookkeeper
that referenced
this issue
Jul 14, 2018
…exit the BookieProcess ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack
sijie
added a commit
that referenced
this issue
Jul 23, 2018
… exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at #1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue #508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after #508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: #1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes #1543 from sijie/fix_lifcycle_components, closes #1540 (cherry picked from commit 50f29ed) Signed-off-by: Sijie Guo <sijie@apache.org>
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan
added a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess - resolve compilation failure issues. This is needed because we are cherry-picking community fix and this fix depends on other change which we haven't brought to our repo.
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan
added a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess - resolve compilation failure issues. This is needed because we are cherry-picking community fix and this fix depends on other change which we haven't brought to our repo.
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Aug 2, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan
added a commit
to reddycharan/bookkeeper
that referenced
this issue
Aug 2, 2018
…hutdown will fail to end exit the BookieProcess - resolve compilation failure issues. This is needed because we are cherry-picking community fix and this fix depends on other change which we haven't brought to our repo.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
BUG REPORT
If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process.
This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.
But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors.
Following is the thread callstacktrace of main method which is waiting forever
(line numbers might not match, since we are using little older code)
main - priority:5 - threadId:0x00007fd60000d000 - nativeId:0x2d37b - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.bookkeeper.server.Main.doMain(Main.java:215)
at org.apache.bookkeeper.server.Main.main(Main.java:189)
at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:256)
Note: This is regression bug, which must be introduced with lifecycle components introduction.
The text was updated successfully, but these errors were encountered: