-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce lifecycle components for managing components in bookie server #508
Comments
This was referenced Sep 13, 2017
sijie
added a commit
to sijie/bookkeeper
that referenced
this issue
Jul 14, 2018
…exit the BookieProcess ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack
sijie
added a commit
that referenced
this issue
Jul 23, 2018
… exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at #1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue #508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after #508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: #1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes #1543 from sijie/fix_lifcycle_components, closes #1540
sijie
added a commit
that referenced
this issue
Jul 23, 2018
… exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at #1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue #508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after #508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: #1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes #1543 from sijie/fix_lifcycle_components, closes #1540 (cherry picked from commit 50f29ed) Signed-off-by: Sijie Guo <sijie@apache.org>
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Jul 24, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan
pushed a commit
to reddycharan/bookkeeper
that referenced
this issue
Aug 2, 2018
…hutdown will fail to end exit the BookieProcess Descriptions of the changes in this PR: ### Motivation Fixes the issue at apache#1540. If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process. This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66. But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors. ### Regression Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`. When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite. However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead, lifecycle management is unaware of the situation and the main thread doesn't quite. ### Changes - Add `UncaughtExceptionHandler` to lifecycle components - When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack Master Issue: apache#1540 Author: Sijie Guo <sijie@apache.org> Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
FEATURE REQUEST
In a bookie server, we have multiple components to run, including:
It becomes a bit messy on managing components run in a bookie. This issue attempts to introduce lifecycle component for each service component. so we can run these components in a clear way.
This is also help with #491 when we introduce a metadata rpc component.
should-have
N/A
The text was updated successfully, but these errors were encountered: