Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bookie/BookieServer components shutdown will fail to end/exit the BookieProcess #1540

Closed
reddycharan opened this issue Jul 12, 2018 · 0 comments

Comments

@reddycharan
Copy link
Contributor

BUG REPORT

  1. Please describe the issue you observed:

If Bookie/BookieServer components are shutdown internally because of any fatal errors (ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent future to complete http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 . The startComponent future will be market complete only in runtime shutdownhook - https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown because of known fatal errors.

Following is the thread callstacktrace of main method which is waiting forever
(line numbers might not match, since we are using little older code)

main - priority:5 - threadId:0x00007fd60000d000 - nativeId:0x2d37b - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)

  • parking to wait for <0x00007fcddf015af0> (a java.util.concurrent.CompletableFuture$Signaller)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
    at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
    at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
    at org.apache.bookkeeper.server.Main.doMain(Main.java:215)
    at org.apache.bookkeeper.server.Main.main(Main.java:189)
    at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:256)

Note: This is regression bug, which must be introduced with lifecycle components introduction.

sijie added a commit to sijie/bookkeeper that referenced this issue Jul 14, 2018
…exit the BookieProcess

 ### Motivation

Fixes the issue at apache#1540.

If Bookie/BookieServer components are shutdown internally because of any fatal errors
(ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then
it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer
but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent
future to complete
http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 .
The startComponent future will be market complete only in runtime shutdownhook -
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence
the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence
Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown
because of known fatal errors.

 ### Regression

Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`.
When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite.
However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead,
lifecycle management is unaware of the situation and the main thread doesn't quite.

 ### Changes

- Add `UncaughtExceptionHandler` to lifecycle components
- When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack
@sijie sijie self-assigned this Jul 18, 2018
@sijie sijie added this to the 4.8.0 milestone Jul 23, 2018
@sijie sijie closed this as completed in 50f29ed Jul 23, 2018
sijie added a commit that referenced this issue Jul 23, 2018
… exit the BookieProcess

Descriptions of the changes in this PR:

 ### Motivation

Fixes the issue at #1540.

If Bookie/BookieServer components are shutdown internally because of any fatal errors
(ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then
it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer
but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent
future to complete
http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 .
The startComponent future will be market complete only in runtime shutdownhook -
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence
the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence
Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown
because of known fatal errors.

 ### Regression

Issue #508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`.
When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite.
However after #508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead,
lifecycle management is unaware of the situation and the main thread doesn't quite.

 ### Changes

- Add `UncaughtExceptionHandler` to lifecycle components
- When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack

Master Issue: #1540

Author: Sijie Guo <sijie@apache.org>

Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>

This closes #1543 from sijie/fix_lifcycle_components, closes #1540

(cherry picked from commit 50f29ed)
Signed-off-by: Sijie Guo <sijie@apache.org>
reddycharan pushed a commit to reddycharan/bookkeeper that referenced this issue Jul 24, 2018
…hutdown will fail to end exit the BookieProcess

Descriptions of the changes in this PR:

 ### Motivation

Fixes the issue at apache#1540.

If Bookie/BookieServer components are shutdown internally because of any fatal errors
(ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then
it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer
but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent
future to complete
http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 .
The startComponent future will be market complete only in runtime shutdownhook -
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence
the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence
Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown
because of known fatal errors.

 ### Regression

Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`.
When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite.
However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead,
lifecycle management is unaware of the situation and the main thread doesn't quite.

 ### Changes

- Add `UncaughtExceptionHandler` to lifecycle components
- When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack

Master Issue: apache#1540

Author: Sijie Guo <sijie@apache.org>

Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue Jul 24, 2018
…hutdown will fail to end exit the BookieProcess

- resolve compilation failure issues. This is needed because
we are cherry-picking community fix and this fix depends on
other change which we haven't brought to our repo.
reddycharan pushed a commit to reddycharan/bookkeeper that referenced this issue Jul 24, 2018
…hutdown will fail to end exit the BookieProcess

Descriptions of the changes in this PR:

 ### Motivation

Fixes the issue at apache#1540.

If Bookie/BookieServer components are shutdown internally because of any fatal errors
(ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then
it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer
but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent
future to complete
http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 .
The startComponent future will be market complete only in runtime shutdownhook -
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence
the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence
Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown
because of known fatal errors.

 ### Regression

Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`.
When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite.
However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead,
lifecycle management is unaware of the situation and the main thread doesn't quite.

 ### Changes

- Add `UncaughtExceptionHandler` to lifecycle components
- When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack

Master Issue: apache#1540

Author: Sijie Guo <sijie@apache.org>

Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue Jul 24, 2018
…hutdown will fail to end exit the BookieProcess

- resolve compilation failure issues. This is needed because
we are cherry-picking community fix and this fix depends on
other change which we haven't brought to our repo.
reddycharan pushed a commit to reddycharan/bookkeeper that referenced this issue Aug 2, 2018
…hutdown will fail to end exit the BookieProcess

Descriptions of the changes in this PR:

 ### Motivation

Fixes the issue at apache#1540.

If Bookie/BookieServer components are shutdown internally because of any fatal errors
(ExitCode - INVALID_CONF, SERVER_EXCEPTION, ZK_EXPIRED, ZK_REG_FAIL, BOOKIE_EXCEPTION) then
it will go through shutdown method logic and shutdowns components internal to Bookie/BookieServer
but it will not succeed in bringing down the bookie process.

This is because in BookieServer.main / server.Main.doMain it would wait for the startComponent
future to complete
http://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/server/Main.java#L227 .
The startComponent future will be market complete only in runtime shutdownhook -
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/main/java/org/apache/bookkeeper/common/component/ComponentStarter.java#L66.

But the problem is nowhere in Bookie/BookieProcess shutdown we are calling System.exit() and hence
the runtime shutdownhook is not executed to mark the startComponent future to complete. Hence
Main.doMain will wait forever on this future though Bookie/BookieServer components are shutdown
because of known fatal errors.

 ### Regression

Issue apache#508 introduced this regression. Before this change, the main thread is blocking using `BookieServer#join()`.
When bookie is dead for any reason, the DeathWatchThread will kill the bookie and bookie server. so the main thread will quite.
However after apache#508 is introduced, the lifecycle management is disconnected from the bookie and bookie server. so when they are dead,
lifecycle management is unaware of the situation and the main thread doesn't quite.

 ### Changes

- Add `UncaughtExceptionHandler` to lifecycle components
- When a lifecycle component hits an error, it can use `UncaughtExceptionHandler` to notify lifecycle component stack to shutdown the whole stack

Master Issue: apache#1540

Author: Sijie Guo <sijie@apache.org>

Reviewers: Andrey Yegorov <None>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#1543 from sijie/fix_lifcycle_components, closes apache#1540
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue Aug 2, 2018
…hutdown will fail to end exit the BookieProcess

- resolve compilation failure issues. This is needed because
we are cherry-picking community fix and this fix depends on
other change which we haven't brought to our repo.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants