Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Continue graceful shutdown even if web service closing fails #9835

Merged
merged 1 commit into from
Mar 8, 2021

Conversation

massakam
Copy link
Contributor

@massakam massakam commented Mar 8, 2021

Motivation

Occasionally, the web service closing times out when shutting down the broker server. The following is the error log at that time:

13:38:20.200 [Thread-2] ERROR o.a.p.b.MessagingServiceShutdownHook - Failed to perform graceful shutdown, Exiting anyway
java.util.concurrent.ExecutionException: org.apache.pulsar.broker.PulsarServerException: org.apache.pulsar.broker.PulsarServerException: java.util.concurrent.TimeoutException
        at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
        at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
        at org.apache.pulsar.broker.MessagingServiceShutdownHook.run(MessagingServiceShutdownHook.java:69)
        at org.apache.pulsar.PulsarBrokerStarter$BrokerStarter.shutdown(PulsarBrokerStarter.java:294)
        at org.apache.pulsar.PulsarBrokerStarter.lambda$main$1(PulsarBrokerStarter.java:322)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.pulsar.broker.PulsarServerException: org.apache.pulsar.broker.PulsarServerException: java.util.concurrent.TimeoutException
        at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:316)
        at org.apache.pulsar.broker.MessagingServiceShutdownHook.lambda$run$0(MessagingServiceShutdownHook.java:62)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 common frames omitted
Caused by: org.apache.pulsar.broker.PulsarServerException: java.util.concurrent.TimeoutException
        at org.apache.pulsar.broker.web.WebService.close(WebService.java:203)
        at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:229)
        ... 5 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
        at org.eclipse.jetty.util.FutureCallback.get(FutureCallback.java:130)
        at org.eclipse.jetty.util.FutureCallback.get(FutureCallback.java:30)
        at org.eclipse.jetty.server.handler.AbstractHandlerContainer.doShutdown(AbstractHandlerContainer.java:175)
        at org.eclipse.jetty.server.Server.doStop(Server.java:447)
        at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:93)
        at org.apache.pulsar.broker.web.WebService.close(WebService.java:199)
        ... 6 common frames omitted

If this happens, the JVM process will be killed without closing the broker service, so topics owned by this broker will be moved to other brokers without their cursor information being persisted to the cursor ledgers. As a result, those cursors rewind to the old position and the duplicate messages are delivered to the consumers.

Modifications

Even if an exception is thrown when closing the web service (i.e. Jetty server), catche the exception and continue the graceful shutdown process. As far as I can see the Jetty source code, it seems that the stop process is completed even if an exception occurs while the Jetty server is stopped.
https://github.com/eclipse/jetty.project/blob/jetty-9.4.33.v20201020/jetty-server/src/main/java/org/eclipse/jetty/server/Server.java#L427-L485

@massakam massakam added this to the 2.8.0 milestone Mar 8, 2021
@massakam massakam self-assigned this Mar 8, 2021
@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Mar 8, 2021
@merlimat merlimat merged commit 4320e4a into apache:master Mar 8, 2021
@massakam massakam deleted the fix-graceful-shutdown branch March 9, 2021 01:05
@codelipenghui codelipenghui added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Mar 23, 2021
codelipenghui pushed a commit that referenced this pull request Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker cherry-picked/branch-2.7 Archived: 2.7 is end of life release/2.7.2 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants