[broker] Increase timeout for loading topics #6750

addisonj · 2020-04-17T04:52:41Z

In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

codelipenghui

Thanks for the fix. I just left a minor comment.

codelipenghui · 2020-04-17T06:57:52Z

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java

@@ -861,7 +861,8 @@ public PulsarAdmin getClusterPulsarAdmin(String cluster) {
    protected CompletableFuture<Optional<Topic>> loadOrCreatePersistentTopic(final String topic,
            boolean createIfMissing) throws RuntimeException {
        checkTopicNsOwnership(topic);
-        final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline();
+        // this timeout needs to be extra long in the case of topics with many replication clusters
+        final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline(5L, TimeUnit.MINUTES, new TimeoutException("Failed to load topic within timeout"));


It's better to make the timeout of the topic loading configurable in the broker.conf

+1 we should make this setting configurable.

In apache#6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout

addisonj · 2020-05-05T20:45:16Z

@sijie @codelipenghui added the option, let me know if there is anything else!

addisonj · 2020-05-05T21:53:11Z

/pulsarbot run-failure-checks

codelipenghui · 2020-05-07T06:40:59Z

@addisonj Could you please help add the configuration to the broker.conf? So that users can find it easier. The change looks good to me.

In #6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout Co-authored-by: Addison Higham <ahigham@instructure.com>(cherry picked from commit 6854b00)

In apache#6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout Co-authored-by: Addison Higham <ahigham@instructure.com>

addisonj force-pushed the load-increase-timeout branch 2 times, most recently from b394070 to ed1a9ec Compare April 17, 2020 05:03

codelipenghui reviewed Apr 17, 2020

View reviewed changes

codelipenghui requested review from sijie, jiazhai and merlimat April 17, 2020 06:58

codelipenghui assigned addisonj Apr 17, 2020

codelipenghui added this to the 2.6.0 milestone Apr 17, 2020

jiazhai added the release/2.5.2 label Apr 17, 2020

addisonj force-pushed the load-increase-timeout branch from ed1a9ec to 209e6d0 Compare May 5, 2020 20:42

codelipenghui approved these changes May 7, 2020

View reviewed changes

sijie approved these changes May 7, 2020

View reviewed changes

sijie merged commit 6854b00 into apache:master May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[broker] Increase timeout for loading topics #6750

[broker] Increase timeout for loading topics #6750

addisonj commented Apr 17, 2020

codelipenghui left a comment

codelipenghui Apr 17, 2020

sijie Apr 21, 2020

addisonj commented May 5, 2020

addisonj commented May 5, 2020

codelipenghui commented May 7, 2020

[broker] Increase timeout for loading topics #6750

[broker] Increase timeout for loading topics #6750

Conversation

addisonj commented Apr 17, 2020

codelipenghui left a comment

Choose a reason for hiding this comment

codelipenghui Apr 17, 2020

Choose a reason for hiding this comment

sijie Apr 21, 2020

Choose a reason for hiding this comment

addisonj commented May 5, 2020

addisonj commented May 5, 2020

codelipenghui commented May 7, 2020