Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Increase timeout for loading topics #6750

Merged
merged 1 commit into from
May 7, 2020

Conversation

addisonj
Copy link
Contributor

In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

@addisonj addisonj force-pushed the load-increase-timeout branch 2 times, most recently from b394070 to ed1a9ec Compare April 17, 2020 05:03
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I just left a minor comment.

@@ -861,7 +861,8 @@ public PulsarAdmin getClusterPulsarAdmin(String cluster) {
protected CompletableFuture<Optional<Topic>> loadOrCreatePersistentTopic(final String topic,
boolean createIfMissing) throws RuntimeException {
checkTopicNsOwnership(topic);
final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline();
// this timeout needs to be extra long in the case of topics with many replication clusters
final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline(5L, TimeUnit.MINUTES, new TimeoutException("Failed to load topic within timeout"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make the timeout of the topic loading configurable in the broker.conf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 we should make this setting configurable.

In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
@addisonj
Copy link
Contributor Author

addisonj commented May 5, 2020

@sijie @codelipenghui added the option, let me know if there is anything else!

@addisonj
Copy link
Contributor Author

addisonj commented May 5, 2020

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

@addisonj Could you please help add the configuration to the broker.conf? So that users can find it easier. The change looks good to me.

@sijie sijie merged commit 6854b00 into apache:master May 7, 2020
jiazhai pushed a commit that referenced this pull request May 8, 2020
In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>(cherry picked from commit 6854b00)
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request May 27, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants