-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[broker] Increase timeout for loading topics #6750
Conversation
b394070
to
ed1a9ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I just left a minor comment.
@@ -861,7 +861,8 @@ public PulsarAdmin getClusterPulsarAdmin(String cluster) { | |||
protected CompletableFuture<Optional<Topic>> loadOrCreatePersistentTopic(final String topic, | |||
boolean createIfMissing) throws RuntimeException { | |||
checkTopicNsOwnership(topic); | |||
final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline(); | |||
// this timeout needs to be extra long in the case of topics with many replication clusters | |||
final CompletableFuture<Optional<Topic>> topicFuture = futureWithDeadline(5L, TimeUnit.MINUTES, new TimeoutException("Failed to load topic within timeout")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to make the timeout of the topic loading configurable in the broker.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 we should make this setting configurable.
In apache#6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout
ed1a9ec
to
209e6d0
Compare
@sijie @codelipenghui added the option, let me know if there is anything else! |
/pulsarbot run-failure-checks |
@addisonj Could you please help add the configuration to the broker.conf? So that users can find it easier. The change looks good to me. |
In #6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout Co-authored-by: Addison Higham <ahigham@instructure.com>(cherry picked from commit 6854b00)
In apache#6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout Co-authored-by: Addison Higham <ahigham@instructure.com>
In apache#6489, a timeout was introduced to make sure calls into the BrokerService finish or error out. However, this timeout is too low by default when loading topics that have many replicated clusters. Loading replicated topics is quite an expensive operation, involve global ZK lookups and the start of many sub-processes. While we would hope it finishes in 60 seconds we want to safe. Long term, it may make sense to break out this operation into more steps where each step can have it's own timeout Co-authored-by: Addison Higham <ahigham@instructure.com>
In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.
Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.
Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout