Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[broker] Timeout API calls in BrokerService #6489

Merged
merged 1 commit into from
Mar 18, 2020

Conversation

addisonj
Copy link
Contributor

@addisonj addisonj commented Mar 6, 2020

See #6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in #6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures may not be returning.

Motivation

This commit just adds a deadline for all the futures created in brokerService.

This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues.

Verifying this change

This change is already covered by existing tests, such as the broker tests

(or)

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): n
  • The public API:n
  • The schema: n
  • The default values of configurations: n
  • The wire protocol:n
  • The rest endpoints: n
  • The admin cli options: n
  • Anything that affects deployment: n

Documentation

  • Does this pull request introduce a new feature? n
  • If yes, how is the feature documented? n/a
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

See apache#6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in apache#6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues
@addisonj
Copy link
Contributor Author

addisonj commented Mar 6, 2020

/pulsarbot run-failure-checks

1 similar comment
@addisonj
Copy link
Contributor Author

addisonj commented Mar 6, 2020

/pulsarbot run-failure-checks

@@ -732,6 +733,20 @@ public void unloadNamespaceBundlesGracefully() {
return future;
}

private <T> CompletableFuture<T> futureWithDeadline(Long delay, TimeUnit unit, Exception exp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into FutureUtil class?

@sijie sijie added area/broker release/2.5.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Mar 6, 2020
@sijie sijie added this to the 2.6.0 milestone Mar 6, 2020
@sijie sijie merged commit 4a4cce9 into apache:master Mar 18, 2020
tuteng pushed a commit to AmateurEvents/pulsar that referenced this pull request Mar 21, 2020
See apache#6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in apache#6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>
(cherry picked from commit 4a4cce9)
tuteng pushed a commit that referenced this pull request Apr 6, 2020
See #6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in #6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>
(cherry picked from commit 4a4cce9)
tuteng pushed a commit that referenced this pull request Apr 13, 2020
See #6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in #6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>
(cherry picked from commit 4a4cce9)
addisonj pushed a commit to instructure/pulsar that referenced this pull request Apr 17, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
addisonj pushed a commit to instructure/pulsar that referenced this pull request Apr 17, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
addisonj pushed a commit to instructure/pulsar that referenced this pull request Apr 17, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
addisonj pushed a commit to instructure/pulsar that referenced this pull request Apr 17, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
addisonj pushed a commit to instructure/pulsar that referenced this pull request Apr 23, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
addisonj pushed a commit to instructure/pulsar that referenced this pull request May 5, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
sijie pushed a commit that referenced this pull request May 7, 2020
In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>
addisonj pushed a commit to instructure/pulsar that referenced this pull request May 7, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
jiazhai pushed a commit that referenced this pull request May 8, 2020
In #6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>(cherry picked from commit 6854b00)
jiazhai pushed a commit to jiazhai/pulsar that referenced this pull request May 18, 2020
See apache#6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in apache#6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>(cherry picked from commit 4a4cce9)
Huanli-Meng pushed a commit to Huanli-Meng/pulsar that referenced this pull request May 27, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>
addisonj pushed a commit to instructure/pulsar that referenced this pull request Jun 12, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
See apache#6416. This change ensures that all futures within BrokerService
have a guranteed timeout. As stated in apache#6416, we see cases where it
appears that loading or creating a topic fails to resolve the future for
unknown reasons. It appears that these futures *may* not be returning.
This seems like a sane change to make to ensure that these futures
finish, however, it still isn't understood under what conditions these
futures may not be returning, so this fix is mostly a workaround for
some underlying issues

Co-authored-by: Addison Higham <ahigham@instructure.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
In apache#6489, a timeout was introduced to make sure calls into the
BrokerService finish or error out. However, this timeout is too low by
default when loading topics that have many replicated clusters.

Loading replicated topics is quite an expensive operation, involve
global ZK lookups and the start of many sub-processes. While we would
hope it finishes in 60 seconds we want to safe.

Long term, it may make sense to break out this operation into more
steps where each step can have it's own timeout

Co-authored-by: Addison Higham <ahigham@instructure.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker release/2.5.1 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants