Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP-45: Implement load-manager leader election using new CoordinationService interface #9240

Merged
merged 10 commits into from
Jan 22, 2021

Conversation

merlimat
Copy link
Contributor

Motivation

Moving the load manager leader election to use the new CoordingationService leader election.

@merlimat merlimat added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Jan 19, 2021
@merlimat merlimat added this to the 2.8.0 milestone Jan 19, 2021
@merlimat merlimat self-assigned this Jan 19, 2021
@@ -447,7 +448,7 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P
private void searchForCandidateBroker(NamespaceBundle bundle,
CompletableFuture<Optional<LookupResult>> lookupFuture,
LookupOptions options) {
if (null == pulsar.getLeaderElectionService() || !pulsar.getLeaderElectionService().isElected()) {
if (null == pulsar.getLeaderElectionService()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the check to allow lookup/assignment only if the leader is elected to maintain consistency in case broker is partitioned or leader is not elected yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already that way. The LeaderElectionService is returning an Optional<LeaderBroker> and it will be empty in all the cases where there's not a current leader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a flaky test issue #9408 where it seems that the reason why the test fails is that the leader never gets assigned to one of the brokers in the test. The value stays as Optional.empty(). I wonder what to do about the test? is it a problem in the test code or production code?
There's also other concerns that I have expressed in comments of #9408 related to the deep stack that was observed. The election seems to recurse when retrying and at least that happens in the test.

Copy link
Contributor

@rdhabalia rdhabalia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. just a minor comment.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
just left one little suggestion

long resourceQuotaUpdateInterval = TimeUnit.MINUTES
.toMillis(getConfiguration().getLoadBalancerResourceQuotaUpdateIntervalMinutes());

loadSheddingTask = loadManagerExecutor.scheduleAtFixedRate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may it happen that we already have a value here in loadSheddingTask ? (some rare bug in the LeaderElectionService ?)
should be try to cancel it ?, the same for loadResourceQuotaTask

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it shouldn't happen, but it's a good point, I've added the cancel if it's already set.

@merlimat merlimat merged commit 2fd0ce0 into apache:master Jan 22, 2021
@merlimat merlimat deleted the load-manager-leader-election branch January 22, 2021 02:47
jiazhai pushed a commit to streamnative/kop that referenced this pull request Jan 26, 2021
Fixes #336

Besides applying the LightProto generated new package name and some new APIs, this PR also fixes the tests error caused by apache/pulsar#9240, which introduces a new method PulsarService#createLocalMetadataStore.

And some tests are affected by the wrong parse of entry data after the pulsar update, this PR fixes the test failures.
merlimat added a commit to merlimat/pulsar that referenced this pull request Apr 6, 2021
…Service interface (apache#9240)

* PIP-45: Implement load-manager leader election using new CoordinationService interface

* Fixed checkstyle issue

* Allow to mock the metadata instance creation

* More test fixes

* Fixed style

* Test fixes

* Cancel task if there's one already there.

* Test fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants