-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-45: Implement load-manager leader election using new CoordinationService interface #9240
PIP-45: Implement load-manager leader election using new CoordinationService interface #9240
Conversation
…Service interface
@@ -447,7 +448,7 @@ public boolean registerNamespace(String namespace, boolean ensureOwned) throws P | |||
private void searchForCandidateBroker(NamespaceBundle bundle, | |||
CompletableFuture<Optional<LookupResult>> lookupFuture, | |||
LookupOptions options) { | |||
if (null == pulsar.getLeaderElectionService() || !pulsar.getLeaderElectionService().isElected()) { | |||
if (null == pulsar.getLeaderElectionService()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the check to allow lookup/assignment only if the leader is elected to maintain consistency in case broker is partitioned or leader is not elected yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already that way. The LeaderElectionService
is returning an Optional<LeaderBroker>
and it will be empty in all the cases where there's not a current leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a flaky test issue #9408 where it seems that the reason why the test fails is that the leader never gets assigned to one of the brokers in the test. The value stays as Optional.empty()
. I wonder what to do about the test? is it a problem in the test code or production code?
There's also other concerns that I have expressed in comments of #9408 related to the deep stack that was observed. The election seems to recurse when retrying and at least that happens in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. just a minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
just left one little suggestion
long resourceQuotaUpdateInterval = TimeUnit.MINUTES | ||
.toMillis(getConfiguration().getLoadBalancerResourceQuotaUpdateIntervalMinutes()); | ||
|
||
loadSheddingTask = loadManagerExecutor.scheduleAtFixedRate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may it happen that we already have a value here in loadSheddingTask
? (some rare bug in the LeaderElectionService ?)
should be try to cancel it ?, the same for loadResourceQuotaTask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it shouldn't happen, but it's a good point, I've added the cancel if it's already set.
Fixes #336 Besides applying the LightProto generated new package name and some new APIs, this PR also fixes the tests error caused by apache/pulsar#9240, which introduces a new method PulsarService#createLocalMetadataStore. And some tests are affected by the wrong parse of entry data after the pulsar update, this PR fixes the test failures.
…Service interface (apache#9240) * PIP-45: Implement load-manager leader election using new CoordinationService interface * Fixed checkstyle issue * Allow to mock the metadata instance creation * More test fixes * Fixed style * Test fixes * Cancel task if there's one already there. * Test fixes
Motivation
Moving the load manager leader election to use the new CoordingationService leader election.