-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Broker] Fix and improve topic ownership assignment #13069
[Broker] Fix and improve topic ownership assignment #13069
Conversation
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Show resolved
Hide resolved
534eec7
to
7c785d0
Compare
7c785d0
to
e483775
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I fixed the test in https://github.com/lhotari/pulsar/commits/lh-fix-topic-ownership-assignment-branch-2.8 . I can now consistently reproduce the issue. It results in error 500 :
|
Problem happens here: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/common/naming/NamespaceBundleFactory.java Lines 104 to 110 in 0b9d51b
|
Exception gets handled here: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java Lines 205 to 208 in e1fbccd
and here: pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/lookup/TopicLookupBase.java Lines 126 to 130 in 5dc5de8
|
I have a fix for branch-2.8 in lhotari@1a350804 . I'll make similar changes to this PR since they aren't 2.8 specific. |
@codelipenghui @eolivelli Please review the recent changes. |
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/NamespaceService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/common/naming/NamespaceBundleFactory.java
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/common/naming/NamespaceBundleFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
PR to backport this fix to branch-2.8 is #13117 |
…ache#13117) (cherry picked from commit 537dee1)
* Add warning log message when leader broker isn't available * Add more logging about load manager decisions * Use cached information for available brokers * Reproduce lookup race issue * Use java.util.concurrent.Phaser to increase the chances of a race * Address review feedback * Increase concurrency of test case to reproduce race conditions * Use real Zookeeper server in MultiBrokerLeaderElectionTest * Add retry with backoff to loading namespace bundles * Add more topics to test * Address review comment * Fix checkstyle * Improve logging * Address review comments (cherry picked from commit 537dee1)
* Add warning log message when leader broker isn't available * Add more logging about load manager decisions * Use cached information for available brokers * Reproduce lookup race issue * Use java.util.concurrent.Phaser to increase the chances of a race * Address review feedback * Increase concurrency of test case to reproduce race conditions * Use real Zookeeper server in MultiBrokerLeaderElectionTest * Add retry with backoff to loading namespace bundles * Add more topics to test * Address review comment * Fix checkstyle * Improve logging * Address review comments
…!64) Squash merge branch 'optimize-ownership-assign' into '2.8.1' --story=872733891 负载均衡优化(apache#13069)(apache#13117 ) TAPD: --story=872733891
Motivation
This PR depends on #13066 .
When a lookup is made to a topic that isn't currently loaded, the decision will be made in a distributed fashion on the follower brokers since the information about the leader broker is missing (because
LeaderElectionService.getCurrentLeader()
always returned Optional.empty()). This leads to races when assigning the topic ownership to a broker, since the decision isn't made centrally on the leader broker.This PR adds a test to verify the behavior and also uses a cached way to get available brokers.
Modifications
MultiBrokerLeaderElectionTest.shouldProvideConsistentAnswerToTopicLookup
NamespaceService.searchForCandidateBroker
LoadManager.getAvailableBrokers
instead of reading the list every time from ZookeeperAdditional context
PR to backport this fix to branch-2.8 is #13117 . Pulsar 2.8.1 contains another topic ownership bug which is fixed by #12650 .