-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix no known bookies after reset racks for all BKs #4128
Fix no known bookies after reset racks for all BKs #4128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good resolution
// Register 2 BKs with depth 1 rack. | ||
BookieNode dp1BkNode1 = new BookieNode(bkId1, dp1Rack); | ||
BookieNode dp1BkNode2 = new BookieNode(bkId2, dp1Rack); | ||
networkTopology.add(dp1BkNode1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a case for the middle state will be better.
The only one bks shutdown, then restart it with depth 2 rack, it also can't add successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
* Fix no known bookies after reset racks for all BKs (cherry picked from commit c07e72a)
* Fix no known bookies after reset racks for all BKs
* Fix no known bookies after reset racks for all BKs (cherry picked from commit c07e72a)
* Fix no known bookies after reset racks for all BKs
Motivation
Background
NetworkTopologyImpl
does not allow multi Bks register with differentdepthOfAllLeaves.
NetworkTopologyImpl,
once a node is added, thedepthOfAllLeaves
will be initiated.EnsemblePlacementPolicy.networkTopology.add( newBkNode )
EnsemblePlacementPolicy.knownBookies.add( newBkNode )
network topology.add(newBkNode)
fail, it will not callknownBookies.add( newBkNode )
The scenarios that would hit bugs.
Scenario 1: Reset racks for all BK nodes, for Example:
[BK1,BK2]
with rack/r_1
depthOfAllLeaves
is2
now.[BK1,BK2]
with rack/region_1/r_1
/region_1/r_1
is3
, different with2
depthOfAllLeaves
ofNetworkTopologyImpl
is still2
,[BK1, BK2]
could not be added, and get an errorcan't add leaf node BK1 at depth 3 to the topology.
You can reproduce this by the new test
testRestartBKWithNewRackDepth.
Scenario 2: A race condition caused
depthOfAllLeaves
to be initialized with a wrong value.BK1 start/shutdown
ZK main
RegistrationClient
BK1
startbkInfos
ofRegistrationClient
EnsemblePlacementPolicy
: a node addedBK1
shutdownbkInfos
ofRegistrationClient,
bkInfos
is empty nowNetworkTopologyImpl
tries to calculate the network location but getsnull,
so use a default valuedefault-region/default-rack
[1]depthOfAllLeaves
was initialized to3
BK1
could not be added toNetworkTopologyImpl
anymore[1]: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/TopologyAwareEnsemblePlacementPolicy.java#L830
The above scenario will cause
EnsemblePlacementPolicy.knownBookies
to be an empty collection, leading to errorNot enough non-faulty bookies available
The issue below occurs on the version
4.16.3
Changes
Reset
depthOfAllLeaves
after all BKs have been removed.