-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix over-replication caused by balancing when inventory is not updated yet #13114
Conversation
…ator_test_fwrk
* Fix Apache #12881 to fix this test. | ||
*/ | ||
@Test | ||
public void testBalancingWithStaleInventoryCausesOverReplication() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this test to SegmentLoadingTest.testBalancingWithStaleViewDoesNotOverReplicate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, otherwise LGTM
ConcurrentMap<SegmentId, BalancerSegmentHolder> movingSegments = | ||
currentlyMovingSegments.get(toServer.getTier()); | ||
movingSegments.put(segmentId, segment); | ||
final LoadPeonCallback callback = moveSuccess -> movingSegments.remove(segmentId); | ||
try { | ||
ConcurrentMap<SegmentId, BalancerSegmentHolder> movingSegments = | ||
currentlyMovingSegments.get(toServer.getTier()); | ||
movingSegments.put(segmentId, segment); | ||
callback = () -> movingSegments.remove(segmentId); | ||
coordinator.moveSegment( | ||
params, | ||
fromServer, | ||
toServer, | ||
segmentToMove, | ||
callback | ||
); | ||
coordinator | ||
.moveSegment(params, fromServer, toServer, segmentToMove, callback); | ||
return true; | ||
} | ||
catch (Exception e) { | ||
log.makeAlert(e, StringUtils.format("[%s] : Moving exception", segmentId)).emit(); | ||
if (callback != null) { | ||
callback.execute(); | ||
} | ||
log.makeAlert(e, "[%s] : Moving exception", segmentId).emit(); | ||
callback.execute(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes fixing something or just style? It looks like just style trying to remove the null check from the callback by moving the work of building it outside of the try
is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just a style change for readability.
Thanks a lot for the review, @imply-cheddar ! |
Fixes item (1) in #12881
Description
The current implementation of segment balancing often leads to over-replication.
This can happen in the following (fairly common) scenario:
This is a frequent occurrence but under normal circumstances, it is not noticeable
because load rules quickly drop the over-replicated segment from A. But if load rules get
stuck for some reason (e.g. trying to reach target replication levels on a different tier),
the number of over-replicated segments keeps increasing, thereby overloading historicals.
Changes
The HTTP response status is enough to determine load success or failure.
SegmentLoadingNegativeTest
Main classes to review:
LoadPeonCallback
DruidCoordinator
CuratorLoadQueuePeon
HttpLoadQueuePeon
BalanceSegments
This PR has: