-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinator primary segment assignment fix #5532
Coordinator primary segment assignment fix #5532
Conversation
Fix looks good to me, would be nice if we had tests but looks like that's pretty difficult/complicated as of now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Can we add a unit test to verify the load rule correctly considers segment loading?
@@ -33,9 +33,12 @@ | |||
*/ | |||
public class SegmentReplicantLookup | |||
{ | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary lines.
…er while loading is in progress
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after CI
Saw a failure in this one:
Being optimistic, I restarted the tests. Although this test does seem potentially related to this patch. |
@gianm the test was added in #5528, and it also failed in a definitely unrelated PR where i increased the timeout so hopefully it will happen less often. I don't think it's because of a race condition with the curator test server used in the test because I ran it in a loop where it successfully passed ~720 times, with something like this
but I suppose it's possible it's still racy somewhere. Also possible is that travis just takes a bit longer than my local debug machine took to run the tests, which is why the increased timeout in the other PR. |
The test passed after being restarted, so let's commit the patch. |
* fix issue where assign primary assigns segments to all historical servers in cluster * fix test * add test to ensure primary assignment will not assign to another server while loading is in progress
* fix issue where assign primary assigns segments to all historical servers in cluster * fix test * add test to ensure primary assignment will not assign to another server while loading is in progress
* fix issue where assign primary assigns segments to all historical servers in cluster * fix test * add test to ensure primary assignment will not assign to another server while loading is in progress
* fix issue where assign primary assigns segments to all historical servers in cluster * fix test * add test to ensure primary assignment will not assign to another server while loading is in progress
Fixes #5531 by taking segment loading into account when deciding how to place segments, avoiding a bug where primary assignment would assign the segment to all historicals until it is loaded.
Reverts some code removed in #4921 for tracking currently loading segments in
SegmentReplicantLookup
.