Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-19.2: allocator: fix incorrect rebalance signal and target #43077

Merged

Conversation

andy-kimball
Copy link
Contributor

Backport 1/1 commits from #43041.

/cc @cockroachdb/release


This change fixes two bugs:

  • The first bug is related to the comparison of floating point
    numbers when considering to rebalance a range. The diversity score
    of two identical ranges (even the same range) may differ by a small
    amount. The fix is to consider candidates equal if their diversity score
    have very close values (within 1e-10).
  • The second bug has been introduced with
    *: remove the remaining uses of ReplicaDescriptors.Unwrap #39157 and resulted in
    ignoring the newly added replica when determining what existing replica
    it should replace in Allocator.simulateRemoveTarget.
    As a result - when the first bug was triggering and unnecessary
    rebalancing was happening - the replica to be dropped was chosen
    at random (in the case where all existing replica are necessary).
    For example - let's say a range is constrained to have a replica in each
    of the regions A,B,C. We are trying to add a new replica in C.
    Evaluating what should be removed from the A,B,C may choose any one of them.
    If the choice is A or B, the constraint will be violated.
    If we however ask the same question for A,B,C(old),C(new) - the result
    will be C(old) as this will preserve the constraint (to have one
    replica in each region).

Fixes #42715
Fixes https://github.com/cockroachlabs/support/issues/348

Release note (bug fix): the allocator now considers stores with very close
diversity scores equal (all other things being the same) and doesn't attempt
to rebalance.
Release note (bug fix): the allocator considers the new store being added
when looking for target in case of rebalance.

This change fixes two bugs:
- The first bug is related to the comparison of floating point
numbers when considering to rebalance a range. The diversity score
of two identical ranges (even the same range) may differ by a small
amount. The fix is to consider candidates equal if their diversity score
have very close values (within 1e-10).
- The second bug has been introduced with
cockroachdb#39157 and resulted in
ignoring the newly added replica when determining what existing replica
it should replace in Allocator.simulateRemoveTarget.
As a result - when the first bug was triggering and unnecessary
rebalancing was happening - the replica to be dropped was chosen
at random (in the case where all existing replica are necessary).
For example - let's say a range is constrained to have a replica in each
of the regions A,B,C. We are trying to add a new replica in C.
Evaluating what should be removed from the A,B,C may choose any one of them.
Regardless of the choise, the constraint will be violated.
If we however ask the same question for A,B,C(old),C(new) - the result
will be C(old) as this will preserve the constraint (to have one
replica in each region).

Fixes cockroachdb#42715
Fixes cockroachlabs/support#348

Release note (bug fix): the allocator now considers stores with very close
diversity scores equal (all other things being the same) and doesn't attempt
to rebalance.
Release note (bug fix): the allocator considers the new store being added
when looking for target in case of rebalance.
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt
Copy link
Member

dt commented Dec 10, 2019

I'm going to wait on restarting 19.2.2 qualification to pick this one up too.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@dt dt mentioned this pull request Dec 10, 2019
19 tasks
@andy-kimball andy-kimball merged commit 73b64c3 into cockroachdb:release-19.2 Dec 10, 2019
@andy-kimball andy-kimball deleted the backport19.2-43041 branch December 10, 2019 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants