-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: prevent the StoreRebalancer from downreplicating a range #64170
kvserver: prevent the StoreRebalancer from downreplicating a range #64170
Conversation
I'm trying to figure out if I can write a sort of randomized test for the store rebalancer, but so far I've had a hard time thinking of anything that wouldn't be super invasive. If I cannot cleanly add a randomized test, I will add a more targeted unit test. The PR should be ready for a look aside from this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):
for i := range targets { for j := i + 1; j < len(targets); j++ { if targets[i].StoreID == targets[j].StoreID && targets[i].NodeID == targets[j].NodeID {
Isn't this just targets[i] == targets[j]
?
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
targetType targetReplicaType, ) []roachpb.ReplicaDescriptor { var finalTargetsForType *[]roachpb.ReplicaDescriptor
Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList
in the loop down below to observe the append
call below.
1aaccf7
to
abab419
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Isn't this just
targets[i] == targets[j]
?
Fixed, didn't realize I was just comparing ReplicationTarget
s and not ReplicaDescriptor
s
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to
allocateTargetFromList
in the loop down below to observe theappend
call below.
Done, lmk if you think it is lacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
we want the result of subsequent calls
Isn't this "we want subsequent calls"?
Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList
isn't what mutates the slice, it's the call to append
below it.
abab419
to
b8a0897
Compare
This commit adds checks inside of `AdminRelocateRange` to fail if the lists of relocation targets passed in by the caller contain duplicates. This is supposed to act as a safeguard against catastrophic outcomes like a range getting spuriously downreplicated due to malformed input. Release note: None
Prior to this commit, we had a bug inside one of the methods used by the `StoreRebalancer` where we had two variables referring to a slice that was subsequently being appended to. The code was making the implicit assumption that both of these slices would point to the same underlying array, which is not true since any of the additions mentioned above could result in the underlying array for one of the slices being resized. This commit fixes this bug. Resolves cockroachdb#64064 Release note (bug fix): A bug in previous 21.1 betas allowed the store rebalancer to spuriously downreplicate a range during normal operation. This bug is now fixed.
b8a0897
to
b00f923
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
we want the result of subsequent calls
Isn't this "we want subsequent calls"?
Also, let's explicitly mention the append when we say "the result of previous calls", because
allocateTargetFromList
isn't what mutates the slice, it's the call toappend
below it.
argh thanks for catching that. fixed and done.
Build succeeded: |
This PR contains 2 commits:
kvserver: add safeguard against spurious calls to AdminRelocateRange
This commit adds checks inside of
AdminRelocateRange
to fail if thelists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.
Release note: None
kvserver: prevent the StoreRebalancer from downreplicating a range
Prior to this commit, we had a bug inside one of the methods used by the
StoreRebalancer
where we had two variables referring to a slice thatwas subsequently being appended to.
The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.
This commit fixes this bug.
Resolves #64064
Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.