Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: prevent the StoreRebalancer from downreplicating a range #64170

Merged
merged 2 commits into from
Apr 27, 2021

Conversation

aayushshah15
Copy link
Contributor

This PR contains 2 commits:

kvserver: add safeguard against spurious calls to AdminRelocateRange

This commit adds checks inside of AdminRelocateRange to fail if the
lists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.

Release note: None

kvserver: prevent the StoreRebalancer from downreplicating a range

Prior to this commit, we had a bug inside one of the methods used by the
StoreRebalancer where we had two variables referring to a slice that
was subsequently being appended to.

The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.

This commit fixes this bug.

Resolves #64064

Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

I'm trying to figure out if I can write a sort of randomized test for the store rebalancer, but so far I've had a hard time thinking of anything that wouldn't be super invasive. If I cannot cleanly add a randomized test, I will add a more targeted unit test.

The PR should be ready for a look aside from this.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):

	for i := range targets {
		for j := i + 1; j < len(targets); j++ {
			if targets[i].StoreID == targets[j].StoreID && targets[i].NodeID == targets[j].NodeID {

Isn't this just targets[i] == targets[j]?


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

	targetType targetReplicaType,
) []roachpb.ReplicaDescriptor {
	var finalTargetsForType *[]roachpb.ReplicaDescriptor

Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList in the loop down below to observe the append call below.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Isn't this just targets[i] == targets[j]?

Fixed, didn't realize I was just comparing ReplicationTargets and not ReplicaDescriptors


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList in the loop down below to observe the append call below.

Done, lmk if you think it is lacking.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

we want the result of subsequent calls

Isn't this "we want subsequent calls"?

Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList isn't what mutates the slice, it's the call to append below it.

This commit adds checks inside of `AdminRelocateRange` to fail if the
lists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.

Release note: None
Prior to this commit, we had a bug inside one of the methods used by the
`StoreRebalancer` where we had two variables referring to a slice that
was subsequently being appended to.

The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.

This commit fixes this bug.

Resolves cockroachdb#64064

Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
we want the result of subsequent calls

Isn't this "we want subsequent calls"?

Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList isn't what mutates the slice, it's the call to append below it.

argh thanks for catching that. fixed and done.

@craig
Copy link
Contributor

craig bot commented Apr 27, 2021

Build succeeded:

@craig craig bot merged commit 3335e39 into cockroachdb:master Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: unxpected replication change from 3 to 2 voters
3 participants