Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: ensure cleanup of all abandoned learners before an AdminRelocateRange #80205

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Apr 19, 2022

Previously, after #79405,
maybeLeaveAtomicChangeReplicasAndRemoveLearners() could return a range
descriptor that contained learner replicas. This was because of the idempotency
introduced in #79405, which relaxed how/when we failed replication changes due
to descriptor mismatch. The hazard was that, while we would ensure that the
learner we were trying to remove was indeed gone, we wouldn't do enough to
ensure that there weren't new learners that were added to the range due to
other concurrent AdminRelocateRange calls.

This was a violation of its contract and could sometimes fail
AdminRelocateRange requests.

This commit makes it such that when maybeLeaveAtomicChangeReplicasAndRemoveLearners
returns a non-error response, it guarantees that all the learner replicas
have been removed according to the returned range descriptor.

Resolves #79887

Release note: None

@aayushshah15 aayushshah15 requested a review from a team as a code owner April 19, 2022 22:21
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

@nvanbenschoten, this PR is just a proposal and not something I'm super sold on. What do you think?

…locateRange`

Previously, after cockroachdb#79405,
`maybeLeaveAtomicChangeReplicasAndRemoveLearners()` could return a range
descriptor that contained learner replicas. This was because of the idempotency
introduced in cockroachdb#79405, which relaxed how/when we failed replication changes due
to descriptor mismatch. The hazard was that, while we would ensure that the
learner we were trying to remove was indeed gone, we wouldn't do enough to
ensure that there weren't new learners that were added to the range due to
other concurrent `AdminRelocateRange` calls.

This was a violation of its contract and could sometimes fail
`AdminRelocateRange` requests.

This commit makes it such that when `maybeLeaveAtomicChangeReplicasAndRemoveLearners`
 returns a non-error response, it guarantees that _all_ the learner replicas
have been removed according to the returned range descriptor.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Apr 21, 2022
This commit is an alternative to cockroachdb#80205.

In cockroachdb#79405, we relaxed the check
that `AdminChangeReplicas` performs comparing the expected range descriptor
state against the range descriptor in KV when we were performing single learner
removals. This was because these learner removals are performed at the end of a
replication change (i.e. after its already completed) and thus, could be made
idempotent.

The above change messed with a subtle part of the contract of
`maybeLeaveAtomicChangeReplicasAndRemoveLearners()` since now it was no longer
guaranteed to always successfully return with a range desc that did not contain
learners (due to intervening replication changes that were now no longer
detected because of the relaxation mentioned above).

This commit fixes this regression by making
`maybeLeaveAtomicChangeReplicasAndRemoveLearners` check at the end if we've
raced with another replication change. If so, we return an error instead of
incorrectly returning a successful response with a descriptor that could still
have learners or be in a joint config.

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like this approach better than #80205. That PR feels like a partial regression of #79405, so I don't understand the benefit of a state with both of those PRs vs. a state with neither.

However, this PR does pose a risk of getting stuck in the loop if some other actor continuously adds learners.

I wonder if there's a third option that scopes down the call to maybeLeaveAtomicChangeReplicasAndRemoveLearners at the end of changeReplicasImpl. If we only want the demoted learners to be removed in this case, why are we requiring that all learners be removed?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go line 1324 at r1 (raw file):

	// Now the config isn't joint any more, but we may have demoted some voters
	// into learners. These learners should go as well.
	for {

Now that #79379 has landed, Should we be checking hasOutstandingLearnerSnapshotInFlight inside of this loop?


pkg/kv/kvserver/replica_command.go line 1338 at r1 (raw file):

		var err error
		log.VEventf(ctx, 2, `removing learner replica %v from %v`, removalTarget, desc)
		desc, err = execChangeReplicasTxn(

Having seen fallout already, I'm somewhat wary that #79405 might have other consequences for callers that are removing a single learner replica and expect execChangeReplicasTxn to be a true compare-and-swap on the full descriptor. Do you think that's a concern? If so, do you think there's merit in using changeReplicasTxnArgs to scope these limited compare-and-swap fast-paths to just the callers that want them, instead of inferring them from the shape of the internalReplicationChanges?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: kv50/rangelookups/relocate/nodes=8 failed [waiting for backport]
3 participants