Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kvserver: synchronize replica removal with read-write requests #64598

Merged

Conversation

erikgrinaker
Copy link
Contributor

Backport 1/1 commits from #64471. Ran kvserver tests with -tags deadlock.

/cc @cockroachdb/release @cockroachdb/kv


Replica removal did not synchronize with in-flight read-write requests,
which could cause them to be evaluated on a removed (empty) replica. The
request would not be able to persist any writes, since it's unable to
submit Raft proposals. However, it can affect conditional writes, for
example causing a ConditionalPutRequest to error because it finds a
missing value instead of the expected one.

This patch fixes the problem by taking out Replica.readOnlyCmdMu
during pre-Raft evaluation, to synchronize with replica removal. This
can cause such requests to return AmbiguousResultError as the write is
evaluated.

Resolves #46329, follow-up from #64324.

Release note (bug fix): Fixed a race condition where read-write requests
during replica removal (e.g. during range merges or rebalancing) could
be evaluated on the removed replica. These will not have been able to
write any data to persistent storage, but could behave unexpectedly,
e.g. returning errors that they should not have returned.

@erikgrinaker erikgrinaker self-assigned this May 3, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented May 4, 2021

It looks like we didn't request an extraordinary release for 21.1, which I think means that this PR will wait for v21.1.1 (another week or two until we can merge).

@erikgrinaker
Copy link
Contributor Author

Yeah, think we decided to punt it. Will hold for 21.1.1.

@erikgrinaker erikgrinaker marked this pull request as draft May 4, 2021 08:36
@tbg tbg removed their request for review May 4, 2021 08:46
Replica removal did not synchronize with in-flight read-write requests,
which could cause them to be evaluated on a removed (empty) replica. The
request would not be able to persist any writes, since it's unable to
submit Raft proposals. However, it can affect conditional writes, for
example causing a `ConditionalPutRequest` to error because it finds a
missing value instead of the expected one.

This patch fixes the problem by taking out `Replica.readOnlyCmdMu`
during pre-Raft evaluation, to synchronize with replica removal. This
can cause such requests to return `AmbiguousResultError` as the write is
evaluated.

Release note (bug fix): Fixed a race condition where read-write requests
during replica removal (e.g. during range merges or rebalancing) could
be evaluated on the removed replica. These will not have been able to
write any data to persistent storage, but could behave unexpectedly,
e.g. returning errors that they should not have returned.
@erikgrinaker erikgrinaker marked this pull request as ready for review May 12, 2021 16:14
@erikgrinaker erikgrinaker merged commit 7e24776 into cockroachdb:release-21.1 May 13, 2021
@erikgrinaker erikgrinaker deleted the backport21.1-64471 branch May 13, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants