-
Notifications
You must be signed in to change notification settings - Fork 4k
kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5bcb954 to
b157314
Compare
|
I pulled out the fix for self-recovery after EndTxn(abort) to its own PR here; #156722 |
b157314 to
c8ee395
Compare
|
@miraradeva @arulajmani Rebased this on master. It should be good for a look. Definitely let me know if you can athink of some tests we should add here in addition to the targeted one I added. |
miraradeva
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miraradeva reviewed 1 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
-- commits line 38 at r3:
Should this be Put(b) succeeding? Or maybe I'm misunderstanding the example.
-- commits line 41 at r3:
Future me would love to not have to remember what errors we hit. Maybe list something that's easy to explain?
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4796 at r3 (raw file):
// // After a successful refresh, our testing filters have arranged for the // second EndTxn to fail.
So no recovery involved here. I like that; it makes it simpler to understand.
But should we mention somewhere that the second EndTxn can race with a txn recovery? That's how we get the "already committed" variant of the failure, right?
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4869 at r3 (raw file):
// for this bug because it both needs to be in an isolation level that // allows committing when wrt != rts and the read needs to produce a read // span that requires a fresh such that we can't do a server-side refresh.
I assume "fresh" is supposed to be "refresh". So essentially we need the stepping in order to push the txn's read timestamp, which will make it ineligible for server-side refreshes?
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4915 at r3 (raw file):
require.Error(t, txn1Err) } else { // if val8 doesn't exist, then txn1 was committed so we shouldn't have
nit: comment punctuation
…tions
The span refresher may retry a batch with an EndTxn request. When
retrying such a request, it splits the EndTxn out into its own batch to
avoid the transaction from becoming implicitly committed in the middle
of the retry.
For serializable transactions this is sufficient because serializable
transactions are not allowed to write a transaction record if
WriteTimestamp != ReadTimestamp. All of the errors we retry in the span
refresher imply that the write timestamp is moving forward, and thus any
old staging record is at a timestamp that is less than the timestamp
that we will be writing at during the retry.
For weak isolation transactions, however, this is not sufficient.
Consider a weak isolation transaction issuing the following batch, with
ReadTimestamp == WriteTimestamp == t1.
Put(a)
Put(b)
EndTxn
Assume the Puts and EndTxn go to different ranges in parallel. The
following can happen:
Put(a) -> Encounters WriteTooOld that requires a refresh to t2
Put(b) -> Writes intent@t1
EndTxn() -> WriteTimetamp pushed via timestamp cache to t2, writes staging record@t2
In a SSI transaction the EndTxn would fail because WriteTimestamp !=
ReadTimestamp, but in an weak isolation transaction we write a staging
record at t2.
If we successfully refresh to t2 and start our retry, then that existing
STAGING transaction record meets the implicit commit criteria as soon as
Put(a) succeeds on retry.
We would like to avoid this because it can result in the transaction being
comitted by transaction recovery. This might manifest as:
- a `TransactionStatusError: already committed (REASON_TXN_COMMITTED)` during a
commit; or,
- an unrelated error being delivered to the client (if, say, some portion of the
retry failed) despite the transaction committing.
Here, we avoid this hazard by refreshing weak isolation transactions
that are in STAGING to a timestamp just past the largest timestamp at
which the staging record could have been written. As a result, any
subsequent writes do not satisfy the implicit commit criteria of the
existing record.
Fixes cockroachdb#156698
Fixes cockroachdb#154510
Release note (bug fix): Fix a bug in which a Read Committed or Snapshot
isolation transaction may be committed despite returning a non-ambiguous
error.
c8ee395 to
f28a1db
Compare
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @miraradeva)
Previously, miraradeva (Mira Radeva) wrote…
Should this be Put(b) succeeding? Or maybe I'm misunderstanding the example.
Put(b) succeeded the first time and was written at ts1 which is less than the staging timestamp of ts2. So, once Put(a) succeeds on retry, then the combination of (new) Put(a) and (old) Put(b) technically satisfying the old staging record.
Previously, miraradeva (Mira Radeva) wrote…
Future me would love to not have to remember what errors we hit. Maybe list something that's easy to explain?
Added the two conditions we have observed, but I honestly doubt it is a complete list of ways this could manifest.
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4796 at r3 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
So no recovery involved here. I like that; it makes it simpler to understand.
But should we mention somewhere that the second EndTxn can race with a txn recovery? That's how we get the "already committed" variant of the failure, right?
I've added some words here to describe what I think happens in this case in when the bug was in place still.
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4869 at r3 (raw file):
So essentially we need the stepping in order to push the txn's read timestamp, which will make it ineligible for server-side refreshes?
Yeah, IIRC, what happens here is that without SteppingEnabled, then the transaction is in "auto step" mode in which we step the transaction between each batch. Further, auto-step (SteppingDisabled) mode consults the transaction level, when doing its step and steps the read timestamp every time. Stepping the read timestamp clears the read footprint in the span refresher, which then makes the batch eligble for server-side refreshes.
|
TFTR! bors r=miraradeva |
|
Build succeeded: |
The span refresher may retry a batch with an EndTxn request. When
retrying such a request, it splits the EndTxn out into its own batch to
avoid the transaction from becoming implicitly committed in the middle
of the retry.
For serializable transactions this is sufficient because serializable
transactions are not allowed to write a transaction record if
WriteTimestamp != ReadTimestamp. All of the errors we retry in the span
refresher imply that the write timestamp is moving forward, and thus any
old staging record is at a timestamp that is less than the timestamp
that we will be writing at during the retry.
For weak isolation transactions, however, this is not sufficient.
Consider a weak isolation transaction issuing the following batch, with
ReadTimestamp == WriteTimestamp == t1.
Assume the Puts and EndTxn go to different ranges in parallel. The
following can happen:
Put(a) -> Encounters WriteTooOld that requires a refresh to t2
Put(b) -> Writes intent@t1
EndTxn() -> WriteTimetamp pushed via timestamp cache to t2, writes staging record@t2
In a SSI transaction the EndTxn would fail because WriteTimestamp !=
ReadTimestamp, but in an weak isolation transaction we write a staging
record at t2.
If we successfully refresh to t2 and start our retry, then that existing
STAGING transaction record meets the implicit commit criteria as soon as
Put(a) succeeds on retry.
We would like to avoid this because it can result in a number of
different errors.
Here, we avoid this hazard by refreshing weak isolation transactions
that are in STAGING to a timestamp just past the largest timestamp at
which the staging record could have been written. As a result, any
subsequent writes do not satisfy the implicit commit criteria of the
existing record.
This required a small change to the transaction record returned from
EndTxn(abort) to avoid situations where we mistake this future
transaction record for the existing transaction record.
Fixes #156698
Fixes #154510
Release note (bug fix): Fix a bug in which a Read Committed or Snapshot
isolation transaction may be committed despite returning a non-ambiguous
error.