kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010

stevendanna · 2025-08-18T12:35:41Z

The span refresher may retry a batch with an EndTxn request. When
retrying such a request, it splits the EndTxn out into its own batch to
avoid the transaction from becoming implicitly committed in the middle
of the retry.

For serializable transactions this is sufficient because serializable
transactions are not allowed to write a transaction record if
WriteTimestamp != ReadTimestamp. All of the errors we retry in the span
refresher imply that the write timestamp is moving forward, and thus any
old staging record is at a timestamp that is less than the timestamp
that we will be writing at during the retry.

For weak isolation transactions, however, this is not sufficient.

Consider a weak isolation transaction issuing the following batch, with
ReadTimestamp == WriteTimestamp == t1.

Put(a)
Put(b)
EndTxn

Assume the Puts and EndTxn go to different ranges in parallel. The
following can happen:

Put(a) -> Encounters WriteTooOld that requires a refresh to t2
Put(b) -> Writes intent@t1
EndTxn() -> WriteTimetamp pushed via timestamp cache to t2, writes staging record@t2

In a SSI transaction the EndTxn would fail because WriteTimestamp !=
ReadTimestamp, but in an weak isolation transaction we write a staging
record at t2.

If we successfully refresh to t2 and start our retry, then that existing
STAGING transaction record meets the implicit commit criteria as soon as
Put(a) succeeds on retry.

We would like to avoid this because it can result in a number of
different errors.

Here, we avoid this hazard by refreshing weak isolation transactions
that are in STAGING to a timestamp just past the largest timestamp at
which the staging record could have been written. As a result, any
subsequent writes do not satisfy the implicit commit criteria of the
existing record.

This required a small change to the transaction record returned from
EndTxn(abort) to avoid situations where we mistake this future
transaction record for the existing transaction record.

Fixes #156698
Fixes #154510

Release note (bug fix): Fix a bug in which a Read Committed or Snapshot
isolation transaction may be committed despite returning a non-ambiguous
error.

cockroach-teamcity · 2025-08-18T12:35:52Z

This change is

stevendanna · 2025-11-02T02:04:52Z

I pulled out the fix for self-recovery after EndTxn(abort) to its own PR here; #156722

stevendanna · 2025-11-12T16:07:22Z

@miraradeva @arulajmani Rebased this on master. It should be good for a look. Definitely let me know if you can athink of some tests we should add here in addition to the targeted one I added.

miraradeva

@miraradeva reviewed 1 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

-- commits line 38 at r3:
Should this be Put(b) succeeding? Or maybe I'm misunderstanding the example.

-- commits line 41 at r3:
Future me would love to not have to remember what errors we hit. Maybe list something that's easy to explain?

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4796 at r3 (raw file):

//
// After a successful refresh, our testing filters have arranged for the
// second EndTxn to fail.

So no recovery involved here. I like that; it makes it simpler to understand.

But should we mention somewhere that the second EndTxn can race with a txn recovery? That's how we get the "already committed" variant of the failure, right?

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4869 at r3 (raw file):

		// for this bug because it both needs to be in an isolation level that
		// allows committing when wrt != rts and the read needs to produce a read
		// span that requires a fresh such that we can't do a server-side refresh.

I assume "fresh" is supposed to be "refresh". So essentially we need the stepping in order to push the txn's read timestamp, which will make it ineligible for server-side refreshes?

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4915 at r3 (raw file):

		require.Error(t, txn1Err)
	} else {
		// if val8 doesn't exist, then txn1 was committed so we shouldn't have

nit: comment punctuation

…tions The span refresher may retry a batch with an EndTxn request. When retrying such a request, it splits the EndTxn out into its own batch to avoid the transaction from becoming implicitly committed in the middle of the retry. For serializable transactions this is sufficient because serializable transactions are not allowed to write a transaction record if WriteTimestamp != ReadTimestamp. All of the errors we retry in the span refresher imply that the write timestamp is moving forward, and thus any old staging record is at a timestamp that is less than the timestamp that we will be writing at during the retry. For weak isolation transactions, however, this is not sufficient. Consider a weak isolation transaction issuing the following batch, with ReadTimestamp == WriteTimestamp == t1. Put(a) Put(b) EndTxn Assume the Puts and EndTxn go to different ranges in parallel. The following can happen: Put(a) -> Encounters WriteTooOld that requires a refresh to t2 Put(b) -> Writes intent@t1 EndTxn() -> WriteTimetamp pushed via timestamp cache to t2, writes staging record@t2 In a SSI transaction the EndTxn would fail because WriteTimestamp != ReadTimestamp, but in an weak isolation transaction we write a staging record at t2. If we successfully refresh to t2 and start our retry, then that existing STAGING transaction record meets the implicit commit criteria as soon as Put(a) succeeds on retry. We would like to avoid this because it can result in the transaction being comitted by transaction recovery. This might manifest as: - a `TransactionStatusError: already committed (REASON_TXN_COMMITTED)` during a commit; or, - an unrelated error being delivered to the client (if, say, some portion of the retry failed) despite the transaction committing. Here, we avoid this hazard by refreshing weak isolation transactions that are in STAGING to a timestamp just past the largest timestamp at which the staging record could have been written. As a result, any subsequent writes do not satisfy the implicit commit criteria of the existing record. Fixes cockroachdb#156698 Fixes cockroachdb#154510 Release note (bug fix): Fix a bug in which a Read Committed or Snapshot isolation transaction may be committed despite returning a non-ambiguous error.

stevendanna

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @miraradeva)

-- commits line 38 at r3:

Previously, miraradeva (Mira Radeva) wrote…

Should this be Put(b) succeeding? Or maybe I'm misunderstanding the example.

Put(b) succeeded the first time and was written at ts1 which is less than the staging timestamp of ts2. So, once Put(a) succeeds on retry, then the combination of (new) Put(a) and (old) Put(b) technically satisfying the old staging record.

-- commits line 41 at r3:

Previously, miraradeva (Mira Radeva) wrote…

Future me would love to not have to remember what errors we hit. Maybe list something that's easy to explain?

Added the two conditions we have observed, but I honestly doubt it is a complete list of ways this could manifest.

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4796 at r3 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

So no recovery involved here. I like that; it makes it simpler to understand.

But should we mention somewhere that the second EndTxn can race with a txn recovery? That's how we get the "already committed" variant of the failure, right?

I've added some words here to describe what I think happens in this case in when the bug was in place still.

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 4869 at r3 (raw file):

So essentially we need the stepping in order to push the txn's read timestamp, which will make it ineligible for server-side refreshes?

Yeah, IIRC, what happens here is that without SteppingEnabled, then the transaction is in "auto step" mode in which we step the transaction between each batch. Further, auto-step (SteppingDisabled) mode consults the transaction level, when doing its step and steps the read timestamp every time. Stepping the read timestamp clears the read footprint in the span refresher, which then makes the batch eligble for server-side refreshes.

stevendanna · 2025-11-17T09:01:30Z

TFTR!

bors r=miraradeva

craig · 2025-11-17T09:41:21Z

Build succeeded:

stevendanna requested a review from miraradeva August 18, 2025 12:36

stevendanna force-pushed the ssd/commit-after-retry branch from 5bcb954 to b157314 Compare November 1, 2025 01:00

stevendanna changed the title ~~WIP: kvcoord: return AmbiguousResultError if transaction is STAGING~~ kvcoord: prevent unexpected parallel commit of weak isolation transactions Nov 1, 2025

stevendanna marked this pull request as ready for review November 1, 2025 01:02

stevendanna requested a review from a team as a code owner November 1, 2025 01:02

stevendanna requested review from arulajmani and removed request for a team November 1, 2025 01:02

stevendanna force-pushed the ssd/commit-after-retry branch from b157314 to c8ee395 Compare November 12, 2025 16:04

miraradeva approved these changes Nov 12, 2025

View reviewed changes

stevendanna force-pushed the ssd/commit-after-retry branch from c8ee395 to f28a1db Compare November 14, 2025 00:12

stevendanna commented Nov 14, 2025

View reviewed changes

craig bot merged commit a97e76d into cockroachdb:master Nov 17, 2025
24 checks passed

celeste-cockroachdb bot added the target-release-26.1.0 label Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010

kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010

stevendanna commented Aug 18, 2025 •

edited

Loading

Uh oh!

cockroach-teamcity commented Aug 18, 2025

Uh oh!

stevendanna commented Nov 2, 2025

Uh oh!

stevendanna commented Nov 12, 2025

Uh oh!

miraradeva left a comment

Uh oh!

stevendanna left a comment

Uh oh!

stevendanna commented Nov 17, 2025

Uh oh!

craig bot commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010

kvcoord: prevent unexpected parallel commit of weak isolation transactions #152010

Conversation

stevendanna commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Aug 18, 2025

Uh oh!

stevendanna commented Nov 2, 2025

Uh oh!

stevendanna commented Nov 12, 2025

Uh oh!

miraradeva left a comment

Choose a reason for hiding this comment

Uh oh!

stevendanna left a comment

Choose a reason for hiding this comment

Uh oh!

stevendanna commented Nov 17, 2025

Uh oh!

craig bot commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stevendanna commented Aug 18, 2025 •

edited

Loading