Skip to content

kvcoord: assert that the txn is clean after commit#46399

Open
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:txn.assert-clean-after-commit
Open

kvcoord: assert that the txn is clean after commit#46399
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:txn.assert-clean-after-commit

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

This patch adds a couple of assertions about the state in which a txn
proto can be after a successful commit (explicit commit or implicit).
The assertions are about how the multiple updates to a txn done by
sub-batches of the commit batch (as split by the DistSender) must have
been collapsed in a sane way when merging their responses together.

I spent the morning thinking that we have a problem with the WriteTooOld
flag being set on the result for an implicitly-committed txn. The
scenario that I was worried about is that the
EndTxn(CanCommitAtHigherTimestamp:true) batch gets split
into two, and both halves get pushed. The EndTxn half gets pushed higher
than the other half (otherwise, we wouldn't end up in an implicit commit
state), but it performs a server-side refresh and succeeds. I was
worried that the other half can still return a WriteTooOld flag, which
would make it into the combined response, and would trick the span
refresher into retrying. This would be bad, since retrying after an
implicit commit is problematic.
However, I believe this situation is not possible. We're saved by the
fact that combining the transaction updates (Transaction.Update()) is
smart enough to not treat the WriteTooOld flag as cummulative; instead,
the response with the highest read timestamp gets to dictate the wto,
and the EndTxn response should not have it set.

Release note: None
Release justification: Only an assertion. It would be indicative of a
dangerous bug if it fires.

@andreimatei andreimatei requested a review from nvb March 21, 2020 18:17
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the one comment is addressed. However, could you run a set of the following three roachtests 3x each before merging this:

  • tpcc/mixed-headroom/n5cpu16
  • version/mixed/nodes=3
  • version/mixed/nodes=5

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go, line 227 at r1 (raw file):

	// highest read timestamp, the response from the EndTxn sub-batch gets to
	// dictate the wto flag of the combined response (see Transaction.Update()).
	if br.Txn.WriteTooOld {

Why do we have the same assertions twice? Why don't we just use txn.Status in assertTxnCleanWhenCommitted and call that from here?

This patch adds a couple of assertions about the state in which a txn
proto can be after a successful commit (explicit commit or implicit).
The assertions are about how the multiple updates to a txn done by
sub-batches of the commit batch (as split by the DistSender) must have
been collapsed in a sane way when merging their responses together.

I spent the morning thinking that we have a problem with the WriteTooOld
flag being set on the result for an implicitly-committed txn. The
scenario that I was worried about is that the
EndTxn(CanCommitAtHigherTimestamp:true) batch gets split
into two, and both halves get pushed. The EndTxn half gets pushed higher
than the other half (otherwise, we wouldn't end up in an implicit commit
state), but it performs a server-side refresh and succeeds. I was
worried that the other half can still return a WriteTooOld flag, which
would make it into the combined response, and would trick the span
refresher into retrying. This would be bad, since retrying after an
implicit commit is problematic.
However, I believe this situation is not possible. We're saved by the
fact that combining the transaction updates (Transaction.Update()) is
smart enough to not treat the WriteTooOld flag as cummulative; instead,
the response with the highest read timestamp gets to dictate the wto,
and the EndTxn response should not have it set.

Release note: None
Release justification: Only an assertion. It would be indicative of a
dangerous bug if it fires.
@andreimatei andreimatei force-pushed the txn.assert-clean-after-commit branch from de7921a to 3c9e2c8 Compare March 26, 2020 19:34
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants