storage: rationalize server-side refreshes and fix bugs #42969

andreimatei · 2019-12-04T23:06:47Z

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

cockroach-teamcity · 2019-12-04T23:06:55Z

This change is

andreimatei · 2019-12-04T23:07:02Z

First commit is #42965

bdarnell

Reviewed 5 of 5 files at r1, 9 of 9 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

nvanbenschoten

Reviewed 1 of 5 files at r1, 8 of 9 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):

				b.CPut("a", "cput", strToValue("orig"))
				b.Put("c", "put")
				err := txn.CommitInBatch(ctx, b)

What's up with this?

pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):

				return txn.CommitInBatch(ctx, b)
			},
			// Expect a TxcCoordSender retry, which should succeed.

TxnCoordSender

pkg/roachpb/api.proto, line 1806 at r3 (raw file):

  // nowhere to defer the error to). Similarly, it's pointless to set the flag
  // on batches containing an EndTransaction - although we do allow it and it
  // effects sub-batches split off by the DistSender.

Why do we do it if it's pointless? I would say either less here or more.

pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

	// If there was an EndTransaction in the batch that finalized the transaction,
	// the WriteTooOld status has been fully processed and we can discard the error.
	if baHeader.Txn != nil && baHeader.Txn.Status.IsFinalizedOrStaging() {

Are there any tests that hit the OrStaging part of this condition?

pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):

		// retry (i.e. there were no refresh spans and the commit timestamp
		// wasn't leaked).
		if baHeader.Txn.Status == roachpb.COMMITTED {

Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?

pkg/storage/stores.go, line 175 at r3 (raw file):

		log.Fatal(ctx, "batch request missing store ID")
	}
	if ba.Header.DeferWriteTooOldError && ba.Txn == nil {

This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a BatchRequest.Validate method instead? That can also check the conditions above.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)

pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's up with this?

done

pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

TxnCoordSender

done

pkg/roachpb/api.proto, line 1806 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we do it if it's pointless? I would say either less here or more.

See now. Also keep in mind that we don't really "do it", since this flag is never set :P

pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are there any tests that hit the OrStaging part of this condition?

I'll add a test

pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?

prepended a commit removing this. PTAL

nvanbenschoten

Reviewed 2 of 11 files at r4, 9 of 9 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/roachpb/api.proto, line 1804 at r5 (raw file):

  //
  // Non-transactional requests are not allowed to set this flag (since there's
  // nowhere to defer the error to). At the replica level, this flag is ignore

s/ignore/ignored/

pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'll add a test

👀

pkg/storage/replica_write.go, line 437 at r5 (raw file):

				break
			}
			log.Infof(ctx, "!!! local retries updating ba.Timestamp %s->%s", ba.Timestamp, wtoErr.ActualTimestamp)

!!!

pkg/storage/stores.go, line 175 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a BatchRequest.Validate method instead? That can also check the conditions above.

I still think we should do something like this.

andreimatei · 2019-12-20T23:38:09Z

I've reworked this, PTAL.
Turned out we had yet another bug than the original CPut one - described in the commit msg.

One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a Put+Scan, where the Put is pushed by the timestamp cache? The DistSender will split the two, and then the Scan will evaluate at... which timestamp?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)

pkg/roachpb/api.proto, line 1804 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/ignore/ignored/

done

pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👀

👀 now

pkg/storage/replica_write.go, line 437 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done

pkg/storage/stores.go, line 175 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I still think we should do something like this.

my silence merely meant that I had not gotten to it. Done.

andreimatei

One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a Put+Scan, where the Put is pushed by the timestamp cache? The DistSender will split the two, and then the Scan will evaluate at... which timestamp?

I can now answer my question: a write running into the ts cache does not affect the timestamp of "non-write" requests from the same batch because all the non-writes have been split off by the DistSender and their sub-batches are not updated to the bumped timestamp. CPuts are affected by the timestamp bump - so it's possible for a non-transactional batch to read some data at ts0 but execute a CPut at ts1. Seems funky, but we probably don't care about non-txn batches too much.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)

nvanbenschoten

Reviewed 9 of 9 files at r6, 13 of 13 files at r7, 15 of 15 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):

			},
			retryable: func(ctx context.Context, txn *client.Txn) error {
				if err := txn.Put(ctx, "aother", "another put"); err != nil {

s/aother/another/

pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):

				b.Put("a", "final value")
				return txn.CommitInBatch(ctx, b)
			},

Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.

pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

	br, pErr, largestRefreshTS := sr.sendLockedWithRefreshAttempts(ctx, ba, maxAttempts)
	if pErr != nil {
		// The server sometimes "performs refreshes" - it updates the transaction's

We only need to do this on the error path?

pkg/roachpb/batch.go, line 600 at r7 (raw file):

}

// ValidateForEvaluation performs sanity checks on the batch.

Note where this validation is meant to take place. E.g. this is server-side validation.

pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

👀 now

👀'd. Very nice

pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

		// EndTransaction that decided that it can essentially refresh to something
		// higher than baHeader.Timestamp because there were no refresh spans.
		br.Timestamp.Forward(baHeader.Txn.ReadTimestamp)

minor nit: I'd s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/. You just set br.Txn so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious that baHeader.Txn was being mutated along the way.

pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):

				// If this write request also implies a read, it morally needs to be
				// "refreshed".

it morally needs to be "refreshed" if it hit a WriteTooOldError.

Just so we don't cause any confusion with the fact that !roachpb.NeedsRefresh(args) for these requests.

pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):

	var pd result.Result

	cArgs := batcheval.CommandArgs{

Did you mean to keep this diff?

pkg/storage/replica_write.go, line 399 at r7 (raw file):

		br, res, pErr = evaluateBatch(ctx, idKey, batch, rec, ms, ba, false /* readOnly */)
		if pErr == nil {

nit: if pErr == nil && opLogger != nil {

pkg/storage/replica_write.go, line 416 at r7 (raw file):

}

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {

I'd document this a bit. Also, make it clear that it can mutate ba.

pkg/storage/replica_write.go, line 417 at r7 (raw file):

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {
	if pErr == nil {

It seems like there's some redundancy between checking pErr here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. Taking pErr into account here implies the latter.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/aother/another/

done

pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.

added more words above

pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We only need to do this on the error path?

Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.

pkg/roachpb/batch.go, line 600 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Note where this validation is meant to take place. E.g. this is server-side validation.

done

pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

minor nit: I'd s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/. You just set br.Txn so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious that baHeader.Txn was being mutated along the way.

done

pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

it morally needs to be "refreshed" if it hit a WriteTooOldError.

Just so we don't cause any confusion with the fact that !roachpb.NeedsRefresh(args) for these requests.

you meant add more words to the comment, right? Done.

pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you mean to keep this diff?

no. gone.

pkg/storage/replica_write.go, line 399 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: if pErr == nil && opLogger != nil {

done

pkg/storage/replica_write.go, line 416 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd document this a bit. Also, make it clear that it can mutate ba.

done

pkg/storage/replica_write.go, line 417 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems like there's some redundancy between checking pErr here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. Taking pErr into account here implies the latter.

Removed the check for pErr.

nvanbenschoten

Reviewed 7 of 13 files at r10, 15 of 15 files at r11.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.

Ack.

pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):

		// ReadTimestamp when the client said that there's nothing to refresh. In
		// these cases, we need to update our refreshedTimestamp too. This is pretty
		// inconsequential: the server only does this on an EndTransaction. If the

s/EndTransaction/EndTxn/

Do you mind searching this entire diff for that change?

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/EndTransaction/EndTxn/

Do you mind searching this entire diff for that change?

done

andreimatei · 2020-01-07T21:29:13Z

I've reworked the patch again to take EndTxn out of the bumping timestamps business, which takes the patch to a new local maximum. See updated commit message.
PTAL

andreimatei

r12 is https://reviewable.io/reviews/cockroachdb/cockroach/43791

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

nvanbenschoten

Reviewed 8 of 14 files at r14, 17 of 17 files at r15, 2 of 2 files at r16.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)

pkg/roachpb/batch.go, line 490 at r16 (raw file):

// can sometimes have their read timestamp bumped on the server, which doesn't
// work for read requests due to have the timestamp-aware latching works (i.e. a
// read that acquired a latch @ ts10 can't simply be bumped to tsrequests 20

"tsrequests"

pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

done

Did we regress on this?

pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):

				if baHeader.Txn != nil {
					baHeader.Txn.WriteTimestamp.Forward(tErr.ActualTimestamp)
					log.VEventf(ctx, 2, "setting WTO. idx: %d, req: %s", index, args)

Are you meaning to keep this? If so, we should make it more readable.

pkg/storage/replica_write.go, line 260 at r16 (raw file):

	// timestamp, let's evaluate the batch at the bumped timestamp. This will
	// allow it commit, and also it'll allow us to attempt the 1PC code path.
	maybeBumpReadTimestampToWriteTimestamp(ctx, ba)

What do you think about merging some of this logic with maybeStripInFlightWrites and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.

pkg/storage/replica_write.go, line 300 at r16 (raw file):

		synthesizeEndTxnResponse := func() (_res onePCResult) {
			// Handle the case of a required one phase commit transaction.
			defer func() {

Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?

pkg/storage/replica_write.go, line 332 at r16 (raw file):

			}

			// 1PC execution was successful, let's synthesize a EndTxnResponse.

s/a/an/

pkg/storage/replica_write.go, line 481 at r16 (raw file):

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {
	if pErr == nil {

I thought we had removed this.

pkg/storage/replica_write.go, line 526 at r16 (raw file):

// Note that this, like all the server-side bumping of the read timestamp, only
// works for batches that exclusively contain writes; reads cannot be bumped
// liked this because they've already acquired timestamp-aware latches.

"bumped liked this"

pkg/storage/replica_write.go, line 548 at r16 (raw file):

				"ba.Txn.WriteTimestamp: %s", ts, txn.ReadTimestamp, txn.WriteTimestamp)
		}
		txn.ReadTimestamp = ts

This should be copy-on-write, right?

andreimatei

I needed to prepend r17.
Also please take a look at the deleted tests in cmd_end_transaction_test.go and tell me if you feel like something else should have been done about them - EndTxn itself no longer looks at the CanCommitAtHigherTimestamp flag, and so all those tests now return retriable errors that are swallowed by the replica retry loop.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

pkg/roachpb/batch.go, line 490 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"tsrequests"

done

pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did we regress on this?

i guess so, I don't know what happened. Fixed.

pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you meaning to keep this? If so, we should make it more readable.

removed

pkg/storage/replica_write.go, line 260 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What do you think about merging some of this logic with maybeStripInFlightWrites and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.

As discussed, they unfortunately can't be called together, but I've moved them both to a new, dedicated, file.

pkg/storage/replica_write.go, line 300 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?

ok see now

pkg/storage/replica_write.go, line 332 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/a/an/

done

pkg/storage/replica_write.go, line 481 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I thought we had removed this.

Done. Not sure what happened, but I went through and re-incorporated a couple of small changes from the previous review.

pkg/storage/replica_write.go, line 526 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"bumped liked this"

done

pkg/storage/replica_write.go, line 548 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This should be copy-on-write, right?

done

nvanbenschoten

Reviewed 11 of 27 files at r17, 7 of 14 files at r19, 1 of 4 files at r20, 19 of 19 files at r21, 2 of 2 files at r22.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):

// This files contains functions performing updates to a BatchRequest performed
// on the server-side, specifically after the request has been routed to a
// replica.

👍

Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?

pkg/storage/replica_test.go, line 505 at r22 (raw file):

		{ru: txnReqsNoRefresh, isTxn: true, exp1PC: true},
		{ru: txnReqsNoRefresh, isTxn: true, isTSOff: true, exp1PC: true},
		// !!!

!!!

pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):

			commit:         true,
			inFlightWrites: writes,
			noRefreshSpans: true,

Let's remove noRefreshSpans if no test case sets it to true anymore and it's not considered during EndTxn evaluation.

andreimatei

As discussed, I've prepended a commit fixing txn.Update(). PTAL
Everything else is unchanged, modulo the minor edits addressing your last comments.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👍

Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?

done

pkg/storage/replica_test.go, line 505 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done

pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's remove noRefreshSpans if no test case sets it to true anymore and it's not considered during EndTxn evaluation.

done

Updates to the WriteTooOld field were not commutative. This patch fixes that, by clarifying that the transaction with the higher ReadTimestamp gets to dictate the WriteTooOld value. I'm not sure what consequences this used to have, besides allowing for the confusing case where the server would receive a request with the WriteTooOld flag set, but with the ReadTimestamp==WriteTimestamp. A future commit introduces a sanity assertion that all the requests with the WTO flag have a bumped WriteTimestamp. Release note: None

This test was creating batches with different flags set. The combination of the WriteTooOld flag being set and the write timestamp not being bumped in relation to the read timestamp does not make sense. All such tests are removed, and the test spec is improved. A future commit introduces an assertion that such requests are indeed not received by servers. Release note: None

This patch fixes an incorrect comment on br.Timestamp that mistakenly indicated that the field is only relevant for non-transactional requests. It also makes the management of br.Timestamp in evaluateBatch() - it used to be pretty unscrutable. Release note: None

Release note: None

This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None

Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes cockroachdb#42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed.

Explain that it's about the quality of the serializability error that is produced. Release note: None

nvanbenschoten

Reviewed 1 of 30 files at r23, 7 of 14 files at r26, 1 of 5 files at r27, 19 of 19 files at r28, 2 of 2 files at r29.
Reviewable status: complete! 1 of 0 LGTMs obtained

andreimatei

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained

42969: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. 43915: cli: warn if trying to [rd]ecommision when already [rd]ecommissioned r=tbg a=knz Fixes #36624. First commit from #43908 Release note (cli change): The CLI commands `cockroach node decommission` and `cockroach node recommission` now produce a warning on the standard error if one of the node(s) specified is already (d/r)ecommissioned. 43989: colexec: fix an issue with builtin operators and a minor cleanup r=yuzefovich a=yuzefovich Flat bytes relies on coldata.Batch.SetLength call to maintain its invariant. We assume that it is always called before return the batch in which Bytes vector might have been modified. This was not the case for default builtin and substring operators, and the calls were added. Additionally, to be safe, similar calls have been in added in projection operators. In a few places where we were setting the length of an internal batch to 0 and then returning it, those were replaced with returning coldata.ZeroBatch. Fixes: #43656. Release note: None 44004: githooks: accept release note category 'security update' r=knz a=knz Forgot this in #43869 44008: re-enable: roachprod: Make multiple set [provider]-zones always geo-distribute nodes r=jlinder a=jlinder This re-enables commit d24e40e which was reverted in commit 63279f9. It was reverted due to roachtest automatically passing in a list of zones but only wanting the first zone to be used (#43898) which was fixed in f68c6d5 . Before: if multiple zones were set for a provider and --geo wasn't set, all hosts would be started in just one zone in one region. Why change? Because if multiple zones are set, the intention is that they be used. Now, --geo and --[provider]-zones work as follows for gcloud, aws and azure: 1. when geo and zones are not set, nodes are all placed in one of the default zones 2. when geo is set but zones aren't, nodes are spread evenly across the default zones 3. when zones are set, nodes are spread evenly across the specified zones Fixes #38542. Release note: None 44016: builtins: miscellaneous fixes for the to_hex builtin r=mjibson a=otan Resolves #41707. Release note (sql change, bug fix): * Added to_hex(string) -> string functionality. * Previously, `to_hex(-1)` would return `-1` instead of the negative hex representation (`FFFFFFFFFFFFFFFF`). This has been rectified in this PR. 44023: roachtest: bump minimum version of the sqlsmith roachtest r=yuzefovich a=rohany Fixes #43995. This PR bumps the minimum version of the SQLSmith roachtest to be v20.1.0. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: James H. Linder <jamesl@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>

craig · 2020-01-15T20:40:07Z

Build succeeded

GitHub CI (Cockroach)

44350: storage: revert "storage: rationalize server-side refreshes and fix bugs" r=andreimatei a=andreimatei This reverts commit 1edb0d5. Revert the main commit from #42969 (storage: rationalize server-side refreshes and fix bugs). It seems to be causing consistency problems, as seen in #43110. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

44381: storage: revert remaining bits of 42969 r=andreimatei a=andreimatei Revert everything from #42969 that wasn't reverted by #44350. Even after the partial revert in #44350, it appears the bug causing TPCC to fail (#43110) is still there. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner This PR adds a --max-concurrency flag to roachprod and defaults its value to 32. In large clusters doing many concurrent SSH operations can lead to unexpected behavior where the command fails to communicate with the SSH agent and leads to the user being prompted for their private key passphrase. Adding a limit prevents this behavior when interacting with a cluster of 256 nodes. Release note: None 44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately. --------- Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

andreimatei requested review from bdarnell and nvanbenschoten December 4, 2019 23:06

andreimatei force-pushed the txn.reproduce-commit-bug branch from 1a160ee to 13319ba Compare December 4, 2019 23:28

bdarnell approved these changes Dec 5, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 13319ba to 41ef739 Compare December 5, 2019 20:26

nvanbenschoten reviewed Dec 9, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 41ef739 to cf28026 Compare December 13, 2019 21:52

andreimatei commented Dec 13, 2019

View reviewed changes

nvanbenschoten reviewed Dec 16, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from cf28026 to 65c5e33 Compare December 20, 2019 23:17

andreimatei changed the title ~~storage: fix a lost update bug~~ storage: rationalize server-side refreshes and fix bugs Dec 20, 2019

andreimatei force-pushed the txn.reproduce-commit-bug branch from 65c5e33 to 6ce1e52 Compare December 20, 2019 23:25

andreimatei commented Dec 27, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 6ce1e52 to 1d27ede Compare December 27, 2019 17:44

andreimatei commented Dec 27, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 65f0a6a to ae654e9 Compare December 27, 2019 19:26

nvanbenschoten reviewed Dec 27, 2019

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from ae654e9 to c2ffe71 Compare December 28, 2019 19:30

andreimatei commented Dec 28, 2019

View reviewed changes

nvanbenschoten approved these changes Jan 6, 2020

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from c2ffe71 to 8b13e15 Compare January 6, 2020 19:27

andreimatei commented Jan 6, 2020

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 8b13e15 to 0851180 Compare January 7, 2020 21:26

andreimatei commented Jan 7, 2020

View reviewed changes

nvanbenschoten reviewed Jan 9, 2020

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch from 8089a94 to 7fa9bd6 Compare January 13, 2020 23:57

andreimatei commented Jan 14, 2020

View reviewed changes

nvanbenschoten approved these changes Jan 14, 2020

View reviewed changes

andreimatei force-pushed the txn.reproduce-commit-bug branch 2 times, most recently from d585ecc to 7042c27 Compare January 15, 2020 01:04

andreimatei commented Jan 15, 2020

View reviewed changes

andreimatei added 7 commits January 14, 2020 20:30

roachpb: rename NoRefreshSpans -> CanCommitAtHigherTimestamp

27a4a59

Release note: None

storage: move maybeStripInFlightWrites

f37f977

This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None

roachpb: improve comments on the write_too_old field

5e5b0c4

Explain that it's about the quality of the serializability error that is produced. Release note: None

andreimatei force-pushed the txn.reproduce-commit-bug branch from 7042c27 to 5e5b0c4 Compare January 15, 2020 01:30

nvanbenschoten approved these changes Jan 15, 2020

View reviewed changes

andreimatei commented Jan 15, 2020

View reviewed changes

craig bot merged commit 5e5b0c4 into cockroachdb:master Jan 15, 2020

nvanbenschoten mentioned this pull request Jan 23, 2020

roachtest: tpcc/mixed-headroom/n5cpu16 failed #43110

Closed

andreimatei mentioned this pull request Jan 24, 2020

storage: revert "storage: rationalize server-side refreshes and fix bugs" #44350

Merged

andreimatei mentioned this pull request Jan 24, 2020

storage: revert remaining bits of 42969 #44381

Merged

andreimatei mentioned this pull request Jan 27, 2020

storage: fix the setting of br.Timestamp #44411

Merged

andreimatei deleted the txn.reproduce-commit-bug branch January 29, 2020 18:21

This was referenced Jan 29, 2020

roachpb: fix txn.Update() commutativity #44503

Merged

storage: rationalize server-side refreshes and fix bugs #44507

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: rationalize server-side refreshes and fix bugs #42969

storage: rationalize server-side refreshes and fix bugs #42969

andreimatei commented Dec 4, 2019 •

edited

Loading

cockroach-teamcity commented Dec 4, 2019

andreimatei commented Dec 4, 2019

bdarnell left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei commented Dec 20, 2019

andreimatei left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

andreimatei commented Jan 7, 2020

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

craig bot commented Jan 15, 2020

storage: rationalize server-side refreshes and fix bugs #42969

storage: rationalize server-side refreshes and fix bugs #42969

Conversation

andreimatei commented Dec 4, 2019 • edited Loading

cockroach-teamcity commented Dec 4, 2019

andreimatei commented Dec 4, 2019

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei commented Dec 20, 2019

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei commented Jan 7, 2020

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Jan 15, 2020

Build succeeded

andreimatei commented Dec 4, 2019 •

edited

Loading