Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: rationalize server-side refreshes and fix bugs #42969

Merged
merged 7 commits into from
Jan 15, 2020

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Dec 4, 2019

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

First commit is #42965

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, 9 of 9 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r1, 8 of 9 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):

				b.CPut("a", "cput", strToValue("orig"))
				b.Put("c", "put")
				err := txn.CommitInBatch(ctx, b)

What's up with this?


pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):

				return txn.CommitInBatch(ctx, b)
			},
			// Expect a TxcCoordSender retry, which should succeed.

TxnCoordSender


pkg/roachpb/api.proto, line 1806 at r3 (raw file):

  // nowhere to defer the error to). Similarly, it's pointless to set the flag
  // on batches containing an EndTransaction - although we do allow it and it
  // effects sub-batches split off by the DistSender.

Why do we do it if it's pointless? I would say either less here or more.


pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

	// If there was an EndTransaction in the batch that finalized the transaction,
	// the WriteTooOld status has been fully processed and we can discard the error.
	if baHeader.Txn != nil && baHeader.Txn.Status.IsFinalizedOrStaging() {

Are there any tests that hit the OrStaging part of this condition?


pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):

		// retry (i.e. there were no refresh spans and the commit timestamp
		// wasn't leaked).
		if baHeader.Txn.Status == roachpb.COMMITTED {

Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?


pkg/storage/stores.go, line 175 at r3 (raw file):

		log.Fatal(ctx, "batch request missing store ID")
	}
	if ba.Header.DeferWriteTooOldError && ba.Txn == nil {

This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a BatchRequest.Validate method instead? That can also check the conditions above.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)


pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's up with this?

done


pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

TxnCoordSender

done


pkg/roachpb/api.proto, line 1806 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we do it if it's pointless? I would say either less here or more.

See now. Also keep in mind that we don't really "do it", since this flag is never set :P


pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are there any tests that hit the OrStaging part of this condition?

I'll add a test


pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?

prepended a commit removing this. PTAL

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 11 files at r4, 9 of 9 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/roachpb/api.proto, line 1804 at r5 (raw file):

  //
  // Non-transactional requests are not allowed to set this flag (since there's
  // nowhere to defer the error to). At the replica level, this flag is ignore

s/ignore/ignored/


pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'll add a test

👀


pkg/storage/replica_write.go, line 437 at r5 (raw file):

				break
			}
			log.Infof(ctx, "!!! local retries updating ba.Timestamp %s->%s", ba.Timestamp, wtoErr.ActualTimestamp)

!!!


pkg/storage/stores.go, line 175 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a BatchRequest.Validate method instead? That can also check the conditions above.

I still think we should do something like this.

@andreimatei andreimatei changed the title storage: fix a lost update bug storage: rationalize server-side refreshes and fix bugs Dec 20, 2019
@andreimatei
Copy link
Contributor Author

I've reworked this, PTAL.
Turned out we had yet another bug than the original CPut one - described in the commit msg.

One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a Put+Scan, where the Put is pushed by the timestamp cache? The DistSender will split the two, and then the Scan will evaluate at... which timestamp?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)


pkg/roachpb/api.proto, line 1804 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/ignore/ignored/

done


pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👀

👀 now


pkg/storage/replica_write.go, line 437 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done


pkg/storage/stores.go, line 175 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I still think we should do something like this.

my silence merely meant that I had not gotten to it. Done.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a Put+Scan, where the Put is pushed by the timestamp cache? The DistSender will split the two, and then the Scan will evaluate at... which timestamp?

I can now answer my question: a write running into the ts cache does not affect the timestamp of "non-write" requests from the same batch because all the non-writes have been split off by the DistSender and their sub-batches are not updated to the bumped timestamp. CPuts are affected by the timestamp bump - so it's possible for a non-transactional batch to read some data at ts0 but execute a CPut at ts1. Seems funky, but we probably don't care about non-txn batches too much.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r6, 13 of 13 files at r7, 15 of 15 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):

			},
			retryable: func(ctx context.Context, txn *client.Txn) error {
				if err := txn.Put(ctx, "aother", "another put"); err != nil {

s/aother/another/


pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):

				b.Put("a", "final value")
				return txn.CommitInBatch(ctx, b)
			},

Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.


pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

	br, pErr, largestRefreshTS := sr.sendLockedWithRefreshAttempts(ctx, ba, maxAttempts)
	if pErr != nil {
		// The server sometimes "performs refreshes" - it updates the transaction's

We only need to do this on the error path?


pkg/roachpb/batch.go, line 600 at r7 (raw file):

}

// ValidateForEvaluation performs sanity checks on the batch.

Note where this validation is meant to take place. E.g. this is server-side validation.


pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

👀 now

👀'd. Very nice


pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

		// EndTransaction that decided that it can essentially refresh to something
		// higher than baHeader.Timestamp because there were no refresh spans.
		br.Timestamp.Forward(baHeader.Txn.ReadTimestamp)

minor nit: I'd s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/. You just set br.Txn so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious that baHeader.Txn was being mutated along the way.


pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):

				// If this write request also implies a read, it morally needs to be
				// "refreshed".

it morally needs to be "refreshed" if it hit a WriteTooOldError.

Just so we don't cause any confusion with the fact that !roachpb.NeedsRefresh(args) for these requests.


pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):

	var pd result.Result

	cArgs := batcheval.CommandArgs{

Did you mean to keep this diff?


pkg/storage/replica_write.go, line 399 at r7 (raw file):

		br, res, pErr = evaluateBatch(ctx, idKey, batch, rec, ms, ba, false /* readOnly */)
		if pErr == nil {

nit: if pErr == nil && opLogger != nil {


pkg/storage/replica_write.go, line 416 at r7 (raw file):

}

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {

I'd document this a bit. Also, make it clear that it can mutate ba.


pkg/storage/replica_write.go, line 417 at r7 (raw file):

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {
	if pErr == nil {

It seems like there's some redundancy between checking pErr here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. Taking pErr into account here implies the latter.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/aother/another/

done


pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.

added more words above


pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We only need to do this on the error path?

Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.


pkg/roachpb/batch.go, line 600 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Note where this validation is meant to take place. E.g. this is server-side validation.

done


pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

minor nit: I'd s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/. You just set br.Txn so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious that baHeader.Txn was being mutated along the way.

done


pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

it morally needs to be "refreshed" if it hit a WriteTooOldError.

Just so we don't cause any confusion with the fact that !roachpb.NeedsRefresh(args) for these requests.

you meant add more words to the comment, right? Done.


pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you mean to keep this diff?

no. gone.


pkg/storage/replica_write.go, line 399 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: if pErr == nil && opLogger != nil {

done


pkg/storage/replica_write.go, line 416 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd document this a bit. Also, make it clear that it can mutate ba.

done


pkg/storage/replica_write.go, line 417 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

It seems like there's some redundancy between checking pErr here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. Taking pErr into account here implies the latter.

Removed the check for pErr.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 13 files at r10, 15 of 15 files at r11.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.

Ack.


pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):

		// ReadTimestamp when the client said that there's nothing to refresh. In
		// these cases, we need to update our refreshedTimestamp too. This is pretty
		// inconsequential: the server only does this on an EndTransaction. If the

s/EndTransaction/EndTxn/

Do you mind searching this entire diff for that change?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/EndTransaction/EndTxn/

Do you mind searching this entire diff for that change?

done

@andreimatei
Copy link
Contributor Author

I've reworked the patch again to take EndTxn out of the bumping timestamps business, which takes the patch to a new local maximum. See updated commit message.
PTAL

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r12 is https://reviewable.io/reviews/cockroachdb/cockroach/43791

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 14 files at r14, 17 of 17 files at r15, 2 of 2 files at r16.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)


pkg/roachpb/batch.go, line 490 at r16 (raw file):

// can sometimes have their read timestamp bumped on the server, which doesn't
// work for read requests due to have the timestamp-aware latching works (i.e. a
// read that acquired a latch @ ts10 can't simply be bumped to tsrequests 20

"tsrequests"


pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

done

Did we regress on this?


pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):

				if baHeader.Txn != nil {
					baHeader.Txn.WriteTimestamp.Forward(tErr.ActualTimestamp)
					log.VEventf(ctx, 2, "setting WTO. idx: %d, req: %s", index, args)

Are you meaning to keep this? If so, we should make it more readable.


pkg/storage/replica_write.go, line 260 at r16 (raw file):

	// timestamp, let's evaluate the batch at the bumped timestamp. This will
	// allow it commit, and also it'll allow us to attempt the 1PC code path.
	maybeBumpReadTimestampToWriteTimestamp(ctx, ba)

What do you think about merging some of this logic with maybeStripInFlightWrites and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.


pkg/storage/replica_write.go, line 300 at r16 (raw file):

		synthesizeEndTxnResponse := func() (_res onePCResult) {
			// Handle the case of a required one phase commit transaction.
			defer func() {

Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?


pkg/storage/replica_write.go, line 332 at r16 (raw file):

			}

			// 1PC execution was successful, let's synthesize a EndTxnResponse.

s/a/an/


pkg/storage/replica_write.go, line 481 at r16 (raw file):

func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {
	if pErr == nil {

I thought we had removed this.


pkg/storage/replica_write.go, line 526 at r16 (raw file):

// Note that this, like all the server-side bumping of the read timestamp, only
// works for batches that exclusively contain writes; reads cannot be bumped
// liked this because they've already acquired timestamp-aware latches.

"bumped liked this"


pkg/storage/replica_write.go, line 548 at r16 (raw file):

				"ba.Txn.WriteTimestamp: %s", ts, txn.ReadTimestamp, txn.WriteTimestamp)
		}
		txn.ReadTimestamp = ts

This should be copy-on-write, right?

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to prepend r17.
Also please take a look at the deleted tests in cmd_end_transaction_test.go and tell me if you feel like something else should have been done about them - EndTxn itself no longer looks at the CanCommitAtHigherTimestamp flag, and so all those tests now return retriable errors that are swallowed by the replica retry loop.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/roachpb/batch.go, line 490 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"tsrequests"

done


pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did we regress on this?

i guess so, I don't know what happened. Fixed.


pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Are you meaning to keep this? If so, we should make it more readable.

removed


pkg/storage/replica_write.go, line 260 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What do you think about merging some of this logic with maybeStripInFlightWrites and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.

As discussed, they unfortunately can't be called together, but I've moved them both to a new, dedicated, file.


pkg/storage/replica_write.go, line 300 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?

ok see now


pkg/storage/replica_write.go, line 332 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/a/an/

done


pkg/storage/replica_write.go, line 481 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I thought we had removed this.

Done. Not sure what happened, but I went through and re-incorporated a couple of small changes from the previous review.


pkg/storage/replica_write.go, line 526 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"bumped liked this"

done


pkg/storage/replica_write.go, line 548 at r16 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This should be copy-on-write, right?

done

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 27 files at r17, 7 of 14 files at r19, 1 of 4 files at r20, 19 of 19 files at r21, 2 of 2 files at r22.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):

// This files contains functions performing updates to a BatchRequest performed
// on the server-side, specifically after the request has been routed to a
// replica.

👍

Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?


pkg/storage/replica_test.go, line 505 at r22 (raw file):

		{ru: txnReqsNoRefresh, isTxn: true, exp1PC: true},
		{ru: txnReqsNoRefresh, isTxn: true, isTSOff: true, exp1PC: true},
		// !!!

!!!


pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):

			commit:         true,
			inFlightWrites: writes,
			noRefreshSpans: true,

Let's remove noRefreshSpans if no test case sets it to true anymore and it's not considered during EndTxn evaluation.

@andreimatei andreimatei force-pushed the txn.reproduce-commit-bug branch 2 times, most recently from d585ecc to 7042c27 Compare January 15, 2020 01:04
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I've prepended a commit fixing txn.Update(). PTAL
Everything else is unchanged, modulo the minor edits addressing your last comments.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

👍

Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?

done


pkg/storage/replica_test.go, line 505 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

!!!

done


pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's remove noRefreshSpans if no test case sets it to true anymore and it's not considered during EndTxn evaluation.

done

Updates to the WriteTooOld field were not commutative. This patch fixes
that, by clarifying that the transaction with the higher ReadTimestamp
gets to dictate the WriteTooOld value.
I'm not sure what consequences this used to have, besides allowing for
the confusing case where the server would receive a request with the
WriteTooOld flag set, but with the ReadTimestamp==WriteTimestamp.  A
future commit introduces a sanity assertion that all the requests with
the WTO flag have a bumped WriteTimestamp.

Release note: None
This test was creating batches with different flags set. The combination
of the WriteTooOld flag being set and the write timestamp not being
bumped in relation to the read timestamp does not make sense. All such
tests are removed, and the test spec is improved.
A future commit introduces an assertion that such requests are indeed
not received by servers.

Release note: None
This patch fixes an incorrect comment on br.Timestamp that mistakenly
indicated that the field is only relevant for non-transactional
requests. It also makes the management of br.Timestamp in
evaluateBatch() - it used to be pretty unscrutable.

Release note: None
This patch creates a dedicated file for functions performing server-side
modifications to a batch. The next commit will have a 2nd such function.

Release note: None
Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes cockroachdb#42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.
Explain that it's about the quality of the serializability error that
is produced.

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 30 files at r23, 7 of 14 files at r26, 1 of 5 files at r27, 19 of 19 files at r28, 2 of 2 files at r29.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

craig bot pushed a commit that referenced this pull request Jan 15, 2020
42969: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

43915: cli: warn if trying to [rd]ecommision when already [rd]ecommissioned r=tbg a=knz

Fixes #36624.
First commit from #43908

Release note (cli change): The CLI commands `cockroach node
decommission` and `cockroach node recommission` now produce a warning
on the standard error if one of the node(s) specified is
already (d/r)ecommissioned.

43989: colexec: fix an issue with builtin operators and a minor cleanup r=yuzefovich a=yuzefovich

Flat bytes relies on coldata.Batch.SetLength call to maintain its
invariant. We assume that it is always called before return the batch in
which Bytes vector might have been modified. This was not the case for
default builtin and substring operators, and the calls were added.
Additionally, to be safe, similar calls have been in added in projection
operators.

In a few places where we were setting the length of an internal batch to
0 and then returning it, those were replaced with returning
coldata.ZeroBatch.

Fixes: #43656.

Release note: None

44004: githooks: accept release note category 'security update' r=knz a=knz

Forgot this in #43869 

44008: re-enable: roachprod: Make multiple set [provider]-zones always geo-distribute nodes r=jlinder a=jlinder

This re-enables commit d24e40e which was
reverted in commit 63279f9. It was reverted
due to roachtest automatically passing in a list of zones but only wanting the
first zone to be used (#43898)
which was fixed in f68c6d5 .

Before: if multiple zones were set for a provider and --geo wasn't set, all
hosts would be started in just one zone in one region.

Why change? Because if multiple zones are set, the intention is that they be
used.

Now, --geo and --[provider]-zones work as follows for gcloud, aws and azure:

1. when geo and zones are not set, nodes are all placed in one of the
   default zones
2. when geo is set but zones aren't, nodes are spread evenly across the
   default zones
3. when zones are set, nodes are spread evenly across the specified zones

Fixes #38542.

Release note: None

44016: builtins: miscellaneous fixes for the to_hex builtin r=mjibson a=otan

Resolves #41707.

Release note (sql change, bug fix):
* Added to_hex(string) -> string functionality.
* Previously, `to_hex(-1)` would return `-1` instead of the negative
hex representation (`FFFFFFFFFFFFFFFF`). This has been rectified in
this PR.

44023: roachtest: bump minimum version of the sqlsmith roachtest r=yuzefovich a=rohany

Fixes #43995.

This PR bumps the minimum version of the SQLSmith roachtest
to be v20.1.0.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: James H. Linder <jamesl@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
@craig
Copy link
Contributor

craig bot commented Jan 15, 2020

Build succeeded

@craig craig bot merged commit 5e5b0c4 into cockroachdb:master Jan 15, 2020
craig bot pushed a commit that referenced this pull request Jan 24, 2020
44350: storage: revert "storage: rationalize server-side refreshes and fix bugs" r=andreimatei a=andreimatei

This reverts commit 1edb0d5.

Revert the main commit from #42969 (storage: rationalize server-side refreshes and fix bugs). 
It seems to be causing consistency problems, as seen in #43110.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
craig bot pushed a commit that referenced this pull request Jan 25, 2020
44381: storage: revert remaining bits of 42969 r=andreimatei a=andreimatei

Revert everything from #42969 that wasn't reverted by #44350. Even after the partial revert in #44350, it appears the bug causing TPCC to fail (#43110) is still there.

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@andreimatei andreimatei deleted the txn.reproduce-commit-bug branch January 29, 2020 18:21
craig bot pushed a commit that referenced this pull request Feb 7, 2020
35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner

This PR adds a --max-concurrency flag to roachprod and defaults its value to 32.
In large clusters doing many concurrent SSH operations can lead to unexpected
behavior where the command fails to communicate with the SSH agent and leads to
the user being prompted for their private key passphrase. Adding a limit
prevents this behavior when interacting with a cluster of 256 nodes.

Release note: None

44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei

This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately.

---------

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1PC CPuts suffering from lost updates
4 participants