storage: improve high contention performance under deadlock scenarios #16256

spencerkimball · 2017-05-31T23:59:52Z

During contention, a transaction which encounters another transaction's
intents will move to "push" the transaction. This sends the "pusher" into
the pushTxnQueue to wait for the transaction which is blocking it (the
"pushee") to complete.

Previously, the waiting pusher would periodically (using an overly
aggressive backoff / retry loop) query itself in order to determine
whether its priority or status had been updated, as well as whether
any transactions were waiting on the pusher. The set of waiting txns
is transitive, so with sufficient repeated queries, even a large
dependency cycle would be revealed.

This change modifies the backoff / retry loop to instead immediately
send a query which waits at the range containing the pusher's txn
until either the pusher's priority or status changes, or the set of
transactions waiting on it changes. This allows any changes to quickly
propagate, but avoids unnecessary queries if there's nothing to
communicate.

cockroach-teamcity · 2017-05-31T23:59:57Z

This change is

spencerkimball · 2017-06-01T00:07:41Z

Here's the performance improvement running the ledger example for 1m with contention:

rm -rf cockroach-data; ./cockroach start --insecure --logtostderr

go run $GOPATH/src/github.com/cockroachdb/examples-go/ledger/main.go --concurrency 5 --sources 0 --destinations 1 postgres://root@localhost:26257?sslmode=disable

These are stats computed from 60 1s counts of ledger transaction throughput. Notice that we get a ~50% speedup and a significant decline in throughput variance.

	Original	New
Sum	5,348	7,949
% Diff	0.00%	48.64%
Avg	87.7	130.3
Var	354.0	129.2
Stddev	18.8	11.4

tamird · 2017-06-01T18:15:57Z

Didn't get through the "meat" of this, sorry about that. Mostly nits right now, will try to take another pass. Glad to see such an improvement, though!

Oh, and looks like there's a merge conflict.

Reviewed 8 of 9 files at r1.
Review status: 8 of 9 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.

pkg/storage/push_txn_queue.go, line 115 at r1 (raw file):

		if id := push.req.PusherTxn.ID; id != nil {
			set[*id] = struct{}{}
			push.mu.Lock()

don't you need to grab all the locks before iterating through the dependents?

pkg/storage/push_txn_queue.go, line 247 at r1 (raw file):

	if log.V(1) {
		if count := len(ptq.mu.queries[*txn.ID]); count > 0 {
			log.Infof(context.Background(), "returning %d waiting queries for %s", count, txn.ID.Short())

I think this should be TODO()

pkg/storage/push_txn_queue_test.go, line 340 at r1 (raw file):

		respWithErr := <-retCh
		if respWithErr.resp == nil || respWithErr.resp.PusheeTxn.Status != roachpb.COMMITTED {
			t.Errorf("expected committed txn response; got %+v, err=%v", respWithErr.resp, respWithErr.pErr)

isn't this duplicative with the below?

pkg/storage/store.go, line 2632 at r1 (raw file):

	// may cause this request to wait and either return a successful push
	// txn response or else allow this request to proceed.
	if ba.IsSinglePushTxnRequest() {

nit: this method does more harm than good - i'd rather it was just inlined here.

pkg/storage/store.go, line 2640 at r1 (raw file):

we need to copy the batch request
// in order to modify the push txn request

This comment is incorrect; the copy happens unconditionally.

Comments from Reviewable

bdarnell · 2017-06-01T18:37:13Z

The commit message says "under deadlock scenarios" - is the improvement limited to just deadlocks or does it help with other contention scenarios too? (the code looks like the latter to me). If it's just deadlocks, I'm not sure this complexity is worth it. There's getting to be a lot of moving parts with fairly complex interactions.

Reviewed 9 of 9 files at r1.
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks failed.

pkg/base/config.go, line 300 at r1 (raw file):

	// estimate of latency.
	return retry.Options{
		InitialBackoff: 50 * time.Millisecond,

Changing the default retry options for the whole system should at least be done in its own commit and probably its own PR (would be nice to do this after the benchmark reporting stuff is in). We also need to stress all the tests for this change to make sure it doesn't cause tests to start missing deadlines.

pkg/storage/push_txn_queue.go, line 38 at r1 (raw file):

)

const maxWaitForQueryTxn = 10 * time.Millisecond

This is awfully short; it hardly seems worth waiting at all if this will be the limit.

pkg/storage/push_txn_queue.go, line 411 at r1 (raw file):

	// status, priority, or dependents (for deadlock detection) have
	// changed.
	var queryPusherCh <-chan *roachpb.Transaction

There should be comments explaining what each of these channels is used for.

pkg/storage/push_txn_queue.go, line 580 at r1 (raw file):

		// pushers waiting on it, use a maximum time to wait for updates
		// before allowing the query to proceed. This prevents us from
		// never discovering that the pusher itself has already been

I think this sentence would be more clear if phrased positively: "This returns control to the pusher periodically so it can bail out if the pusher itself has already been aborted" (right?)

I'm getting confused by the term "pusher" in this file. This method runs on the range that holds the pusher's transaction record, right? But in this comment, we're using "pusher" to refer to the process running on behalf of the pusher transaction on some other range, which is calling QueryTxn (If this the pusher txn record is here, what could cause an abort there? Is that referring to context cancellation? Isn't that propagated across the RPC?)

Why is the setting of maxWaitCh tied to whether or not there is already a queue on the transaction? Isn't the problem of the pusher being aborted the same in either case?

pkg/storage/push_txn_queue.go, line 638 at r1 (raw file):

	go func() {
		defer func() {
			close(errCh)

Closing the channel (instead of just guaranteeing that exactly one value is written to it) is slightly unorthodox. I see why you're doing it (you may read from the channel once in the select loop and once in the defer), but it feels wrong to me to do it this way. Maybe set the channel to nil after reading from it in the select loop and then only read it in the defer if it is non-nil?

pkg/storage/push_txn_queue_test.go, line 538 at r1 (raw file):

			// time to query the missing record and notice.
			if !txnRecordExists {
				time.Sleep(10 * time.Millisecond)

We've generally found 10ms sleeps to be too short to be reliable when tests are run under stress.

Comments from Reviewable

petermattis · 2017-06-01T19:34:50Z

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

pkg/roachpb/api.proto, line 543 at r1 (raw file):

  optional bool wait_for_update = 3 [(gogoproto.nullable) = false];
  // Set of known dependent transactions.
  repeated bytes known_waiting_txns = 4 [(gogoproto.customtype) = "github.com/cockroachdb/cockroach/pkg/util/uuid.UUID"];

During a rolling upgrade what will the behavior of the system be when an upgraded node sends a QueryTxnRequest with these new fields? This may be answered later in the PR, I'm just driving by.

Comments from Reviewable

petermattis · 2017-06-06T14:55:20Z

Reiterating what @bdarnell said, seems like this PR helps high contention scenarios in general, not just deadlock scenarios, right? If yes, the added complexity seems worth it.

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

Comments from Reviewable

spencerkimball · 2017-06-06T15:09:40Z

It will help in cases where there is non-trivial contention (i.e. more than two txns in play over same subset of keys), even if there is no deadlock. This is true if there are retries due to timestamps being advanced for SERIALIZABLE transactions. This is probably the normal case where we would see significant contention.

My $0.02: I think it's worth doing this even if it were just degenerate deadlock cases.

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

Comments from Reviewable

spencerkimball · 2017-06-06T17:58:14Z

Review status: all files reviewed at latest revision, 12 unresolved discussions, some commit checks failed.

pkg/base/config.go, line 300 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Changing the default retry options for the whole system should at least be done in its own commit and probably its own PR (would be nice to do this after the benchmark reporting stuff is in). We also need to stress all the tests for this change to make sure it doesn't cause tests to start missing deadlines.

I agree. It was an error to have changed this to 10ms in the first place. That was done peremptorily in a98969d

I sent PR #16357 to correct this separately.

pkg/roachpb/api.proto, line 543 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

During a rolling upgrade what will the behavior of the system be when an upgraded node sends a QueryTxnRequest with these new fields? This may be answered later in the PR, I'm just driving by.

The query will busy loop. I don't see this as worth fixing, as this requires unusually high contention to be a problem in practice, and the window of opportunity is limited to a rolling restart. It'll potentially cause a performance blip.

pkg/storage/push_txn_queue.go, line 38 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This is awfully short; it hardly seems worth waiting at all if this will be the limit.

Changed to 50ms.

pkg/storage/push_txn_queue.go, line 115 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

don't you need to grab all the locks before iterating through the dependents?

There's no need for strict consistency with the set of dependents. We just need to avoid thread safety issues.

pkg/storage/push_txn_queue.go, line 247 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I think this should be TODO()

Changed the method signature to accept a context and updated the call sites.

pkg/storage/push_txn_queue.go, line 411 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

There should be comments explaining what each of these channels is used for.

Done.

pkg/storage/push_txn_queue.go, line 580 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think this sentence would be more clear if phrased positively: "This returns control to the pusher periodically so it can bail out if the pusher itself has already been aborted" (right?)

I'm getting confused by the term "pusher" in this file. This method runs on the range that holds the pusher's transaction record, right? But in this comment, we're using "pusher" to refer to the process running on behalf of the pusher transaction on some other range, which is calling QueryTxn (If this the pusher txn record is here, what could cause an abort there? Is that referring to context cancellation? Isn't that propagated across the RPC?)

Why is the setting of maxWaitCh tied to whether or not there is already a queue on the transaction? Isn't the problem of the pusher being aborted the same in either case?

I've updated the comment for clarity and added a comment to pushTxnQueue as well.

Yes, this method runs on the range that holds the pusher's txn record. I've removed the use of the word "pusher" in the comment and instead refer to the "txns" which might be waiting on the pusher we're trying to query. If the txn we're querying is in the queue, then it's definitely still pending; if it's not in the queue, then it might have been aborted or committed already. So, we ultimately have no option but to eventually bail waiting for an update.

The caller could have been aborted by a HIGH priority concurrent txn, it could have failed to heartbeat from its coordinator, etc.

maxWaitCh is tied to whether there is already a queue because if there is one, we know that the txn we're querying is still pending – so we don't have to have a safety pass through; we can confidently wait for the txn to be updated. The problem lies in the case of there not being a queue – a queue is only created if there's contention on the txn (i.e. someone else is "pushing" it). This is certainly not guaranteed to be true. If we just waited here, without the maxWaitCh timeout, we might wait forever if no further updates to the txn were forthcoming, as would be the case if it were aborted before the call to QueryTxn arrived.

pkg/storage/push_txn_queue.go, line 638 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Closing the channel (instead of just guaranteeing that exactly one value is written to it) is slightly unorthodox. I see why you're doing it (you may read from the channel once in the select loop and once in the defer), but it feels wrong to me to do it this way. Maybe set the channel to nil after reading from it in the select loop and then only read it in the defer if it is non-nil?

Done.

pkg/storage/push_txn_queue_test.go, line 340 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

isn't this duplicative with the below?

Good point. Removed.

pkg/storage/push_txn_queue_test.go, line 538 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We've generally found 10ms sleeps to be too short to be reliable when tests are run under stress.

This is expected to be flaky in the opposite way. In other words, it may fail to discover an error if the machine is slow. But since we run these things so often, I expect it would fail many times a day if the condition it's testing were broken. Keeping it "fast" at 10ms is just to avoid unnecessarily adding to test timings.

pkg/storage/store.go, line 2632 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: this method does more harm than good - i'd rather it was just inlined here.

I find this less noisy and there is already a IsSingleSkipLeaseCheckRequest in roachpb/batch.go.

pkg/storage/store.go, line 2640 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

we need to copy the batch request
// in order to modify the push txn request

This comment is incorrect; the copy happens unconditionally.

Done.

Comments from Reviewable

tamird · 2017-06-06T18:01:50Z

Reviewed 3 of 7 files at r2.
Review status: 6 of 10 files reviewed at latest revision, 9 unresolved discussions.

pkg/storage/push_txn_queue.go, line 247 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Changed the method signature to accept a context and updated the call sites.

you're still using context.Background() rather than the context that was passed in.

Comments from Reviewable

spencerkimball · 2017-06-06T18:03:28Z

Review status: 6 of 10 files reviewed at latest revision, 9 unresolved discussions.

pkg/storage/push_txn_queue.go, line 247 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you're still using context.Background() rather than the context that was passed in.

Oops. Fixed.

Comments from Reviewable

tamird · 2017-06-06T18:04:30Z

Reviewed 3 of 7 files at r2.
Review status: 9 of 10 files reviewed at latest revision, 8 unresolved discussions.

Comments from Reviewable

bdarnell · 2017-06-06T21:59:59Z

Reviewed 6 of 7 files at r2, 1 of 1 files at r3.
Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

pkg/storage/push_txn_queue.go, line 580 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

I've updated the comment for clarity and added a comment to pushTxnQueue as well.

Yes, this method runs on the range that holds the pusher's txn record. I've removed the use of the word "pusher" in the comment and instead refer to the "txns" which might be waiting on the pusher we're trying to query. If the txn we're querying is in the queue, then it's definitely still pending; if it's not in the queue, then it might have been aborted or committed already. So, we ultimately have no option but to eventually bail waiting for an update.

The caller could have been aborted by a HIGH priority concurrent txn, it could have failed to heartbeat from its coordinator, etc.

maxWaitCh is tied to whether there is already a queue because if there is one, we know that the txn we're querying is still pending – so we don't have to have a safety pass through; we can confidently wait for the txn to be updated. The problem lies in the case of there not being a queue – a queue is only created if there's contention on the txn (i.e. someone else is "pushing" it). This is certainly not guaranteed to be true. If we just waited here, without the maxWaitCh timeout, we might wait forever if no further updates to the txn were forthcoming, as would be the case if it were aborted before the call to QueryTxn arrived.

Got it. I'm less concerned about the arbitrariness (and shortness) of maxWaitForQueryTxn now.

pkg/storage/push_txn_queue.go, line 646 at r3 (raw file):

	push.mu.Unlock()

	go func() {

We tend to avoid go func() in favor of stopper.RunAsyncTask.

Comments from Reviewable

spencerkimball · 2017-06-07T14:45:50Z

Review status: all files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.

pkg/storage/push_txn_queue.go, line 646 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

We tend to avoid go func() in favor of stopper.RunAsyncTask.

Done.

Comments from Reviewable

During contention, a transaction which encounters another transaction's intents will move to "push" the transaction. This sends the "pusher" into the `pushTxnQueue` to wait for the transaction which is blocking it (the "pushee") to complete. Previously, the waiting pusher would periodically (using an overly aggressive backoff / retry loop) query _itself_ in order to determine whether its priority or status had been updated, as well as whether any transactions were waiting on the pusher. The set of waiting txns is transitive, so with sufficient repeated queries, even a large dependency cycle would be revealed. This change modifies the backoff / retry loop to instead immediately send a query which waits at the range containing the pusher's txn until either the pusher's priority or status changes, or the set of transactions waiting on it changes. This allows any changes to quickly propagate, but avoids unnecessary queries if there's nothing to communicate.

spencerkimball requested a review from bdarnell May 31, 2017 23:59

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from 0788c12 to b19c530 Compare June 1, 2017 00:01

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from b19c530 to 33d24cd Compare June 6, 2017 17:58

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from 33d24cd to 3b1def1 Compare June 6, 2017 18:00

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from 3b1def1 to b0c9fba Compare June 6, 2017 18:03

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from b0c9fba to 5b4270e Compare June 7, 2017 14:45

spencerkimball force-pushed the spencerkimball/push-txn-queue-priority-queries branch from 5b4270e to d5b7980 Compare June 7, 2017 14:56

spencerkimball merged commit 79f65ab into master Jun 7, 2017

spencerkimball deleted the spencerkimball/push-txn-queue-priority-queries branch June 7, 2017 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: improve high contention performance under deadlock scenarios #16256

storage: improve high contention performance under deadlock scenarios #16256

spencerkimball commented May 31, 2017

cockroach-teamcity commented May 31, 2017

spencerkimball commented Jun 1, 2017 •

edited

Loading

tamird commented Jun 1, 2017

bdarnell commented Jun 1, 2017

petermattis commented Jun 1, 2017

petermattis commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

tamird commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

tamird commented Jun 6, 2017

bdarnell commented Jun 6, 2017

spencerkimball commented Jun 7, 2017

storage: improve high contention performance under deadlock scenarios #16256

storage: improve high contention performance under deadlock scenarios #16256

Conversation

spencerkimball commented May 31, 2017

cockroach-teamcity commented May 31, 2017

spencerkimball commented Jun 1, 2017 • edited Loading

tamird commented Jun 1, 2017

bdarnell commented Jun 1, 2017

petermattis commented Jun 1, 2017

petermattis commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

tamird commented Jun 6, 2017

spencerkimball commented Jun 6, 2017

tamird commented Jun 6, 2017

bdarnell commented Jun 6, 2017

spencerkimball commented Jun 7, 2017

spencerkimball commented Jun 1, 2017 •

edited

Loading