storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

nvanbenschoten · 2018-12-03T04:31:18Z

This change removes a faulty optimization in the contentionQueue.
The optimization removed the txnMeta associated with a contended key
in the queue when it found a WriteIntentError from a different key.
It didn't take into account that this error could be from an earlier
request within the same batch, meaning that we can't make any assumptions
about the state of the previously contended intent simply because we
see a different WriteIntentError.

Release note (bug fix): Fix a bug where metadata about contended keys
was inadvertently ignored, allowing for a failure in txn cycle detection
and transaction deadlocks in rare cases.

cockroach-teamcity · 2018-12-03T04:31:26Z

This change is

tbg

Good catch. You're going to be adding better unit testing and such, right?

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/intent_resolver.go, line 349 at r1 (raw file):

			// contended key (i.e. newWIErr != nil).
			contended.setLastTxnMeta(nil)
		}

Doesn't there need to be a case where we set it to nil or does it not matter?

I'm also confused generally by the semantics of setLastTxnMeta. For example, there's no synchronization between the callers of add, so the following can happen:

read1 runs into intent of txn1, almost calls add but gets preempted
txn1 resolves, txn2 writes another intent
read2 runs into intent of txn2, calls add and thus setLastMeta(txn2)
read1 continues and clobbers by calling setLastMeta(txn1)

I would likely have more questions if I really dug into the code. While you're here, I'd appreciate a round of comments on how it all works.

bdarnell

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/storage/intent_resolver.go, line 349 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Doesn't there need to be a case where we set it to nil or does it not matter?

I'm also confused generally by the semantics of setLastTxnMeta. For example, there's no synchronization between the callers of add, so the following can happen:

read1 runs into intent of txn1, almost calls add but gets preempted

txn1 resolves, txn2 writes another intent

read2 runs into intent of txn2, calls add and thus setLastMeta(txn2)

read1 continues and clobbers by calling setLastMeta(txn1)

I would likely have more questions if I really dug into the code. While you're here, I'd appreciate a round of comments on how it all works.

Nil is a dangerous value here, since it doesn't participate in cycle detection. I don't think it's ever required to set it to nil. Instead, we leave it set to the last-known transaction, so that if that transaction is pending we'll set up the cycle-detection loop, and if it's not pending we'll break out of the contention queue and retry.

I don't think that race would be a problem as described: read1 would try to push txn1, see that it's no longer pending, and be able to retry. But I'm not sure if all instances of that race would be so benign. I wouldn't be surprised if there were cases in which this could break a necessary link in the waiting graph.

nvanbenschoten · 2018-12-04T07:05:56Z

This took way longer than expected, but I finally got a reliable reproduction of the failure in a test. It doesn't reproduce quite as reliably now that I ripped out the randomized sleeps that helped guide it in the right direction, but it still fails under stress without the fix.

I wasn't able to create a sceneraio that reproduced the issue with fewer than 5 unique txns. Here are the steps:

1.  txn1 writes to keyA.
2.  txn2 writes to keyB.
3.  txn4 writes to keyC and keyB in the same batch. The batch initially
    fails with a WriteIntentError on keyA. It enters the contentionQueue
    and becomes the front of a contendedKey list.
4.  txn5 writes to keyB. It enters the contentionQueue behind txn4.
5.  txn3 writes to keyC.
6.  txn2 is committed and the intent on keyB is resolved.
7.  txn4 exits the contentionQueue and re-evaluates. This time, it hits
    a WriteIntentError on the first request in its batch: keyC. HOWEVER,
    before it updates the contentionQueue, steps 8-10 occur.
8.  txn3 writes to keyB. It never enters the contentionQueue. txn3 then
    writes to keyA in a separate batch and gets stuck waiting on txn1.
9.  txn2 writes to keyB. It observes txn3's intent and informs the
    contentionQueue of the new txn upon its entrance.
10. txn5, the new front of the contendedKey list, pushes txn2 and gets
    stuck in the txnWaitQueue.
11. txn4 finally updates the contentionQueue. A bug previously existed
    where it would set the contendedKey's lastTxnMeta to nil because it
    saw a WriteIntentError for a different key.
12. txn1 notices the nil lastTxnMeta and does not push txn2. This prevents
    cycle detection from succeeding and we observe a deadlock.

There are probably a few simplifications that we could make here. The tricky part is ensuring that nothing corrected the lastTxnMeta once we get it to nil.

Nil is a dangerous value here, since it doesn't participate in cycle detection. I don't think it's ever required to set it to nil. Instead, we leave it set to the last-known transaction, so that if that transaction is pending we'll set up the cycle-detection loop, and if it's not pending we'll break out of the contention queue and retry.

I agree. The change rips out all code paths that allow it.

Doesn't there need to be a case where we set it to nil or does it not matter?

We never need to set it to nil. It's always safe to allow the push and go on from there.

I don't think that race would be a problem as described: read1 would try to push txn1, see that it's no longer pending, and be able to retry. But I'm not sure if all instances of that race would be so benign. I wouldn't be surprised if there were cases in which this could break a necessary link in the waiting graph.

I don't think pushing will ever break links in the waiting graph. As long as we don't stop pushing and we continue handling cycles correctly in the txnWaitQueue then we'll eventually converge on the correct depedency graph.

nvanbenschoten · 2018-12-04T16:18:14Z

I opened #32814 to create a test that regularly generates these kinds of scenarios.

Fixes cockroachdb#32582. This change removes a faulty optimization in the `contentionQueue`. The optimization removed the txnMeta associated with a contended key in the queue when it found a `WriteIntentError` from a different key. It didn't take into account that this error could be from an earlier request within the same batch, meaning that we can't make any assumptions about the state of the previously contended intent simply because we see a different `WriteIntentError`. Release note (bug fix): Fix a bug where metadata about contended keys was inadvertently ignored, allowing for a failure in txn cycle detection and transaction deadlocks in rare cases.

bdarnell

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained

nvanbenschoten · 2018-12-05T14:47:45Z

TFTR!

bors r+

32773: storage: don't clear lastTxnMeta on WriteIntentError to different key r=nvanbenschoten a=nvanbenschoten Fixes #32582. This change removes a faulty optimization in the `contentionQueue`. The optimization removed the txnMeta associated with a contended key in the queue when it found a `WriteIntentError` from a different key. It didn't take into account that this error could be from an earlier request within the same batch, meaning that we can't make any assumptions about the state of the previously contended intent simply because we see a different `WriteIntentError`. Release note (bug fix): Fix a bug where metadata about contended keys was inadvertently ignored, allowing for a failure in txn cycle detection and transaction deadlocks in rare cases. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2018-12-05T15:05:24Z

Build succeeded

GitHub CI (Cockroach)

nvanbenschoten requested review from bdarnell, spencerkimball and a team December 3, 2018 04:31

tbg approved these changes Dec 3, 2018

View reviewed changes

bdarnell mentioned this pull request Dec 3, 2018

production: txn contention caused qps stall on adriatic #32582

Closed

bdarnell reviewed Dec 3, 2018

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/noWIError branch from f89676b to 8311e97 Compare December 4, 2018 07:00

nvanbenschoten changed the title ~~[DNM] storage: don't clear lastTxnMeta on WriteIntentError to different key~~ storage: don't clear lastTxnMeta on WriteIntentError to different key Dec 4, 2018

nvanbenschoten force-pushed the nvanbenschoten/noWIError branch from 8311e97 to 3793bfa Compare December 4, 2018 16:16

nvanbenschoten force-pushed the nvanbenschoten/noWIError branch from 3793bfa to 1252ac7 Compare December 4, 2018 18:46

bdarnell approved these changes Dec 5, 2018

View reviewed changes

craig bot merged commit 1252ac7 into cockroachdb:master Dec 5, 2018

nvanbenschoten mentioned this pull request Dec 5, 2018

release-2.1: storage: don't clear lastTxnMeta on WriteIntentError to different key #32853

Merged

nvanbenschoten deleted the nvanbenschoten/noWIError branch December 5, 2018 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

nvanbenschoten commented Dec 3, 2018 •

edited

Loading

cockroach-teamcity commented Dec 3, 2018

tbg left a comment

bdarnell left a comment

nvanbenschoten commented Dec 4, 2018

nvanbenschoten commented Dec 4, 2018

bdarnell left a comment

nvanbenschoten commented Dec 5, 2018

craig bot commented Dec 5, 2018

storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

Conversation

nvanbenschoten commented Dec 3, 2018 • edited Loading

cockroach-teamcity commented Dec 3, 2018

tbg left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Dec 4, 2018

nvanbenschoten commented Dec 4, 2018

bdarnell left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Dec 5, 2018

craig bot commented Dec 5, 2018

Build succeeded

nvanbenschoten commented Dec 3, 2018 •

edited

Loading