Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: don't clear lastTxnMeta on WriteIntentError to different key #32773

Merged
merged 1 commit into from
Dec 5, 2018

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Dec 3, 2018

Fixes #32582.

This change removes a faulty optimization in the contentionQueue.
The optimization removed the txnMeta associated with a contended key
in the queue when it found a WriteIntentError from a different key.
It didn't take into account that this error could be from an earlier
request within the same batch, meaning that we can't make any assumptions
about the state of the previously contended intent simply because we
see a different WriteIntentError.

Release note (bug fix): Fix a bug where metadata about contended keys
was inadvertently ignored, allowing for a failure in txn cycle detection
and transaction deadlocks in rare cases.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. You're going to be adding better unit testing and such, right?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/intent_resolver.go, line 349 at r1 (raw file):

			// contended key (i.e. newWIErr != nil).
			contended.setLastTxnMeta(nil)
		}

Doesn't there need to be a case where we set it to nil or does it not matter?

I'm also confused generally by the semantics of setLastTxnMeta. For example, there's no synchronization between the callers of add, so the following can happen:

  • read1 runs into intent of txn1, almost calls add but gets preempted
  • txn1 resolves, txn2 writes another intent
  • read2 runs into intent of txn2, calls add and thus setLastMeta(txn2)
  • read1 continues and clobbers by calling setLastMeta(txn1)

I would likely have more questions if I really dug into the code. While you're here, I'd appreciate a round of comments on how it all works.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/intent_resolver.go, line 349 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Doesn't there need to be a case where we set it to nil or does it not matter?

I'm also confused generally by the semantics of setLastTxnMeta. For example, there's no synchronization between the callers of add, so the following can happen:

  • read1 runs into intent of txn1, almost calls add but gets preempted
  • txn1 resolves, txn2 writes another intent
  • read2 runs into intent of txn2, calls add and thus setLastMeta(txn2)
  • read1 continues and clobbers by calling setLastMeta(txn1)

I would likely have more questions if I really dug into the code. While you're here, I'd appreciate a round of comments on how it all works.

Nil is a dangerous value here, since it doesn't participate in cycle detection. I don't think it's ever required to set it to nil. Instead, we leave it set to the last-known transaction, so that if that transaction is pending we'll set up the cycle-detection loop, and if it's not pending we'll break out of the contention queue and retry.

I don't think that race would be a problem as described: read1 would try to push txn1, see that it's no longer pending, and be able to retry. But I'm not sure if all instances of that race would be so benign. I wouldn't be surprised if there were cases in which this could break a necessary link in the waiting graph.

@nvanbenschoten
Copy link
Member Author

This took way longer than expected, but I finally got a reliable reproduction of the failure in a test. It doesn't reproduce quite as reliably now that I ripped out the randomized sleeps that helped guide it in the right direction, but it still fails under stress without the fix.

I wasn't able to create a sceneraio that reproduced the issue with fewer than 5 unique txns. Here are the steps:

1.  txn1 writes to keyA.
2.  txn2 writes to keyB.
3.  txn4 writes to keyC and keyB in the same batch. The batch initially
    fails with a WriteIntentError on keyA. It enters the contentionQueue
    and becomes the front of a contendedKey list.
4.  txn5 writes to keyB. It enters the contentionQueue behind txn4.
5.  txn3 writes to keyC.
6.  txn2 is committed and the intent on keyB is resolved.
7.  txn4 exits the contentionQueue and re-evaluates. This time, it hits
    a WriteIntentError on the first request in its batch: keyC. HOWEVER,
    before it updates the contentionQueue, steps 8-10 occur.
8.  txn3 writes to keyB. It never enters the contentionQueue. txn3 then
    writes to keyA in a separate batch and gets stuck waiting on txn1.
9.  txn2 writes to keyB. It observes txn3's intent and informs the
    contentionQueue of the new txn upon its entrance.
10. txn5, the new front of the contendedKey list, pushes txn2 and gets
    stuck in the txnWaitQueue.
11. txn4 finally updates the contentionQueue. A bug previously existed
    where it would set the contendedKey's lastTxnMeta to nil because it
    saw a WriteIntentError for a different key.
12. txn1 notices the nil lastTxnMeta and does not push txn2. This prevents
    cycle detection from succeeding and we observe a deadlock.

There are probably a few simplifications that we could make here. The tricky part is ensuring that nothing corrected the lastTxnMeta once we get it to nil.

Nil is a dangerous value here, since it doesn't participate in cycle detection. I don't think it's ever required to set it to nil. Instead, we leave it set to the last-known transaction, so that if that transaction is pending we'll set up the cycle-detection loop, and if it's not pending we'll break out of the contention queue and retry.

I agree. The change rips out all code paths that allow it.

Doesn't there need to be a case where we set it to nil or does it not matter?

We never need to set it to nil. It's always safe to allow the push and go on from there.

I don't think that race would be a problem as described: read1 would try to push txn1, see that it's no longer pending, and be able to retry. But I'm not sure if all instances of that race would be so benign. I wouldn't be surprised if there were cases in which this could break a necessary link in the waiting graph.

I don't think pushing will ever break links in the waiting graph. As long as we don't stop pushing and we continue handling cycles correctly in the txnWaitQueue then we'll eventually converge on the correct depedency graph.

@nvanbenschoten nvanbenschoten changed the title [DNM] storage: don't clear lastTxnMeta on WriteIntentError to different key storage: don't clear lastTxnMeta on WriteIntentError to different key Dec 4, 2018
@nvanbenschoten
Copy link
Member Author

I opened #32814 to create a test that regularly generates these kinds of scenarios.

Fixes cockroachdb#32582.

This change removes a faulty optimization in the `contentionQueue`.
The optimization removed the txnMeta associated with a contended key
in the queue when it found a `WriteIntentError` from a different key.
It didn't take into account that this error could be from an earlier
request within the same batch, meaning that we can't make any assumptions
about the state of the previously contended intent simply because we
see a different `WriteIntentError`.

Release note (bug fix): Fix a bug where metadata about contended keys
was inadvertently ignored, allowing for a failure in txn cycle detection
and transaction deadlocks in rare cases.
Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@nvanbenschoten
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Dec 5, 2018
32773: storage: don't clear lastTxnMeta on WriteIntentError to different key r=nvanbenschoten a=nvanbenschoten

Fixes #32582.

This change removes a faulty optimization in the `contentionQueue`.
The optimization removed the txnMeta associated with a contended key
in the queue when it found a `WriteIntentError` from a different key.
It didn't take into account that this error could be from an earlier
request within the same batch, meaning that we can't make any assumptions
about the state of the previously contended intent simply because we
see a different `WriteIntentError`.

Release note (bug fix): Fix a bug where metadata about contended keys
was inadvertently ignored, allowing for a failure in txn cycle detection
and transaction deadlocks in rare cases.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Dec 5, 2018

Build succeeded

@craig craig bot merged commit 1252ac7 into cockroachdb:master Dec 5, 2018
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/noWIError branch December 5, 2018 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

production: txn contention caused qps stall on adriatic
4 participants