Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
storage: don't clear lastTxnMeta on WriteIntentError to different key #32773
This change removes a faulty optimization in the
Release note (bug fix): Fix a bug where metadata about contended keys
left a comment
Good catch. You're going to be adding better unit testing and such, right?
Doesn't there need to be a case where we set it to nil or does it not matter?
I'm also confused generally by the semantics of
I would likely have more questions if I really dug into the code. While you're here, I'd appreciate a round of comments on how it all works.
referenced this pull request
Dec 3, 2018
left a comment
Previously, tbg (Tobias Grieger) wrote…
Nil is a dangerous value here, since it doesn't participate in cycle detection. I don't think it's ever required to set it to nil. Instead, we leave it set to the last-known transaction, so that if that transaction is pending we'll set up the cycle-detection loop, and if it's not pending we'll break out of the contention queue and retry.
I don't think that race would be a problem as described: read1 would try to push txn1, see that it's no longer pending, and be able to retry. But I'm not sure if all instances of that race would be so benign. I wouldn't be surprised if there were cases in which this could break a necessary link in the waiting graph.
This took way longer than expected, but I finally got a reliable reproduction of the failure in a test. It doesn't reproduce quite as reliably now that I ripped out the randomized sleeps that helped guide it in the right direction, but it still fails under stress without the fix.
I wasn't able to create a sceneraio that reproduced the issue with fewer than 5 unique txns. Here are the steps:
There are probably a few simplifications that we could make here. The tricky part is ensuring that nothing corrected the
I agree. The change rips out all code paths that allow it.
We never need to set it to nil. It's always safe to allow the push and go on from there.
I don't think pushing will ever break links in the waiting graph. As long as we don't stop pushing and we continue handling cycles correctly in the txnWaitQueue then we'll eventually converge on the correct depedency graph.