kv: don't disable the merge queue needlessly in more tests #46431

nvanbenschoten · 2020-03-23T20:14:29Z

Follow up to #46383.

These tests were disabling the queue to not interfere with its
AdminSplits, but since the tests were written, AdminSplit got
a TTL.

Release note: None
Release justification: test only

Follow up to cockroachdb#46383. These tests were disabling the queue to not interfere with its AdminSplits, but since the tests were written, AdminSplit got a TTL. Release note: None Release justification: test only

cockroach-teamcity · 2020-03-23T20:17:15Z

This change is

andreimatei

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

nvanbenschoten · 2020-03-23T22:16:03Z

bors r+

craig · 2020-03-23T22:42:56Z

Build failed (retrying...)

GitHub CI (Cockroach)

craig · 2020-03-24T00:00:36Z

Build succeeded

GitHub CI (Cockroach)

The scenario that this patch addresses is the following (from cockroachdb#46431): 1. txn1 sends Put(a) + Put(b) + EndTxn 2. DistSender splits the Put(a) from the rest. 3. Put(a) succeeds, but the rest catches some retriable error. 4. TxnCoordSender gets the retriable error. The fact that a sub-batch succeeded is lost. We used to care about that fact, but we've successively gotten rid of that tracking across cockroachdb#35140 and cockroachdb#44661. 5. we refresh everything that came before this batch. The refresh succeeds. 6. we re-send the batch. It gets split again. The part with the EndTxn executes first. The transaction is now STAGING. More than that, the txn is in fact implicitly committed - the intent on a is already there since the previous attempt and, because it's at a lower timestamp than the txn record, it counts as golden for the purposes of verifying the implicit commit condition. 7. some other transaction wonders in, sees that txn1 is in its way, and transitions it to explicitly committed. 8. the Put(a) now tries to evaluate. It gets really confused. I guess that different things can happen; none of them good. One thing that I believe we've observed in cockroachdb#46299 is that, if there's another txn's intent there already, the Put will try to push it, enter the txnWaitQueue, eventually observe that its own txn is committed and return an error. The client thus gets an error (and a non-ambiguous one to boot) although the txn is committed. Even worse perhaps, I think it's possible for a request to return wrong results instead of an error. This patch fixes it by inhibiting the parallel commit when the EndTxn batch is retried. This way, there's never a STAGING record. Release note (bug fix): A rare bug causing errors to be returned for successfully committed transactions was fixed. The most common error message was "TransactionStatusError: already committed". Release justification: serious bug fix Fixes cockroachdb#46341

46596: kvclient/kvcoord: inhibit parallel commit when retrying EndTxn request r=andreimatei a=andreimatei The scenario that this patch addresses is the following (from #46431): 1. txn1 sends Put(a) + Put(b) + EndTxn 2. DistSender splits the Put(a) from the rest. 3. Put(a) succeeds, but the rest catches some retriable error. 4. TxnCoordSender gets the retriable error. The fact that a sub-batch succeeded is lost. We used to care about that fact, but we've successively gotten rid of that tracking across #35140 and #44661. 5. we refresh everything that came before this batch. The refresh succeeds. 6. we re-send the batch. It gets split again. The part with the EndTxn executes first. The transaction is now STAGING. More than that, the txn is in fact implicitly committed - the intent on a is already there since the previous attempt and, because it's at a lower timestamp than the txn record, it counts as golden for the purposes of verifying the implicit commit condition. 7. some other transaction wonders in, sees that txn1 is in its way, and transitions it to explicitly committed. 8. the Put(a) now tries to evaluate. It gets really confused. I guess that different things can happen; none of them good. One thing that I believe we've observed in #46299 is that, if there's another txn's intent there already, the Put will try to push it, enter the txnWaitQueue, eventually observe that its own txn is committed and return an error. The client thus gets an error (and a non-ambiguous one to boot) although the txn is committed. Even worse perhaps, I think it's possible for a request to return wrong results instead of an error. This patch fixes it by inhibiting the parallel commit when the EndTxn batch is retried. This way, there's never a STAGING record. Release note (bug fix): A rare bug causing errors to be returned for successfully committed transactions was fixed. The most common error message was "TransactionStatusError: already committed". Release justification: serious bug fix Fixes #46341 Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

kv: don't disable the merge queue needlessly in more tests

623574d

Follow up to cockroachdb#46383. These tests were disabling the queue to not interfere with its AdminSplits, but since the tests were written, AdminSplit got a TTL. Release note: None Release justification: test only

nvanbenschoten requested a review from andreimatei March 23, 2020 20:14

andreimatei reviewed Mar 23, 2020

View reviewed changes

craig bot merged commit 21e6fa6 into cockroachdb:master Mar 24, 2020

andreimatei mentioned this pull request Mar 25, 2020

kvclient/kvcoord: inhibit parallel commit when retrying EndTxn request #46596

Merged

nvanbenschoten deleted the nvanbenschoten/mergeQueue branch March 30, 2020 19:22

andreimatei mentioned this pull request Apr 1, 2020

release-20.1: kvclient/kvcoord: inhibit parallel commit when retrying EndTxn request #46848

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: don't disable the merge queue needlessly in more tests #46431

kv: don't disable the merge queue needlessly in more tests #46431

nvanbenschoten commented Mar 23, 2020

cockroach-teamcity commented Mar 23, 2020

andreimatei left a comment

nvanbenschoten commented Mar 23, 2020

craig bot commented Mar 23, 2020

craig bot commented Mar 24, 2020

kv: don't disable the merge queue needlessly in more tests #46431

kv: don't disable the merge queue needlessly in more tests #46431

Conversation

nvanbenschoten commented Mar 23, 2020

cockroach-teamcity commented Mar 23, 2020

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Mar 23, 2020

craig bot commented Mar 23, 2020

Build failed (retrying...)

craig bot commented Mar 24, 2020

Build succeeded