# tla-plus: QueryTxn on ambiguous QueryIntent failure during ParallelCommit#37900

## Conversation

Member

### nvanbenschoten commented May 29, 2019

 This PR starts by adding write pipelining and concurrent intent resolution by conflicting transactions to the parallel commits TLA+ spec. This causes an assertion to be triggered by the hazard discussed in #37866. The assertion fires when the pre-commit QueryIntent for a pipelined write gets confused about an already resolved intent. This triggers a transaction retry, at which point the transaction record is unexpectedly already committed. This PR then proposes a medium-term solution to #37866. In doing so, it resolved the model failure from the previous commit. The medium-term solution is to catch IntentMissingErrors in DistSender's divideAndSendParallelCommit method coming from the pre-commit QueryIntent batch (right around here). When we see one of these errors, we immediately send a QueryTxn request to the transaction record. This will result in one of the four statuses: PENDING: Unexpected because the parallel commit EndTransactionRequest succeeded. Ignore. STAGING: Unambiguously not the issue from #37866. Ignore. COMMITTED: Unambiguously the issue from #37866. Strip the error and return the updated proto. ABORTED: Still ambiguous. Transform error into an AmbiguousCommitError and return. This solution isolates the ambiguity caused by the loss of information during intent resolution to just the case where the result of the QueryTxn is ABORTED. This is because an ABORTED record can mean either 1) the transaction was ABORTED and the missing intent was removed or 2) the transaction was COMMITTED, all intents were resolved, and the transaction record was GCed. This is a significant reduction in the cases where an AmbiguousCommitError will be needed and I suspect it will be able to tide us over until we're able to eliminate the loss of information caused by intent resolution almost entirely (e.g. by storing transaction IDs in committed values). There will still be some loss of information if we're not careful about MVCC GC, and it's still not completely clear to me how we'll need to handle that in every case. That's a discussion for a different time.
 tla-plus: model pipelined writes and concurrent intent resolution 
 74c36e5 
This commit adds write pipelining and concurrent intent resolution
by conflicting transactions to the parallel commits spec. This causes
an assertion to be triggered by the hazard discussed in #37866.

The assertion fires when the pre-commit QueryIntent for a pipelined
a transaction retry, at which point the transaction record is

Running the model right now hits this issue. The next commit will
propose a solution to fixing it.

Release note: None
 tla-plus: QueryTxn on ambiguous QueryIntent failure during ParallelCo… 
 8dc06b6 
…mmit

This commit proposes a medium-term solution to #37866. In doing so, it
resolved the model failure from the previous commit.

The medium-term solution is to catch IntentMissingErrors in DistSender's
divideAndSendParallelCommit method coming from the pre-commit QueryIntent
batch. When we see one of these errors, we immediately send a QueryTxn
request to the transaction record. This will result in one of the four
statuses:
1. PENDING: Unexpected because the parallel commit EndTransactionRequest
succeeded. Ignore.
2. STAGING: Unambiguously not the issue from #37866. Ignore.
3. COMMITTED: Unambiguously the issue from #37866. Strip the error and
return the updated proto.
4. ABORTED: Still ambiguous. Transform error into an AmbiguousCommitError
and return.

This solution isolates the ambiguity caused by the loss of information during
intent resolution to just the case where the result of the QueryTxn is ABORTED.
This is because an ABORTED record can mean either 1) the transaction was ABORTED
and the missing intent was removed or 2) the transaction was COMMITTED, all intents
were resolved, and the transaction record was GCed.

This is a significant reduction in the cases where an AmbiguousCommitError will
be needed and I suspect it will be able to tide us over until we're able to eliminate
the loss of information caused by intent resolution almost entirely (e.g. by storing
transaction IDs in committed values). There will still be some loss of information if
we're not careful about MVCC GC, and it's still not completely clear to me how we'll
need to handle that in every case.

Release note: None
Member

### cockroach-teamcity commented May 29, 2019

 This change is
approved these changes
Member

### tbg left a comment

 Reviewed 2 of 2 files at r1, 1 of 1 files at r2. Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten) docs/tla-plus/ParallelCommits/ParallelCommits.tla, line 296 at r1 (raw file):  while to_write /= {} do with key \in to_write do if ~intent_writes[key].resolved then Just curious what forced you to add this. You're only mutating intent_writes once in this label, so...?
 tla-plus: model pipelined write failures during async consensus 
 c6b641c 
This commit adds the possibility of async consensus failures when
pipelining writes in the parallel commits spec. This doesn't cause
any new issues, but expands the model to be more accurate.

The commit also documents the complications caused by information
loss during intent resolution.

Release note: None
Member Author

### nvanbenschoten left a comment

 TFTR! bors r+ Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg) docs/tla-plus/ParallelCommits/ParallelCommits.tla, line 296 at r1 (raw file): Previously, tbg (Tobias Grieger) wrote… Just curious what forced you to add this. You're only mutating intent_writes once in this label, so...? At some point I'm going to add multiple resolved states, so checking that the intent is not resolved before resolving it to be more explicit seemed like an appropriate change. You're correct that this wasn't necessary.
