Skip to content

storage: relax assertion about "prevented" parallel commits that succeed#37784

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixParCommitAssertion
May 24, 2019
Merged

storage: relax assertion about "prevented" parallel commits that succeed#37784
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/fixParCommitAssertion

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented May 24, 2019

I was seeing the following assertion fire when running TPC-C:

programming error: found COMMITTED record for prevented implicit commit

This was very concerning until I realized that the assertion was not taking into account how QueryIntent requests interact with resolved writes.

If a transaction recovery process believes it successfully prevented a write that was in-flight while a transaction was performing a parallel commit then we would expect that the transaction record could only be committed if it has a higher epoch or timestamp. This is true if the recovery process did actually prevent the in-flight write.

However, due to QueryIntent's implementation, a successful intent write that was already resolved after the parallel commit finished can be mistaken for a missing in-flight write by a recovery process. This ambiguity is harmless, as the transaction stays committed either way, but it means that we can't be quite as strict about what we assert in RecoverTxn as we would like to be.

The PR starts by modeling intent resolution to demonstrate that it can cause the assertion to fire. It then fixes (read: removes) the assertion in the next commit.

nvb added 3 commits May 24, 2019 00:39
This commit adds intent resolution to the parallel commits spec. It then uses
intent resolution to more accurately model the implementation of QueryIntent
requests. Doing so reveals an overly strict assertion in the `RecoverRecord`
step, which fires immediately.

Release note: None
I was seeing the following assertion fail when running TPC-C:
```
programming error: found COMMITTED record for prevented implicit commit
```

This was very concerning until I realized that the assertion was not taking
into account how QueryIntent requests interact with resolved writes.

If a transaction recovery process believes it successfully prevented a write
that was in-flight while a transaction was performing a parallel commit then we
would expect that the transaction record could only be committed if it has a
higher epoch or timestamp. This is true if the recovery process did actually
prevent the in-flight write.

However, due to QueryIntent's implementation, a successful intent write that was
already resolved after the parallel commit finished can be mistaken for a
missing in-flight write by a recovery process. This ambiguity is harmless, as
the transaction stays committed either way, but it means that we can't be quite
as strict about what we assert in RecoverTxn as we would like to be.

This commit removes the assertion and leaves a comment explaining the ambiguity
in its place.

Release note: None
Code movement and a better comment.

Release note: None
@nvb nvb requested review from a team and ajwerner May 24, 2019 05:01
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, :lgtm:

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented May 24, 2019

TFTR.

bors r=ajwerner

craig Bot pushed a commit that referenced this pull request May 24, 2019
37784: storage: relax assertion about "prevented" parallel commits that succeed r=ajwerner a=nvanbenschoten

I was seeing the following assertion fire when running TPC-C:
```
programming error: found COMMITTED record for prevented implicit commit
```

This was very concerning until I realized that the assertion was not taking into account how QueryIntent requests interact with resolved writes.

If a transaction recovery process believes it successfully prevented a write that was in-flight while a transaction was performing a parallel commit then we would expect that the transaction record could only be committed if it has a higher epoch or timestamp. This is true if the recovery process did actually prevent the in-flight write.

However, due to QueryIntent's implementation, a successful intent write that was already resolved after the parallel commit finished can be mistaken for a missing in-flight write by a recovery process. This ambiguity is harmless, as the transaction stays committed either way, but it means that we can't be quite as strict about what we assert in RecoverTxn as we would like to be.

The PR starts by modeling intent resolution to demonstrate that it can cause the assertion to fire. It then fixes (read: removes) the assertion in the next commit.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig Bot commented May 24, 2019

Build succeeded

@craig craig Bot merged commit 086eae5 into cockroachdb:master May 24, 2019
@nvb nvb deleted the nvanbenschoten/fixParCommitAssertion branch June 3, 2019 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants