Skip to content

rfc: add an RFC describing an improved client.Txn interface#42766

Open
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:rfc.txn
Open

rfc: add an RFC describing an improved client.Txn interface#42766
andreimatei wants to merge 1 commit intocockroachdb:masterfrom
andreimatei:rfc.txn

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

... allowing for concurrent use of a txn.

Release note: None

@andreimatei andreimatei requested review from bdarnell, knz, nvb and tbg November 26, 2019 02:15
@andreimatei andreimatei requested a review from a team as a code owner November 26, 2019 02:15
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @knz, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 53 at r1 (raw file):

  requesting a refresh
* the `TxnCoordSender` successfully refreshes all the requests prior to *r* and
  *r1* (note that r*'s read spans have not yet been recorded) and bumps the

note that r


docs/RFCS/20191125_txn_interface_improvements.md, line 188 at r1 (raw file):

An alternative to the concurrency part is to simply dissallow concurrency. We
have write pipelining on the server-side, which is already a form of

Write pipelining is a much less effective form of concurrency when the leaseholder is in a remote region.

I think it’s worth looking at different cases where write concurrency on top of pipelining might occur.

Two that come to mind are

  • The implementation of some schema changes

Currently they perform multiple batches to write to different system tables. These tables rarely have a collocated leaseholder in global deployments.

  • CTEs with multiple mutations

These will be executed sequentially IIUC.

In both these cases it might be even better to fuse separate write batches into a single batch than to run batches in parallel (up to some point). However, especially for the first case, fusing batches makes it difficult to craft abstractions which operate on a client.Txn.

I’m not sure I have a concrete proposal re: fusing batches as it feels like a radically different programming model. Enabling parallel writes seems more approachable.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting these thoughts in writing. I recognize your proposal from a month or two ago. I like the API you propose too, and 💯 on aligning the proposal with the savepoints RFC.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 19 at r1 (raw file):

concurrent actors before restarting.

# Motivation

I think some xrefs with the TCS tech note would be useful in this section.


docs/RFCS/20191125_txn_interface_improvements.md, line 128 at r1 (raw file):

### Refreshes

I think the alternatives in this section would benefit from sequence diagrams akin to those in the tech note. The visual sequencing would clarify the prosaic explanation.


docs/RFCS/20191125_txn_interface_improvements.md, line 183 at r1 (raw file):

allow further operations on the handle that encountered the retriable error.
However, this seems likely to confuse users.

The TCS tech note identifies various kinds of errors, not all of which are retry errors. What are the non-retry errors that would keep a txnhandle able to process further requests?


docs/RFCS/20191125_txn_interface_improvements.md, line 187 at r1 (raw file):

## Rationale and Alternatives

An alternative to the concurrency part is to simply dissallow concurrency. We

nit: disallow


docs/RFCS/20191125_txn_interface_improvements.md, line 200 at r1 (raw file):

behavior of concurrent overlapping reads and writes is undefined. Is there a
reason to allow a particular handle to be associated with a different read
sequence number so it can be isolated from concurrent overlapping writes?

I could see the constraint checks for multiple previous CTEs in a single statement run concurrently with each other at different timestamps? This would be an optimization of course, but maybe it's desirable. I'd ask Radu.


docs/RFCS/20191125_txn_interface_improvements.md, line 206 at r1 (raw file):

extant concurrent actors , whereas the point of leaves is to allow txn state to
be disseminated to remote nodes, updated , and collected back. So I don't think
there's much to unify, but perhaps someone can open my eyes.

I'd say that a Leaf is a particular kind of txnhandle, one that can do fewer things. But it probably should implement the interface.


docs/RFCS/20191125_txn_interface_improvements.md, line 208 at r1 (raw file):

there's much to unify, but perhaps someone can open my eyes.
3. Should we take the opportunity and make a `client.Txn` unusable after a
`TransactionAbortedError`? The current code is pretty surprising - the

yes!

Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


docs/RFCS/20191125_txn_interface_improvements.md, line 142 at r1 (raw file):

recorded and their response is returned to the client), then we can refresh.
We’d also block new requests from starting. The problem here is that, if one of
these concurrent requests causes the refresh to fail, it’s too to retry that

s/too/too late/


docs/RFCS/20191125_txn_interface_improvements.md, line 144 at r1 (raw file):

these concurrent requests causes the refresh to fail, it’s too to retry that
request at the new read timestamp (since the result was already delivered to the
client). This seems suboptimal, but it’s also analogous the handling of all the

Just to be explicit, the downside here is that we'd return a user-visible retry error, right? Would option 2 or 3 be able to avoid that?

... allowing for concurrent use of a txn.

Release note: None
@bobvawter
Copy link
Copy Markdown
Contributor

bobvawter commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

@tbg tbg removed their request for review June 25, 2020 09:51
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
@nvb nvb removed their request for review September 27, 2022 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants