RFC: range-local intent resolution #1873

tbg · 2015-07-30T14:27:40Z

No description provided.

bdarnell · 2015-07-30T18:18:27Z

docs/RPCS/local_intent_resolution.md

+## Disallow cross-coordinator Transactions
+Dealing with multiple coordinators handling a single `Txn` is tricky: It's hard to keep clients from using more than one gateway reliably without additional complexity, but the natural path to take with the suggested changes is to declare all intents which are not contained in the transaction record illegal. This could mean that parts of a client's `Transaction` might get lost upon commit when written through another gateway, unless we take precautions.
+
+An easy way out is adding a sequence number to the `Transaction` object whose single purpose is to be incremented by the client before sending another request in a `Txn`: The gateway aborts the `Transaction` if the sequence number does not match; additionally, a Transaction can only start with sequence number one. The returned error indicates an abort of the `Txn` with immediate retry.


We can disallow cross-coordinator transactions without adding a new field. The first request in a transaction is already special since it does not include a transaction ID. We could move the creation of the heartbeat goroutine from sendOne to maybeBeginTxn and return an error if we receive a txn ID that is not in our txns map.

If the sequence number is useful for other purposes then it may be worth including, but I'd like to avoid it if we can because it removes opportunities for parallelism (with a sequence number the only way to parallelize operations is to include them in the same batch; without a sequence number requests may be sent separately in parallel as long as the client is smart enough to merge all the transaction objects before calling EndTransaction).

Good point. Happy to ditch the sequence number.

bdarnell · 2015-07-30T18:44:32Z

LGTM

mrtracy · 2015-07-30T21:37:10Z

LGTM. This should reduce the mental complexity of our transaction model.

mrtracy · 2015-07-30T21:48:08Z

Actually, I did have a question: won't this make it difficult to use transactions when consuming Cockroach over HTTP, in the case where external traffic goes through a load balancer? If your transaction spans multiple round-trips, I wouldn't think there is a guarantee that a VIP will send the individual requests to the same machine.

I'm hardly an expert on the subject, but that seems like a relatively common configuration.

bdarnell · 2015-07-30T21:52:56Z

Yes, this means that you won't be able to use load balancers that don't enforce stickiness. This may be a good thing, since that configuration would be wasteful anyway (since multiple coordinators would start heartbeat loops).

If we wanted to, we could encode the coordinator's node ID in the proto.Transaction, and forward requests accordingly.

[ci skip]

tbg · 2015-07-31T08:23:14Z

@mrtracy yeah. Whenever that becomes an issue, there are ways out with some added complexity. For example, the client collecting the intents, or what Ben suggested. Heartbeating may need to be revisited when we make those changes, but it'll be doable. Added that to the comment.

tbg · 2015-07-31T20:29:36Z

I finished disallowing coordinator-spanning transactions, but since conflict resolution in its current form actually starts coordinator-spanning Txns all the time, I have to untangle that first before I can post the progress for review.

RFC: range-local intent resolution

a subgoal of cockroachdb#1873 is making sure that a transaction can only write on one coordinator (actually, the RFC proposes limiting reads to a single coordinator as well, but this has proven unnecessary). this change is carried out here. as a slight change from the RFC, we do add an additional flag to the proto.Transaction struct, which contains the information of whether previous requests in the Txn have written data (i.e. laid down intents). not doing that would have mandated giving up on the optimizations for read-only transactions since we would have had to rely on special- casing the first transactional request.

@bdarnell

this passes all tests except for one (which I've disabled; maybe @bdarnell has an idea?). Basically carried out a lot of the work announced in RFC cockroachdb#1873: - coordinator sends intents along with EndTransaction - range resolves all it possibly can in the EndTransaction batch, and hands everything else off to an async resolver. - "skipped intents" passed up by the range_command functions are now executed even if the command returns an error. This is required for EndTransaction which finds out that it's been aborted. There are a lot of TODOs documented in the code, and I intend to address all of those that are relevant for correctness. Some parts of the RFC are also not yet implemented (gc'ing Txn entries), but the road should be very clear and I'll add those before merging. I did want to get this out early because it's a fairly complex refactoring and will only get bigger from here.

tbg added the PTAL label Jul 30, 2015

bdarnell reviewed Jul 30, 2015
View reviewed changes

tbg force-pushed the intent_rfc branch 4 times, most recently from 9e812b9 to c5a5936 Compare July 30, 2015 20:13

tbg force-pushed the intent_rfc branch 3 times, most recently from ad1b029 to d50a6ad Compare July 31, 2015 07:09

RFC: range-local intent resolution

abf0054

[ci skip]

tbg force-pushed the intent_rfc branch from d50a6ad to abf0054 Compare July 31, 2015 08:10

tbg added a commit that referenced this pull request Jul 31, 2015

Merge pull request #1873 from tschottdorf/intent_rfc

9c56b9a

RFC: range-local intent resolution

tbg merged commit 9c56b9a into cockroachdb:master Jul 31, 2015

tbg deleted the intent_rfc branch July 31, 2015 20:29

tbg removed the PTAL label Jul 31, 2015

This was referenced Aug 2, 2015

pin txn writes to one coordinator #1900

Merged

move InternalCommitTrigger.Intents one level up #1902

Merged

tbg mentioned this pull request Aug 3, 2015

range-local intent resolution #1916

Merged

tbg mentioned this pull request Aug 12, 2015

GC of Transaction table #2062

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: range-local intent resolution #1873

RFC: range-local intent resolution #1873

tbg commented Jul 30, 2015

bdarnell Jul 30, 2015

tbg Jul 30, 2015

bdarnell commented Jul 30, 2015

mrtracy commented Jul 30, 2015

mrtracy commented Jul 30, 2015

bdarnell commented Jul 30, 2015

tbg commented Jul 31, 2015

tbg commented Jul 31, 2015

RFC: range-local intent resolution #1873

RFC: range-local intent resolution #1873

Conversation

tbg commented Jul 30, 2015

bdarnell Jul 30, 2015

Choose a reason for hiding this comment

tbg Jul 30, 2015

Choose a reason for hiding this comment

bdarnell commented Jul 30, 2015

mrtracy commented Jul 30, 2015

mrtracy commented Jul 30, 2015

bdarnell commented Jul 30, 2015

tbg commented Jul 31, 2015

tbg commented Jul 31, 2015