Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range-local intent resolution #1916

Merged
merged 4 commits into from
Aug 6, 2015
Merged

range-local intent resolution #1916

merged 4 commits into from
Aug 6, 2015

Conversation

tbg
Copy link
Member

@tbg tbg commented Aug 3, 2015

this passes all tests except for one (which I've disabled; maybe
@bdarnell has an idea?). Basically carried out a lot of the work announced in
RFC #1873:

  • coordinator sends intents along with EndTransaction
  • range resolves all it possibly can in the EndTransaction batch, and hands
    everything else off to an async resolver.
  • "skipped intents" passed up by the range_command functions are now executed
    even if the command returns an error. This is required for EndTransaction
    which finds out that it's been aborted.

I addressed all of the TODOs I wanted to deal with before merging. The
transaction record cleanup mechanism isn't there yet: it's the natural candidate
for the next PR. I think this is ready to merge after adding some more tests,
so it's ready for a closer look.

@tbg tbg added the PTAL label Aug 3, 2015
@tbg tbg force-pushed the intents_range branch 2 times, most recently from 035e7b0 to 861eda1 Compare August 3, 2015 20:25
// IsPrev is a more efficient version of k.Next().Equal(m).
func (k Key) IsPrev(m Key) bool {
l := len(m) - 1
return l == len(k) && m[l] == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't enough. We still need to check that k and m[:l] are equal (consider foo and bar\x00).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geez. Good point.

@bdarnell
Copy link
Contributor

bdarnell commented Aug 3, 2015

LGTM so far.

@mrtracy
Copy link
Contributor

mrtracy commented Aug 3, 2015

Very well commented; everything appears to do what it claims, by my eye. LGTM

// need-to-resolve-some-intents-or-a-split-may-never-finish thing
// going on on during node shutdown.
if !r.ContainsKey(intent.Key) {
b := &client.Batch{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this using a separate batch for each intent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see the comment.

@tbg tbg force-pushed the intents_range branch 5 times, most recently from f837200 to 5caed68 Compare August 4, 2015 07:46
// TODO(tschottdorf): Also no support for key-range intents yet. We
// need to support those because the EndTransaction path will throw
// them our way. And we need a test that does exactly that.
// TODO(tschottdorf): currently synchronous. See if we still have that
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @tamird re: comment below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if it's synchronous, it needs to be tied to the stopper. Is this being invoked inside a RunWorker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not important now. This is provisional code until the TODOs above are addressed. We'll want to do this async eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're asking for a deadlock in a test if you don't RunTask here, but ok

@tbg tbg force-pushed the intents_range branch 5 times, most recently from c891ce6 to e3bc0ef Compare August 4, 2015 12:54
@tbg tbg changed the title WIP: range-local intent resolution range-local intent resolution Aug 4, 2015
@tbg tbg force-pushed the intents_range branch 2 times, most recently from 39db2aa to d1525f6 Compare August 5, 2015 17:55
@tbg
Copy link
Member Author

tbg commented Aug 5, 2015

PTAL, just rebased and added another test for the intent resolution.
The test is still skipped (or should it be fixed now? @bdarnell), I'll open an issue to fix it up if we don't have one yet.

@bdarnell
Copy link
Contributor

bdarnell commented Aug 5, 2015

The test still needs to be fixed; go ahead and open another issue.

aHeader.User = bHeader.User
aHeader.UserPriority = bHeader.UserPriority
aHeader.Txn = bHeader.Txn
// Only allow individual transactions in a batch if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/individual transactions/transactions on the individual requests of a batch/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussion with @spencerkimball: It's weird that ResolveIntent even uses the header transaction, since it is not really a part of that transaction. Instead of allowing requests from multiple transactions to appear in the same batch, it's probably better to just add a Transaction field to ResolveIntentRequest so it doesn't use the header transaction at all (and then we can preserve the invariant that all requests in a batch have the same transaction)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I had the same thoughts about header usage by Resolve and Push (and even carried it out in unpushed code) but decided it's worth its own PR. I'll file an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue
}

// If it is local, it goes directly into Raft.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this split save us? If it's going through raft it already involves non-local communication; it would be simpler to just treat everything as if it were remote.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well thing is we actually want to make sure that local intents are resolved synchronously. If we group these all into a single batch and invoke through non-local dist-sender, we'd need to wait for both the fast local intents and the slow non-local intents before finishing the txn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not resolving them synchronously - we're waiting for the commands to be proposed to raft, and then moving on before they are committed (at least if the comments are still accurate). Some local intents are resolved completely synchronously in EndTransaction, though. The ones that make it here are just the ones that need some other transaction to be pushed before they can be resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch this. I see that this is only used from the Store.resolveWriteIntentError execution path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much does this split save us?

good question. The optimization here is to put things into Raft but not wait for them, which means almost certainly that they'll commit before our retrying client gets its next attempt in. So it definitely saves us from having to wait for Raft, which is a couple of round-trips.
I've added a TODO checking this for premature optimization. I imagine we'll refactor this part anyways when we have range-local batch support.

@bdarnell
Copy link
Contributor

bdarnell commented Aug 5, 2015

LGTM

action := func() {
// Trace this under the ID of the intent owner.
ctx := tracer.ToCtx(ctx, r.rm.Tracer().NewTrace(resolveArgs.Header().Txn))
if _, err := r.addWriteCmd(ctx, resolveArgs, &wg); err != nil && log.V(0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part isn't in the raft command processing - it's called from the Store that originally proposed the EndTransactionCommand after it has been applied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment.
Changed V(0) (back) to V(1) above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was replying to a comment from @spencerkimball which he has since deleted.

@tbg
Copy link
Member Author

tbg commented Aug 6, 2015

I'm ready to merge but going to wait for @bdarnell explaining the one comment I didn't understand.

tbg added 2 commits August 6, 2015 10:52
this passes all tests except for one (which I've disabled; maybe
@bdarnell has an idea?). Basically carried out a lot of the work announced in
RFC cockroachdb#1873:

- coordinator sends intents along with EndTransaction
- range resolves all it possibly can in the EndTransaction batch, and hands
  everything else off to an async resolver.
- "skipped intents" passed up by the range_command functions are now executed
  even if the command returns an error. This is required for EndTransaction
  which finds out that it's been aborted.

There are a lot of TODOs documented in the code, and I intend to address all
of those that are relevant for correctness. Some parts of the RFC are also
not yet implemented (gc'ing Txn entries), but the road should be very clear
and I'll add those before merging.

I did want to get this out early because it's a fairly complex refactoring
and will only get bigger from here.
... if they are "intent-less" operations and the batch
itself is non-transactional.

this allows sending InternalResolveIntent requests for
multiple transactions' intents in (*Replica).resolveIntents().
tbg added 2 commits August 6, 2015 11:07
this addresses most of the comments in `(*Replica).resolveIntents`.
tests for this functionality are next.
this fairly high-level test makes sure that only non-local intents are
dispatched for resolution via a new RPC. It also documents potential
for future optimizations.
also added a test for verifying range-local key ranges and edited
some comments.
tbg added a commit that referenced this pull request Aug 6, 2015
@tbg tbg merged commit 0c71a20 into cockroachdb:master Aug 6, 2015
@tbg tbg removed the PTAL label Aug 6, 2015
@tbg tbg deleted the intents_range branch August 6, 2015 15:25
if !r.rm.Stopper().RunAsyncTask(action) {
// As with local intents, try async to not keep the caller waiting, but
// when draining just go ahead and do it synchronously. See #1684.
action()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line appears to have resulted in a deadlock: https://circleci.com/gh/cockroachdb/cockroach/6250

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not. #2206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants