Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CompleteableFuture compose to centralize commit logic #978

Closed
keith-turner opened this issue Dec 13, 2017 · 18 comments
Closed

Use CompleteableFuture compose to centralize commit logic #978

keith-turner opened this issue Dec 13, 2017 · 18 comments
Milestone

Comments

@keith-turner
Copy link
Contributor

Opening this issue for follow on work identified in the comments of #975 and #722.

@keith-turner
Copy link
Contributor Author

keith-turner commented Dec 13, 2017

I created a Gist containing some local experimentation I did. I am not in anyway advocating for this design, I am just experimenting with the high level decomposition and sharing the result for discussion. I am new to using thenCompose and was just trying it out.

@keith-turner
Copy link
Contributor Author

This is the commit code as it existed before being converted to an async process.

public synchronized void commit() throws CommitException {

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 14, 2017

Just wanted to also copy some relevant comments from #722 by @jwonders, to avoid having to flip back and forth.

Assuming the creation of a data class, maybe called AsyncCommitResult, that can represent the various outcomes of the async commit, something like the following would be an alternative.

In the end, the chaining might look similar to what @jkosh44 is suggesting except there would not be a need for the CompletableFutureTask or the addCallback method because the thenComposeAsync method already implements the desired behavior. There would be a bit less indirection as well.

private CompletionStage<AsyncCommitResult> deleteLocks(CommitData cd, final long commitTs) {
   ArrayList<Mutation> mutations = new ArrayList<>(updates.size() + 1);
   // snip ---
   // start the finish commit task after the delete locks task completes successfully
   // in this case the finish commit task does not make a decision based on the result
   return env.getSharedResources().getBatchWriter()
           .writeMutationsAsyncCF(mutations)
           .thenComposeAsync(x -> finishCommit(cd, commitTs), executor());
}

private CompletionStage<AsyncCommitResult> finishCommit(CommitData cd, long commitTs) {
   ArrayList<Mutation> afterFlushMutations = new ArrayList<>(2);
   // snip ---
   return env.getSharedResources().getBatchWriter()
           .writeMutationsAsyncCF(afterFlushMutations)
           .thenApply(x -> AsyncCommitResult.committed());
}

For the fail-fast cases, it is reasonable to return a completion stage that is immediately failed, and for the synchronous tasks, it is reasonable to just perform them synchronously and then return an already-completed future.

And

The purpose of AsyncCommitResult is to capture the result of the async operation. Terminal steps would be able to create a concrete value of this result class because they are aware of successful completion or errors. Intermediate steps would return a future based on additional composed async steps.

In the code I had posted, the finishCommit method is a terminal step that returns the committed() status upon successful completion. The deleteLocks is an intermediate step that does its work then composes the result with the finishCommit step. Some steps might return immediately upon exceptional conditions and perform composition on normal conditions.

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 14, 2017

@keith-turner I've been trying to think of a way to synthesize @jwonders idea of an AysncCommitResult and your idea of having one main commit method to handle all the logic.

Essentially AsyncCommitResult would have any info that would be needed by any of the helper method parameters. Each helper method would return a CompletableFuture<AsyncCommitResult>. Then the main commit method would chain all of those helper methods together using .thenCommitAsync().

Ultimately the last method, finishCommit could return a CompletableFuture<void> or even a CompletableFuture<boolean> since it doesn't need to pass any info onto the next method.

So using the two methods that @jwonders used above it would look something like this:

@Override
public synchronized void commitAsync(AsyncCommitObserver commitCallback) {
  checkIfOpen();
  status = TxStatus.COMMIT_STARTED;
  commitAttempted = true;
  CommitData cd = createCommitData();

 //The line below would be a CompletableFuture<boolean> if we decide on a boolean 
 CompletableFuture<void> res = beginCommitAsync(cd, commitCallback, null)
      .thenComposeAsync(....)
      ...
      .thenComposeAsync(acr -> deleteLocks(acr.getCommitData(), acr.getCommitTs())
      .thenComposeAsync(acr -> finishCommit(acr.getCommitData(), acr.getCommitTs())
      .exceptionally(e ->commitCallback.failed(e));
}

...

private CompletionStage<AsyncCommitResult> deleteLocks(CommitData cd, final long commitTs) {
   ArrayList<Mutation> mutations = new ArrayList<>(updates.size() + 1);
   // snip ---
   // start the finish commit task after the delete locks task completes successfully
   // in this case the finish commit task does not make a decision based on the result
   return env.getSharedResources().getBatchWriter()
           .writeMutationsAsyncCF(mutations)
           .thenComposeAsync(x -> {
               AsyncCommitResult res = new AsyncCommitResult;
               res.setCommitData(cd);
               res.setCommitTs(commitTs);
               return res;
            });
}

private CompletionStage<void> finishCommit(CommitData cd, long commitTs) {
   ArrayList<Mutation> afterFlushMutations = new ArrayList<>(2);
   // snip ---
  cd.commitObserver.committed();
  //For the line below we'd have to use indirection again or if we go with a boolean just complete it with true
  return CompletableFuture.completedFuture(null)
}

My two questions/issues are:

  1. Is the original commitCallback the same as any subsequent cd.commitObservers. I.e. will that exceptionally call accurately log any failures along the way.
  2. I don't really like how each helper method creates it's own AsyncCommitResult I'd rather figure someway to just pass the same one between methods.

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 14, 2017

AsyncCommitResult would look like

public class AsyncCommitResult {
  private CommitData cd;
  private ConditionalMutation pcm;
  private Result result;
  private Status mutationStatus;
  private Iterator<Result> results;
  private Stamp stamp;
  private long commitTs;
  private ConditionalMutation delLockMutation;

  //A lot of getters and setters
}

After typing it out though it doesn't look very clean, any thoughts?

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 14, 2017

I was going to try and see how this worked in actual code and I noticed that this would not cover the case where we commit and return early. We could add a boolean committed to AsyncCommitResult and at the start of each thenCompose call check if(!acr.getCommitted). Whenever we commit early we just return a completed future with an acr setting commit to true.

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 14, 2017

If anyone's interested I started to try and see how this would look/work in code. TransactionImpl and AsyncCommitResult. It's still pretty messy, but I've already been thinking of ways to clean it up. I didn't open a pull request because it doesn't compile, I have one error left that I need to take care of. So I still don't really know if it works but I just wanted to show an idea of what I was talking about.

Some cleanup ideas:

  • Add an AsyncCommitResult as a param to the helper methods to avoid recreating one at every step
  • Move if(!acr.isFinished()) logic into the helper methods themselves
  • Is it necessary for the acr to have a CommitData field, or is it the same cd object through every call?

@jwonders
Copy link

@jkosh44 I'm interested in looking at it. I probably can't get to it until this weekend rthough.

Hopefully the AsyncCommitResult doesn't need to pass much intermediate information between steps. There is probably a balance between doing everything in-line with thenCompose(...) and keeping some of the current structure to make the short-circuiting cleaner and allow for capturing data in the surrounding scope. The direction I would approach this from was starting with the same structure, introducing thenCompose(...) and then trying to refactor to achieve the understandability @keith-turner is looking for.

@keith-turner
Copy link
Contributor Author

I have been looking at your changes, I am not sure the AsyncCommitResult path is improving the readability of the code. I have been thinking about this and experimented with another high level design. This design has CommitSteps can be chained together allowing someone to easily follow the overall commit logic. An example of the commit step is the LockPrimaryStep which does a conditional update.

There are a few things I like about this pattern.

  • The code for a commit step is now all together in a commit step class. Currently functions contain the code to finish one commit step and start another. So currently a single function contains code for two different commit steps usually.
  • Common functionality is pushed up to super classes. For example the code for conditional mutation operations are simple functions that are easier to understand with more complex async stuff in a super class.
  • The composition process creates futures as needed, not creating them unless that path is taken.

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 18, 2017

@keith-turner I haven't had a chance to look at the code in depth but it seems to achieve everything you were looking for.

In terms of exception handling for CommitStep, I thin we can append compose with .exceptionally(e -> getFailureOp(cd)) or use handleAsync like addCallback above it.

For all the TODOs saying something along the lines of "do async" what do you think about just surrounding them with CompletableFuture.runAsync()`?

@keith-turner
Copy link
Contributor Author

In terms of exception handling for CommitStep, I thin we can append compose with .exceptionally(e -> getFailureOp(cd)) or use handleAsync like addCallback above it.

I think either of those approaches would work. However I am still not very familiar with .exceptionally(), so its hard for me to make a call on it. I need to experiment with it and get some hands on experience.

For all the TODOs saying something along the lines of "do async" what do you think about just surrounding them with CompletableFuture.runAsync()`?

I think that would work in some cases.

If you think the approach I proposed is good, feel free to copy it or branch it. One thing I realized about this approach is that it makes the normal path easy to follow in the code. However the error path is still spread out in the code.

@jwonders
Copy link

It took me a little while to work my way through the transaction implementation. I made this diagram to help keep things straight in my head. I think it is pretty accurate although I didn't pay as close of attention to some of the error handling branches.

transaction_impl

The main thing to note is that some of the stages require a decision between more than two branches upon completion. The CompletableFuture naturally provides two branches, the normal one and the exceptional one. And, the methods like thenDoSomething(...) are most natural when there are two clearly separable paths to go down. I think this is why adding the notion of a CommitStep is making the code clearer as it handles the pattern matching and delegation to multiple branches. It is also why the normal path being easy to follow leads to the error path being hidden. I like the idea of the normal path being very easy to follow as this is the essence of what needs to happen to perform a transaction.

@keith-turner
Copy link
Contributor Author

keith-turner commented Dec 19, 2017

@jwonders the flow chart is really awesome! What did you use to create it? Would you be willing to contribute it? I think we could use that diagram to create a blog post explaining the commit process. The blog post could have short explanations of the steps and links to the code.

I like the idea of the normal path being very easy to follow as this is the essence of what needs to happen to perform a transaction.

I agree with this. I can't think of a way to represent the error paths at the top level without making the code hard to follow at the top level.

@jwonders
Copy link

I used gliffy. Definitely willing to contribute it. Here is a gist of the json.

https://gist.github.com/jwonders/f71425e6495e08d610ad69ff49735742

@jkosh44
Copy link
Contributor

jkosh44 commented Dec 26, 2017

Hey, I haven't really been able to work on this because of the holidays/holiday preparation. I hope to start up on it again soon though.

@jkosh44
Copy link
Contributor

jkosh44 commented Jan 14, 2018

Hey sorry, so I got really distracted and took a bit of a hiatus. But I took a look at it this weekend and went through a first pass at filling it in, TransactionImpl. It's currently failing a lot of tests but surprisingly passing a lot too. I'm going to look at it some more tomorrow, but I just wanted to post this since I haven't posted in awhile.

@jkosh44
Copy link
Contributor

jkosh44 commented Jan 16, 2018

Ok so I created a pull request, #1001. It's not passing every test but it only seems to be failing 3 tests.Those three test are all from FailureIT and it's because TransactionImpl.commitPrimaryColumn(CommitData cd, Stamp commitStamp) isn't returning false when it's supposed to. I'm pretty sure it's how I implemented exception handling, if a CommitException is thrown it's somehow not propagated back up to commitPrimaryColumn. I'm going to look at it some more but I though maybe a fresh set of eyes would be helpful.

In general the way I dealt with the Visible For Testing methods were as follows. I would create a first step as whatever intermediate step that testing method was starting at, fill in the rest of the steps using andThen, and then call compose. In the appropriate step classes themselves in getMainOp i would check if stopAfterPrimaryCommit or stopAfterPreCommit were true and if so I would return false. Then in getFailureOp if stopAfterPrimaryCommit or stopAfter'PreCommit were true I'd just call cd.commitObserver.committed(); It's not super clean in that Essentially I'd be saying the operation failed if those test values were true even though technically they didn't fail. I wasn't sure of a cleaner solution but though this was a good first approach.

Edit: right after typing this I just thought of a cleaner way. We wouldn't even need the two booleans if while filling in the next steps using andThen we stopped at the step we wanted to stop at. So for commitPrimaryColumn instead of

GetCommitStampStepTest firstStep = new GetCommitStampStepTest();
      firstStep.setTestStamp(commitStamp);

      firstStep.andThen(new WriteNotificationsStep()).andThen(new CommitPrimaryStep())
          .andThen(new DeleteLocksStep()).andThen(new FinishCommitStep());

      firstStep.compose(cd);

we'd have

GetCommitStampStepTest firstStep = new GetCommitStampStepTest();
      firstStep.setTestStamp(commitStamp);

      firstStep.andThen(new WriteNotificationsStep()).andThen(new CommitPrimaryStep());

      firstStep.compose(cd).thenAccept(v -> cd.commitObserver.committed());

@jkosh44
Copy link
Contributor

jkosh44 commented Jan 16, 2018

Well that change seemed to have killed two birds with one stone, all tests pass now. I still want to refactor preCommit(CommitData cd, RowColumn primary) so it no longer uses stopAfterPreCommit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants