Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/doc request/feature request: automatic checkpointing behavior + batch prefetch #128

Open
bartelink opened this issue Jan 18, 2019 · 0 comments

Comments

@bartelink
Copy link

bartelink commented Jan 18, 2019

(Apologies, I know this is three issues in one and thats bad; I'm happy to edit this OP and separate out based on a quick triaging response)

In #124 @ealsur clarified the readme.md text:

After successful delivery notification, checkpoints last continuation token to its lease.

for me. I suggest rewording to

In automatic checkpointing mode (the default):

  • If ProcessChangesAsync completes successfully, checkpoints last continuation token to its lease.
  • If ProcessChangesAsync throws, restarts from the previous checkpoint (which should yield the same results plus and new changes since then)

Questions this raises:

  1. is there a separated delay on exception backoff, is it the same as the delay between polls when you've caught up, or is it zero (which could make a big mess if something gets stuck in my processor)
  2. is the marking progress async wrt the next fetch ? (I'd want to hand tune to take out the roundtrip impeding the next fetch if it wasn't)
  3. following from 2 - is there a way to get the next fetch to take place in parallel with my processing but still have the same semantics wrt exception and success triggering a checkpoint ? it would seem that it would be a very useful to have a bool EnablePreFetchNextBatch which when true implies
    • success from processchangesasync can a) kick off the async checkpointing b) immediately present the prefetched batch to me without me waiting
    • failure will [unfortunately] discard the prefetched batch (maybe the prefetching should only apply if the last process changes succeeded?)

For the preemptive loading, the use case I have in mind is that I'm feeding oneward to kafka - I

  1. read what can be a large batch
  2. decode/unroll (which can take e.g. 1s)
  3. supply to kafka driver
  4. await the write (I need to know the write succeeded before I attempt to send the next batch due to at least one semantics in Question: Can changes be presented multiple times or missed? #124 meaning I could end up with a projected event preceding a later one)
  5. ... am now instantly ready to go to step 2

WAIT...

  1. I'm OK with the answer to no 3 being "no that's too much, that's a case for manual checkpointing" as the optimal flow is more like:

a. read what can be a large batch
b. (trigger next fetch, followed by async decode like in next step)
c. decode/unroll (which can take e.g. 1s)
d. supply to kafka driver
e. await the write
f. (trigger the checkpointing)
g. (trigger next fetch + decode like in step b)
h. jump to step d with pre-decoded results from step b/c/g

i.e. the decode/unroll of a batch really should overlap with waiting for the one I'm awaiting the transmission onward of

However I can't see how to achieve this with the API as it stands...

  • to trigger the next fetch, I have to leave the callback with success
  • if I turn off autocheckpointing, how do I trigger retries like I can with throwing from the exception exit?

So I think my proposal/request is for me to have a way to do accomplish:

  • be able to trigger a prefetch of next batch (probably via config, but as a method on the context or batch would be OK)
  • be able to signal that this batch has been processed (MarkProgressAsync=true)
  • be able to say "only actually mark it every 4 times, I'm running small batches for latency" (MarkProgressSkipBatching = 4)
  • be able to say "whenever you have a batch thats just been prefetched, but the previous one's processing is still in ProcessChangesAsync", let me start preprocessing (e.g. let me write a async Task PreProcessBatchAsync(ctx,items) { ctx.Properties["decoded"] = await Preprocess(items); } which precedes ProcessChangesAsync in the order of processing, but the next one can overlap with it)

If any of this makes sense as a general facility to have in CPF (I think it does of course!), I'm happy to contribute (either prod quality or POC spike level) code if it helps, as having it as a first class concept in our library simplifies the task that my library addresses across multiple teams in my company.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant