Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event Hub-triggered function *always* checkpoints -- I want to control checkpointing #947

Closed
vfab opened this issue Sep 11, 2018 · 8 comments
Assignees
Milestone

Comments

@vfab
Copy link

vfab commented Sep 11, 2018

I've already posted this on Azure Advisors but no response from Microsoft there. Retrying here...

This document (https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs) states plainly that at the end of execution of an Event Hub-triggered function, the function will checkpoint whether there was an error or not. (I believe WebJobs have the same behavior.) Unfortunately this doesn't give us enough control. Perhaps there was a throttling error, or some other condition that means the messages can't be processed successfully. In such cases I would like to be able to tell the EventProcessorHost not to checkpoint.
I consider being able to control checkpointing as a must-have feature if you're processing events from Event Hubs (or IoT Hub). Without this control, Event Hub-triggered functions (and WebJobs) are not sufficiently reliable, because it's obvious that you can lose messages in error scenarios. Cloud-native apps are supposed to handle failures gracefully, but that's not the case here.

Just to be completely sure about this, I wrote a function to test this out. It reads in batches of messages and throws an exception at the end. Upon launching the function, it happily reads through all of the messages in the Event Hub, checkpointing every batch. So I'm quite confident that the EventProcessorHost DOES checkpoint even if there is an exception.

So is there a way to control checkpointing currently? (And I don't mean messing with checkpoint blobs.) I don't think there is, and if that's truly the case, then is it possible that you could add a feature to tell the EventProcessorHost whether or not to checkpoint?

Thanks

@juliekoubova
Copy link

bump can anyone respond please? Is this something that's on your roadmap? @alrod @alexkarcher-msft @brettsam Thank you!

@onpaj
Copy link

onpaj commented Jan 14, 2019

Same problem here!

@eamonoreilly eamonoreilly added this to the Triaged milestone Feb 21, 2019
@jeffhollan
Copy link
Contributor

Being tracked here

Azure/azure-webjobs-sdk#1597

@jeffhollan
Copy link
Contributor

FYI we have a design proposal out now if you want to review: https://github.com/jeffhollan/retry-design

@Lybecker
Copy link

@jeffhollan the design proposal does not seem to solve another problem - not this one.

The feature I need, is to be able to re-read the 10 mins latest messages over and over again. So I don't need checkpointing at all. This is possible with the legacy Microsoft.Azure.EventHubs

@jeffhollan
Copy link
Contributor

Very interesting. Likely worth opening another issue for that. I know we have issues around being able to replay (e.g. move checkpoints to some point in time), but being able to do it over-and-over again I’m not sure I’ve heard that one or if the way we are thinking about moving checkpoints would work.

@vfab
Copy link
Author

vfab commented May 1, 2020

Jeff, I think this is a good solution. Thanks! Really appreciate the effort.

@Fabiest
Copy link

Fabiest commented May 20, 2020

Jeff, would it not be worth to add the circuit break strategy to your proposal? It would be optional and would work similar to your scenario 3 but with a difference:

Scenario circuit-breaker: The execution occurs, an exception is thrown, there is a retry policy defined in host.json. The execution will be marked as failed. The stream WILL NOT continue on with the checkpoint. The retry policy will be honored. After the final retry has happened (if an upper limit was given and circuit-break strategy is defined) function is stopped and an alert is generated.

Not sure if this is something that can be achieved today. Probably the alert could be something outside of the function responsibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants