Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead-Letter message on permanent failure #2087

Closed
sadgit opened this issue Jan 16, 2019 · 10 comments
Closed

Dead-Letter message on permanent failure #2087

sadgit opened this issue Jan 16, 2019 · 10 comments
Assignees
Labels

Comments

@sadgit
Copy link

sadgit commented Jan 16, 2019

As a Function Developer
In order to monitor failures
I need to exit a function with different exit codes
and to control whether a failing message is retried or dead-lettered

Expected behavior

Permanent Failure
Given V.2 Service Bus Trigger
When the client function throws an permanent execution error
Then the message should be dead-lettered

Actual behavior

Given V.2 Service Bus Trigger
When the client function throws any exception
Then the function host is restarted
And the telemetry is not sent to AppInsights

Known workarounds

No known workarounds.

@fabiocav
Copy link
Member

fabiocav commented Jan 23, 2019

/cc @mathewc - some of the details here don't look accurate, can you comment and provide some guidance?

@sadgit
Copy link
Author

sadgit commented Jan 23, 2019

@fabiocav - this issue arises from the same fault as both #2085 and #2086 - but it remains a distinct issue that can be fixed separately and distinctly from the other two issues. The 3 issues have been very carefully worded to distinguish them from each other although they are very closely related.
Each of these issues is a blocker to our adoption of the V2 runtime.

@mathewc
Copy link
Member

mathewc commented Jan 23, 2019

By default, the message lifetime handling you get out of the box is as follows:

  • if your function throws an exception, the message will be retried up to the max delivery count you have configured on your SB queue. After that the message is DeadLettered
  • if your function succeeds, the message is completed automatically

See this issue for details on handling message Completion/DeadLetter yourself. Basically you can set AutoComplete to false in your ServiceBus host.json config (serviceBus.messageHandlerOptions.autoComplete) and handle the message state transitions yourself in your function (see code examples in linked issue).

Regarding your comment "When the client function throws any exception
Then the function host is restarted" - I don't understand what you mean. If a single function invocation fails the host doesn't restart - just that invocation fails. Can you provide more details here?

Let me know if this addresses the issue for you. If not we can reopen.

@mathewc mathewc closed this as completed Jan 23, 2019
@StingyJack
Copy link

@mathewc - an azure webjob as a console application will restart the console application if an exception is thrown and unhanded.

This makes sense, except that throwing an unhanded exception is the only way to have the webjob invocation report failure.

This pattern breaks Application Insights ability to capture snapshot debugging details. The AppInsights team didn't seem to have a way around the process exiting before the snapshot completed transferring.

What is needed is either changing the return type from void to something meaningful, or permit setting a property or calling some function to signal invocation failure.

@mathewc
Copy link
Member

mathewc commented Mar 12, 2019

See my comment above "If a single function invocation fails the host doesn't restart - just that invocation fails.". You can see this yourself by running your WebJob locally and throwing an exception from your job method. If you're not seeing such behavior, please provide a repro.

@sadgit
Copy link
Author

sadgit commented Mar 12, 2019

@mathewc - this may be the intended behaviour - however we have experienced a full restart under some circumstances. I have attempted to reproduce the conditions in isolation.
Some of my functions perform parallel commands - maybe if an exception is thrown on another thread the restart occurs.
It is particularly difficult to trace because the exception is not logged on AppInsights and neither is the restart request.
I will continue to attempt to isolate the precise cause but this very time consuming and not on my critical path.

@StingyJack
Copy link

StingyJack commented Mar 13, 2019

@mathewc - webjob invocation, not function. Invocation by queue or timer seems to terminate and restart the console app every time. This breaks app insights ability to capture snapshot debugging.

I went through this with them (app insights team) via email and so I'm here asking about the sdk to see if there is a way to report invocation failure without restart of the console app.

@fabiocav
Copy link
Member

@sadgit the issue as originally stated does not reflect the actual behavior, as @mathewc requested, can you please demonstrate the problem with a repro so we can look at this issue?

@StingyJack
Copy link

StingyJack commented Mar 14, 2019

@fabiocav - I had this happen with a timer triggered continuous webjob (same sdk/repo, same behavior) I was using to test app insights snapshot debugging.

Make a .net framework webjob, add the timer trigger to the function so it fires every minute. In the function throw an exception when the current minute is even. To see how this affects app insights, wrap the function body in a try/catch and call the telemetry clients capture method for exceptions (not in front of a PC for a few days, sorry for lack of syntax), then rethrow the ex so the invocation can report failure correctly.

Without swallowing the ex in the function, the telemetry client doesn't have enough time to capture and transfer the exception and snapshot debugging info before the process exiting.

The same behavior should be present in an Azure function, at least it was when I was trying to (unsuccessfully, see the end of Azure/azure-functions-host#911) use them as webjob replacement a few weeks ago.

@sadgit
Copy link
Author

sadgit commented Mar 14, 2019

@fabiocav - I cannot reveal the code that exhibited the issue and have tried to replicate it in a more simple repository.
The simple version does not behave the same way as the more complicated production code. I have not worked out what the factor is that changes the context sufficiently to break the intended handling.
It is especially difficult to diagnose because there is no AppInsights record of the host restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants