-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable service bus subscription (and potentially other errors?) will stop message processing in Dapr sidecar and needs pod restart to resume #1612
Comments
Transferring to correct repository. cc @berndverst @halspang. |
Hi again, I have recently been running However the net effect is the same, re-enabling the subscription does not make the sidecar start processing messages again, and I still need to restart the deployment to make the sidecar start processing the messages. |
|
As it is implemented today, this is the "correct" behavior and not a "bug" (as in, that's what the person who programmed this originally wanted to achieve - and I haven't modified that behavior). The reconnection that you see happening there is primarily meant to recover from things such as transient network failures. However, you're right that if the sidecar can't reconnect after 30 attempts, it currently just "gives up". Perhaps we should remove the maximum limit and just make an exponential back-off that retries until forever? At that point, a retry could happen even after 10 minutes, but at least it won't leave daprd in a weird state where nothing is happening. What do you think @yaron2 ? |
I think that's a good idea. It'd create a connectivity reconciliation loop between desired (connected) and current state. |
With exponential back-off configurable between min and max time. Fixes dapr#1612 Also includes fixes: - Binding: make sure it actually retries to connect forever - Binding: add delay (exponential backoff) before reconnecting - PubSub: better handling of failures such as topics disabled or other non-connection issues Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
I've opened #1783 which should fix this issue. |
* Make Service Bus attempt to reconnect forever in case of issues With exponential back-off configurable between min and max time. Fixes #1612 Also includes fixes: - Binding: make sure it actually retries to connect forever - Binding: add delay (exponential backoff) before reconnecting - PubSub: better handling of failures such as topics disabled or other non-connection issues Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com> * 💄 Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com> * Added warning for deprecated metadata options Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com> * 💄 Signed-off-by: ItalyPaleAle <43508+ItalyPaleAle@users.noreply.github.com>
@IvanILanekassen the fix should now be available in the nightly builds, should you want to try them out! Docker images with tag (We are also planning to have a release candidate hopefully early next week) |
In what area(s)?
/area runtime
What version of Dapr?
1.6.0
Expected Behavior
Intermittent service bus errors should not stop Dapr message processing
Actual Behavior
We discovered this issue when disabling the service bus subscription to stop message processing. We immediately get errors in the dapr side car on the form:
And after only 1 minute and 20 seconds we get this error:
After this point, no more messages are processed by the pod, and we need to restart the pod to the the processing going again.
Steps to Reproduce the Problem
Release Note
RELEASE NOTE: FIX Self healing from intermittent issues in Azure Service Bus
The text was updated successfully, but these errors were encountered: