New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Health check endpoints continues to report Healthy for Azure Service Bus when network connection drops #2804
Comments
This might be related to the way ASB reports transient exceptions, in that it doesn't actually fault the transport. I'm not entirely sure though, some details debug logs would be helpful (despite most transient errors being suppressed to avoid noisy logs). |
Only the following type of errors are logged with Debug level by MassTransit after the network connection drops:
The ready health endpoint continues to return Healthy as below for both the bus and endpoint:
|
@cmeeren Do you have any thoughts on this one? By no longer recycling the receiver when a "transient" error occurs, the health checks are misleading. |
I don't use health checks, so I have no opinions on that particular aspect. Of course, I'd prefer any rectification to not introduce regressions for fixed issues I have previously raised in this repo. "Recycling the receiver" etc. seems to me like an implementation detail. I don't know anything about that, just that I don't want to be spammed with warnings if MT correctly and successfully handles reconnection etc. |
Fair enough, I think in the scenario that a communication exception occurs, I'll recycle but not log it as an error (debug only). |
…sport, isTransient isn't really "correct" in this case.
Let me know if the new develop package with this fix resolves the issue. |
I'm going this, feel free to comment if you find the problem still exists. |
Apparently this is still an issue, and with the SDK change for v7.3.0, should try to reproduce and fix if possible. |
I put one more check into the receiver to try and recycle on failure, to detect the unhealthiness, surely the log should show something now. I'm going to need to check this eventually using some approach. |
Same issue is valid for me (as I've reported before too) |
@oguzhankahyaoglu are you running 7.3.1 and seeing the same result? The Azure SDK has a lot of transient error handling features, and it doesn't immediately surface the unavailability to applications right away. That said, there may be some way to pick up the loss of connection and report the unhealthy status, but I haven't had time to look at it yet. |
We were at 7.3.0 for the last 10 days approx and upgraded to 7.3.1 today, will definitely watch whether everything is working fine on production.
|
Well, that's what the receive transport does - if the consumer is disconnected from Azure, it should go into an unhealthy state. But in the event of transient issues, it doesn't immediately detect it. Not sure why, and like I stated above, I haven't been able to setup a scenario where I disconnect from the network to see it break. |
Research this, and the Azure SDK just never seems to report a connection failure as it retries under the hood regardless of the retry policy applied. So, at this point, I can't think of anything to do and will close this issue. |
Is this a bug report?
Yes
Can you also reproduce the problem with the latest version?
Yes (7.2.2)
Environment
1.Net Core 3.1
2.Windows 10
3.Visual Studio
Steps to Reproduce
(Write your steps here:)
Expected Behavior
At step 4 the health check endpoint returns status Unhealthy for the bus endpoints with exception detail and for the bus itself.
Exception details are included in the response if available.
Actual Behavior
At step 4 the health check endpoint result is unchanged though it is obvious that it can't connect to the bus anymore (no network connection).
The health check endpoint continues to return status Healthy for the bus endpoints and for the bus itself.
Reproducible Demo
I'll try to provide if needed.
The text was updated successfully, but these errors were encountered: