New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against exceptions in Learning transport that can happen due to competing consumer starting #5299
Conversation
src/NServiceBus.Core/Transports/Learning/DirectoryBasedTransaction.cs
Outdated
Show resolved
Hide resolved
} | ||
catch (IOException e) | ||
{ | ||
log.Debug($"Unable to delete pending transaction directory '{pendingDir.FullName}'.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add retries similar to https://github.com/Particular/NServiceBus/blob/develop/src/NServiceBus.Core/Transports/Learning/AsyncDirectory.cs ?
@andreasohlund I've moved the pending transaction recovery to a background process. If there are any errors the hope is those will be fixed on the next run. @danielmarbach I've tested this version with MonitoringDemo and could not break it. |
6a9c6d2
to
54a02a5
Compare
I've added two more changes to handle exceptions observed while testing:
After those changes I could not get the monitoring demo to crash. |
src/NServiceBus.Core/Transports/Learning/LearningTransportMessagePump.cs
Outdated
Show resolved
Hide resolved
11f40a6
to
94c071c
Compare
I patched the monitoring demo with this latest version and let it run with high throughput mode with scaleout sales endpoint for over an hour. Seems to be fine and we are no longer running into the problem that the endpoints get stopped to due CriticalError.Raise and the competing recover problems are also gone |
Changes LGTM, but we still need to figure out what branch this needs to go in. |
I've rebased the branch and opened #5309 to have this go against master instead, so I'm going to close this. |
While migrating the monitoring demo to the learning transport we found that in some scenarios with competing consumers the endpoint might fail to start with the following exception
After doing short investigation it looks that
LearningTransport
does not consider competing consumers during start-up when it cleans the unfinished transactions. What is probably happening is that he running endpoint has an opened transaction that is consider by new starting instance as unfished pending. In such case the new one tries to clean the transaction and delete message folder which fails as it is locked by other instance.Particular/MonitoringDemo#51 (comment)