Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against exceptions in Learning transport that can happen due to competing consumer starting #5299

Closed
wants to merge 7 commits into from

Conversation

danielmarbach
Copy link
Contributor

While migrating the monitoring demo to the learning transport we found that in some scenarios with competing consumers the endpoint might fail to start with the following exception

Unhandled Exception: System.IO.IOException: The process cannot access the file '432e0ce9-851b-46cb-b156-16d390325ff9.metadata.out' because it is being used by another process.
  at System.IO.Directory.DeleteHelper(String fullPath, String userPath, Boolean recursive, Boolean throwOnTopLevelDirectoryNotFound, WIN32_FIND_DATA& data)
  at System.IO.Directory.Delete(String fullPath, String userPath, Boolean recursive, Boolean checkHost)
  at NServiceBus.DirectoryBasedTransaction.RecoverPartiallyCompletedTransactions(String basePath, String pendingDirName, String committedDirName)
  at NServiceBus.LearningTransportMessagePump.Start(PushRuntimeSettings limitations)
  at NServiceBus.TransportReceiver.Start()
  at NServiceBus.ReceiveComponent.Start()
  at NServiceBus.StartableEndpoint.<Start>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
  at NServiceBus.Endpoint.<Start>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
  at Sales.Program.<Main>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
  at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
  at Sales.Program.<Main>(String[] args)

After doing short investigation it looks that LearningTransport does not consider competing consumers during start-up when it cleans the unfinished transactions. What is probably happening is that he running endpoint has an opened transaction that is consider by new starting instance as unfished pending. In such case the new one tries to clean the transaction and delete message folder which fails as it is locked by other instance.

Particular/MonitoringDemo#51 (comment)

@danielmarbach danielmarbach requested a review from a team November 27, 2018 11:36
}
catch (IOException e)
{
log.Debug($"Unable to delete pending transaction directory '{pendingDir.FullName}'.", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmarbach danielmarbach changed the title Guard against IOExceptions that can happen due to competing consumer starting WIP Guard against IOExceptions that can happen due to competing consumer starting Nov 27, 2018
@tmasternak
Copy link
Member

@andreasohlund I've moved the pending transaction recovery to a background process. If there are any errors the hope is those will be fixed on the next run.

@danielmarbach I've tested this version with MonitoringDemo and could not break it.

@tmasternak
Copy link
Member

I've added two more changes to handle exceptions observed while testing:

  • IOException handling is not enough. Switched to Exception instead. Here is a sample of UnauthroizedAccessException being thrown
Unhandled Exception: System.UnauthorizedAccessException: Access to the path is denied.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalMove(String sourceFileName, String destFileName, Boolean checkHost)
   at NServiceBus.DirectoryBasedTransaction.RecoverCommitted()
   at NServiceBus.DirectoryBasedTransaction.RecoverPartiallyCompletedTransactions(String basePath, String pendingDirName, String committedDirName)
   at NServiceBus.LearningTransportMessagePump.RecoverPendingTransactions()
   at NServiceBus.LearningTransportMessagePump.Start(PushRuntimeSettings limitations)
   at NServiceBus.TransportReceiver.Start()
   at NServiceBus.ReceiveComponent.Start()
   at NServiceBus.StartableEndpoint.<Start>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at NServiceBus.Endpoint.<Start>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sales.Program.<Main>d__0.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Sales.Program.<Main>(String[] args)
  • Commit recovery can crash on directory enumeration just as pending recovery so the try-catch has been extended.

After those changes I could not get the monitoring demo to crash.

@danielmarbach danielmarbach changed the title WIP Guard against IOExceptions that can happen due to competing consumer starting Guard against IOExceptions that can happen due to competing consumer starting Nov 28, 2018
@bording bording changed the title Guard against IOExceptions that can happen due to competing consumer starting Guard against exceptions that can happen due to competing consumer starting Nov 28, 2018
@bording bording changed the title Guard against exceptions that can happen due to competing consumer starting Guard against exceptions in Learning transport that can happen due to competing consumer starting Nov 28, 2018
@danielmarbach
Copy link
Contributor Author

I patched the monitoring demo with this latest version and let it run with high throughput mode with scaleout sales endpoint for over an hour. Seems to be fine and we are no longer running into the problem that the endpoints get stopped to due CriticalError.Raise and the competing recover problems are also gone

@bording
Copy link
Member

bording commented Dec 4, 2018

Changes LGTM, but we still need to figure out what branch this needs to go in.

@bording
Copy link
Member

bording commented Dec 5, 2018

I've rebased the branch and opened #5309 to have this go against master instead, so I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Sync Topics
Awaiting triage
Development

Successfully merging this pull request may close these issues.

None yet

4 participants