Issues after Orleans 3.0 upgrade running on a Service Fabric Cluster #6113

thakursagar · 2019-11-11T17:41:30Z

I recently upgraded to Orleans 3.0 nuget packages on our application which is deployed on a Service Fabric Cluster. After the upgrade when we run our benchmark tests for performance, I am seeing a lot of exceptions being thrown:
It starts from

Microsoft.Azure.Cosmos.Table.StorageException at Orleans.Clustering.AzureStorage.AzureTableDataManager`1+<>c__DisplayClass28_0+<<ReadTableEntriesAndEtagsAsync>b__0>d.MoveNext
--

Intermediate issue reading Azure storage table OrleansSiloInstances: IsRetriable=False HTTP status code=NotFound REST status code=TableNotFound Exception Type=Microsoft.Azure.Cosmos.Table.StorageException Message='Not Found'
--

Then after a few minutes I see

Orleans.Runtime.SiloUnavailableException
--
The target silo became unavailable for message: NewPlacement Request...

and


Orleans.Runtime.OrleansMessageRejectionException
--

Target S10.2.69.8:20015:311184127 silo is known to be dead

types of messages. I am using the Azure Table Storage Clustering for both the client and server.

The text was updated successfully, but these errors were encountered:

thakursagar · 2019-11-11T20:56:46Z

Update: I noticed I was using the old WindowsAzure.Storage nuget package. So I removed the old WindowsAzure.Storage nuget package and installed the Microsoft.Azure.Storage.* packages for Blob, Queue and Common. That seems to resolve the Microsoft.Azure.Cosmos.Table.StorageException and I am no longer seeing it. However, I now see:

[{"severityLevel":"Warning","parsedStack":[{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<ConnectAsync>d__20.MoveNext","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":1,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":2,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<GetConnectionAsync>d__15.MoveNext","level":3,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":4,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":5,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Messaging.ClientMessageCenter+<<GetGatewayConnection>g__ConnectAsync|37_1>d.MoveNext","level":6,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":7,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":8,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Internal.OrleansTaskExtentions+<<ToTypedTask>g__ConvertAsync|4_0>d`1.MoveNext","level":9,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":10,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":11,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.OutsideRuntimeClient+<RefreshGrainTypeResolver>d__56.MoveNext","level":12,"line":0}],"outerId":"0","message":"Unable to connect to endpoint S10.2.69.10:20011:0. See InnerException","type":"Orleans.Runtime.Messaging.ConnectionFailedException","id":"3729534"},{"severityLevel":"Warning","parsedStack":[{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":1,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Internal.OrleansTaskExtentions+<MakeCancellable>d__25`1.MoveNext","level":2,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":3,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":4,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":5,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<ConnectAsync>d__20.MoveNext","level":6,"line":0}],"outerId":"3729534","message":"A task was canceled.","type":"System.Threading.Tasks.TaskCanceledException","id":"18829747"}]

which ultimately results approximately 20 minutes after in a ton of these SiloUnavailableExceptions:

[{"severityLevel":"Error","parsedStack":[{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":1,"line":0},{"assembly":"Cti.Reg.Apex.Grains, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null","method":"","level":2,"line":766,"fileName":""}],"outerId":"0","message":"The target silo became unavailable for message: Request S10.2.69.10:20015:311200119*grn/BE5B12F0/00000000 #2742723: . Target History is: <S10.2.69.7:20017:311200117:*grn/E2326C08/00000000>. See https://aka.ms/orleans-troubleshooting for troubleshooting help.","type":"Orleans.Runtime.SiloUnavailableException","id":"25588762"}]

sergeybykov · 2019-11-13T21:23:32Z

What clustering provider are you using?

thakursagar · 2019-11-13T21:33:41Z

@sergeybykov I'm using Azure Table storage as the clustering provider.

sergeybykov · 2019-11-13T21:40:15Z

Are you performing an in-place upgrade or a new deployment? If the latter, did you use a new cluster ID for it?

Have you looked at the cluster membership table? It should show live state of the cluster.

thakursagar · 2019-11-13T23:34:46Z

@sergeybykov New deployment. We delete the two Orleans Tables from Azure Tables (SiloInstances and Reminders) everytime before we deploy the new code package to Service Fabric.

I looked at the cluster membership table and it looks like most of the silos were dead (more than 1 time) and then came back up.

sergeybykov · 2019-11-14T19:23:44Z

Before you start the benchmark, do you see all silos started and joined to the cluster successfully? If the table is fresh, you should not see any dead silo entries at that point.

thakursagar · 2019-11-14T19:35:10Z

Yes. Before kicking off the benchmark runs, I made sure all the silos started and are in a status of "Active" in the SiloInstances tables. After a few minutes when I start the run, I see a lot of exceptions thrown from the Orleans.Core library indicating connection issues with reading the SiloInstances tables (which I have pasted above in my second comment on this post) followed by lots of SiloUnavailableExceptions. Then after some time, the silo seems to come back up and the benchmark run continues and then after some time I see the same behavior again. I feel something is not working well with the new Microsoft.Azure.Storage.* packages because when I downgrade Orleans to 2.4.3 which has a dependency to WindowsAzure.Storage, everything seems to be working well and I don't see any such issues. With the 3.0 version, I have to uninstall the WindowsAzure.Storage package as Orleans seems to have dependency on the new Microsoft.Azure.Storage.* packages.

I am also trying to replace the Azure Table Storage Clustering Membership with SQL clustering right now to rule out any issue with these new Microsoft.Azure.Storage.* packages or Azure Table Storage Clustering.

sergeybykov · 2019-11-14T19:48:23Z

After a few minutes when I start the run, I see a lot of exceptions thrown from the Orleans.Core library indicating connection issues with reading the SiloInstances tables (which I have pasted above in my second comment on this post)

Do you mean this?

Intermediate issue reading Azure storage table OrleansSiloInstances: IsRetriable=False HTTP status code=NotFound REST status code=TableNotFound Exception Type=Microsoft.Azure.Cosmos.Table.StorageException Message='Not Found'

TableNotFound may be caused by you deleting the table. There was a known Azure Table behavior when a previously deleted table would be unavailable for several minutes afterwards. I don't know how/if that changed with the transition to the new library. I suggest instead of deleting the table, use a new cluster ID for each deployment. That's the recommended process.

Note that unavailability of the cluster membership table after a cluster successfully started has no impact on operation of the cluster unless silos leave or join it. This makes me think that the issue with running the benchmark might be somewhere else.

thakursagar · 2019-11-14T20:00:52Z

No I meant this. That one was fixed after I uninstalled the WindowsAzure.Storage packages and installed the Microsoft.Azure.Storage.* packages.

[{"severityLevel":"Warning","parsedStack":[{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<ConnectAsync>d__20.MoveNext","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":1,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":2,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<GetConnectionAsync>d__15.MoveNext","level":3,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":4,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":5,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Messaging.ClientMessageCenter+<<GetGatewayConnection>g__ConnectAsync|37_1>d.MoveNext","level":6,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":7,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":8,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Internal.OrleansTaskExtentions+<<ToTypedTask>g__ConvertAsync|4_0>d`1.MoveNext","level":9,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":10,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":11,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.OutsideRuntimeClient+<RefreshGrainTypeResolver>d__56.MoveNext","level":12,"line":0}],"outerId":"0","message":"Unable to connect to endpoint S10.2.69.10:20011:0. See InnerException","type":"Orleans.Runtime.Messaging.ConnectionFailedException","id":"3729534"},{"severityLevel":"Warning","parsedStack":[{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":1,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Internal.OrleansTaskExtentions+<MakeCancellable>d__25`1.MoveNext","level":2,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":3,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":4,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":5,"line":0},{"assembly":"Orleans.Core, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Orleans.Runtime.Messaging.ConnectionManager+<ConnectAsync>d__20.MoveNext","level":6,"line":0}],"outerId":"3729534","message":"A task was canceled.","type":"System.Threading.Tasks.TaskCanceledException","id":"18829747"}]

After a few minutes when I start the run, I see a lot of exceptions thrown from the Orleans.Core library indicating connection issues with reading the SiloInstances tables (which I have pasted above in my second comment on this post)

Do you mean this?

thakursagar · 2019-11-14T20:01:50Z

I have tried using a new serviceId and a new clusterId for deployments but that doesn't seem to fix the issue.

sergeybykov · 2019-11-14T20:06:16Z

No I meant this.

I don't see anything about the table there. But I do see the following:

"Unable to connect to endpoint S10.2.69.10:20011:0

Is silo 10.2.69.10:20011 part of the cluster and running fine at this point? If not, what happened to it?

thakursagar · 2019-11-14T23:14:07Z

Yes that silo 10.2.69.10:20011 is running fine at this point. I even changed the clustering membership from Azure Table Storage to ADO.Net clustering and I still see the same behavior. After a few minutes (3-4) of kicking off the run, every silo starts suspecting every other silo in the cluster which results in a lot of SiloUnavailableExceptions and some silos are marked as dead eventually. Then they come back up again and the same cycle continues.

sergeybykov · 2019-11-14T23:32:31Z

Hmm. Is this running on a physical cluster with each silo having a few CPU cores and ServerGC? Can you share the beginning of a silo log or, better, a full silo log?

I have a hard time thinking what should happen to a silo for it to stop responding to clustering pings. I guess it is possible to exhaust IO Completion ports, but still... @ReubenBond, do you have any other idea?

thakursagar · 2019-11-15T15:37:54Z

We have this running on Azure VM Scale sets with Service Fabric. I will share the silo logs shortly.

thakursagar · 2019-11-15T15:48:43Z

@sergeybykov Here are the silo logs.
silotraces.zip

thakursagar · 2019-11-15T15:59:15Z

this is the log from one of the VMs which experiences an unexpected silo shutdown. We are running SF Cluster version 6.5.664.9590.

Application: xxxx
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: FATAL EXCEPTION from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! I should be Dead according to membership table (in TryToSuspectOrKill): entry = [SiloAddress=S10.2.69.11:20006:311391329 SiloName=Silo_5c243 Status=Dead HostName=nt-3wcu-2000004 ProxyPort=20005 RoleName=xxxx UpdateZone=0 FaultZone=0 StartTime = 2019-11-14 01:35:30.559 GMT IAmAliveTime = 2019-11-14 01:45:32.171 GMT Suspecters = [S10.2.69.7:20005:311391549, S10.2.69.8:20005:311391330] SuspectTimes = [2019-11-14 01:48:18.358 GMT, 2019-11-14 01:48:20.650 GMT]].. Exception: null.
Current stack:    at System.Environment.GetStackTrace(Exception e, Boolean needFileInfo)
   at System.Environment.get_StackTrace()
   at Orleans.Runtime.FatalErrorHandler.OnFatalException(Object sender, String context, Exception exception)
   at Orleans.Runtime.MembershipService.MembershipTableManager.KillMyselfLocally(String reason)
   at Orleans.Runtime.MembershipService.MembershipTableManager.<TryToSuspectOrKill>d__50.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Orleans.Runtime.MembershipService.AzureBasedMembershipTable.<ReadAll>d__10.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Orleans.AzureUtils.OrleansSiloInstanceManager.<FindAllSiloEntries>d__29.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Orleans.Clustering.AzureStorage.AzureTableDataManager`1.<ReadTableEntriesAndEtagsAsync>d__28.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Orleans.Internal.AsyncExecutorWithRetries.<ExecuteWithRetriesHelper>d__4`1.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Orleans.Clustering.AzureStorage.AzureTableDataManager`1.<>c__DisplayClass28_0.<<ReadTableEntriesAndEtagsAsync>b__0>d.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.TableCommand.Executor.<ExecuteAsync>d__1`1.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(Task`1 completedTask)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.AsyncStreamCopier`1.<StartCopyStreamAsync>d__13.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(Task`1 completedTask)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.AsyncStreamCopier`1.<StartCopyStreamAsyncHelper>d__14.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.TaskExtensions.<WithCancellation>d__0`1.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Threading.Tasks.TaskFactory.CompleteOnInvokePromise.Invoke(Task completingTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetResult(TResult result)
   at System.Net.Http.HttpClientHandler.WebExceptionWrapperStream.<ReadAsync>d__4.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining, Task& currentTask)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1.TrySetResult(TResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncTrimPromise`1.CompleteFromAsyncResult(IAsyncResult asyncResult)
   at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken)
   at System.Net.ChunkParser.CompleteUserRead(Object result)
   at System.Net.ChunkParser.ParseTrailer()
   at System.Net.ChunkParser.ProcessResponse()
   at System.Net.ChunkParser.ReadCallback(IAsyncResult ar)
   at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken)
   at System.Net.Security._SslStream.ProcessFrameBody(Int32 readBytes, Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
   at System.Net.Security._SslStream.ReadFrameCallback(AsyncProtocolRequest asyncRequest)
   at System.Net.AsyncProtocolRequest.CompleteRequest(Int32 result)
   at System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(Int32 bytes)
   at System.Net.FixedSizeReader.ReadCallback(IAsyncResult transportResult)
   at System.Net.LazyAsyncResult.Complete(IntPtr userToken)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Net.ContextAwareResult.Complete(IntPtr userToken)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(Object result, IntPtr userToken)
   at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
   at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)
Stack:
   at System.Environment.FailFast(System.String)
   at Orleans.Runtime.FatalErrorHandler.OnFatalException(System.Object, System.String, System.Exception)
   at Orleans.Runtime.MembershipService.MembershipTableManager.KillMyselfLocally(System.String)
   at Orleans.Runtime.MembershipService.MembershipTableManager+<TryToSuspectOrKill>d__50.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Orleans.Runtime.MembershipService.AzureBasedMembershipTable+<ReadAll>d__10.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Orleans.AzureUtils.OrleansSiloInstanceManager+<FindAllSiloEntries>d__29.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Orleans.Clustering.AzureStorage.AzureTableDataManager`1+<ReadTableEntriesAndEtagsAsync>d__28[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Orleans.Internal.AsyncExecutorWithRetries+<ExecuteWithRetriesHelper>d__4`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Orleans.Clustering.AzureStorage.AzureTableDataManager`1+<>c__DisplayClass28_0+<<ReadTableEntriesAndEtagsAsync>b__0>d[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.TableCommand.Executor+<ExecuteAsync>d__1`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.Threading.Tasks.Task`1<System.Threading.Tasks.VoidTaskResult>)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.AsyncStreamCopier`1+<StartCopyStreamAsync>d__13[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.Threading.Tasks.Task`1<System.Threading.Tasks.VoidTaskResult>)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.AsyncStreamCopier`1+<StartCopyStreamAsyncHelper>d__14[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(Int32)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(Int32)
   at Microsoft.Azure.Cosmos.Table.RestExecutor.Utils.TaskExtensions+<WithCancellation>d__0`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Threading.Tasks.TaskFactory+CompleteOnInvokePromise.Invoke(System.Threading.Tasks.Task)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(Int32)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(Int32)
   at System.Net.Http.HttpClientHandler+WebExceptionWrapperStream+<ReadAsync>d__4.MoveNext()
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(Int32)
   at System.Threading.Tasks.TaskFactory`1+FromAsyncTrimPromise`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Complete(System.__Canon, System.Func`3<System.__Canon,System.IAsyncResult,Int32>, System.IAsyncResult, Boolean)
   at System.Threading.Tasks.TaskFactory`1+FromAsyncTrimPromise`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].CompleteFromAsyncResult(System.IAsyncResult)
   at System.Net.LazyAsyncResult.Complete(IntPtr)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(System.Object, IntPtr)
   at System.Net.ChunkParser.CompleteUserRead(System.Object)
   at System.Net.ChunkParser.ParseTrailer()
   at System.Net.ChunkParser.ProcessResponse()
   at System.Net.ChunkParser.ReadCallback(System.IAsyncResult)
   at System.Net.LazyAsyncResult.Complete(IntPtr)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(System.Object, IntPtr)
   at System.Net.Security._SslStream.ProcessFrameBody(Int32, Byte[], Int32, Int32, System.Net.AsyncProtocolRequest)
   at System.Net.Security._SslStream.ReadFrameCallback(System.Net.AsyncProtocolRequest)
   at System.Net.AsyncProtocolRequest.CompleteRequest(Int32)
   at System.Net.FixedSizeReader.CheckCompletionBeforeNextRead(Int32)
   at System.Net.FixedSizeReader.ReadCallback(System.IAsyncResult)
   at System.Net.LazyAsyncResult.Complete(IntPtr)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   at System.Net.ContextAwareResult.Complete(IntPtr)
   at System.Net.LazyAsyncResult.ProtectedInvokeCallback(System.Object, IntPtr)
   at System.Net.Sockets.BaseOverlappedAsyncResult.CompletionPortCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)
   at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32, UInt32, System.Threading.NativeOverlapped*)

ReubenBond · 2019-11-15T16:50:40Z

@thakursagar those appear to be traces from one of the clients - I'm not able to see silo logs there

thakursagar · 2019-11-15T17:58:30Z

@ReubenBond Sorry i got the wrong logs. I will get the correct ones soon.

thakursagar · 2019-11-15T21:15:14Z

logs.zip
Here you go @ReubenBond. I hope this is what you're looking for.

ReubenBond · 2019-11-15T21:58:39Z

@thakursagar I'm not seeing the OnFatalException call or the "I have been told I am dead" log line in those logs - are there more logs that I can look at?

thakursagar · 2019-11-17T01:35:02Z

@ReubenBond That OnFatalException was from the Event Viewer logs in one of the VMSS nodes. The Service Fabric event logs also show there was an unexpected termination for both the server and client services.

thakursagar · 2019-11-18T14:54:19Z

vmlogs.zip
Here are the logs from the VMs. They're pretty much the same type on all of them. I am out of ideas at this point honestly. Do you think maybe running it on Service Fabric is causing an issue?

ReubenBond · 2019-11-18T16:42:51Z

@thakursagar the log indicates that the silo was declared as dead by other silos in the cluster. This can mean that communication with that silo was lost. It can also mean that the node froze for a very long time (eg, 30s). Analyzing logs from all relevant nodes will often reveal the cause.

thakursagar · 2019-11-19T00:27:25Z

@ReubenBond I tried to run the app in a different environment with a different VMSS size configuration and I could not reproduce the issue. Maybe it is an infrastructure related thing (Azure) at this point.

thakursagar · 2019-11-21T01:47:20Z

@ReubenBond Update on this - tried spinning up another identical environment to the one that was causing issues and observed the same behavior. Looked into the silo logs further and saw the trend was silos were suspecting other silos because they were not receiving the ping in a timely fashion. So I changed the configuration to set the UseLivenessGossip=false and increased the ProbeTimeout to 30 seconds. This caused the benchmark runs to almost finish its run before seeing the SiloUnavailableExceptions. Any thoughts here? I am thinking of disabling the LivenessEnabled property or increase the ProbeTimeout further and give this another shot. Also attaching the silo traces where you will see messages which say "the ping attempt was cancelled"
query_data (19).zip

ReubenBond · 2019-11-21T03:46:57Z

Could you show me your silo configuration code?

ReubenBond · 2019-11-21T03:50:42Z

The logs you uploaded have no message saying "the ping attempt was cancelled"

thakursagar · 2019-11-21T05:21:17Z

@ReubenBond Sorry the logs had the 10k limit on export. I've filtered those exact logs in this file. I will get the config code soon.
query_data (20).zip

thakursagar · 2019-11-21T05:24:55Z

@ReubenBond Here's the silo config code:

{
			const string serviceName = "xxxx";
            ServiceEventSource.Current.Message($"[PID {Process.GetCurrentProcess().Id}] CreateServiceInstanceListeners()");
            var listener = OrleansServiceListener.CreateStateless((fabricServiceContext, builder) =>
            {
                builder.Configure<ClusterOptions>(options =>
                {
                    // The service id is unique for the entire service over its lifetime. This is used to identify persistent state
                    // such as reminders and grain state.
                    options.ServiceId = serviceName;

                    // The cluster id identifies a deployed cluster. Since Service Fabric uses rolling upgrades, the cluster id
                    // can be kept constant. This is used to identify which silos belong to a particular cluster.
                    options.ClusterId = "xxxxxx";
                });

               

                var activation = fabricServiceContext.CodePackageActivationContext;
                var keyVault = new AzureKeyVault(new ServiceFabricConfigProvider());
                var serviceConfigProvider = new ServiceFabricConfigProvider();

                var cloudConfig = activation.GetConfigurationPackageObject("Config");
                var storageConnectionString = keyVault.GetSecretByNameAsync("xxxx").GetAwaiter().GetResult();
                var aiInstrumentationKey = keyVault.GetSecretByNameAsync("xxxx").GetAwaiter().GetResult();
                Common.Infrastructure.LoggerFactory.Initialize(aiInstrumentationKey.Value, keyVault);
                var loggerFactory = Common.Infrastructure.LoggerFactory.Instance;
                Log.Logger = new LoggerConfiguration()
               .Enrich.FromLogContext()
               .WriteTo.ApplicationInsights(aiInstrumentationKey.Value, TelemetryConverter.Traces)
               .CreateLogger();
                var minimumLogLevel = new LogLevelHelper(keyVault).GetMinimumLogLevel().GetAwaiter().GetResult();
                if (minimumLogLevel.Equals(LogEventLevel.Debug))
                    builder.AddApplicationInsightsTelemetryConsumer(aiInstrumentationKey.Value);
                var invariant = "System.Data.SqlClient"; // for Microsoft SQL Server
                var connectionString = "xxxx";
                //use AdoNet for clustering 
                builder.UseAdoNetClustering(options =>
                {
                    options.Invariant = invariant;
                    options.ConnectionString = connectionString;
                });
                //use AdoNet for reminder service
                builder.UseAdoNetReminderService(options =>
                {
                    options.Invariant = invariant;
                    options.ConnectionString = connectionString;
                });

                builder.ConfigureLogging(logging => {
                    logging.AddSerilog(dispose: true)
                    .AddFilter("", LogLevel.Debug)
                    .AddFilter("Orleans", LogLevel.Debug);
                });
                var endpoints = activation.GetEndpoints();


                var siloEndpoint = endpoints["OrleansSiloEndpoint"];
                var gatewayEndpoint = endpoints["OrleansProxyEndpoint"];

                var hostname = fabricServiceContext.NodeContext.IPAddressOrFQDN;

                builder.ConfigureEndpoints(hostname, siloEndpoint.Port, gatewayEndpoint.Port);

                builder.ConfigureApplicationParts(parts =>
                {
                    parts.AddApplicationPart(typeof(TenantGrain).Assembly).WithReferences();
                    parts.AddApplicationPart(typeof(ITenantGrain).Assembly).WithReferences();
                });
                ConfigureServices(builder, loggerFactory);
                builder.Configure<SiloMessagingOptions>(options => options.ResponseTimeout = TimeSpan.FromMinutes(30));

                builder.Configure<SerializationProviderOptions>(options => options.SerializationProviders.Add(typeof(Orleans.Serialization.ProtobufSerializer).GetTypeInfo()));

                builder.AddStartupTask<Startup>();

            });

            return new[] { listener };
        }

ReubenBond · 2019-11-21T17:31:08Z

builder.Configure<SiloMessagingOptions>(options => options.ResponseTimeout = TimeSpan.FromMinutes(30));

I strongly recommend using a value less than 2 minutes for ResponseTimeout. The default of 30 seconds is an appropriate value.

Is there some way that you can get more comprehensive logs? There's not much to work with in those logs. I'd start by looking for warnings and errors from all silos. In particular, is the process freezing? Are blocking operations causing threads on that host to deadlock?

thakursagar · 2019-11-21T18:56:39Z

@ReubenBond I'm working on getting the silo logs being written to a file which I will share with you.

thakursagar · 2019-11-21T19:52:19Z

@ReubenBond Here is a complete silo log from one of the silos.
log20191121.zip

thakursagar · 2019-11-22T17:42:57Z

@ReubenBond some more silo logs for you.
log20191121-2_2.zip

thakursagar · 2019-11-25T15:29:27Z

@ReubenBond Did you get a chance to go through the silo logs? I looked at warnings and errors from all silos and nothing seem to be giving me a clue about the possible root cause. The process just seems to crash with lots of silo unavailable exceptions because the silo's don't ping response is not received in time. I would say that there might be a few blocking operations overall but this benchmark run seems to be working with the Orleans 2.4 version so my expectation is it should work with 3.0 as well?

ReubenBond · 2019-11-25T17:21:36Z

The cause of the issue is not immediately obvious. I see that some silo(s) are pausing for a few seconds at a time. All of the silo logs are lumped in together in that one file, so there's no way to distinguish them.

It's interesting that it the same silo is declared dead twice in those logs, 10.2.73.6 and the suspecting silos which voted it dead are always Suspecters = [S10.2.73.7:20001:312059015, S10.2.73.8:20001:312059012].

Is there a networking issue? Could you please verify that you are definitely using Orleans 3.0.0 on all hosts and not 3.0.0-beta1?

sujesharukil · 2019-11-26T15:49:56Z

@ReubenBond - do Silo pings go through grains? as in uses the same dedicated threads as Orleans?

ReubenBond · 2019-11-26T17:19:54Z

Silo pings use different threads to grains. In 3.0, they are handled directly by the connection processor (see SiloConnection.cs)

sujesharukil · 2019-11-26T19:30:41Z

Thanks for the info @ReubenBond . Are there logs from ping receivers that they have received a ping and will respond?
The cancellation of pings in this case seems to be just a wait timeout. The SF explorer showed all the silos as healthy and the silo logs do show that the silos are just shutting down because they have been told that they are dead.
There is nothing that tells how the ping was handled/not handled other than just that it timed out.

ReubenBond · 2019-11-26T19:40:27Z

There are logs: you need to enable trace level logging for "Microsoft.Orleans.Networking" to show them.

sujesharukil · 2019-11-26T20:18:32Z

                    .AddFilter("Orleans", LogLevel.Trace);
                });

Like this?

ReubenBond · 2019-11-26T20:47:22Z

Yes, but with Microsoft.Orleans.Networking instead of just Orleans

sujesharukil · 2019-11-26T21:39:13Z

thanks, we will give that a shot and reach back

thakursagar · 2019-11-26T23:02:28Z

The cause of the issue is not immediately obvious. I see that some silo(s) are pausing for a few seconds at a time. All of the silo logs are lumped in together in that one file, so there's no way to distinguish them.

It's interesting that it the same silo is declared dead twice in those logs, 10.2.73.6 and the suspecting silos which voted it dead are always Suspecters = [S10.2.73.7:20001:312059015, S10.2.73.8:20001:312059012].

Is there a networking issue? Could you please verify that you are definitely using Orleans 3.0.0 on all hosts and not 3.0.0-beta1?

The logs that I have uploaded are from one silo. Each file represents one silo log. I have verified we are using all 3.0 packages.

ReubenBond · 2019-11-27T17:07:17Z

Thanks for the clarification, @thakursagar, I had misinterpreted them because of the doubled-up log lines.

I just noticed that you are using AppInsights for telemetry. I've seen some issues with AppInsights and blocking threads recently (particularly in Flush calls) which could potentially cause issues like this. Diagnosing that is not trivial, but capturing a memory dump can help (I can analyze it if you like, just email me a link to it). Capturing a perf trace can also help and is preferable since it gives an idea of behavior over time rather than just a snapshot.

To capture a trace, download the latest version of PerfView from here: https://github.com/microsoft/perfview/releases/tag/P2.0.48 and copy PerfView.exe to the target machine and execute the following in an elevated command prompt:

.\PerfView.exe /acceptEULA /noGui /threadTime /zip /maxCollectSec:30 /bufferSizeMB:1024 /circularMB:1024 /dataFile:1.etl collect

I can help to analyze the resulting zipped traces.

We can also make some time to diagnose this over a call. That might be faster.

If you are able to send me a dump of the membership table (eg, using Azure Storage Explorer), that is also useful for diagnosing this.

My current inclination is blocked threads on the scheduler, since we see so many stalls.

The ResponseTimeout should definitely be lowered to < 2 minutes.

sujesharukil · 2019-11-27T19:05:54Z

@ReubenBond , we do a few Task.WhenAll awaiting a few thousand grain calls. We have seen this can take a while (30 minutes is harsh, but that was because we didn't know where it will break). But now that you mention stalled threads and/or blocked threads, it would definitely make sense why some of the processes take long.
We can certainly provide you with the dumps and we greatly appreciate your help in this regard.

ReubenBond · 2019-11-27T19:38:20Z

Any time, @sujesharukil. I'm flying to the other side of the planet today, but I'll try to help whenever I have connectivity

thakursagar · 2019-12-10T11:51:30Z

@ReubenBond I tried unhooking App Insights from the app but saw the same behavior. I captured the perf traces using PerfView and have emailed you the link to traces. Thanks for all your help!

ReubenBond · 2020-01-27T17:03:11Z

Closing due to inactivity - let us know if this is still an issue

thakursagar · 2020-01-29T05:05:22Z

@ReubenBond sorry I haven't gotten a chance to do the next steps that we discussed on the email thread. We have put down the upgrade for a bit due to other priorities. I did find an interesting issue with some of our GrainInterfaces though - we were importing the [OneWay] from the System.Runtime.Remoting.Messaging library instead of Orleans.Concurrency. Do you think that might be related to this?

ReubenBond · 2020-01-29T05:16:22Z

It's possible, worth testing.

thakursagar · 2021-07-02T04:01:10Z

@ReubenBond @sergeybykov Gave another shot this week to update the application to the 3.4.3 version of Orleans. What I am seeing this time is a lot of errors like this:

{"FailedProbeCount":"1","MessageTemplate":"Did not get response for probe #{Id} to silo {Silo} after {Elapsed}. Current number of consecutive failed probes is {FailedProbeCount}","SourceContext":"Orleans.Runtime.MembershipService.SiloHealthMonitor","Elapsed":"00:00:05.0047962","EventId":"{\"Id\":100613}","Silo":"S10.2.69.11:20052:362863605","Id":"5279"}

followed by this:

[{"severityLevel":"Error","parsedStack":[{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":0,"line":0},{"assembly":"mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":1,"line":0},{"assembly":"Grains, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null","method":"Grains.SectionProcessorGrain+<Answers>d__47.MoveNext","level":2,"line":766,"fileName":"FileName.cs"}],"outerId":"0","message":"The target silo became unavailable for message: Request S10.2.69.10:20056:362863155*grn/Grains.ProcessorGrain/0+012018.3107424C-0F21-5FCF-17C8-E542DA6B9848.EB5F71F1-1C7E-9D9B-5C14-6274B9E0203D@5a696d96->S10.2.69.11:20052:362863150*grn/Grains.Grain/0+012018.3107424C-0F21-5FCF-17C8-E542DA6B9848.1@3eaac383 InvokeMethodRequest Interfaces#830933. Target History is: <S10.2.69.11:20052:362863150:*grn/Grains/012018.3107424C-0F21-5FCF-17C8-E542DA6B9848.1:@3eaac383>. See https://aka.ms/orleans-troubleshooting for troubleshooting help.","type":"Orleans.Runtime.SiloUnavailableException","id":"44914034"}]

Do you think there is anything that changed w.r.t the errors that I am seeing in the 3.x version as compared to the 2.4.2 version?
I am wondering what changed in the 3.x version of Orleans which is causing this to happen and if there is any way to workaround that? The same code works very well without any issues with the 2.4.2 version of Orleans.

thakursagar changed the title ~~Issues after Orleans 3.0 upgrade~~ Issues after Orleans 3.0 upgrade running on a Service Fabric Cluster Nov 12, 2019

sergeybykov self-assigned this Nov 13, 2019

sergeybykov added this to the Triage milestone Nov 13, 2019

sergeybykov assigned ReubenBond and unassigned sergeybykov Jan 26, 2020

ReubenBond closed this as completed Jan 27, 2020

ghost locked as resolved and limited conversation to collaborators Sep 25, 2021

ghost added the stale Issues with no activity for the past 6 months label Dec 7, 2021

Issues after Orleans 3.0 upgrade running on a Service Fabric Cluster #6113

Issues after Orleans 3.0 upgrade running on a Service Fabric Cluster #6113

Comments

thakursagar commented Nov 11, 2019 • edited Loading

thakursagar commented Nov 11, 2019 • edited Loading

sergeybykov commented Nov 13, 2019

thakursagar commented Nov 13, 2019

sergeybykov commented Nov 13, 2019

thakursagar commented Nov 13, 2019 • edited Loading

sergeybykov commented Nov 14, 2019

thakursagar commented Nov 14, 2019

sergeybykov commented Nov 14, 2019

thakursagar commented Nov 14, 2019

thakursagar commented Nov 14, 2019

sergeybykov commented Nov 14, 2019

thakursagar commented Nov 14, 2019

sergeybykov commented Nov 14, 2019

thakursagar commented Nov 15, 2019

thakursagar commented Nov 15, 2019 • edited Loading

thakursagar commented Nov 15, 2019 • edited Loading

ReubenBond commented Nov 15, 2019

thakursagar commented Nov 15, 2019

thakursagar commented Nov 15, 2019

ReubenBond commented Nov 15, 2019

thakursagar commented Nov 17, 2019

thakursagar commented Nov 18, 2019

ReubenBond commented Nov 18, 2019

thakursagar commented Nov 19, 2019

thakursagar commented Nov 21, 2019 • edited Loading

ReubenBond commented Nov 21, 2019

ReubenBond commented Nov 21, 2019

thakursagar commented Nov 21, 2019

thakursagar commented Nov 21, 2019

ReubenBond commented Nov 21, 2019

thakursagar commented Nov 21, 2019

thakursagar commented Nov 21, 2019

thakursagar commented Nov 22, 2019

thakursagar commented Nov 25, 2019

ReubenBond commented Nov 25, 2019

sujesharukil commented Nov 26, 2019

ReubenBond commented Nov 26, 2019

sujesharukil commented Nov 26, 2019

ReubenBond commented Nov 26, 2019

sujesharukil commented Nov 26, 2019

ReubenBond commented Nov 26, 2019

sujesharukil commented Nov 26, 2019

thakursagar commented Nov 26, 2019

ReubenBond commented Nov 27, 2019 • edited Loading

sujesharukil commented Nov 27, 2019

ReubenBond commented Nov 27, 2019

thakursagar commented Dec 10, 2019

ReubenBond commented Jan 27, 2020

thakursagar commented Jan 29, 2020

ReubenBond commented Jan 29, 2020

thakursagar commented Jul 2, 2021

thakursagar commented Nov 11, 2019 •

edited

Loading

thakursagar commented Nov 11, 2019 •

edited

Loading

thakursagar commented Nov 13, 2019 •

edited

Loading

thakursagar commented Nov 15, 2019 •

edited

Loading

thakursagar commented Nov 15, 2019 •

edited

Loading

thakursagar commented Nov 21, 2019 •

edited

Loading

ReubenBond commented Nov 27, 2019 •

edited

Loading