DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

varunpuranik · 2019-05-22T02:03:08Z

Description of the issue:

It seems like the DeviceClient/ModuleClient sometimes goes into a bad state and disposes the underlying transport. This causes subsequent operations to throw an OperationDisposedException, even through CloseAsync or Dispose is never called on the object.

On further investigation, it seems like the Reconnect logic in RetryingDelegatingHandler.HandleDisconnect throws, which causes the transport to get disposed here (line 591) -

azure-iot-sdk-csharp/iothub/device/src/Transport/RetryDelegatingHandler.cs

Lines 582 to 592 in e1dd08e

    
           catch (Exception ex) 
        
           { 
        
               if (Logging.IsEnabled) Logging.Error(this, ex.ToString(), nameof(HandleDisconnect)); 
        
               var hubException = ex as IotHubException; 
        
               if (hubException != null) HandleConnectionStatusExceptions(hubException); 
        
               // We were not able to recover the connection or subscriptions within the configured policy. 
        
               // The object will be placed in an unusable state. 
        
               Dispose(true); 
        
           }

In our application, we get the 2 callbacks triggered from this method -

azure-iot-sdk-csharp/iothub/device/src/Transport/RetryDelegatingHandler.cs

Line 540 in e1dd08e

_onConnectionStatusChanged(ConnectionStatus.Disconnected_Retrying, ConnectionStatusChangeReason.Communication_Error);
azure-iot-sdk-csharp/iothub/device/src/Transport/RetryDelegatingHandler.cs

Line 616 in e1dd08e

_onConnectionStatusChanged(ConnectionStatus.Disconnected, status);

Expected behavior would be -

Ideally the transport does not get disposed, or recovers eventually
If that is not possible, then the actual underlying exception bubbles up to the application code, rather than ObjectDisposedException.

AMCN41R · 2019-05-23T14:10:39Z

I experienced this issue yesterday morning (around 08:00 UTC). There seemed to be a service outage in North Europe which prevented the edgeHub module on my devices communicating with the IoT Hub. As a result, the following exception was thrown repeatedly...

2019-05-22 09:20:12.390 +00:00 [INF] - Reauthenticating connected clients
2019-05-22 09:21:05.910 +00:00 [WRN] - Non retryable exception occurred while sending message.
System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendEventAsync(Message message, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.InternalClient.SendEventAsync(Message message)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.<>c__DisplayClass28_0.<<InvokeFunc>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 174
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func`1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 134
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageAsync(IMessage inputMessage) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 128
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageAsync(IMessage inputMessage) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 134
   at Microsoft.Azure.Devices.Edge.Hub.Core.Routing.CloudEndpoint.CloudMessageProcessor.<>c__DisplayClass12_0.<<ProcessAsync>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/routing/CloudEndpoint.cs:line 146

When service was restored, the device did not recover and I had to manually restart the edgeHub container on the device.

sebader · 2019-05-23T14:26:17Z

I had the same yesterday, around 2 hours earlier, with an IoT Hub in West Europe. Thus, I would assume it was a rolling service upgrade / update, not necessarily an outage, that caused clients to reconnect.

prmathur-microsoft · 2019-06-18T18:41:34Z

#955

vladislav-mitev · 2019-07-08T06:32:54Z

We experience the same issue both within Edge and Regular IoT Devices/Modules.
We could not identify any obvious reason for this. While our heartbeat flows, once of a sudden these exceptions start appearing and until the whole client is reset, nothing helps and this continues as forever.
Exception details below (identical for devices, modules, edge devices, edge modules):

System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<EnsureOpenedAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<SendEventAsync>d__14.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.InternalClient.<SendEventAsync>d__67.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

Hammatt · 2019-07-23T20:00:43Z

We've been seeing this issue in production in Europe but not elsewhere:

23-07-2019 08:57:23.37 Error: System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendEventAsync(Message message, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.InternalClient.SendEventAsync(String outputName, Message message)

It requires a manual restart of the entire IoT Edge service via sudo systemctl restart iotedge. just restarting the effected container via a docker restart {container} does not resolve the issue.

davilu · 2019-08-05T21:23:52Z

@varunpuranik, @Hammatt, @AMCN41R, @sebader, @vladislav-mitev thanks for the patient and this issue is fixed in PR(1001)via AMQP redesign. We're under long haul testing in a stress platform. It will be released once the test are completed.

vladislav-mitev · 2019-08-14T08:20:51Z

@davilu, thanks a lot for your contributions. This issue has been frustrating.

However, you only refer to the AMQP transport being fixed. What about MQTT? We are mostly using MQTT and still suffer from this very problem. I suppose that both implementations are similar.
Could your fixes and tests be applied towards MQTT transport, as well?

Also, do you have any rough idea as to when we could expect this to be released?

prmathur-microsoft · 2019-09-03T23:58:35Z

We are very close to releasing the new SDK with the fixes. Tentative date is Sept 15th. However that would only contain AMQP fixes.

petermillsacumentive · 2019-09-04T08:03:21Z

We're fairly sure we're getting a similar problem when UK or US IoT hubs are bouncing too. We have some code to try tear down the connection if has not recovered in a certain amount of time. Eagerly awaiting this fix all the same though!

prmathur-microsoft · 2019-09-25T21:54:35Z

This issue is now fixed and should be a part of latest release v1.21.1

az-iot-builder-01 · 2019-09-25T21:54:52Z

@AMCN41R, @sebader, @prmathur-microsoft, @vladislav-mitev, @Hammatt, @davilu, @petermillsacumentive, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

vladislav-mitev · 2019-09-26T06:36:12Z

Thanks for the release, great job.

One note, however. The description of the issue does not contain any protocol specifics and as far as this issue is happening for MQTT transport as well, shouldn't this issue be used for that fix also? (Or should we report another one?)

prmathur-microsoft · 2019-10-24T19:39:53Z

Can you please open another one to track mqtt?

varunpuranik mentioned this issue May 24, 2019

Shutdown EdgeAgent if ModuleClient gets stuck in ObjectDisposed loop Azure/iotedge#1254

Merged

prmathur-microsoft added the investigation-required Requires further investigation to root cause this. label Jun 18, 2019

prmathur-microsoft assigned davilu Aug 2, 2019

prmathur-microsoft added bug Something isn't working. and removed investigation-required Requires further investigation to root cause this. labels Aug 2, 2019

vladislav-mitev mentioned this issue Aug 14, 2019

edgeAgent breaks with "System.ObjectDisposedException: Cannot access a disposed object." Azure/iotedge#1336

Closed

prmathur-microsoft closed this as completed Sep 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

varunpuranik commented May 22, 2019 •

edited

Loading

AMCN41R commented May 23, 2019

sebader commented May 23, 2019

prmathur-microsoft commented Jun 18, 2019

vladislav-mitev commented Jul 8, 2019 •

edited

Loading

Hammatt commented Jul 23, 2019

davilu commented Aug 5, 2019

vladislav-mitev commented Aug 14, 2019

prmathur-microsoft commented Sep 3, 2019 •

edited

Loading

petermillsacumentive commented Sep 4, 2019 •

edited

Loading

prmathur-microsoft commented Sep 25, 2019

az-iot-builder-01 commented Sep 25, 2019

vladislav-mitev commented Sep 26, 2019

prmathur-microsoft commented Oct 24, 2019

DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

Comments

varunpuranik commented May 22, 2019 • edited Loading

Description of the issue:

AMCN41R commented May 23, 2019

sebader commented May 23, 2019

prmathur-microsoft commented Jun 18, 2019

vladislav-mitev commented Jul 8, 2019 • edited Loading

Hammatt commented Jul 23, 2019

davilu commented Aug 5, 2019

vladislav-mitev commented Aug 14, 2019

prmathur-microsoft commented Sep 3, 2019 • edited Loading

petermillsacumentive commented Sep 4, 2019 • edited Loading

prmathur-microsoft commented Sep 25, 2019

az-iot-builder-01 commented Sep 25, 2019

vladislav-mitev commented Sep 26, 2019

prmathur-microsoft commented Oct 24, 2019

varunpuranik commented May 22, 2019 •

edited

Loading

vladislav-mitev commented Jul 8, 2019 •

edited

Loading

prmathur-microsoft commented Sep 3, 2019 •

edited

Loading

petermillsacumentive commented Sep 4, 2019 •

edited

Loading