Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceClient/ModuleClient goes into a bad state and throws ObjectDisposedExceptions on subsequent operations #953

Closed
varunpuranik opened this issue May 22, 2019 · 13 comments
Assignees
Labels
bug Something isn't working.

Comments

@varunpuranik
Copy link
Contributor

varunpuranik commented May 22, 2019

Description of the issue:

It seems like the DeviceClient/ModuleClient sometimes goes into a bad state and disposes the underlying transport. This causes subsequent operations to throw an OperationDisposedException, even through CloseAsync or Dispose is never called on the object.

On further investigation, it seems like the Reconnect logic in RetryingDelegatingHandler.HandleDisconnect throws, which causes the transport to get disposed here (line 591) -

catch (Exception ex)
{
if (Logging.IsEnabled) Logging.Error(this, ex.ToString(), nameof(HandleDisconnect));
var hubException = ex as IotHubException;
if (hubException != null) HandleConnectionStatusExceptions(hubException);
// We were not able to recover the connection or subscriptions within the configured policy.
// The object will be placed in an unusable state.
Dispose(true);
}

In our application, we get the 2 callbacks triggered from this method -

  1. _onConnectionStatusChanged(ConnectionStatus.Disconnected_Retrying, ConnectionStatusChangeReason.Communication_Error);
  2. _onConnectionStatusChanged(ConnectionStatus.Disconnected, status);

Expected behavior would be -

  • Ideally the transport does not get disposed, or recovers eventually
  • If that is not possible, then the actual underlying exception bubbles up to the application code, rather than ObjectDisposedException.
@AMCN41R
Copy link

AMCN41R commented May 23, 2019

I experienced this issue yesterday morning (around 08:00 UTC). There seemed to be a service outage in North Europe which prevented the edgeHub module on my devices communicating with the IoT Hub. As a result, the following exception was thrown repeatedly...

2019-05-22 09:20:12.390 +00:00 [INF] - Reauthenticating connected clients
2019-05-22 09:21:05.910 +00:00 [WRN] - Non retryable exception occurred while sending message.
System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendEventAsync(Message message, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.InternalClient.SendEventAsync(Message message)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.<>c__DisplayClass28_0.<<InvokeFunc>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 174
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient.InvokeFunc[T](Func`1 func, String operation, Boolean useForConnectivityCheck) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/ConnectivityAwareClient.cs:line 134
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageAsync(IMessage inputMessage) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 128
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy.SendMessageAsync(IMessage inputMessage) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudProxy.cs:line 134
   at Microsoft.Azure.Devices.Edge.Hub.Core.Routing.CloudEndpoint.CloudMessageProcessor.<>c__DisplayClass12_0.<<ProcessAsync>b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/routing/CloudEndpoint.cs:line 146

When service was restored, the device did not recover and I had to manually restart the edgeHub container on the device.

@sebader
Copy link
Member

sebader commented May 23, 2019

I had the same yesterday, around 2 hours earlier, with an IoT Hub in West Europe. Thus, I would assume it was a rolling service upgrade / update, not necessarily an outage, that caused clients to reconnect.

@prmathur-microsoft
Copy link
Member

#955

@prmathur-microsoft prmathur-microsoft added the investigation-required Requires further investigation to root cause this. label Jun 18, 2019
@vladislav-mitev
Copy link

vladislav-mitev commented Jul 8, 2019

We experience the same issue both within Edge and Regular IoT Devices/Modules.
We could not identify any obvious reason for this. While our heartbeat flows, once of a sudden these exceptions start appearing and until the whole client is reset, nothing helps and this continues as forever.
Exception details below (identical for devices, modules, edge devices, edge modules):

System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<EnsureOpenedAsync>d__31.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<SendEventAsync>d__14.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Client.InternalClient.<SendEventAsync>d__67.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

@Hammatt
Copy link

Hammatt commented Jul 23, 2019

We've been seeing this issue in production in Europe but not elsewhere:

23-07-2019 08:57:23.37 Error: System.ObjectDisposedException: Cannot access a disposed object.
Object name: 'IoT Client'.
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.ThrowIfDisposed()
   at Microsoft.Azure.Devices.Client.Transport.DefaultDelegatingHandler.OpenAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass32_0.<<OpenAsyncInternal>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.EnsureOpenedAsync(CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.<>c__DisplayClass14_0.<<SendEventAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Devices.Client.Transport.RetryDelegatingHandler.SendEventAsync(Message message, CancellationToken cancellationToken)
   at Microsoft.Azure.Devices.Client.InternalClient.SendEventAsync(String outputName, Message message)

It requires a manual restart of the entire IoT Edge service via sudo systemctl restart iotedge. just restarting the effected container via a docker restart {container} does not resolve the issue.

@prmathur-microsoft prmathur-microsoft added bug Something isn't working. and removed investigation-required Requires further investigation to root cause this. labels Aug 2, 2019
@davilu
Copy link
Contributor

davilu commented Aug 5, 2019

@varunpuranik, @Hammatt, @AMCN41R, @sebader, @vladislav-mitev thanks for the patient and this issue is fixed in PR(1001)via AMQP redesign. We're under long haul testing in a stress platform. It will be released once the test are completed.

@vladislav-mitev
Copy link

@davilu, thanks a lot for your contributions. This issue has been frustrating.

However, you only refer to the AMQP transport being fixed. What about MQTT? We are mostly using MQTT and still suffer from this very problem. I suppose that both implementations are similar.
Could your fixes and tests be applied towards MQTT transport, as well?

Also, do you have any rough idea as to when we could expect this to be released?

@prmathur-microsoft
Copy link
Member

prmathur-microsoft commented Sep 3, 2019

We are very close to releasing the new SDK with the fixes. Tentative date is Sept 15th. However that would only contain AMQP fixes.

@petermillsacumentive
Copy link

petermillsacumentive commented Sep 4, 2019

We're fairly sure we're getting a similar problem when UK or US IoT hubs are bouncing too. We have some code to try tear down the connection if has not recovered in a certain amount of time. Eagerly awaiting this fix all the same though!

@prmathur-microsoft
Copy link
Member

This issue is now fixed and should be a part of latest release v1.21.1

@az-iot-builder-01
Copy link
Contributor

@AMCN41R, @sebader, @prmathur-microsoft, @vladislav-mitev, @Hammatt, @davilu, @petermillsacumentive, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

@vladislav-mitev
Copy link

Thanks for the release, great job.

One note, however. The description of the issue does not contain any protocol specifics and as far as this issue is happening for MQTT transport as well, shouldn't this issue be used for that fix also? (Or should we report another one?)

@prmathur-microsoft
Copy link
Member

Can you please open another one to track mqtt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

9 participants