Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module gets disconnected after some time #19

Closed
lucianoctorres opened this issue Jul 13, 2018 · 29 comments
Closed

Module gets disconnected after some time #19

lucianoctorres opened this issue Jul 13, 2018 · 29 comments
Assignees
Labels
area:connectivity Anything related to connectivity, networking, and related. bug Something isn't working customer-reported iotedge no-issue-activity

Comments

@lucianoctorres
Copy link

I have several modules running on an Edge device running on Linux (Edge GA)

After some time, one of the modules is no longer able to send messages to IoT Hub. Even after restarting iotedge it is not able to recover. Sometimes (not always) deleting all docker containers and images the module recovers and starts sending messages again.

On edgeHub, we have the attached log when we restart iotedge and it it is not able to recover.
The module AutomationPublisher is the one which generates the data, and EdgePublisher is the one that sends it to IoT Hub.

log.edgepublisherproblem.txt

@varunpuranik
Copy link
Contributor

Hi jabaloo,

I looked at the logs - it seems like the module EdgePublisher was not connected to the EdgeHub for a long time (for some reason), but eventually it did connect to the EdgeHub and the EdgeHub was able to send messages to IoTHub (look at the last few lines in the log you attached). Is that not what you are seeing?
Note that sometimes it takes a little while for the messages to show up in the EventHub associated with the IoTHub, but they should show up.

@lucianoctorres
Copy link
Author

lucianoctorres commented Jul 17, 2018

Hi Varunpuranik

In fact after this happens, the module is no longer able to send any messages.

In the module log, I get the lines (taken from another occasion when this problem happened):

Error: Time:Fri Jun 15 05:47:54 2018 File:/usr/sdk/src/c/umqtt/src/mqtt_message.c Func:mqttmessage_create Line:102 Failure allocating message value of 4294954750z
Error: Time:Fri Jun 15 05:47:54 2018 File:/usr/sdk/src/c/umqtt/src/mqtt_client.c Func:recvCompleteCallback Line:674 failure in mqttmessage_create
Error: Time:Fri Jun 15 05:47:54 2018 File:/usr/sdk/src/c/iothub_client/src/iothubtransport_mqtt_common.c Func:mqtt_error_callback Line:1569 INTERNAL ERROR: unexpected error value received MQTT_CLIENT_MEMORY_ERROR
Error: Time:Fri Jun 15 05:47:54 2018 File:/usr/sdk/src/c/c-utility/src/tlsio_openssl.c Func:decode_ssl_received_bytes Line:788 SSL channel closed in decode_ssl_received_bytes.
Error: Time:Fri Jun 15 05:47:54 2018 File:/usr/sdk/src/c/c-utility/src/tlsio_openssl.c Func:on_underlying_io_bytes_received Line:837 Error in decode_ssl_received_bytes.```

This has been happening in our production environment, every 3 or 4 days. I noticed that I sometimes can reproduce it locally by cutting network access -- as it also happens in the field due to our unstable 3G link.

@lucianoctorres
Copy link
Author

It happened again last night. I am not sure if it's the same problem but the effect is the same.
The order of events is the following:

the first problem occurs when the token needs to be renewed (log of edgeHub):

2018-07-17 22:39:39.003 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection] - Getting new token for HumsFactory/edgePublisher.
2018-07-17 22:39:39.003 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection] - Token Expired. Id:HumsFactory/edgePublisher, CurrentToken: SharedAccessSignature sr=bobstiothubpilot.azure-devices.net%2fdevices%2fHumsFactory%2fmodules%2fedgePublisher&sig=icrS70aG6gZHcn3fLcSx7IBHyBeHubJPrtWwQL%2bZzDk%3d&se=1531867299.
2018-07-17 22:39:39.003 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection] - Existing token not found for HumsFactory/edgePublisher. Getting new token from the client...
2018-07-17 22:39:39.003 +00:00 [WRN] [ProtocolGateway] - Closing connection for device: HumsFactory/edgePublisher, Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device HumsFactory/edgePublisher.,
2018-07-17 22:39:39.003 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Disposing MessagingServiceClient for device Id HumsFactory/edgePublisher because of exception - Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnectionException: Connection closed for device HumsFactory/edgePublisher.
2018-07-17 22:39:39.004 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceProxy] - Closing device proxy for device Id HumsFactory/edgePublisher
2018-07-17 22:39:39.004 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Device connection removed for device HumsFactory/edgePublisher
2018-07-17 22:39:39.004 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnection] - Updating device HumsFactory/edgePublisher connection status to Disconnected

Some time later, it seems to reconnect the module

2018-07-17 22:39:42.676 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint] - Sending 1 message(s) to module HumsFactory/edgePublisher.
2018-07-17 22:39:42.676 +00:00 [ERR] [Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint] - Module HumsFactory/edgePublisher is not connected
2018-07-17 22:39:42.699 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection] - Obtained new token for client HumsFactory/edgePublisher that expires in 00:59:56.3005930
2018-07-17 22:39:42.699 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection] - Updated cloud connection for client HumsFactory/edgePublisher
2018-07-17 22:39:42.699 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - New cloud connection created for device HumsFactory/edgePublisher
2018-07-17 22:39:42.699 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.DeviceConnectivityManager] - IotHub call succeeded
2018-07-17 22:39:42.699 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Operation OpenAsync succeeded
2018-07-17 22:39:42.699 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.TokenCredentialsAuthenticator] - Authenticated HumsFactory/edgePublisher with IotHub
2018-07-17 22:39:42.699 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceIdentityProvider] - Successfully generated identity for clientId HumsFactory/edgePublisher and username bobstiothubpilot.azure-devices.net/HumsFactory/edgePublisher/?api-version=2017-11-08-preview&DeviceClientType=iothubclient%2f1.2.5%20(native%3b%20Linux%3b%20x86_64)
2018-07-17 22:39:42.700 +00:00 [INF] [ProtocolGateway] - ClientAuthenticated, HumsFactory/edgePublisher, 07955594
2018-07-17 22:39:42.700 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Obtained cloud connection for device HumsFactory/edgePublisher
2018-07-17 22:39:42.700 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Binding cloud listener for device HumsFactory/edgePublisher
2018-07-17 22:39:42.700 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - New device connection for device HumsFactory/edgePublisher
2018-07-17 22:39:42.700 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnection] - Updating device HumsFactory/edgePublisher connection status to Connected
2018-07-17 22:39:42.700 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.Device.DeviceMessageHandler] - Bind device proxy for device HumsFactory/edgePublisher
2018-07-17 22:39:42.700 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Binding message channel for device Id HumsFactory/edgePublisher

But shortly after it gets a SSL exception

2018-07-17 22:39:47.245 +00:00 [WRN] [ProtocolGateway] - Closing connection for device: HumsFactory/edgePublisher, scope: ExceptionCaught, DotNetty.Codecs.DecoderException: Exception of type 'DotNetty.Codecs.DecoderException' was thrown. ---> System.AggregateException: One or more errors occurred. (The decryption operation failed, see inner exception.) ---> System.IO.IOException: The decryption operation failed, see inner exception. ---> Interop+OpenSsl+SslException: Decrypt failed with OpenSSL error - SSL_ERROR_SSL. ---> Interop+Crypto+OpenSslCryptographicException: error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2018-07-17 22:39:47.246 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Disposing MessagingServiceClient for device Id HumsFactory/edgePublisher because of exception - DotNetty.Codecs.DecoderException: Exception of type 'DotNetty.Codecs.DecoderException' was thrown. ---> System.AggregateException: One or more errors occurred. (The decryption operation failed, see inner exception.) ---> System.IO.IOException: The decryption operation failed, see inner exception. ---> Interop+OpenSsl+SslException: Decrypt failed with OpenSSL error - SSL_ERROR_SSL. ---> Interop+Crypto+OpenSslCryptographicException: error:140943FC:SSL routines:ssl3_read_bytes:sslv3 alert bad record mac
2018-07-17 22:39:47.246 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceProxy] - Setting device proxy inactive for device Id HumsFactory/edgePublisher
2018-07-17 22:39:47.246 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Closing receiver for device HumsFactory/edgePublisher
2018-07-17 22:39:47.247 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Received connection status changed callback with connection status Disabled and reason Client_Close
2018-07-17 22:39:47.247 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Closed cloud proxy for device HumsFactory/edgePublisher
2018-07-17 22:39:47.247 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Device connection removed for device HumsFactory/edgePublisher
2018-07-17 22:39:47.247 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnection] - Updating device HumsFactory/edgePublisher connection status to Disconnected

After that I get messages that the module is disconnected

2018-07-17 22:39:48.532 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint] - Sending 1 message(s) to module HumsFactory/edgePublisher.
2018-07-17 22:39:48.532 +00:00 [ERR] [Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint] - Module HumsFactory/edgePublisher is not connected

I got one more time the message that the module has connected, but then I get another error

2018-07-17 22:39:54.237 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Device.DeviceMessageHandler] - Sent message with correlation ID 6e09baa9-1315-4236-8142-0c88bc16ffcc to HumsFactory/edgePublisher
2018-07-17 22:39:54.239 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.240 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.241 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.241 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.245 +00:00 [WRN] [DotNetty.Common.Utilities.ReferenceCountUtil] - Failed to release a message: PublishPacket[Type=PUBLISH, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]
2018-07-17 22:39:54.247 +00:00 [WRN] [ProtocolGateway] - Closing connection for device: HumsFactory/edgePublisher, scope: <- PUBLISH, System.AggregateException: One or more errors occurred. (One or more errors occurred. (Broken pipe)) ---> System.AggregateException: One or more errors occurred. (Broken pipe) ---> System.Net.Sockets.SocketException: Broken pipe
2018-07-17 22:39:54.246 +00:00 [WRN] [DotNetty.Common.Utilities.ReferenceCountUtil] - Failed to release a message: PublishPacket[Type=PUBLISH, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]
2018-07-17 22:39:54.248 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.249 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.250 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.250 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.252 +00:00 [WRN] [DotNetty.Common.Utilities.ReferenceCountUtil] - Failed to release a message: PublishPacket[Type=PUBLISH, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]
2018-07-17 22:39:54.253 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.253 +00:00 [WRN] [DotNetty.Transport.Channels.ChannelOutboundBuffer] - Failed to mark a promise as failure because it's done already: TaskCompletionSource[status: Faulted]
2018-07-17 22:39:54.251 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Disposing MessagingServiceClient for device Id HumsFactory/edgePublisher because of exception - System.AggregateException: One or more errors occurred. (One or more errors occurred. (Broken pipe)) ---> System.AggregateException: One or more errors occurred. (Broken pipe) ---> System.Net.Sockets.SocketException: Broken pipe
2018-07-17 22:39:54.252 +00:00 [WRN] [DotNetty.Common.Utilities.ReferenceCountUtil] - Failed to release a message: PublishPacket[Type=PUBLISH, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]
2018-07-17 22:39:54.267 +00:00 [WRN] [DotNetty.Common.Utilities.ReferenceCountUtil] - Failed to release a message: PublishPacket[Type=PUBLISH, QualityOfService=AtMostOnce, Duplicate=False, Retain=False]
2018-07-17 22:39:54.270 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceProxy] - Setting device proxy inactive for device Id HumsFactory/edgePublisher
2018-07-17 22:39:54.270 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Closing receiver for device HumsFactory/edgePublisher
2018-07-17 22:39:54.271 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Received connection status changed callback with connection status Disabled and reason Client_Close

At this stage, the module stops sending messages and never recovers, throwing this message (module log):

Error: Time:Tue Jul 17 22:39:54 2018 File:/usr/sdk/src/c/umqtt/src/mqtt_message.c Func:mqttmessage_create Line:102 Failure allocating message value of 4294941684z
Error: Time:Tue Jul 17 22:39:54 2018 File:/usr/sdk/src/c/umqtt/src/mqtt_client.c Func:recvCompleteCallback Line:680 failure in mqttmessage_create
Error: Time:Tue Jul 17 22:39:54 2018 File:/usr/sdk/src/c/iothub_client/src/iothubtransport_mqtt_common.c Func:mqtt_error_callback Line:1682 INTERNAL ERROR: unexpected error value received MQTT_CLIENT_MEMORY_ERROR
Error: Time:Tue Jul 17 22:39:54 2018 File:/usr/sdk/src/c/c-utility/adapters/tlsio_openssl.c Func:decode_ssl_received_bytes Line:785 SSL channel closed in decode_ssl_received_bytes.
Error: Time:Tue Jul 17 22:39:54 2018 File:/usr/sdk/src/c/c-utility/adapters/tlsio_openssl.c Func:on_underlying_io_bytes_received Line:834 Error in decode_ssl_received_bytes.
Failed to decode message : 'utf8' codec can't decode byte 0x8e in position 9569: invalid start byte

Some time later, I get another error message at edgeHub:

2018-07-17 22:40:01.176 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.Storage.MessageStore] - Getting next batch for endpoint HumsFactory/edgePublisher/input starting from 25980 with batch size 10.
2018-07-17 22:40:01.177 +00:00 [WRN] [ProtocolGateway] - Closing connection for device: HumsFactory/edgePublisher, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Packet of unsupported type was observed: PubRelPacket[Type=PUBREL, QualityOfService=AtLeastOnce, Duplicate=False, Retain=False], channel id: 26436f2e, identity: HumsFactory/edgePublisher,
2018-07-17 22:40:01.192 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.MessagingServiceClient] - Disposing MessagingServiceClient for device Id HumsFactory/edgePublisher because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Packet of unsupported type was observed: PubRelPacket[Type=PUBREL, QualityOfService=AtLeastOnce, Duplicate=False, Retain=False], channel id: 26436f2e, identity: HumsFactory/edgePublisher
2018-07-17 22:40:01.192 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Mqtt.DeviceProxy] - Setting device proxy inactive for device Id HumsFactory/edgePublisher
2018-07-17 22:40:01.192 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Closing receiver for device HumsFactory/edgePublisher
2018-07-17 22:40:01.193 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.ConnectivityAwareClient] - Received connection status changed callback with connection status Disabled and reason Client_Close
2018-07-17 22:40:01.193 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudProxy] - Closed cloud proxy for device HumsFactory/edgePublisher
2018-07-17 22:40:01.193 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Hub.Core.ConnectionManager] - Device connection removed for device HumsFactory/edgePublisher
2018-07-17 22:40:01.193 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Hub.Core.EdgeHubConnection] - Updating device HumsFactory/edgePublisher connection status to Disconnected

Those error messages keeping coming at the log and the module never recovers..

Cheers

@varunpuranik
Copy link
Contributor

Thanks for all the details, @Jabaloo ! I will look at all the logs and see if I can get to the bottom of the issue.

Meanwhile, can you try switching your module to AMQP (instead of MQTT). Basically when you create the ModuleClient, pass in AMQP as the transport type.
From what we have seen, AMQP seems more stable for long running modules, mainly because the token update happens over the CBS link.
Let me know if you run into any problems with that.

@jason-e-gross
Copy link

jason-e-gross commented Jul 18, 2018 via email

@lucianoctorres
Copy link
Author

Thanks @varunpuranik
I can't switch to AMQP as my modules have inputs

IoT Hub Client for Python
Error: Time:Thu Jul 19 06:50:30 2018 File:/usr/sdk/src/c/iothub_client/src/iothubtransportamqp.c Func:IotHubTransportAMQP_Subscribe_InputQueue Line:163 AMQP does not support input queues
Error: Time:Thu Jul 19 06:50:30 2018 File:/usr/sdk/src/c/iothub_client/src/iothub_client_core_ll.c ```
Func:IoTHubClientCore_LL_SetInputMessageCallbackImpl Line:2574 IoTHubTransport_Subscribe_InputQueue failed
Unexpected error IoTHubClient.set_input_message_callback, IoTHubClientResult.ERROR from IoTHub

I have modules that generate the data by polling the local machine, and modules that do processing on the data before sending it to Azure

thanks

@lucianoctorres
Copy link
Author

For tracking, it seems to be the same problem: Azure/azure-iot-sdk-csharp#558

@varunpuranik
Copy link
Contributor

@lucianoctorres There are a couple of issues affecting the MQTT connections right now, and both are related to the C# SDK -
Azure/azure-iot-sdk-csharp#521
Azure/azure-iot-sdk-csharp#558
We have a fix for one, and are actively looking into the other (I believe both issues are related). I will update the thread as soon as we have a fix on it.

@myagley
Copy link
Contributor

myagley commented Aug 10, 2018

For tracking, we believe the underlying issue has been fixed in dotnetty here: Azure/DotNetty#413

We need to cut a release of DotNetty and get the SDK updated.

@MaxKhlupnov
Copy link

MaxKhlupnov commented Aug 17, 2018

Hello, @myagley ! Thanks for applying this fix!
Could you please drop short guidance how to apply it?
I get master azure-iot-sdk-csharp and recompile for reference in my module coe, but it still depends on DotNetty 0.4.8.
I guess 0.4.9 should be referenced to apply fix.
Issue is very painful so we can't wait for release and have to resolve this ASAP in our project.

@myagley
Copy link
Contributor

myagley commented Aug 17, 2018

We are working on getting a new C# SDK release out today. I think the only way to address in the meantime is to use the latest DotNetty release and build upgrade the C# SDK and build as part of your project. New C# SDK should be out in a couple of hours.

@myagley
Copy link
Contributor

myagley commented Aug 18, 2018

@MaxKhlupnov Version 1.18.0 of the C# SDK is now out on nuget.org. This contains the new version of DotNetty that includes this fix.

@MaxKhlupnov
Copy link

@myagley thanks for prompt help! Appreciated!

@lucianoctorres
Copy link
Author

Few comments: Original issue uses Python SDK and not C#. And I tested with release 1.0.1 of Edge Hub, problem is still there. Is it because 1.18.0 is not yet integrated in Edge Hub?

@varunpuranik
Copy link
Contributor

I see, I was under the impression you were using the C# SDK. This is most likely a client side issue (not the EdgeHub). We have heard similar issues with the Python SDK, and are looking into it.
@pierreca

@myagley
Copy link
Contributor

myagley commented Aug 22, 2018

@lucianoctorres the MQTT connection issues were purely client side, not in the Edge Hub. Not integrating 1.18.0 into the Edge Hub shouldn't have any effect for this specific issue. Like @varunpuranik mentions we should look at the Python SDK.

@myagley
Copy link
Contributor

myagley commented Sep 24, 2018

Version 1.0.2 of the runtime was just released. In addition to bumping the DotNetty/Device SDK dependencies in the Edge Hub to the latest versions, this release also includes updates to allow operation when disconnected. You can read more about this here:

The changes required for offline operation improve some of the edge cases for when the Edge Hub is able to drain a module's/downstream device's message queue. The Edge Hub can now send data upstream using its credentials, on behalf of the module. This means it always has a valid set up credentials to connect to Edge Hub. This brings more stability.

@baharw
Copy link

baharw commented Nov 21, 2018

I appear to be having that same problem, running on RPi arm32. The module starts fine and then after a few hours disconnects.

Is there a know fix to this issue?

See log below...
2018-11-21 09:48:45.076 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 09:53:45.078 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 09:57:13.606 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
2018-11-21 09:57:14.403 +00:00 [INF] - Cleaned up 122 messages from queue for endpoint iothub and 122 messages from message store.
2018-11-21 09:58:45.075 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:03:45.078 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:08:45.077 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:13:45.067 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:18:45.066 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:23:45.066 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:25:21.207 +00:00 [INF] - Closing connection for device: PreEdge-0001-PLV/PythonModule, ,
2018-11-21 10:25:21.207 +00:00 [INF] - Disposing MessagingServiceClient for device Id PreEdge-0001-PLV/PythonModule because of exception -
2018-11-21 10:25:21.207 +00:00 [INF] - Setting device proxy inactive for device Id PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.208 +00:00 [INF] - Device connection removed for device PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.311 +00:00 [INF] - Remove device connection for device PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.393 +00:00 [INF] - Client PreEdge-0001-PLV/PythonModule in device scope authenticated locally.
2018-11-21 10:25:21.393 +00:00 [INF] - Successfully generated identity for clientId PreEdge-0001-PLV/PythonModule and username bwIoTPlayHub.azure-devices.net/PreEdge-0001-PLV/PythonModule/?api-version=2017-11-08-preview&DeviceClientType=iothubclient%2f1.2.9%20(native%3b%20Linux%3b%20armv7l)
2018-11-21 10:25:21.394 +00:00 [INF] - ClientAuthenticated, PreEdge-0001-PLV/PythonModule, 31c9817a
2018-11-21 10:25:21.396 +00:00 [INF] - New device connection for device PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.397 +00:00 [INF] - Bind device proxy for device PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.397 +00:00 [INF] - Binding message channel for device Id PreEdge-0001-PLV/PythonModule
2018-11-21 10:25:21.399 +00:00 [INF] - Processing subscription TwinResponse for client PreEdge-0001-PLV/PythonModule.
2018-11-21 10:25:21.400 +00:00 [INF] - Processing subscription ModuleMessages for client PreEdge-0001-PLV/PythonModule.
2018-11-21 10:25:21.400 +00:00 [INF] - Processing subscription DesiredPropertyUpdates for client PreEdge-0001-PLV/PythonModule.
2018-11-21 10:25:21.401 +00:00 [INF] - Set subscriptions from session state for PreEdge-0001-PLV/PythonModule
2018-11-21 10:27:44.409 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
2018-11-21 10:27:45.072 +00:00 [INF] - Cleaned up 122 messages from queue for endpoint iothub and 122 messages from message store.
2018-11-21 10:28:45.071 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:33:45.069 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:38:45.069 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:43:45.069 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:48:27.936 +00:00 [INF] - Starting compaction of stores
2018-11-21 10:48:27.936 +00:00 [INF] - Starting compaction of store default
2018-11-21 10:48:27.937 +00:00 [INF] - Starting compaction of store sessions
2018-11-21 10:48:27.985 +00:00 [INF] - Starting compaction of store checkpoints
2018-11-21 10:48:28.021 +00:00 [INF] - Starting compaction of store twins
2018-11-21 10:48:28.044 +00:00 [INF] - Starting compaction of store messages
2018-11-21 10:48:28.081 +00:00 [INF] - Starting compaction of store DeviceScopeCache
2018-11-21 10:48:28.116 +00:00 [INF] - Starting compaction of store iothub
2018-11-21 10:48:43.984 +00:00 [INF] - Starting refresh of device scope identities cache
2018-11-21 10:48:45.075 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:53:45.075 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 10:58:15.075 +00:00 [INF] - Started task to cleanup processed and stale messages for endpoint iothub
2018-11-21 10:58:15.838 +00:00 [INF] - Cleaned up 122 messages from queue for endpoint iothub and 122 messages from message store.
2018-11-21 10:58:45.075 +00:00 [INF] - Reauthenticating connected clients
2018-11-21 11:12:22.789 +00:00 [WRN] - Closing connection for device: PreEdge-0001-PLV/PythonModule, Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Keep Alive timed out.,
2018-11-21 11:12:22.789 +00:00 [INF] - Disposing MessagingServiceClient for device Id PreEdge-0001-PLV/PythonModule because of exception - Microsoft.Azure.Devices.ProtocolGateway.ProtocolGatewayException: Keep Alive timed out.
2018-11-21 11:12:22.789 +00:00 [INF] - Setting device proxy inactive for device Id PreEdge-0001-PLV/PythonModule
2018-11-21 11:12:22.789 +00:00 [INF] - Device connection removed for device PreEdge-0001-PLV/PythonModule
2018-11-21 11:12:22.791 +00:00 [INF] - Remove device connection for device PreEdge-0001-PLV/PythonModule
pi@raspberrypi:~ $

@baharw
Copy link

baharw commented Nov 21, 2018

And one more thing, the custom module fails with the following message...

Sending message: {'temperature': 33.46,'humidity': 75.37, 'Timestamp': November 21, 2018 - 01:35:10PM}
Sending message: {'temperature': 30.51,'humidity': 70.45, 'Timestamp': November 21, 2018 - 01:35:25PM}
Error: Time:Wed Nov 21 13:35:31 2018 File:/usr/sdk/src/c/iothub_client/src/iothubtransport_mqtt_common.c Func:InitializeConnection Line:2195 mqtt_client timed out waiting for CONNACK

@baharw
Copy link

baharw commented Nov 28, 2018

Any updates?

@Venug1234
Copy link

I see, I was under the impression you were using the C# SDK. This is most likely a client side issue (not the EdgeHub). We have heard similar issues with the Python SDK, and are looking into it.
@pierreca

Is this resolved in azure C SDK. Because the latest C sdk still giving me error like " AMQP doesn't support input queues" error.

@myagley
Copy link
Contributor

myagley commented Feb 14, 2019

The ARM32 issue mentioned a couple of comments above was fixed in 1.0.6 of the runtime. This was caused by a deadlock in the dotnet core runtime on ARM32 and would affect any client library.

@gauravagarwal28
Copy link

Hey, I am facing this issue too.
I have IoT Edge 1.0.6.x runtime on my machine with AMD64 Ubuntu 16.04

gaurav@gaurav-VirtualBox:~$ iotedge --version
iotedge 1.0.6.1 (3fa6cbef8b7fc3c55a49a622735eb1021b8a5963)

All communications between the device and cloud (both C2D and D2C) halt after some hours.

@gauravagarwal28
Copy link

Everything starts working again normally when I restart the iotedge service:

sudo systemctl restart iotedge

@tomagb
Copy link

tomagb commented Aug 29, 2019

I still see this behavior on both development IoT Edges as well in production. Is there anything new regarding it?
I'm running 1.0.8 on Ubuntu Server 18.04.

@github-actions
Copy link

github-actions bot commented Nov 6, 2019

This issue is being marked as stale because it has been open for 30 days with no activity.

@lt72 lt72 added the bug Something isn't working label Nov 19, 2019
@lt72 lt72 added the area:connectivity Anything related to connectivity, networking, and related. label Nov 19, 2019
@lt72
Copy link

lt72 commented Nov 19, 2019

@varunpuranik was this fixed for 1.0.8?

@varunpuranik
Copy link
Contributor

This should be fixed with 1.0.8.4. Please re-open the issue if you are still the issue with 1.0.8.4 and provide the complete EdgeHub debug logs.

@alaendle
Copy link

alaendle commented Mar 6, 2020

O.k. but when will an update hit https://packages.microsoft.com/ubuntu/16.04/multiarch/prod? As far as I can see this release is still 1.0.8(.0)?
And should the bug still be present in iotedge 1.0.9~rc5 using agent/hub images 1.0.9-rc6?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:connectivity Anything related to connectivity, networking, and related. bug Something isn't working customer-reported iotedge no-issue-activity
Projects
None yet
Development

No branches or pull requests