Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on Edge Hub and messages get stuck while sending messages upstream. Failure is happening in EdgeHub #715

Closed
pratson20 opened this issue Jan 15, 2019 · 9 comments

Comments

@pratson20
Copy link

pratson20 commented Jan 15, 2019

Hi, I am facing issue with EdgeHub, data sending to IoT hub get stuck after 7-10 days.
My downstream device is reading data from RS485 and sending to Edge Hub using transparent gateway. Edge module is sending data to IoT hub. I have added offline features of edge as well.

This is completely blocker for me at production. EdgeHub is not getting restart as well after getting stuck. Can you please help to find alternate solution.

Expected Behavior

Configured following things-
DPS- TPM on Aaeon Gateway
Configured Offline Capability for Message storage- 24 hours
Gateway Configuration- 4GB RAM and 32 GB SSD

Module should keep running and get restarted in case of any failure. But this not happening.

Fixing this issue with following steps if EdgeHub get disconnect from IoT Hub.
Solution- Manually delete all docker images- "sudo docker rm -f edgeHub edgeAgent mymodule". After executing this command. Modules started running perfectly fine.

Current Behavior

Facing issue while running modules on Edge runtime. Device is running perfectly fine for 7-10 days but after that message got stuck and seeing some exception in EdgeHub. EdgeHub connection to IoT hub is getting failed. Please find attached log for you reference. I have seen this issue on two different gateway and attaching both logs.

Device (Host) Operating System

<Ubuntu 16.04>
failed_logs_edgeAgent_10_days_08_01_2019.txt
failed_logs_edgeAgent_05012019.txt
failed_logs_edgeHub_10_days_08_01_2019.txt
failed_logs_edgeHub_05012019.txt

Architecture

Container Operating System

Runtime Versions

iotedged

<Run iotedge version 1.0.5>

Edge Agent

< Image tag (i.e. 1.0.5) >

Edge Hub

< Image tag (i.e. 1.0.5) >

Docker

< Run docker version 3.0.2>

Logs

Additional Information

@pratson20 pratson20 reopened this Jan 15, 2019
@pratson20 pratson20 changed the title Failure on Edge Hub and messages get stuck while sending messages upstream. Failure on Edge Hub and messages get stuck while sending messages upstream. Failure is happening in EdgeHub Jan 15, 2019
@avranju
Copy link
Contributor

avranju commented Jan 17, 2019

Looking at the logs it appears the device lost connectivity to IoT Hub. Are you saying the device had connectivity but somehow Edge Hub was unable to connect still?

@pratson20
Copy link
Author

Device had connectivity . I am running so many devices for testing. All are configured with MQTT protocol instead of AMQP. Attached log is for two different devices.
I have added Offline feature as well, if device will be disconnected also. It should re-establish the connection.

@avranju
Copy link
Contributor

avranju commented Jan 18, 2019

When you say "messages get stuck" what exactly do you mean. From the logs I see 2 kinds of issues - one is timeouts when Edge Hub connects to IoT Hub but it does eventually recover and connect successfully -- I am thinking this is some kind of transient connectivity issue. The other problem is that Edge Hub seems to be trying to route a message to a module and did not receive an acknowledgement from the module. Subsequently an exception seems to have caused Edge Hub to close the connection to the module but the module does not seem to detect this and recover.

We are trying to repro this issue in our long haul tests. In the meantime, could you check what version of the Device SDK you are using from your module? If its not the latest version (v1.19.0) then you might want to try upgrading to that version and see if that helps. Will keep this thread posted on our findings.

@pratson20
Copy link
Author

I am using transparent Gw solution-

  • Using Python Sdk to talk to Edge Hub. Here, i am reading data from Unit which is talking on RS485 protocol. This is acting as child device. and passing IoT device connection string with GatewayHostName. Here is my reference solution on-
    https://github.com/AzureIoTGBB/azure-iot-edge-hol-linux

  • Child device is talking to EdgeHub and passing data to my module which is transferring to IoT hub. I have put all logic on module.

  • Sending data to IoT hub is getting stopped after a 7-10 days. Message get stuck mean- I am not seeing any data to my module from EdgeHub.

-i have dedicated Wifi network for Gw. So there is no chance. it will not reach to module because i have added offline feature capability. In that case if i switched off WiFi Network. Data from EdgeHub to module was reaching properly and storing data upto 24 hours.

I am using below version of Device Client version- Microsoft.Azure.Devices.Client/1.18.1
and using Mqtt protocol
I can try updating it to 1.19.0 also. and will see the same issue to reproduce.

@pratson20
Copy link
Author

There are two thread are running which points the same issue if i am correct-

As per comment this is looking like device sdk issue?

Facing issue like-
2019-01-08 11:41:04.104 +00:00 [WRN] - Error sending messages to module HPEdgeTestwoDPS/parsermodule
System.TimeoutException: Message completion response not received
at Microsoft.Azure.Devices.Edge.Hub.Core.Device.DeviceMessageHandler.SendMessageAsync(IMessage message, String input) in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/device/DeviceMessageHandler.cs:line 266
at Microsoft.Azure.Devices.Edge.Hub.Core.Routing.ModuleEndpoint.ModuleMessageProcessor.<>c__DisplayClass5_2.<b__0>d.MoveNext() in /home/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.Core/routing/ModuleEndpoint.cs:line 99
2019-01-08 11:41:38.665 +00:00 [INF] - Reauthenticating connected clients
2019-01-08 11:44:35.327 +00:00 [INF] - Closing connection for device: HPEdgeTestwoDPS/parsermodule,

@avranju
Copy link
Contributor

avranju commented Jan 22, 2019

Yes, we suspect that this is a Device SDK issue which might get resolved by upgrading to 1.19.0.

@pratson20
Copy link
Author

I have upgraded device sdk to 1.19.0 on few Gw devices. I will update if same issue still persist.

@pratson20
Copy link
Author

Didn't see this issue after updating sdk to 1.19.0. Issue seems to be resolved with device sdk update.

@myagley
Copy link
Contributor

myagley commented Feb 6, 2019

I'm going to close this issue for now as it seems to be resolved with the new SDK. Please feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants