Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sudo iotedge list/or any command hangs #69

Closed
ilyas-it83 opened this issue Jul 25, 2018 · 10 comments
Closed

sudo iotedge list/or any command hangs #69

ilyas-it83 opened this issue Jul 25, 2018 · 10 comments

Comments

@ilyas-it83
Copy link

ilyas-it83 commented Jul 25, 2018

Current Behavior

I hit a weird issue that the iotedge started hanging or stopped working, however when i issue iotedge its responds with the helper message and commands, but when i issue sudo iotedge list, it hangs.

Steps to Reproduce

Followed the below article and setup iotedge on linux (cjheck the version below), it stopped working after doing sudo apt-get update & upgrade, the hanging behavior started.

https://docs.microsoft.com/en-us/azure/iot-edge/how-to-install-iot-edge-linux

Context (Environment)

IoTEdge Simulation on DSVM (Ubuntu)

Device (Host) Operating System

Linux version 4.15.0-1018-azure (buildd@lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu116.04.10)) #1816.04.1-Ubuntu SMP Thu Jul 19 13:13:57 UTC 2018

Architecture

amd64

Container Operating System

Linux Containers

Runtime Versions

iotedged

iotedge 1.0.0 (52ef77d)

Edge Agent

mcr.microsoft.com/azureiotedge-agent:1.0

Edge Hub

mcr.microsoft.com/azureiotedge-hub:1.0

Docker

Docker version 18.06.0-dev, build daf021fe

Logs

edgeHub.Log
https://gist.github.com/ilyas-it83/e0cd3c94e6ac3e165ea91efb6e8a94d7/raw/34c82fc65e6ed6ab58b9aa9c693f79bf1030ae01/edgeHub.log

edgeAgent.log
https://gist.github.com/ilyas-it83/f2fcc478471a7fe4cd1ba1507bd80c46

Additional Information

All i'm trying is to setup is filtermodule as per https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-csharp-module, but could not get it work.

user@user-dsvm:~$ iotedge
iotedge 1.0.0 (52ef77d)
The iotedge tool is used to manage the IoT Edge runtime.

USAGE:
iotedge [OPTIONS]

FLAGS:
-h, --help Prints help information
-V, --version Prints version information

OPTIONS:
-H, --host Daemon socket to connect to [env: IOTEDGE_HOST=] [default: unix:///var/run/iotedge/mgmt.sock]

SUBCOMMANDS:
help Prints this message or the help of the given subcommand(s)
list List modules
logs Fetch the logs of a module
restart Restart a module
version Show the version information
user@user-dsvm:$ iotedge list
An error in the management http client occurred.
caused by: Hyper error
caused by: Permission denied (os error 13)
user@user-dsvm:
$ sudo iotedge list
^C
user@user-dsvm:~$ sudo iotedge -v
error: Found argument '-v' which wasn't expected, or isn't valid in this context

USAGE:
iotedge [OPTIONS]

For more information try --help
user@user-dsvm:$ sudo iotedge --version
iotedge 1.0.0 (52ef77d)
user@user-dsvm:
$ sudo systemctl status iotedge
● iotedge.service - Azure IoT Edge daemon
Loaded: loaded (/lib/systemd/system/iotedge.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/iotedge.service.d
└─override.conf
Active: active (running) since Wed 2018-07-25 08:37:07 UTC; 1h 6min ago
Docs: man:iotedged(8)
Main PID: 23934 (iotedged)
Tasks: 6
Memory: 10.1M
CPU: 1.702s
CGroup: /system.slice/iotedge.service
└─23934 /usr/bin/iotedged -c /etc/iotedge/config.yaml

Jul 25 09:43:46 user-dsvm iotedged[23934]: 2018-07-25T09:43:46Z [DBUG] - [edgelet_http] accepted new connection (172.17.0.2:37255)
Jul 25 09:43:46 user-dsvm iotedged[23934]: 2018-07-25T09:43:46Z [DBUG] - [edgelet_http] accepted new connection (172.17.0.2:38563)
Jul 25 09:43:46 user-dsvm iotedged[23934]: 2018-07-25T09:43:46Z [DBUG] - [edgelet_http_mgmt::server::module::list] List modules
Jul 25 09:43:46 user-dsvm iotedged[23934]: 2018-07-25T09:43:46Z [INFO] - [mgmt] - - - [2018-07-25 09:43:46.842664763 UTC] "GET /modules?api-version=2018-06-28 HTTP/1.1" 200 OK 2124 "-" "-" pid(any)
Jul 25 09:43:46 user-dsvm iotedged[23934]: 2018-07-25T09:43:46Z [INFO] - [work] - - - [2018-07-25 09:43:46.951141795 UTC] "POST /modules/%24edgeAgent/genid/636675104799092643/encrypt?api-version=2018-06-28 HTT
Jul 25 09:43:52 user-dsvm iotedged[23934]: 2018-07-25T09:43:52Z [DBUG] - [edgelet_http] accepted new connection (172.17.0.2:34971)
Jul 25 09:43:52 user-dsvm iotedged[23934]: 2018-07-25T09:43:52Z [DBUG] - [edgelet_http] accepted new connection (172.17.0.2:36355)
Jul 25 09:43:52 user-dsvm iotedged[23934]: 2018-07-25T09:43:52Z [DBUG] - [edgelet_http_mgmt::server::module::list] List modules
Jul 25 09:43:52 user-dsvm iotedged[23934]: 2018-07-25T09:43:52Z [INFO] - [work] - - - [2018-07-25 09:43:52.661312462 UTC] "POST /modules/%24edgeAgent/genid/636675104799092643/encrypt?api-version=2018-06-28 HTT
Jul 25 09:43:52 user-dsvm iotedged[23934]: 2018-07-25T09:43:52Z [INFO] - [mgmt] - - - [2018-07-25 09:43:52.769567392 UTC] "GET /modules?api-version=2018-06-28 HTTP/1.1" 200 OK 2124 "-" "-" pid(any)

@dsajanice
Copy link
Member

Hi @ilyas-it83 if you see the hang again, could you please provide iotedged logs (journalctl -u iotedge.service -b --since "1 hour ago"). That will give us more information about the root cause of the hang.

Regarding the https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-csharp-module tutorial, I walked through it yesterday. Couple of things that need changes to the script:

  1. In the custom module, please use AMQP instead of MQTT
    //MqttTransportSettings mqttSetting = new MqttTransportSettings(TransportType.Mqtt_Tcp_Only); //ITransportSettings[] settings = { mqttSetting };

AmqpTransportSettings amqpSetting = new AmqpTransportSettings(TransportType.Amqp_Tcp_Only); ITransportSettings[] settings = {amqpSetting};

  1. Please modify the routes in deployment.template.json to take CSharpModule and not SampleModule. I filed a doc issue about this yesterday The 'Build your IoT Edge solution' section is missing a step MicrosoftDocs/azure-docs#12326 and will work to get it fixed.

@dsajanice
Copy link
Member

Unfortunately, I have not been able to reproduce the hang. The iotedged logs from your setup would help debug this further.

@ilyas-it83
Copy link
Author

Here you go!

The logs are in the below gist, hope this helps you to figure out the issue or reproduce the issue

https://gist.github.com/ilyas-it83/2c9057063e79b638a74af8bd43fdcfb3

i have replaced MQTT with AMQP, will capture the learning shortly.

@ilyas-it83
Copy link
Author

updating mqtt with AMQP didnt really help and i'm still facing issues, but trying to get the error message out. Meanwhile i found some new errors coming out of edge logs, find the gist below

https://gist.github.com/ilyas-it83/708378797cc57a582040ca64b609b6c9

@dsajanice
Copy link
Member

Hi @ilyas-it83 the second gist that you linked to has the following error in it:
Jul 26 05:47:14 ilyas-dsvm iotedged[23934]: 2018-07-26T05:47:14Z [ERR!] - Internal server error: Module runtime error
Jul 26 05:47:14 ilyas-dsvm iotedged[23934]: caused by: Container runtime error
Jul 26 05:47:14 ilyas-dsvm iotedged[23934]: caused by: Serde error
Jul 26 05:47:14 ilyas-dsvm iotedged[23934]: caused by: invalid type: sequence, expected a string at line 1 column 1159

We saw this issue when we were exec'ed in to an edge container and then issued an 'iotedge list' command. I verified that this would cause the list command to throw an error and not hang. The issue was fixed here: 31468a1 and will be in the next release.

@dsajanice
Copy link
Member

In the first gist (https://gist.github.com/ilyas-it83/2c9057063e79b638a74af8bd43fdcfb3) iotedged is finding that the edgeAgent container is running. Two modules, machinelearningmodule and filtermodule are in the deployment. However the machinelearningmodule is repeatedly calling the sign operation, which is odd. Is the machinelearningmodule crashing (docker logs machinelearningmodule)?

@ilyas-it83
Copy link
Author

Looks like yes, i guess the ML sample app also have issues which i need to figure out, but even if there issues inside my module why does it cause repeated calling of sign operations?

Anyway the logs are as below:

Unhandled Exception: System.AggregateException: One or more errors occurred. (CONNECT failed: RefusedNotAuthorized) ---> Microsoft.Azure.Devices.Client.Exceptions.UnauthorizedException: CONNECT failed: RefusedNotAuthorized
at Microsoft.Azure.Devices.Client.InternalClient.<>c.b__62_2(Task t)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at AzureMLIoTServer.Program.d__12.MoveNext() in /opt/vsts/work/1/s/ICE/Product/ice_backend/base_image/iot-server/src/Program.cs:line 86
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at AzureMLIoTServer.Program.Main(String[] args) in /opt/vsts/work/1/s/ICE/Product/ice_backend/base_image/iot-server/src/Program.cs:line 54
2018-07-26 18:19:44,192 INFO exited: iot (terminated by SIGABRT (core dumped); not expected)
2018-07-26 18:19:45,199 INFO spawned: 'iot' with pid 31573
EdgeHubConnectionString is set: False, IOTEDGE_IOTHUBHOSTNAME is set: True
2018-07-26 18:19:45,850 INFO success: iot entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

Unhandled Exception: System.AggregateException: One or more errors occurred. (CONNECT failed: RefusedNotAuthorized) ---> Microsoft.Azure.Devices.Client.Exceptions.UnauthorizedException: CONNECT failed: RefusedNotAuthorized
at Microsoft.Azure.Devices.Client.InternalClient.<>c.b__62_2(Task t)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at AzureMLIoTServer.Program.d__12.MoveNext() in /opt/vsts/work/1/s/ICE/Product/ice_backend/base_image/iot-server/src/Program.cs:line 86
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at AzureMLIoTServer.Program.Main(String[] args) in /opt/vsts/work/1/s/ICE/Product/ice_backend/base_image/iot-server/src/Program.cs:line 54
2018-07-26 18:20:59,862 INFO exited: iot (terminated by SIGABRT (core dumped); not expected)
2018-07-26 18:21:00,867 INFO spawned: 'iot' with pid 31604
EdgeHubConnectionString is set: False, IOTEDGE_IOTHUBHOSTNAME is set: True
2018-07-26 18:21:01,572 INFO success: iot entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)

@dsajanice
Copy link
Member

Hi @ilyas-it83 the machinelearningmodule is crashing and on each restart is calling the sign operation for authentication purposes. The workaround for the machine learning module crashing is documented here: https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-deploy-machine-learning#disable-process-identification

@dsajanice
Copy link
Member

Hi @ilyas-it83 if the original hang is not an issue anymore I would like to close this issue. If the machine learning module is still crashing after the workaround posted above, we can track it in a separate issue.

@mysaggar
Copy link

Hi @dsajanice I am facing the same issue i.e. the iotedge commands other than the help ones aren't functioning

My system is:
iotedge version : 1.0.6.1 (3fa6cbe)
Operating System: Ubuntu 18.04.2 LTS
Kernel: Linux 4.18.0-1013-azure

The iot edge commands were running fine before I Set The deployment following the tutorial : https://docs.microsoft.com/en-us/azure/iot-edge/tutorial-csharp-module

I faced this issue earlier and rectified it with clean installation fearing any wrong procedure followed by me.
Please guide about generating logs and pointing the issue.
Also if there was any workaround available other than reinstalling the iotedge on the vm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants