Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating reported properties failed with error The operation timed out. type System.TimeoutException #5106

Closed
jainricha opened this issue Jun 14, 2021 · 5 comments

Comments

@jainricha
Copy link

jainricha commented Jun 14, 2021

I deploy around 20 IoT Edge devices using automation script ARM/azure cli commands. These edge devices are nothing but VMs. I provision the edge Devices by specifying cloud-init config in VM ARM template. However out of 20, around 3-4 devices somehow have edge Agent in N/A state. All the devices does not show the same problem, a few among them does.

The IoT edge runtime is installed using cloud-init script, I do not assign any custom modules yet as I first expect all the devices to be in correct state which is : 417 -- The device's deployment configuration is not set but some of them as stated are in N/A state.

I checked the logs on the respective VMs, iotedge list shows edgeAgent is in running state. But I do see some errors in the edgeAgent logs.


Updating reported properties failed with error The operation timed out. type System.TimeoutException
Updating reported properties failed with error The operation timed out. type System.TimeoutException
Updating reported properties failed with error The operation timed out. type System.TimeoutException

Another thing I notice is, on the portal when I navigate to an edgeDevice whose status is N/A, and click on Troubleshoot->Restart Edge Agent , the edgeAgent comes to correct state. But I cannot do this manually for all offline devices. Can somebody help me with the root cause, as to why this is happening in few devices and few devices are all fine.

Edge runtime version I used is : 1.2.0

Expected Behavior

All the devices should be in same state i.e. 417 -- The device's deployment configuration is not set
Since all the deployed using the same script. same configuration

Steps to Reproduce

  1. Deploy around 20 IoT Edge devices using azure cli command
  2. Deploy 20 VMs using ARM template with cloud-init sccipt to provision the devices deployed in step 1
  3. Out of 20 devices 3-4 devices are shown in N/A state
  4. Deploying 10 devices shows at least 2 devices in N/A state.

edgedevices

Logs:

edgeAgentErr

Additional Information

White I create the IoT Edge devices using Azure CLI command , sometime I see throttle issue, which I am handling by retrying after 15 seconds or so.
Secondly , if I do Troubleshoot->Restart Edge Agent , the device then comes to correct state. But since I need to deploy around say 50 devices, I cannot manually troubleshoot on the devices which has this problem.

@darobs
Copy link
Contributor

darobs commented Jun 14, 2021

Hi @jainricha,

This is an issue we'd like to investigate, but it looks like an issue where we will need to access service logs. To do that, we'd need PII that shouldn't be put on GitHub. Would you please create a support ticket for this? This is probably the best way to get information regarding the service.

@jainricha
Copy link
Author

Create a support ticket. Case 2106150060003898.

Thanks.

@darobs
Copy link
Contributor

darobs commented Jun 16, 2021

Thank you! I'm going to close this issue, but if we have any useful insights that can be shared publicly, I can add it here.

@darobs darobs closed this as completed Jun 16, 2021
@darobs
Copy link
Contributor

darobs commented Jun 23, 2021

Hi @jainricha

We have looked at the service logs, and the service is being throttled. The specific metric we ran into throttling with was "Module D2C Patch ReportedProperties" - these types of messages include the edge devices reporting back to the service its current state. This makes me wonder if the edge runtime is making too many updates too quickly on startup. We have competing requirements: keep the service informed of the edge status vs. keeping customer costs low. At this time, we probably favor the former, so I started a PBI to investigate this.

It could also be the modules you are using that are also updating their reported properties. The queries I used didn't have that level of granularity, something to be aware of - the edge runtime may not be the only thing updating its state.

@jainricha
Copy link
Author

jainricha commented Jun 24, 2021

Hi @darobs,

All that we do on VMs acting as IoT Edge device, is install IoT edge runtime and configure connection string in runtime config, we do not send any additional info which could cause too many updates for a single device.

For the scenario I mentioned, The issue occurs if I have not yet deployed my custom modules, so neither of my modules are yet deployed on IoT Edge device which , hence custom modules are also not updating any properties as well.

About the throttling thing, I do see throttling while I create IoT Edge devices using my script. These devices are created one by one through this script. Script runs the following command in a loop with iterations equal to number of IoT Edge device I want to create. So the iterations can be 20, 50, 100 upto 300.

So following commands are running in the loop:

#To create IoT Edge device:
az iot hub device-identity create --device-id $nameOfDevice --edge-enabled --hub-name $hub

#Then fetch connection string for this device:
connectionString=$(az iot hub device-identity connection-string show --device-id $1 --hub-name $2 | jq '.connectionString')

Here , in some iterations , I get throttle exception for CRUD operation, which I am currently handling by retrying after 15 seconds or so.

Let me know if this info is helpful to check this further and provide some resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants