-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After System.TimeoutException from DeviceClient.SendEventAsync no instance of DeviceClient can connect to Iot Hub, even when the cause for the OperationTimeout is not longer present. #613
Comments
We did further testing, including going back to Microsoft.Azure.Devices.Client 1.7.2 (from June 2018). With this version (1.7.2) the expected behaviour occurs. Does anybody else have connectivity issues with the current version (1.18.0)? |
@dleinius thanks for reporting. Could you please collect and share logs: https://github.com/Azure/azure-iot-sdk-csharp/tree/master/tools/CaptureLogs ? |
We are aware of certain Receive message recovery issues: #571. Part of the investigation, we've tried to inject faults to DeviceClient AMQP while sending and were not able to repro this issue. |
I captured the logs, but the ETL file seems to be "empty" (8KB). Opening it with the event log viewer of windows shows no entries. Nevertheless, where can I upload the log? Via a support request within the azure portal? How can I make sure, that the logs are delivered to you? And @CIPop thanks for your help! |
Able to locally reproduce the issue on 1.18.0. Same symptoms as #620 1.) Issue specific to AMQP. Not able to reproduce using MQTT 2.) Use IoT Hub connection string for local reproduction DeviceClient deviceClient = DeviceClient.CreateFromConnectionString(DeviceConnectionString, "MyDevice", TransportType.Amqp_Tcp_Only) 3.) After timeout no attempt being made by SDK to try on a brand new connection. Hence all subsequent attemps fail event though underlying network connectivity is restored. 4.) No reproduction using older version 1.6.0 |
The current LTS version supports the behavior of creating device client and sending telemetry using the IoTHub connection string; so it can be used to unblock this issue. That being said, we highly encourage you to use the device connection string for creating the device client. |
Thanks for your help. Do I get you right, that the issue only exists when the DeviceClient is created by the IotHub Connectionstring and it will not come up with DeviceConnectionString? |
@dleinius Yes, on connecting with DeviceConnectionString, the client will successfully recover the connection and will be able to continue sending messages. I tried the steps using DeviceConnectionString for creating the Device Client, and was not able to reproduce the issue. |
@abhipsaMisra |
@dleinius, @CIPop, @abhipsaMisra, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey |
Microsoft.Azure.Devices.Shared 1.15.1,
Microsoft.Azure.Amqp 2.3.3
Description of the issue:
After System.TimeoutException ("Operation timeout expired.") from DeviceClient.SendEventAsync no instance of DeviceClient within the AppDomain can ever connect or SendEvents to Iot Hub, even when the cause for the OperationTimeout is not longer present. Reset of AppDomain (restart of program) solves issue.
We are building a Protocol Gateway for native devices. The native devices are using not supported protocols and the Protocol Gateway builds "the bridge" towards the Iot Hub. The Protocol Gateway initiates an instance of DeviceClient foreach connected native device:
Native Devices -> (own implemented) Protocol Gateway -> DeviceClient -> IotHub
At some point (after minutes, hours or days) we experiences a TimeoutException ("Operation timeout expired.") when sending a message via DeviceClient.SendEventAsync. The TimeoutException itself is fine for us, because in this cloud scenario you have to expectect that at some time the network might be down for seconds or whatever. Our Issue is that after this point, even when the cause for the TimeoutException no longer exists, the DeviceClient will NEVER EVER sucessfully connects or sends messages to the IotHub. An reset of the AppDomain (restart of program) immediately solves the issue.
The following code sample demonstrates the issue in a console app:
The program will simulate 2 devices, one device (1234567900000) sending messages to IotHub every 5 seconds, the other device (1234567900001) every 45 seconds. To simulate a TimeoutException use a tool like Clumsy to drop AMQP messages on Port (5671) - or if you are sitting in front of your machine simply disconnect the network.
Repro Steps:
Received Result:
All further messages of both(!!!) devices will received a TimeoutException. - forever
Expected Result:
After disabling of clumsy (and therefore getting a working network) all "new" messages should be sucessfully send to IotHub
Therefore, why are all instances of DeviceClient are getting a TimeoutException - even when the cause of the Timeout no longer exists? Is there a way to reset the InternalClient of DeviceClient?
We tested it with Amqp and Mqtt protocol, each withTcp_Only and Amqp_WebSocket_Only.
Please help, this behaviour seems to be a show stopper for us. :-(
Code sample exhibiting the issue:
Timeout.zip
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.Devices.Client;
namespace ConsoleApp4
{
class Program
{
static void Main(string[] args)
{
RunnerTask().Wait();
}
}
Console log of the issue:
Trying for 1234567900000
Send 05.09.2018 16:22:25 for 1234567900000
Trying for 1234567900000
Send 05.09.2018 16:22:30 for 1234567900000
Trying for 1234567900001
Send 05.09.2018 16:22:35 for 1234567900001
Trying for 1234567900000
Send 05.09.2018 16:22:35 for 1234567900000
[+++ PLING NETWORK OFF +++]
Trying for 1234567900000
Timeout 1234567900000: Operation timeout expired.
[+++ PLING NETWORK ON +++]
Trying for 1234567900000
Timeout 1234567900000: Operation timeout expired.
Trying for 1234567900000
Timeout 1234567900000: Operation timeout expired.
Trying for 1234567900000
Trying for 1234567900001
Timeout 1234567900001: Operation timeout expired.
Timeout 1234567900000: Operation timeout expired.
Trying for 1234567900000
Timeout 1234567900000: Operation timeout expired.
Trying for 1234567900000
Timeout 1234567900000: Operation timeout expired.
The text was updated successfully, but these errors were encountered: