Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceClient/ModuleClient .OpenAsync Timeouts/Delays with .NET Core 2.1 #521

Closed
jason-e-gross opened this issue Jun 18, 2018 · 14 comments
Closed
Assignees
Labels
bug Something isn't working.

Comments

@jason-e-gross
Copy link

jason-e-gross commented Jun 18, 2018

  • OS, version, SKU and CPU architecture used: Windows 10 64 bit, Linux Ubuntu 16.04 (the delay appears in both).
  • Application's .NET Target Framework : netcoreapp21
  • Device: AMD64 machines

Description of the issue:

When updating the IOT Edge project to use .NET Core 2.1, it introduces delays and timeouts in devices or modules at .OpenAsyc. About 30% timeouts to crashing (which edgeAgent restarts the module) to 70% delays of over 20-30 seconds or more.

Code sample exhibiting the issue:

<TargetFramework>netcoreapp2.1</TargetFramework>
<PackageReference Include="Microsoft.AspNetCore.App" Version="2.1.0" />

It shows in either:

<PackageReference Include="Microsoft.Azure.Devices.Client" Version="1.17.0-preview-007" />

or

<PackageReference Include="Microsoft.Azure.Devices.Client" Version="1.6.0-preview-001" />

If we downgrade back to .NET Core App 2.0, the timeout/delay goes away, and .OpenAsync is as quick as you would expect it to be. It exhibits itself in three modules we tried updating - which range from nothing more special than the FilterModule example to one that uses Kestrel/SignalR (port 8012).

Updating to the 1.17.0-preview-007 has made it less likely to timeout and crash the module, but it still takes over 20 seconds. An example of a timeout is below (4 mins passed before it crashed). Includes a .ContinueWith to capture the stack trace.

06/15/2018 14:02:22 | INIT: Opening ModuleClient
06/15/2018 14:06:23 | INIT: ModuleClient Initialized, CW: {"Id":1,"Exception":{"ClassName":"System.AggregateException","Message":"One or more errors occurred.","Data":null,"InnerException":{"ClassName":"System.TimeoutException","Message":"Operation timeout expired.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeout>b__62_2(Task t)\n   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n--- End of stack trace from previous location where exception was thrown ---\n   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233083,"Source":"Microsoft.Azure.Devices.Client","WatsonBuckets":null},"HelpURL":null,"StackTraceString":null,"RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233088,"Source":null,"WatsonBuckets":null,"InnerExceptions":[{"ClassName":"System.TimeoutException","Message":"Operation timeout expired.","Data":null,"InnerException":null,"HelpURL":null,"StackTraceString":"   at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeout>b__62_2(Task t)\n   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n--- End of stack trace from previous location where exception was thrown ---\n   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)","RemoteStackTraceString":null,"RemoteStackIndex":0,"ExceptionMethod":null,"HResult":-2146233083,"Source":"Microsoft.Azure.Devices.Client","WatsonBuckets":null}]},"Status":7,"IsCanceled":false,"IsCompleted":true,"IsCompletedSuccessfully":false,"CreationOptions":0,"AsyncState":null,"IsFaulted":true}
06/15/2018 14:06:23 | INIT: DeviceClient StateChange Handler Enabled

Startup.cs

        public void ConfigureServices(IServiceCollection services)  {
            services.AddMvc();
            services.AddCors();
            services.AddSignalR();
        }

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IHostingEnvironment env) {
            if (env.IsDevelopment())    app.UseDeveloperExceptionPage();

            app.UseFileServer();
            app.UseStaticFiles(
                new StaticFileOptions    {
                    FileProvider = new PhysicalFileProvider(Path.Combine(Directory.GetCurrentDirectory(), "resources")),    //physical
                    RequestPath = "/resources"                                                                              //pseudo
                }
            );
            app.UseCookiePolicy();
            app.UseCors(builder =>
                builder
                    .AllowAnyHeader()
                    .AllowAnyMethod()
                    .AllowAnyOrigin()
            );
            app.UseSignalR(routes => {routes.MapHub<UserInterfaceHub>("/uihub");});
            app.UseMvc();
        }

relevant Program.cs

            moduleClient = ModuleClient.CreateFromConnectionString(connectionString, settings);
			await moduleClient.OpenAsync().ContinueWith(_result =>
                Console.WriteLine($"{DateTime.Now.ToString()} | INIT: ModuleClient Initialized, CW: {JsonConvert.SerializeObject(_result)}")
             );
            moduleClient.SetConnectionStatusChangesHandler(onConnectionChange);
@CIPop CIPop added area-edge Issues related to IoT Edge. bug Something isn't working. labels Jun 28, 2018
@CIPop
Copy link
Member

CIPop commented Jun 28, 2018

@jason-e-gross , we are aware of the issue.
Could you give this a try with yesterday's GA release (1.17.0) please? It could be related to #352 or #495 which we fixed only in GA bits.

@CIPop
Copy link
Member

CIPop commented Jun 28, 2018

/cc @varunpuranik

@jason-e-gross
Copy link
Author

jason-e-gross commented Jul 2, 2018

@CIPop -- I've been able to reduce it down to:

  1. It happens with MQTT on the old runtime, if module project is .NET Core 2.1
  2. It happens with MQTT on the new GA runtime, if module project is .NET Core 2.0 and 2.1

If I switch to AMQP transport settings, there's zero delay in .OpenAsync (2.0 and 2.1), works as expected.

@CIPop
Copy link
Member

CIPop commented Jul 17, 2018

Thanks @jason-e-gross for the new info! I've moved all our testing to netcoreapp2.1 and didn't see any timing issues yet - this is strictly related to the network configuration in edge / Docker. I'm continuing the investigation.

@myagley
Copy link
Contributor

myagley commented Aug 10, 2018

We believe this issue has been fixed in DotNetty here: Azure/DotNetty#413

We need to cut a new release of DotNetty and update the SDK with this new version.

@myagley
Copy link
Contributor

myagley commented Aug 18, 2018

Version 1.18.0 of the C# SDK has been released to nuget.org. Please give it a try.

@isimanovich
Copy link

Any update on when new version of DotNetty will be available?

@varunpuranik
Copy link
Contributor

@isimanovich - the new version 1.18.0 of the C# SDK has already been released. Can you check if if you still run into this issue?
@jason-e-gross - can you also try C# SDK version 1.18.0? It has the fix for this issue. If it works for you, can you please close this issue?

@isimanovich
Copy link

isimanovich commented Oct 1, 2018

@varunpuranik I was referring to DotNetty package that @myagley mentioned on Aug10 ^^^. However, I get following error message after exactly 1 hr and 5 minutes of DeviceClient being used to send messages to IoTHub:

Exception with message: I/O error occurred., Exception : DotNetty.Transport.Channels.ClosedChannelException: I/O error occurred. at Microsoft.Azure.Devices.Client.InternalClient.<>c.<ApplyTimeoutMessage>b__63_2(Task'1 t) at System.Threading.Tasks.ContinuationResultTaskFromResultTask'2.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult()`

@varunpuranik
Copy link
Contributor

@isimanovich - Dotnetty version 0.5.0 which includes the fix has also been released.

As for the exception, is it a timeout (you included the callstack but not the exception)?

@isimanovich
Copy link

i think, i found root-cause of my issue

@prmathur-microsoft
Copy link
Member

@isimanovich - Could you update us with your findings ? Are you unblocked?

@CIPop
Copy link
Member

CIPop commented Oct 15, 2018

We are very interested in your findings - they may help other users of the SDK as well so if you have some time, please add them to this thread.

Closing for now. @isimanovich please follow https://github.com/Azure/azure-iot-sdk-csharp/tree/master/tools/CaptureLogs to collect logs when you're getting this exception with the latest release.

@CIPop CIPop closed this as completed Oct 15, 2018
@az-iot-builder-01
Copy link
Contributor

@jason-e-gross, @myagley, @isimanovich, @prmathur-microsoft, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

7 participants