Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TelemetryClient creates InMemoryChannel in Track but never disposes it #364

Closed
Dmitry-Matveev opened this issue Nov 17, 2016 · 17 comments · Fixed by #406
Closed

TelemetryClient creates InMemoryChannel in Track but never disposes it #364

Dmitry-Matveev opened this issue Nov 17, 2016 · 17 comments · Fixed by #406
Labels
Milestone

Comments

@Dmitry-Matveev
Copy link
Member

TelemetryClient has a logic to create InMemoryChannel in case no other channel is configured.
The code is here.

InMemoryChannel creates InMemoryTransmitter that in turn starts a task (Runner()) on a thread pool thread. When TelemetryClient is no longer used and the new TelemetryClient is created, the old thread stays in WaitOneNative() waiting for the next send interval. Dispose() method of InMemoryTransmitter/InMemoryChannel would fire the event to stop waiting and will disable the Runner(), however neither seems to be currently invoked if Telemetry Client is GC'd.

The issue will reproduce if custom or default configuration object is passed into TelemetryClient on each creation, the issue should not reproduce if TelemetryConfiguration.Active is used - one channel will remain active.

The stack of the stale threads is:
System.Threading.WaitHandle.WaitOneNative
System.Threading.WaitHandle.InternalWaitOne
System.Threading.WaitHandle.WaitOne
Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.Runner()

@Dmitry-Matveev Dmitry-Matveev added this to the 2.3 milestone Nov 17, 2016
@Dmitry-Matveev Dmitry-Matveev modified the milestones: 2.3-Beta2, 2.3 Nov 18, 2016
@dnduffy dnduffy self-assigned this Jan 6, 2017
@dnduffy
Copy link
Member

dnduffy commented Jan 6, 2017

Proper Dispose cleanup is hampered by the fact that the TelemetryChannel setter on TelemetryConfiguration is public. Someone can replace the channel instance on a configuration instance without disposing of the old channel which makes for a messed up anti-pattern.
I think we need to remove some of the overly open property access in future.

@dnduffy
Copy link
Member

dnduffy commented Jan 12, 2017

We have several scenarios supported in the existing code:

  1. Default configuration, singleton instance (Active)
  2. Create new "default" configuration, for dependency injection
  3. Custom telemetry channel, set either when XML config is read or when developer does something like "TelemetryConfiguration.Active.TelemetryChannel = new SomeCustomChannel();"

TelemetryConfiguration currently has no non-default constructor and needs a setter on the TelemetryChannel property but this poses the problem about when a channel is created and who is responsible for disposing of it.

@dnduffy
Copy link
Member

dnduffy commented Jan 13, 2017

@dnduffy dnduffy closed this as completed Jan 13, 2017
@olegsych
Copy link
Member

I'd like us to separate creation of a default and custom TelemetryConfiguration. When a configuration is created by calling TelemetryConfiguration.CreateDefault, the ITelemetryChannel is created by the SDK and the SDK is responsible for managing it's lifetime. However, when the ITelemetryChannel is created by user , the user code should be responsible for managing it's lifetime. Otherwise, they may get unexpected ObjectDisposedException or other errors in their application.

@cijothomas cijothomas reopened this Feb 3, 2017
@cijothomas
Copy link
Contributor

Reopened until the PR is complete and merged.

@stofte
Copy link

stofte commented May 31, 2018

Is there any guidance if we're seeing this very specific stacktrace in production? It's not particularly clear, but for instance, we have code that looks similar to this (in a shared dll which is used all over)

public static class AppInsightsHelper
{
    public static TelemetryConfiguration DefaultTelemetryConfiguration 
        { get { return new TelemetryConfiguration { InstrumentationKey = "ikey" }; } }

    public static TelemetryClient DefaultTelemetryClient => 
        new TelemetryClient(DefaultTelemetryConfiguration);
}

then I'm guessing our "fix" should be something like this (

public static class AppInsightsHelper
{
    public static TelemetryClient DefaultTelemetryClient => 
    {
        var conf = TelemetryConfiguration.Active;
        conf.InstrumentationKey = "ikey";
        new TelemetryClient();
    }
}

We're using v.2.4.0.0 of the AppInsights dll, and I'm not seeing the lines of code mentioned in OP (using dotpeek), so it's a little confusing, but we've about half a year of mysterious timeouts, which I finally traced to the same stacktrace as OP.

@pharring
Copy link
Member

@stofte Yes, you should change your code to avoid repeatedly creating new TelemetryConfigurations each time DefaultTelemetryConfiguration is called. i.e. make it a static field-backed property. Like this, for example:

public static TelemetryConfiguration DefaultTelemetryConfiguration { get; } = new TelemetryConfiguration { InstrumentationKey = "ikey" };

You could also do the same for DefaultTelemetryClient (i.e. cache the result in a static field) to avoid constructing a new one each time, but that's less of a problem.

@mark-janos
Copy link

Even if I cache my instance of TelemetryClient in a static field I still observe the number of open threads growing. Is there any ETA on a fix for this?

@Dmitry-Matveev
Copy link
Member Author

@mark-janos , the known issue is that the repetitive creation of Telemetry Configuration objects will lead to the orphaned threads produced. If you only leverage one Telemetry Configuration object, you should not see ever increasing number of the threads. If you do, it can be a new issue we are not aware about yet - can you post the example stacks from such threads and their approximate quantity?

@mark-janos
Copy link

mark-janos commented Oct 29, 2019

@Dmitry-Matveev

I have an Azure Function App which, when I add the TelemetryClient, the Thread Count of the function app continues to grow until the Function App crashes. The growth of threads is shown in the attached screen shot (they only decrease when I manually restart the Function App), and the timescale is about 24 hours. I note that this issue does not occur when I remove the TelemetryClient from the Function App.

As suggested above I use static fields for both the TelemetryConfiguration and TelemetryClient, and all I do is call TrackAvailability inside the 'Run' method of the Function App:

private static TelemetryConfiguration telemetryConfiguration { get; } = new TelemetryConfiguration { InstrumentationKey = "<MyInstrumentationKey>" }; private static TelemetryClient telemetryClient = new TelemetryClient(telemetryConfiguration);

ThreadCount

@cijothomas
Copy link
Contributor

cijothomas commented Oct 29, 2019

@mark-janos
If you are in Functions, can you try to use DI to get telemetry configuration injected. instead of creating your own.
https://docs.microsoft.com/en-us/azure/azure-functions/functions-monitoring#log-custom-telemetry-in-c-functions

@dradoaica
Copy link

Same issues as @mark-janos (static or DI does not help). When using the "ApplicationInsights-aspnetcore" the behaviour is not reproducible

@mark-janos
Copy link

@dradoaica in my case I reverted to using the instance of ILogger that's passed into the Function (as per the example posted by @cijothomas ).

@olivergrimes
Copy link

olivergrimes commented Jan 28, 2020

I have an issue that looks similar to this, one particular instance of the affected service had over 55000 threads, I took a dump and most of them had the following stack:

Entry point   coreclr!Thread::intermediateThreadProc 
Create time   27/01/2020 11:14:52 
Time spent in user mode   0 Days 00:00:00.000 
Time spent in kernel mode   0 Days 00:00:00.000 

This thread is waiting in a WaitOne


.NET Call Stack

[[HelperMethodFrame]] 
System.Threading.WaitHandle.WaitOneNoCheck(Int32)+b0 
Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.Runner()+b3 
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+8c 
System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)+194 
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+8c 
[[GCFrame]] 
[[DebuggerU2MCatchHandlerFrame]] 

Full Call Stack

ntdll!NtWaitForMultipleObjects+14 
KERNELBASE!WaitForMultipleObjectsEx+ef 
coreclr!Thread::DoAppropriateWaitWorker+242 
coreclr!Thread::DoAppropriateWait+89 
coreclr!WaitHandleNative::CorWaitOneNative+b9 
[[HelperMethodFrame]] 
System.Threading.WaitHandle.WaitOneNoCheck(Int32)+b0 
0x000001a6`79389990 
0x00007ff9`00007530 
0x000001a6`79389990 
0x00007530 
0x0000008e`1cb7ef30 
0x000001a6`7a000018 
0x000001a5`79564bd0 
0x00000001 
0x000001a6`79389870 
0x000001a5`79564bb8 
0x0000008e`1cb7efe0 
Microsoft.ApplicationInsights.Channel.InMemoryTransmitter.Runner()+b3 
0x11e1a300 
0x00007ff9`10425650 
0x00000004 
coreclr!JIT_GetSharedGCThreadStaticBase+70 
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+8c 
0x0000008e`1cb7f3a0 
System.Threading.Tasks.TplEventSource.TaskStarted(Int32, Int32, Int32) 
0x0000008e`1cb7f388 
0x00007ff9`10425650 
0x0000008e`1cb7eff0 
System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread)+194 
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+8c 
[[GCFrame]] 
[[DebuggerU2MCatchHandlerFrame]] 

This is a SF ASP.NET Core (netcoreapp3.0) service, using the following packages:

<PackageReference Include="Microsoft.Diagnostics.EventFlow.Inputs.EventSource" Version="1.4.6" />
<PackageReference Include="Microsoft.Diagnostics.EventFlow.Outputs.ApplicationInsights" Version="1.4.0" />
<PackageReference Include="Microsoft.Diagnostics.EventFlow.ServiceFabric" Version="1.5.1" />

The diagnostics pipeline is initialised in Program.cs as follows:

private static void Main()
        {
            try
            {
                using var diagnosticsPipeline = ServiceFabricDiagnosticPipelineFactory.CreatePipeline(
                    "MyService-DiagnosticsPipeline");
            ...

And using the following eventFlow.config:

{
  "inputs": [
    {
      "type": "EventSource",
      "sources": [
        { "providerName": "Microsoft-ServiceFabric-Services" },
        { "providerName": "Microsoft-ServiceFabric-Actors" },
        { "providerName": "MyService" }
      ]
    }
  ],
  "filters": [
    {
      "type": "drop",
      "include": "Level == Verbose"
    }
  ],
  "outputs": [
    {
      "type": "ApplicationInsights",
      "instrumentationKey": "servicefabric:/Environment/AppInsights_InstrumentationKey"
    }
  ],
  "schemaVersion": "2016-08-11"
}

Interestingly, all other services in the application, including other ASP.NET Core services don't exhibit the same issue. I haven't yet spotted any difference in implementation but I'll keep looking.

For now, I've had to remove everything above from the service.

@olivergrimes
Copy link

olivergrimes commented Feb 3, 2020

After a bit more research, I realised that we had a TelemetryClient instance being created on each HTTP request to the service, as it was registered using AddScoped🤦‍♂️. I've updated this to AddSingleton in service registration and hopefully this should sort our issue out.

I thought an update might help anyone else in the same self-inflicted situation!

@shoguns6
Copy link

After a bit more research, I realised that we had a TelemetryClient instance being created on each HTTP request to the service, as it was registered using AddScoped🤦‍♂️. I've updated this to AddSingleton in service registration and hopefully this should sort our issue out.

I thought an update might help anyone else in the same self-inflicted situation!

Did that help? Am facing similar issue too. Instead of memory, CPU goes to 100% and there are way too many threads stuck at 'defaultaggregationperiodcycle'.

@olivergrimes
Copy link

After a bit more research, I realised that we had a TelemetryClient instance being created on each HTTP request to the service, as it was registered using AddScoped🤦‍♂️. I've updated this to AddSingleton in service registration and hopefully this should sort our issue out.
I thought an update might help anyone else in the same self-inflicted situation!

Did that help? Am facing similar issue too. Instead of memory, CPU goes to 100% and there are way too many threads stuck at 'defaultaggregationperiodcycle'.

Yes that change sorted the issue, we had to restart our nodes one by one in order to free enough resources to deploy the fix, but it didn't happen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.