Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage slowly increases over time #690

Closed
stevedesmond-ca opened this Issue May 19, 2018 · 82 comments

Comments

Projects
None yet
@stevedesmond-ca
Copy link

commented May 19, 2018

I had previously used Application Insights, but no longer do, however the App Insights code remained in my app. I noticed when I recently migrated hosts and upgraded to .NET Core 2.1.0-rc1 that the CPU usage gradually increased over the course of a few days.
image
The drops back to 0 are Docker container restarts.

Removing Application Insights from the site results in a steady almost-zero CPU usage.

See also this Twitter thread, though there's not much more there.

Repro Steps

  1. Add Application Insights (2.3.0-beta2) to an ASP.NET Core MVC app via NuGet and Startup.cs
    app.AddApplicationInsightsTelemetry("{no-longer-valid instrumentation key}")
  2. Setup some sort of synthetic monitoring to hit the site every minute
  3. Wait a couple days ¯_(ツ)_/¯

Actual Behavior

  • CPU slowly creeps up over the course of a few days

Expected Behavior

  • CPU remains stable

Version Info

SDK Version : 2.1.0-rc1
.NET Version : 2.1.0-rc1
How Application was onboarded with SDK(VisualStudio/StatusMonitor/Azure Extension) :
OS : Ubuntu 16.04
Hosting Info (IIS/Azure WebApps/ etc) : Azure VM

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 23, 2018

Thanks for reporting. We'll try to setup a repro and investigate more.
What was the version of application insights SDK which you used?

@stevedesmond-ca

This comment has been minimized.

Copy link
Author

commented May 23, 2018

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

image
Left an app in .net 2.0 running for last 3 days. Dont see any increase in mem/cpu

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

image
Similar for app in .net 2.1 for last 3 days.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

I'll setup something for linux now.

@thohng

This comment has been minimized.

Copy link

commented May 30, 2018

image

We got same issue on Ubuntu 16.04

  • After upgrade AspNetCore 2.1.0-rc1
  • Never got CPU issue with AspNetCore 2.0.x before
  • Not sure this caused by new AspNetCore 2.1.0-rc1 or new ApplicationInsights 2.3.0-beta2
  • Internally, we got few HttpClient failed a day, might related to dotnet/corefx#29285

Before Upgrade:

<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.2.1" />
<PackageReference Include="Microsoft.AspNetCore.ApplicationInsights.HostingStartup" Version="2.0.2" />
<PackageReference Include="Microsoft.AspNetCore.Mvc" Version="2.0.3" />

After:

<PackageReference Include="Microsoft.ApplicationInsights.AspNetCore" Version="2.3.0-beta2" />
<PackageReference Include="Microsoft.AspNetCore.ApplicationInsights.HostingStartup" Version="2.1.0-rc1-final" />
<PackageReference Include="Microsoft.AspNetCore.Mvc" Version="2.1.0-rc1-final" />
@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 30, 2018

Thanks for reporting this. We will setup a linux test to do deeper investigation soon.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented May 31, 2018

Have an asp.net core 2.1 app with AI SDK 2.3, running in ubuntu 16_04. A script is pinging the app once every 10 sec.
WIll let this run for few hours/day and check again.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Jun 1, 2018

6hours, no visible change in CPU load. CPU is constant around .39%
Load is 6Request/Sec
Will leave this overnight.

@thohng

This comment has been minimized.

Copy link

commented Jun 1, 2018

When we stop pinging, the CPU usage is no more keep constantly increase.

image

@mehmetilker

This comment has been minimized.

Copy link

commented Jun 8, 2018

Same thing happening on Windows as well (Azure App service)
Not only CPU but also Garbage collector affected.
I upgrade my .Net core 2.0 site to .Net core 2.1 on Jun 3 afternoon. And yesterday I have removed AI Insight extension from my app service (Nuget package reference still in my project.)
You can see changes in attached images.

cpu
gen

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2018

repro-ed on linux vm. Need to investigate more.

@cijothomas cijothomas added this to the 2.4.0 milestone Jun 8, 2018

@flagbug

This comment has been minimized.

Copy link

commented Jun 12, 2018

@stevedesmond-ca @mehmetilker @thohng I've seen a similar thing since upgrading to .NET Core 2.1 and not only does our CPU usage increase over time, but our open handles increase dramatically since the upgrade:

capture

Could you take a look at your handle count and confirm if you have the same issue or not?

@cwe1ss

This comment has been minimized.

Copy link

commented Jun 12, 2018

I'm recently having issues with increased memory usage. I checked it with dotMemory and I've seen many System.Net.Http related objects - e.g. open handlers that haven't been closed etc.

I'm not yet sure if it's Application Insights though (I will do some more tests next week) - but if it is, maybe this clue helps finding the issue.

@ByronAP

This comment has been minimized.

Copy link

commented Jun 18, 2018

I also have increasing CPU which seems to be from HTTP. It was not noticeable until I went to production which looked great after deployment, but a few hours later everything was redlined due to ever-increasing CPU demand as shown below. Unfortunately, this is a show stopper and is forcing me to rollback.

Runtime: core 2.1 Linux x64
Containers: YES Docker
OS: UBUNTU & CENTOS

cpu

@flagbug

This comment has been minimized.

Copy link

commented Jun 21, 2018

We've downgraded Application Insights back to v2.2.1 and don't see the increasing CPU and memory usage anymore, so I can confirm that this is an issue with Application Insights.

@cijothomas Any chance this can be looked into? This basically makes version 2.3.0 not usable for production scenarios

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2018

@flagbug Thanks for helping narrow down to 2.3.0. We will prioritize this investigation.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2018

Related, maybe the cause as well: Microsoft/ApplicationInsights-dotnet#594

@nycdotnet

This comment has been minimized.

Copy link

commented Jun 26, 2018

Might want to see if HttpClientFactory solves the issue. Static HttpClient is not without its issues. https://blogs.msdn.microsoft.com/webdev/2018/02/28/asp-net-core-2-1-preview1-introducing-httpclient-factory/

@cwe1ss

This comment has been minimized.

Copy link

commented Jun 29, 2018

I now also ran our application without App Insights for a few days and I don't see any stale Http-related objects in memory so I'm also "blaming" Application Insights for now.

Could you please prioritize this? This has been open for more than a month and a profiler that leaks memory is a "really bad thing" ™️...

/cc @SergeyKanzhelev @lmolkova (fyi, in case you didn't know about this yet)

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2018

@cwe1ss Were you facing issues in linux or windows or both?

@cwe1ss

This comment has been minimized.

Copy link

commented Jul 2, 2018

My setup: Windows Server 2016 on Azure (Standard_D11_v2, 2 cores 16GB RAM); Single-node Service Fabric cluster (development system); I'm running ~ 60 .NET Core apps (~ 30 ASP.NET Core apps, 30 console apps); services are communicating via the Service Fabric HTTP reverse proxy. As it's a dev system, there's very little load on the system.

Here's some more details:

I see a similar pattern in overall CPU usage like the other reporters:
cpu-increase

I took a process dump of one ASP.NET core Web API project. The process has been running for "2 days 20:49:36". dotMemory gives me the following overview. As you can see, the string value "dc.services.visualstudio.com" exists 23218 times.
dotmemory-overview

Looking at the type list shows that "System.Net.Http"-related objects make up the top of the list.
types-list

I do not see a similar pattern in an app where I've disabled Application Insights.

@benaadams

This comment has been minimized.

Copy link

commented Jul 2, 2018

@cwe1ss what are the back traces/call tree like on some of these objects?

11,613 object count ones are interesting; especially since it looks like there are 11,613 sets of HttpConnectionPoolManagers and 11,613 x List<HttpConnectionPool+CachedConnection>. That's not the number of cached connections; but the number of caches for connections.

HttpClientHandler probably would have the most informative backtraces? (as I assume all the others are allocated in it)

@cwe1ss

This comment has been minimized.

Copy link

commented Jul 2, 2018

@benaadams can I get this from the process dump via dotMemory / WinDbg or do I need to trace this with PerfView / dotTrace ?

@cwe1ss

This comment has been minimized.

Copy link

commented Jul 2, 2018

oh... I've seen the tabs in dotMemory... but they are empty and just say "No allocation info was collected during profiling". So I guess I need PerfView / dotTrace, right?!

@cwe1ss

This comment has been minimized.

Copy link

commented Jul 2, 2018

I clicked around a bit more in dotMemory (I don't know much about it, sorry) and I've seen that you can get more details about the object instances. Here's some screenshots for a random instance of System.Collections.Concurrent.ConcurrentDictionary+Tables<System.Net.Http.HttpConnectionPoolManager+HttpConnectionKey, System.Net.Http.HttpConnectionPool>.

dotmemory-instance

dotmemory-instance-shortestpath

dotmemory-instance-path

Hope this helps?!

@benaadams

This comment has been minimized.

Copy link

commented Jul 2, 2018

I'm guessing the CPU increase is from the constantly increasing number of Timers; from scanning the list of active timers to work out which ones to trigger; which your memory dump seems to indicate.

Timers also act as GC roots, which is why everything is probably hanging around, if their callbacks are capturing state.

Not sure why the timers are constantly increasing though...

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Aug 6, 2018

@Meenakshise The issue we addressed here was specific to .NET Core apps only, so i don't think this will help you. Either way, we always recommend using the latest version to get all the bug fixes etc.
Could you try to use the latest stable/beta, and report back if you have high CPU usage issue.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2018

https://www.nuget.org/packages/Microsoft.ApplicationInsights.AspNetCore/2.4.0 - Stable is released which has fix for this issue. Please try out and let us know if any issues are found! Thank you everyone!

@thohng

This comment has been minimized.

Copy link

commented Aug 10, 2018

New Microsoft.ApplicationInsights.AspNetCore@2.4.0 is good from our side (after more than 24h running).

Thanks!

@vigneshmsft

This comment has been minimized.

Copy link

commented Aug 14, 2018

updating to 2.4.0 has fixed the memory leak issue.

@IanKemp

This comment has been minimized.

Copy link

commented Aug 27, 2018

Any explanation on what was the issue?

@cijothomas

This comment has been minimized.

@Svisstack

This comment has been minimized.

Copy link

commented Sep 9, 2018

I upgraded to the 2.4.1 and it still has the leak which increases over time.

First screenshot showing 2 threads working only and constantly on the CPU leak, I found that the amount of threads depend on uptime of the process, so leak increasing in time and was not fixed in 2.4.1

Second screenshot showing the increase ratio where 600% is a hardware limit, after some time AspNetCore app working only on the leak.

@cijothomas Please advise here.

1

2

@benaadams

This comment has been minimized.

Copy link

commented Sep 9, 2018

@Svisstack what version of .NET are you running on? High CPU usage in the HttpHeaders Dictionary'2.TryGetValue would suggest concurrent reads+writes to the header dictionary.

.NET Core 2.1 will throw an exception when an infinite loop would previously occur due to this dotnet/corefx#28123

It could happen if FrameworkSqlEventListner (or its chain in your callstack) if fired after/concurrently with the request end (when the Dictionary is cleared)

@Svisstack

This comment has been minimized.

Copy link

commented Sep 9, 2018

@benaadams I'm running this on the .NET Framework 4.7.2 and compiling with the 4.7.1

If this is true, it looks like a major bug in app insights; why someone would put a SpinLock-like reader around the dictionary item which could not exist.

I had an idea to create a workaround and still keep the app insights, with disabling dependency tracking to avoid this issue to happen as it's related to the database dependency, however it looks like there is no option to disable dependency tracking.

@benaadams

This comment has been minimized.

Copy link

commented Sep 9, 2018

why someone would put a SpinLock-like reader around the dictionary item which could not exist.

The dictionary for the header collection is reused for each request on the same connection; so at the end of the request it is cleared.

If that clearing happens while another thread is reading the header collection (e.g. it reads it outside of the normal request flow); then the read can go into an infinite loop as the arrays inside the dictionary are changed (Dictionary's are not threadsafe for a concurrent write with any other operation on it).

In .NET Core 2.1 it will detect the infinite loop being formed and throw an exception; though it won't fix the concurrent use.

@Svisstack

This comment has been minimized.

Copy link

commented Sep 9, 2018

@benaadams Thanks for detailed explanation. I understand now what's happening, but didn't see a clear solution/fix for this as it looks like the bug is actually in the AspNetCore which allows concurrent access to the dictionary instance which is not thread-safe ofc.

@benaadams

This comment has been minimized.

Copy link

commented Sep 9, 2018

looks like the bug is actually in the AspNetCore which allows concurrent access to the dictionary instance

Well... even if concurrent access was prevented (at a performance cost); it still wouldn't help as the data it read would still be wrong.

If its conflicting with .Clear() this means there is a race condition and 3 outcome could happen (if concurrent access was prevented)

  1. It could read the right data (pre clear)
  2. It could read no data (post clear) wrong
  3. It could read headers from the next request very wrong

At a guess... Looking at the stack from dotTrace, it could be a Sql query that's started as part of the Request; but not awaited as part of the request, so the response is sent before the Sql is completed. It could also be a Timer that's started to periodically poll and lazily started as part of a Request.

When it completes it would then be trying to log the shallow copied (i.e. reference to) the captured Request data as part of the Application Insight's Telemetry event; using the mutable Request information that has now changed and is no longer valid for the event.

I'm not entirely sure how FrameworkSqlProcessing.OnEndExecuteCallback accesses the Headers collection, though it looks to be part of the DependencyTelemetry object.

public void OnEndExecuteCallback(long id, bool success, bool synchronous, int sqlExceptionNumber)
{
    DependencyCollectorEventSource.Log.EndCallbackCalled(id.ToString(CultureInfo.InvariantCulture));

    var telemetryTuple = this.TelemetryTable.Get(id);

    if (telemetryTuple == null)
    {
        DependencyCollectorEventSource.Log.EndCallbackWithNoBegin(id.ToString(CultureInfo.InvariantCulture));
        return;
    }

    if (!telemetryTuple.Item2)
    {
        this.TelemetryTable.Remove(id);
        var telemetry = telemetryTuple.Item1 as DependencyTelemetry;
        telemetry.Success = success;
        telemetry.ResultCode = sqlExceptionNumber != 0 ? sqlExceptionNumber.ToString(CultureInfo.InvariantCulture) : string.Empty;
        DependencyCollectorEventSource.Log.AutoTrackingDependencyItem(telemetry.Name);
        ClientServerDependencyTracker.EndTracking(this.telemetryClient, telemetry);
    }
}
@benaadams

This comment has been minimized.

Copy link

commented Sep 9, 2018

If its a Timer being kicked off that then runs Sql queries you will likely want to suspend ExecutionContext flow when creating that Timer so that the HttpContext doesn't get captured and become forever accessible via IHttpContextAccessor when it shouldn't be (e.g. its only valid during that request)

Using a pattern similar to dotnet/corefx#26075

@benaadams

This comment has been minimized.

Copy link

commented Sep 9, 2018

Or aspnet's NonCapturingTimer helper aspnet/Extensions#395

@Svisstack

This comment has been minimized.

Copy link

commented Sep 10, 2018

@benaadams Thanks, I actually found the Tasks which can and probably are ending after the request is finished and they contain the SQL code, I will wait on them and report about the state of the issue after deploying the new version today.

Based on your very useful input, I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests then function for accessing that headers should have required some kind of request ID (static long incrementing every new request) based on which internal function could decide that these headers no longer exists and throw an exception instead of providing random data.

@benaadams

This comment has been minimized.

Copy link

commented Sep 10, 2018

I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests

Its more the Headers are for the connection as there can only be one active request on a connection at any time. If they are accessed via IHttpContextAccessor there is a change in ASP.NET Core 2.2 to invalidate the HttpContext and return null in the manner you described aspnet/HttpAbstractions#1021

However, while that may help with the CPU issue; it would also mean the telemetry from OnStart would be disconnected from OnEnd; if its using anything in the headers or HttpContext to connect them (as OnEndXxx wouldn't have this information).

What should probably happen is Application Insights during OnStartXxx; which happens during the request, should deep copy the information it needs from the context/headers (e.g. key, values) as the strings will remain valid being immutable. Then OnEndXxx should use that same copied object to get the headers which were captured at the start of the event.

I'm not entirely sure how Application Insights is capturing/reading the headers so I can't provide code suggestions for the changes, alas...

@Svisstack

This comment has been minimized.

Copy link

commented Sep 11, 2018

@benaadams Removing the Tasks with SQL code that could end after the request solved the issue. Many thanks, I really appreciate your input here.

So it's no action needed here as there is a new interface which didn't have this issue? I want to make sure that we should fix that for the community, or at least in known Insights bugs should be information that no dependency should end after the request is already finished.

@benaadams

This comment has been minimized.

Copy link

commented Sep 11, 2018

Well I do think AI should collect the information it needs on event start (i.e. key,value for headers); then use that collected information for event end; rather than rereading it from the HttpHeader collection; which should solve this issue more holistically.

@Svisstack

This comment has been minimized.

Copy link

commented Sep 11, 2018

@benaadams Can you then reopen this issue?

@benaadams

This comment has been minimized.

Copy link

commented Sep 11, 2018

Can you then reopen this issue?

Alas, no, I don't work for MS or have rights to this repo. Sorry

@Svisstack

This comment has been minimized.

Copy link

commented Sep 11, 2018

@cijothomas Can you reopen this? We still have bug in insights related to this problem.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2018

@Svisstack Can you open this issue in https://github.com/Microsoft/ApplicationInsights-dotnet-server/issues?
From an initial look, the issue you are facing is from DependencyCollection module, which is in not in this repo. (This repo only has asp.net core extension methods, and request collection)
https://github.com/Microsoft/ApplicationInsights-dotnet-server/issues

@Svisstack

This comment has been minimized.

Copy link

commented Sep 12, 2018

@cijothomas It's done.

@cijothomas

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

@Svisstack thanks. Will discuss on othed thread.

@turgayozgur

This comment has been minimized.

Copy link

commented Dec 13, 2018

@benaadams Thanks, I actually found the Tasks which can and probably are ending after the request is finished and they contain the SQL code, I will wait on them and report about the state of the issue after deploying the new version today.

Based on your very useful input, I think it's still issue in the AspNetCore as it should not provide functions which delivering possible random data, if Headers dictionary is shared between requests then function for accessing that headers should have required some kind of request ID (static long incrementing every new request) based on which internal function could decide that these headers no longer exists and throw an exception instead of providing random data.

Hi @Svisstack,

How did you solve that with this way? Could you share any code detail?

@Svisstack

This comment has been minimized.

Copy link

commented Dec 14, 2018

@turgay

This comment has been minimized.

Copy link

commented Dec 14, 2018

@Svisstack I guess you meant @turgayozgur but not me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.