Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher CPU usage since update from 5.0.13 to 6.0.1+ #66827

Closed
gregolsky opened this issue Mar 18, 2022 · 5 comments
Closed

Higher CPU usage since update from 5.0.13 to 6.0.1+ #66827

gregolsky opened this issue Mar 18, 2022 · 5 comments
Labels
Milestone

Comments

@gregolsky
Copy link

gregolsky commented Mar 18, 2022

Description

We manage a number of B1ls instances on Azure and ever since we updated our application to .NET 6.0.1 (still happening on 6.0.3) they started to get out of CPU credits. Even completely idle instances slowly consumed the CPU credits pool - on B1ls this starts when the CPU usage is over the baseline 5%.

Regression?

Before the update to 6.0.1 our application was running on .NET 5.0.13 without this issue.

Data

Azure instance type in question: B1ls - 1 vCPU, 0.5 GB RAM, 5% CPU usage baseline
https://azure.microsoft.com/en-us/updates/b-series-update-b1ls-is-now-available/

Here's a screenshot from our monitoring showing the CPU credits slowly going down on those idle servers:
image

Analysis

We tried to narrow down the issue by using our tooling that grabs stack traces, which showed LowLevelLIFOSemaphore generating the CPU load:

image

Moving further we used perf to see if we can check what's been using the CPU. We tried to find differences between versions 5.3.2 (running .NET 5.0.13) and 5.3.101 (running on .NET 6.0.1) of our application. Attached are the interactive flamegraphs made on those versions.

Application running on .NET 6.0 is taking aproximately 1% more CPU than 5.0, which makes a big difference for B1ls burstable machines running 1 vCPU. The difference seems to lie in .NET ThreadPool threads and their usage of sched_yield syscalls.

image
image

flamegraphs.zip

@gregolsky gregolsky added the tenet-performance Performance related issue label Mar 18, 2022
@ghost
Copy link

ghost commented Mar 18, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

We manage a number of B1ls instances on Azure and ever since we updated our application to .NET 6.0.1 (still happening on 6.0.3) they started to get out of CPU credits. Even completely idle instances slowly consumed the CPU credits pool - on B1ls this starts when the CPU usage is over the baseline 5%.

Regression?

Before the update to 6.0.1 our application was running on .NET 5.0.13 without this issue.

Data

Azure instance type in question: B1ls - 1 vCPU, 0.5 GB RAM, 5% CPU usage baseline
https://azure.microsoft.com/en-us/updates/b-series-update-b1ls-is-now-available/

Here's a screenshot from our monitoring showing the CPU credits slowly going down on those idle servers:
image

Analysis

We tried to narrow down the issue by using our tooling that grabs stack traces, which showed LowLevelLIFOSemaphore generating the CPU load:

image

Moving further we used perf to see if we can check what's been using the CPU. We tried to find differences between versions 5.3.2 (running .NET 5.0.13) and 5.3.101 (running on .NET 6.0.1) of our application. Attached are the interactive flamegraphs made on those versions.

Application running on .NET 6.0 is taking aproximately 1% more CPU than 5.0, which makes a big difference for B1ls burstable machines running 1 vCPU. The difference seems to lie in .NET ThreadPool threads and their usage of sched_yield syscalls.

image
image

flamegraphs.zip

Author: gregolsky
Assignees: -
Labels:

area-System.Threading, tenet-performance

Milestone: -

@janvorli
Copy link
Member

cc: @kouvel as the comment above says:

The difference seems to lie in .NET ThreadPool threads and their usage of sched_yield syscalls.

@kouvel kouvel added this to the 7.0.0 milestone Mar 18, 2022
@kouvel
Copy link
Member

kouvel commented Mar 18, 2022

There's a config setting to reduce/disable the spin-waiting in LowLevelLifoSemaphore::Wait, the following would disable the spin-waiting through an environment variable that is effective to the app that is launched thereafter (there are also other ways to configure it):

  • COMPlus_ThreadPool_UnfairSemaphoreSpinLimit=0

A tradeoff of applying the above config would be that if the service gets frequent moderately-sized bursts of work, the perf may end up being lower. If you expect the service to either be heavily loaded or barely loaded, and want to limit CPU usage in those cases, the above config may be appropriate. There may also be other tradeoffs, I'd suggest checking the perf to see how it works out.

There has not been any difference in the actual spin-waiting heuristics between .NET 5 and 6, but there may be some natual timing changes that may end up contributing to the difference. I can't attempt a guess at what that difference might be.

@gregolsky
Copy link
Author

@kouvel Thank you. I will try to experiment with different settings and see if it gets better.
Found that the default value is 70 here:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs#L15

@kouvel
Copy link
Member

kouvel commented Mar 21, 2022

The value specified for config variable above is interpreted as hexadecimal.

@kouvel kouvel closed this as completed Jul 8, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants