Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threadpool doesn't scale optimally at low thread count #46077

Open
kevingosse opened this issue Dec 15, 2020 · 4 comments
Open

Threadpool doesn't scale optimally at low thread count #46077

kevingosse opened this issue Dec 15, 2020 · 4 comments
Labels
Milestone

Comments

@kevingosse
Copy link
Contributor

kevingosse commented Dec 15, 2020

I ran into a weird performance issue where having one idle threadpool thread impacted significantly the throughput of the application.

The minimum repro is a simple ASP.NET Core MVC app (dotnet new mvc) running on a 2 cores instance (Amazon c5.large). When load-testing the /home endpoint with enough traffic to overload the server, I get the following results:

  • Average CPU usage: 98%
  • Request Rate (Count / 10 sec): ~110,000
  • ThreadPool Thread Count: oscillates between 2 and 3

Now if I modify the application and use one threadpool worker for some idle work:

public class Startup
{
        // ...
        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            // ...
            Task.Run(() => DoStuff());
        }

        private static void DoStuff()
        {
                while (true) { System.Threading.Thread.Sleep(10000); }
        }
}

I then get the following results:

  • Average CPU usage: 75%
  • Request Rate (Count / 10 sec): ~99,000
  • ThreadPool Thread Count: oscillates between 2 and 3

The size of the threadpool didn't change despite the wasted worker, and now the server is unable to fully use the CPU to process the requests.

I know that long-running work shouldn't be pushed to the threadpool, but that's a common mistake and I was expecting the number of workers to be adjusted accordingly.

Metrics are measured using dotnet-counters: dotnet-counters monitor System.Runtime Microsoft.AspNetCore.Hosting --refresh-interval 10 -p 23328

I initially ran into this with .NET Core 3.1.9, but could reproduce with .NET 5.0.1.

@kevingosse kevingosse added the tenet-performance Performance related issue label Dec 15, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Threading untriaged New issue has not been triaged by the area owner labels Dec 15, 2020
@benaadams
Copy link
Member

/cc @kouvel

@kouvel
Copy link
Member

kouvel commented Dec 15, 2020

An issue I'm aware of in CPU-limited environments is that hill climbing does not work very well in low-thread-count situations, and can be more intrusive than helpful. It can be disabled by starting the process with environment variable COMPlus_HillClimbing_Disable=1. Using up a thread pool thread for long-running work in a CPU-limited environment can have more dire consequences since there are few threads available to begin with, and with hill climbing disabled it could be a bit worse. For instance, in the specific case above with hill climbing disabled since 1 of the 2 worker threads are blocked, only one would be performing work, and it may also prevent the starvation heuristic from kicking in (which checks every 500 ms for whether all thread pool threads are stuck). However, in practice that is relatively rare, if one worker thread gets blocked frequently usually all will fairly quickly. Can you try disabling hill climbing with and without the long-running work item and see how the perf changes?

@kevingosse
Copy link
Contributor Author

COMPlus_HillClimbing_Disable=1 makes it worse, which is somewhat consistent with your analysis.

With the idle thread and hill climbing disabled, CPU is at 50% and the server processes 84,000 requests per 10 second (down from 99,000 requests with hill climbing). Without the idle thread, CPU is at 98% and the server processes 105,000 requests per 10 seconds (down from 110,000 requests with hill climbing).

@kouvel kouvel removed the untriaged New issue has not been triaged by the area owner label Dec 16, 2020
@kouvel kouvel added this to the 6.0.0 milestone Dec 16, 2020
@kouvel
Copy link
Member

kouvel commented Dec 16, 2020

Sounds like hill climbing is not compensating for the blocked thread (at least soon enough). I'll try to get a repro. It should be possible and perhaps better to have the starvation heuristic track starvation on a per-thread basis rather than for the whole pool of worker threads.

@kouvel kouvel modified the milestones: 6.0.0, 7.0.0 Jul 27, 2021
@mangod9 mangod9 modified the milestones: 7.0.0, Future Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants