Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Use a more dependable policy for thread pool thread injection #1754
Reopening this ticket here per the suggestion of ericeil.
As of .NET 4.5 the thread pool injects one thread per 0.5sec if it thinks more threads are required. This is a problem if the number of required threads suddenly increases. Example: A corporate ASP.NET website is idle at night. In the morning at 8AM 1000 people log on and start working. If the app is using let's say 100 threads starting from 8AM it will take like 50sec to create all of them. Until then there will be serious delays and timeouts. It is possible to construe arbitrarily bad scenarios.
Problem statement: If thread pool load suddenly increases in IO bound workloads the pool is too slow to respond. This causes throughput and availability disruption. IO bound workloads relying on synchronous IO are common. Sudden workload changes are also common. Sometimes the workload can change due to a problem outside of the developer's control: A web service timing out or a database become slow.
Let me stress that this causes service interruption.
You can easily repro this yourself. Run a load test on an ASP.NET site with
Starting and shutting down a thread was benchmarked by me to be around 1ms in total. Threads are not really an expensive resource. The thread pool should be a lot more eager to create and destroy threads. 500ms delay to potentially save 1ms is not a good trade-off.
Easy fix: I propose lowering the injection delay to 100ms. This reduces the problem given above by 5x. Ideally, the rate would be configurable. The shutdown delay could be lowered from 30s as well. Keeping an idle thread for 30000ms to save 1ms seems excessive. In this ticket I'm not that concerned with retiring threads, though.
Smarter, riskier fix: The delay could depend on the number of threads in existence, the core count and the perceived pressure on the thread-pool. The injection rate could be:
Note, that the Windows Kernel has some thread pool injection heuristics that apply back pressure the more threads are in existence. This seems to work. Source: https://channel9.msdn.com/Shows/Going+Deep/Inside-Windows-8-Pedro-Teixeira-Thread-pool
The problem exists at all delays:
10ms delay is something that is very unrealistic to avoid in a production workload. Almost nothing is faster than 10ms (e.g. web service, database).
Please also see the discussion at the original location which is now closed.
referenced this issue
Nov 29, 2015
referenced this issue
May 8, 2016
In java there is a mechanism to inform the thread pool that one of its thread is about to block. This allows the thread pool to create a new thread immediately if needed.
Something similar might be doable in .Net, this could be done transparently in WaitHandle.WaitOne and all equivalent blocking calls. I think there is already something like this with SynchronizationContext.
In ASP.NET under a IIS worker process, since the CLR is hosted, the host could consider the current HTTP request queues as pressure to create threads. Even with the 500ms checking interval, it could create more than a single thread at a time, such as the minimum of 1) the number of queued requests and 2) the number of threads to reach the current MaxWorkerThreads configuration.
If ASP.NET Core under a IIS worker process runs in a specialized CLR host as well, this host could do the same.
But in essence, setting MinWorkerThreads to a high enough number has a similar effect, since it tells the CLR host how many threads should be created on demand (read: instantaneously) before using the 1 thread per 500ms adjusting algorithm. Setting a high minimum number of threads doesn't actually mean creating them all upfront or keep them all alive.
This also works when running a standalone ASP.NET Core application, i.e. without IIS.
If you have a requirement on the number of concurrent requests, you should probably provide a high enough minimum for now.
However, I agree that the thread pool algorithm could be improved, probably be configurable or replaced, or make the thread-pool be an API, such that we can create our own thread pools and task schedulers that use them, replace the default thread pool at will, perform certain tasks or requests in fixed thread-pools, etc.
As @ericeil mentioned in dotnet/corefx#2329, it would be good to investigate switching to the Windows thread pool. We need a good solution outside Windows as well, and
IO also can be blocking. Nothing wrong with doing synchronous IO.
Not sure how blocking would play into the heuristics even if the exact number of blocked threads was known. Just releasing another thread for each one blocked could easily cause an explosion of threads.
Knowing the number of blocked threads might help the heuristics but this can never fully resolve the problems detailed in this issue.
Suppose we immediately spin off a new thread to compensate for a blocked thread, when blocking ends and it calls the API to indicate that blocking has ended, instead of allowing that thread to run, we could block that thread and put it in a separate queue that would be prioritized when a thread pool thread becomes available. That way, it would not cause too many threads to be running after blocking operations complete. Realistically, there would need to be some sort of configurable limits on how many threads can be created like this, and perhaps also and increasing delay before compensating with a new thread, based on the number of active threads and processors, to prevent creating too many threads unnecessarily.
The 0.5 sec heuristic you mentioned (starvation heuristic) kicks in when all of the thread pool threads have not completed their work in that amount of time. That could be due to blocking work or long CPU-bound work. With a heuristic that compensates for blocking threads much quicker (immediately for all practical purposes), the starvation heuristic would not kick in due to blocking, only due to long CPU-bound work (but that's a separate issue). Why would this not resolve the problems you mentioned?
This would immediately create a maximum amount of threads with the following code:
Enumerable.Range(0, 100000).Select(_ => Task.Run(() => File.ReadAllBytes("..."))).ToList();
Because all those tasks immediately block, causing another thread to be spawned.
Holding a thread at the end of a blocking operation can lead to deadlocks or catastrophically low throughput if that thread was holding any locks. It can dramatically alter the perceived latency of IO operations.
But I agree that using the number of blocked threads in the thread injection heuristics could improve them. It seems that injection should happen more liberally if there are many threads blocked on IO.
I think the bias should be towards over-subscribing resources instead of under-subscribing. 10x too little threads means 10x loss in thoughput. 10x too many threads usually means far less than 10x loss.
What I recommend at the moment is to target the "Easy fix" immediately:
And then possibly pursue a more sophisticated solution.
There would need to be some limit on how many threads are added to compensate for blocked threads, such that this type of worst-case scenario caps out the number of threads at a reasonable amount. There are several pieces of information available to make reasonable heuristics for this case.
Yea that's a good point, especially if threads returning from blocked state are many and the queue gets backed up. Another issue is that when there are CPU-bound and IO-bound work items, it could artificially slow down the IO-bound work. I agree that we should let some threads continue running after unblocking up to some point and then queue them up, but it would not be reasonable to let all of them run immediately. There is lots of information we can use to tune this well, but yea it won't be perfect since it doesn't have control over scheduling. For instance we can look at duration and time spent unblocked for a particular work item, and make decisions based on that. In any case, it should be significantly better than what we have now.
What I suggested would not undersubscribe resources, it just may not be fair in allocating resources, that can be tuned.
The heuristic is currently unbounded in the number of threads it would create (max threads by default is 32K). So realistically, the delay is the bound. By reducing the delay, more apps would be subject to this heuristic, and the more the delay is reduced, the more it can get out of control. It could create more threads than necessary, creating inefficiencies and causing higher memory usage, and it would do so much faster than before due to the shorter delay.
So I don't think it's as simple as reducing the delay. Adding a bound to this heuristic would be arbitrary. Some apps may benefit from having many threads, and some may benefit from having few, and the information needed to create a reasonable bound is not available. Choosing thresholds for adjusting the rate of thread injection would also be arbitrary. This heuristic is inherently flawed, tweaking it may help some apps but it may hurt others.
For instance, an app that has a work item duration of 150 ms would not currently hit the starvation heuristic, and if all of those are CPU-bound work items, it would be performing well with the right number of threads. Reducing the delay to 100 ms would cause the starvation heuristic to kick in and 10 threads would be created per second in this app as long as there is work.
Maybe a tunable settings for app developers to alter the injection speed? It seems we need two settings:
For all productions apps that I have ever worked on I'd prefer the new lines over the red line. The green one is what I would use in all my web apps.
Configuring the minimum and maximum worker thread count produces an approximation of the green line above, though not as nice:
I'm not sure that it is always beneficial to have a curve. It should get to the right number of threads quickly. The type of work could also be changing, and it would have to adjust the thread count based on that. When the type of work is stable, configuring the min & max can produce decent results.