-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParallelOptions.MaxDegreeOfParallelism documentation, is the given advice correct? #72981
Comments
Tagging subscribers to this area: @dotnet/area-system-linq-parallel Issue DetailsHi! The documentation of the property
Then it goes on describing three specific scenarios where this general advice should not be followed:
My question is: Does this general advice still stands in the modern era of .NET applications, that are increasingly async-enabled and concurrent? Or it's a relic from the early days of the Task-based parallelism (around 2010 if I am not mistaken), that might need to be revised? I am asking because personally, for quite a long time, I am advising people against relying on the default, and instead to specify explicitly the
Nevertheless not all people agree with me. There are experienced .NET professionals (including MVPs) who think that it's wiser to let the So I thought that asking Microsoft's opinion on this subject can't be a bad idea. Which is closer to the truth?
Thanks!
|
Even if that's true that doesn't make the guidance incorrect.
PLINQ was introduced at the same time, not later, and in general it requires a fixed number of workers because of operations that need to synchronize and exchange with each other. It's not overriding the degree because doing so is better for scale but rather because doing so is necessary for its algorithms. Similarly ForEachAsync has to say an explicit value as by its nature its generally not throttled based on CPU; if it had an unlimited degree by default, it would very frequently end up processing every input element concurrently by default. We could have kept the unlimited default but then would have had to come up with another heuristic and mechanism for automatic throttling.
That'd be like saying it'd be hard to go wrong by default setting the thread pool's min and max to ProcessorCount. And that's not the case.
I've not seen evidence to say otherwise. As always, if setting it works well for your needs, go for it. If not, don't. |
@stephentoub thanks for the quick feedback as usual! The reason that the
Following the current general advice in the documentation, and not configuring the
Here is a minimal demonstration that an unconfigured Console.WriteLine(RuntimeInformation.FrameworkDescription);
ThreadPool.SetMinThreads(100, 100);
object locker = new();
int concurrencyCounter = 0;
int maxConcurrency = 0;
Parallel.ForEach(Enumerable.Range(1, 200), n =>
{
int concurrency = Interlocked.Increment(ref concurrencyCounter);
lock (locker) maxConcurrency = Math.Max(maxConcurrency, concurrency);
Thread.Sleep(2000); // Simulate a blocking I/O-bound operation
Interlocked.Decrement(ref concurrencyCounter);
});
Console.WriteLine($"Maximum concurrency: {maxConcurrency}"); Output:
The parallel loop starts by using the 100 available |
It doesn't require a repro: that is the design goal and purpose when there's no other work to be done. But note that the additional work items ForEach uses are queued locally, not globally, which means other threads would need to exhaust the global queue before assisting this loop. I'm not sure what the purpose of this Issue is. If you want to set the value in your code to ProcessorCount, go for it. |
@stephentoub could you expand on this a bit more? Maybe this is what I am missing. How is this difference between local vs global queuing can be demonstrated in practice?
The issue is whether the general advice offered in the documentation is valid or outdated. It's not about my code. It's about the advice that I am giving to other people, which is not aligned with the official documentation. I might have to reconsider my advice, or I might have to amend it with an explicit warning that it deviates from the official guidelines. |
When thread pool worker threads go in search of the next thing to process, they first look at their own queue (which only they can queue to), then to the global queue, and only if that's empty do they try to steal from other threads' queues. As long as there's work to be processed in a thread's local queue or in the global queue, it won't pick up the extra work item created by the loop in another thread's local queue. |
@stephentoub does this have any practical effect on the behavior of a .NET program? What would be the visible behavioral difference if the worker threads didn't have their own local queues, and instead they were using only a shared global queue? Btw I think that I should demonstrate what is the effect of an unconfigured Console.WriteLine(RuntimeInformation.FrameworkDescription);
ThreadPool.SetMinThreads(100, 100);
Task task = Task.Run(async () =>
{
Stopwatch stopwatch = new();
for (int i = 0; i < 10; i++)
{
stopwatch.Restart();
await Task.Delay(100);
stopwatch.Stop();
Console.WriteLine($"Duration of await Task.Delay(100): {stopwatch.ElapsedMilliseconds} msec");
}
});
Stopwatch timerStopwatch = Stopwatch.StartNew();
var timer = new Timer(_ =>
{
timerStopwatch.Stop();
Console.WriteLine($"Timer of 500 msec ticked after: {timerStopwatch.ElapsedMilliseconds} msec");
}, null, 500, -1);
Parallel.ForEach(Enumerable.Range(1, 200), n =>
{
Thread.Sleep(2000); // Simulate a blocking I/O-bound operation
});
GC.KeepAlive(timer); Output:
This is not a good behavior IMHO. That's the kind of observations that make me reluctant to follow and promote the general advice offered in the documentation. |
That code does what you tell it to do. So when you say it's not good behavior, you mean your code isn't written correctly. The way I'd write (and I have written) this kind of code is to use separate thread pools. One thread pool for I/O bound operations (your timer would get queued here), one thread pool for CPU bound operations (your plinq thing would get queued here) and the system thread pool which is used by system components, like Asp.Net Core listeners. Also this:
If that is indeed simulating an I/O bound operation, you should have used async operations, and then |
Yes. If it didn't, we wouldn't have the mechanism.
The primary purpose of the local queue is to allow a work item to be decomposed into multiple pieces such that other threads can help out if available. It enables a thread to receive assistance if others are available but not force itself onto them.
This is an unrepresentative repro in multiple ways. First, Thread.Sleep is an inexhaustible resource, provides zero backpressure, and gives zero information to the pool's scaling heuristics. Second, it's a heterogenous workload involving both asynchronous I/O that's apparently latency-sensitive with a Parallel.ForEach that's not only long-running but is also involving individual long iterations that are purely sleeping; this is not the predominant use of ForEach. Third, this exactly falls into the second exemption already included in the docs: "when you're running multiple algorithms concurrently". If the default isn't good for your workload, please choose a better value; it's been configurable since the day it was exposed. If the default doesn't work for customers you're consulting with and their workloads, but all means, recommend whatever value you determine is better based on sufficiently profiling and measuring their app (or maybe they shouldn't be using ForEach in the first place). If you believe there's a category of exemption missing in the docs, please feel free to open a suggestion in the docs repro. If you're asking that the default be changed, I don't see that happening; not only is that guaranteed to result in performance regressions (and not just for purely I/O-bound operations), it also has a reasonable chance of introducing deadlocks, if one iteration ends up depending on another executing. I'm on vacation and am going to stop responding to this thread now. Thanks. |
The logical flaw in that thinking is that all the individual operations take approximately the same time and start executing at the same time. That's not necessarily the case. |
@stephentoub thanks for stealing time from your vacation for the purpose of resolving this issue! In the future whenever I am giving advice about the configuration of the synchronous |
Hi Blindy! I'm happy that I am seeing you here. We agree that my code isn't written correctly, but we probably disagree (as we usually do) on how my code can be fixed. IMHO the correct way to fix my code is simply to configure the ParallelOptions options = new() { MaxDegreeOfParallelism = 100 }; ...and look how the output of my program is immediately fixed:
Letting the
Yes, the |
@filipnavara sure, one could easily devise scenarios where adding more threads than CPUs could help at speeding up CPU-bound calculations. For example you have 4 CPUs, and 5 items to process. In this case I would expect that it'll be beneficial to use 5 threads instead of 4. So I should have prefixed my statement with "In most cases". How about PLINQ queries though? The same train of thought could induce you increasing PLINQ's degree of parallelism, to something larger than the default .WithDegreeOfParallelism(Environment.ProcessorCount * 3) This should also speed up the CPU-bound processing of 5 items on 4 CPUs. Would you advise people for using the above configuration as their new default? If not, why not? |
Btw although we don't agree regarding the suitability of the general advice offered in the |
Hi! The documentation of the property
ParallelOptions.MaxDegreeOfParallelism
gives this advice in the Remarks section:Then it goes on describing three specific scenarios where this general advice should not be followed:
My question is: Does this general advice still stands in the modern era of .NET applications, that are increasingly async-enabled and concurrent? Or it's a relic from the early days of the Task-based parallelism (around 2010 if I am not mistaken), that might need to be revised?
I am asking because personally, for quite a long time, I am advising people against relying on the default, and instead to specify explicitly the
MaxDegreeOfParallelism
in all cases (recent example). Here is my reasoning for giving this advice:Parallel.ForEachAsync
, use theEnvironment.ProcessorCount
as the default, not unlimited (-1
).MaxDegreeOfParallelism = Environment.ProcessorCount
as the general advice.Nevertheless not all people agree with me. There are experienced .NET professionals (including MVPs) who think that it's wiser to let the
Parallel.For
/Parallel.ForEach
/Parallel.Invoke
methods adjust the number of worker tasks based on load, by using their internal heuristics, and that meddling with theMaxDegreeOfParallelism
option is more likely to cause harm than good.So I thought that asking Microsoft's opinion on this subject can't be a bad idea. Which is closer to the truth?
Thanks!
The text was updated successfully, but these errors were encountered: