New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threadpool performance 5x slower under Linux under WSL2 vs. Windows #42994
Comments
Is it possible to measure with 3.1 to help check whether this is a regression in 5.0? That would make it more time critical if it is. |
Apologies, I should have already noted that. Not a regression, similar behavior was noted on .NET 3.1 (therefore unfortunately but understandably I guess you will have to treat it as less time critical...) |
As well as investigating it would be nice to know whether we are missing interesting coverage in dotnet/performance. As I do not recall this showing up when @adamsitnik compare results by OS. |
hi @dje-dev
WSL2 might add some non-trivial overhead. Have you tried to run the benchmark without it?
We had some issues in the past that were specific to hardware with multiple sockets. Have you tried to run in on a machine with a single socket?
Is there any chance that you could contribute it to https://github.com/dotnet/performance repo? Benchmarks added to this repo are used to ensure that we don't introduce any regressions to .NET |
yes, we should most probably add more ThreadPool benchmarks to the perf repo. Currently we have only one: https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/libraries/System.Threading.ThreadPool/Perf.ThreadPool.cs And it performs much better on Linux compared to Windows: 2.29 vs 3.33 seconds |
I was recently debugging a problem in a .NET Core app and I've noticed that under WSL2, that app was about 40% slower than in a VM on the same machine. The Linux distro in both the VM and WSL2 was the same. But I have no idea whether it is a general trend or if that app had some specific functionality that was interacting badly with WSL2. It was an app of another party, so I didn't know much of its internals. |
Some progress with the help of the comments and suggestions:
Tentative conclusion is that highly multithreaded .NET code is likely to run very slowly on WSL2. I suggest we focus on the most simple case of understanding why GetThreadStatic (a very simple operation) is so much slower. How would you suggest we proceed? It gets complex because of the interaction with WSL2. (full WSL2 and native Windows tests results below)
|
@dje-dev just curious, could you tell us more about the scenario here -- are you deploying a product to run on WSL2? I think generally I have been thinking of that as more of a developer platform, not a deployment platform: for testing and developing software to later deploy on a "regular" Linux machine or VM - so raw performance was less critical. What are you using WSL2 for? |
Fair enough, in some places Microsoft does refer to WSL2 as "primarily a tool for developers." But on the other hand, one could get by if it were a 5% or maybe even 50% performance regression, but at 500% it's no longer viable (at least with some applications) as a developer tool. Further, WSL2 has been described by Microsoft as generally being very close to bare metal. This was confirmed by This suggests to me that there is a nontrival possibility that either (a) there is some bad interaction between .NET runtime and WSL2, and/or (b) this performance problem is not intrinsically solvable (albeit possibly involving WSL2 adjustments). |
@dje-dev I'm still curious about your production scenario, do you have one, or just happened to notice it? It would be interesting f there was data suggesting real customers deploy perf sensitive workloads to WSL2. But yes, if it's essentially a regular VM then perhaps there's a perf issue to report to them here. |
Sure, my scenario is development, I was hoping to leverage awesome tools Microsoft is making available for this (https://devblogs.microsoft.com/dotnet/debug-your-net-core-apps-in-wsl-2-with-visual-studio/). Of course Docker for Windows is now based on WSL2 so this will be a common scenario. Just not sure how where we take this issue form here.....any thoughts appreciated. |
It would be nice if we could localize this to some API we call, so that we could open an issue against WSL2. But, it is for @kouvel to determine whether or how to proceed as he owns this area. |
(And even for a dev scenario, 5x slower may be unnacceptable, as you say, depending on the scenario.) |
The comment above says:
I wonder if WSL2 has some perf issue w.r.t. mechanisms used for thread local access. Linux accesses it via |
Great, so we have isolated at least a part of this apparent problem with WSL2 + .NET to just 2 lines of code (see below) appearing as part of the .NET test suite at: https://github.com/dotnet/performance/blob/74fca49ecd1f0eae51b0172bd121ee7d0fdd2b6d/src/benchmarks/micro/corefx/System.Threading/Perf.ThreadStatic.cs Further, janvorli has conjectured about a potential reason for the performance issue and I have verified the issue exists on two different systems. These two products (WSL2 and .NET) are promoted as working well together, and if we can make sure there are no serious performance problems (such as this 4x to 5x regression in some scenarios) it will be surely helpful to me and others. Is there some way we could move this forward? Thank you.
|
Is that problem still exist on .Net 6? |
Linux_threadpool_perf.txt
Description
A large C# application makes extensive use of multithreading (including ThreadPool) and runs well on Windows but degrades to 1/3 speed on Linux.
The attached standalone C# benchmark code demonstrates the apparent problem. It is based mostly on a performance benchmark written by a member of the mono team:
mono/mono#17387.
Configuration
.NET 5.0 RC1 running Windows 10 (2004)
For Linux test, running Ubuntu 20.04 via WSL2
Intel 2 sockets of 16 physical cores each
Regression?
Unknown.
Data
Windows runtime: 5 seconds (70 seconds CPU)
Linux runtime: 35 seconds (280 seconds CPU time)
Analysis
Attempts such as modifying the threadpool minimum size, or setting processor affinity to only one socket did not meaningfully change the results.
The text was updated successfully, but these errors were encountered: