You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are observing a 15%+ throughput regression in .NET 11 compared to .NET 10 on the ASP.NET Core Kestrel JSON benchmark, running on ARM64 Linux (Azure Cobalt 100). The regression is most severe at high core counts (>32 cores) where throughput drops dramatically instead of scaling.
This was found using dotnet/crank and the json scenario tests.
This suggests the regression is tied to thread count / contention scaling at high core counts.
ASPNET Core KPI version to version regression
The regression is also showing up in the KPI dashboard at https://aka.ms/aspnet/benchmarks -> select either Cobalt environment. Current regression is -14.7% throughput for Json Platform test and -14.4% throughput for Json Minimal APIs test.
Trace Analysis
CPU trace comparison (EventPipe SampleProfiler) between .NET 10 and .NET 11 on the same benchmark showed:
Total wait/synchronization time: NET10 80.4% → NET11 92.3%
Key changes in NET11:
Method
NET10
NET11
Δ
WaitSubsystem+ThreadWaitInfo.Wait
0%
4.56%
+4.56% (new)
WaitSubsystem+WaitableObject.Wait_Locked
0%
3.86%
+3.86% (new)
LowLevelLock.WaitAndAcquire
0.27%
4.86%
+4.59% (18x increase)
PollGCWorker
5.52%
10.91%
+5.39% (doubled)
WaitForSocketEvents
0%
15.51%
+15.51% (was in native)
A comparison with a NET11-alpha build (before the WaitSubsystem change) showed:
The WaitSubsystem adds ~12.4% new wait CPU but replaces ~12.8% of old LIFO semaphore waits — roughly a wash in total. However,
the character changed: LowLevelLock.WaitAndAcquire (active lock contention) quadrupled from 1.21% → 4.86%, which is more
harmful to throughput than passive semaphore waits.
Root Cause
The primary suspect is PR #117788 ("Move CoreCLR over to the managed wait subsystem"), merged Dec 2025. The managed WaitSubsystem introduces a global process-wide lock (as noted in PR #123921's description) that contends heavily at high core counts.
PR #123921 ("A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy.") by @VSadov attempted to address this by replacing WaitSubsystem-based blocking with a lightweight portable implementation and adaptive spinning, but was reverted in PR #125193 due to NuGet restore regression. PR #125596 is the pending reapply.
Description
We are observing a 15%+ throughput regression in .NET 11 compared to .NET 10 on the ASP.NET Core Kestrel JSON benchmark, running on ARM64 Linux (Azure Cobalt 100). The regression is most severe at high core counts (>32 cores) where throughput drops dramatically instead of scaling.
This was found using dotnet/crank and the json scenario tests.
Configuration
crank --scenario json --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml <profile information>)Data
CPU Scaling (NET11 on up to 80-core ARM64)
Throughput peaks at 32 cores, then drops nearly 50% at 64 cores:
Version-to-version regression (64 cores)
10.0.1)11.0.0-alpha.1.25609.102)11.0.0-preview.1.26067.103)Partial mitigation via environment variables
Adding the following environment variables to NET11 preview.1 recovered throughput 26% RPS (still below the base and alpha throughputs):
This suggests the regression is tied to thread count / contention scaling at high core counts.
ASPNET Core KPI version to version regression
The regression is also showing up in the KPI dashboard at https://aka.ms/aspnet/benchmarks -> select either Cobalt environment. Current regression is -14.7% throughput for Json Platform test and -14.4% throughput for Json Minimal APIs test.
Trace Analysis
CPU trace comparison (EventPipe SampleProfiler) between .NET 10 and .NET 11 on the same benchmark showed:
Total wait/synchronization time: NET10 80.4% → NET11 92.3%
Key changes in NET11:
WaitSubsystem+ThreadWaitInfo.WaitWaitSubsystem+WaitableObject.Wait_LockedLowLevelLock.WaitAndAcquirePollGCWorkerWaitForSocketEventsA comparison with a NET11-alpha build (before the WaitSubsystem change) showed:
the character changed: LowLevelLock.WaitAndAcquire (active lock contention) quadrupled from 1.21% → 4.86%, which is more
harmful to throughput than passive semaphore waits.
Root Cause
The primary suspect is PR #117788 ("Move CoreCLR over to the managed wait subsystem"), merged Dec 2025. The managed
WaitSubsystemintroduces a global process-wide lock (as noted in PR #123921's description) that contends heavily at high core counts.PR #123921 ("A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy.") by @VSadov attempted to address this by replacing WaitSubsystem-based blocking with a lightweight portable implementation and adaptive spinning, but was reverted in PR #125193 due to NuGet restore regression. PR #125596 is the pending reapply.
Related Issues / PRs
Questions / Ask
FYI @DrewScoggins, @VSadov, @jkoritzinsky