Skip to content

.NET 11 ASP.NET Core throughput regression on ARM64 at high core counts (16+) (Kestrel JSON benchmark) #127484

@LoopedBard3

Description

@LoopedBard3

Description

We are observing a 15%+ throughput regression in .NET 11 compared to .NET 10 on the ASP.NET Core Kestrel JSON benchmark, running on ARM64 Linux (Azure Cobalt 100). The regression is most severe at high core counts (>32 cores) where throughput drops dramatically instead of scaling.

This was found using dotnet/crank and the json scenario tests.

Configuration

  • Benchmark: ASP.NET Core Kestrel JSON (crank --scenario json --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml <profile information>)
  • Platform: ARM64 Linux (Azure Cobalt 100) (both Ubuntu and Azure Linux 3)
  • Versions compared:
    • Baseline: .NET 10 (stable)
    • Regressed: .NET 11.0.0-preview.1.26067.103+bfa3455fa1a8

Data

CPU Scaling (NET11 on up to 80-core ARM64)

Throughput peaks at 32 cores, then drops nearly 50% at 64 cores:

Version-to-version regression (64 cores)

Version RPS Notes
NET10 (10.0.1) Base RPS
NET11 alpha (11.0.0-alpha.1.25609.102) -2.5% RPS from base Before managed WaitSubsystem (#117788)
NET11 preview.1 (11.0.0-preview.1.26067.103) -50% RPS from base After WaitSubsystem — -47%
NET11 preview.1 + thread tuning env vars 26% RPS from preview.1 Partial recovery — see below

Partial mitigation via environment variables

Adding the following environment variables to NET11 preview.1 recovered throughput 26% RPS (still below the base and alpha throughputs):

ASPNETCORE_threadCount=16
DOTNET_SYSTEM_NET_SOCKETS_THREAD_COUNT=1
DOTNET_EnableWriteXorExecute=0
DOTNET_PerfMapEnabled=1

This suggests the regression is tied to thread count / contention scaling at high core counts.

ASPNET Core KPI version to version regression

The regression is also showing up in the KPI dashboard at https://aka.ms/aspnet/benchmarks -> select either Cobalt environment. Current regression is -14.7% throughput for Json Platform test and -14.4% throughput for Json Minimal APIs test.

Image

Trace Analysis

CPU trace comparison (EventPipe SampleProfiler) between .NET 10 and .NET 11 on the same benchmark showed:

Total wait/synchronization time: NET10 80.4% → NET11 92.3%

Key changes in NET11:

Method NET10 NET11 Δ
WaitSubsystem+ThreadWaitInfo.Wait 0% 4.56% +4.56% (new)
WaitSubsystem+WaitableObject.Wait_Locked 0% 3.86% +3.86% (new)
LowLevelLock.WaitAndAcquire 0.27% 4.86% +4.59% (18x increase)
PollGCWorker 5.52% 10.91% +5.39% (doubled)
WaitForSocketEvents 0% 15.51% +15.51% (was in native)

A comparison with a NET11-alpha build (before the WaitSubsystem change) showed:

  • PollGCWorker doubling was already present in the alpha (10.75%), not caused by Move CoreCLR over to the managed wait subsystem #117788
  • The WaitSubsystem adds ~12.4% new wait CPU but replaces ~12.8% of old LIFO semaphore waits — roughly a wash in total. However,
    the character changed: LowLevelLock.WaitAndAcquire (active lock contention) quadrupled from 1.21% → 4.86%, which is more
    harmful to throughput than passive semaphore waits.

Root Cause

The primary suspect is PR #117788 ("Move CoreCLR over to the managed wait subsystem"), merged Dec 2025. The managed WaitSubsystem introduces a global process-wide lock (as noted in PR #123921's description) that contends heavily at high core counts.

PR #123921 ("A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy.") by @VSadov attempted to address this by replacing WaitSubsystem-based blocking with a lightweight portable implementation and adaptive spinning, but was reverted in PR #125193 due to NuGet restore regression. PR #125596 is the pending reapply.

Related Issues / PRs

Questions / Ask

  1. Is the high-core-count ARM64 scaling cliff a known dimension of issue [Perf] Linux/x64: 62 Regressions on 1/6/2026 2:06:33 PM +00:00 #123159, or is this a new finding?
  2. Will PR Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193) #125596 (pending reapply of the threadpool semaphore fixes) address the WaitSubsystem contention seen here?

FYI @DrewScoggins, @VSadov, @jkoritzinsky

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions