Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#128606
Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#128606VSadov wants to merge 12 commits into
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @VSadov |
There was a problem hiding this comment.
Pull request overview
This PR refactors the thread pool’s worker-parking semaphore to a unified implementation across Windows and Unix by introducing a low-level “block/wake” primitive (futex/WaitOnAddress where available, monitor fallback otherwise), and adjusts worker dispatch/queue behaviors to better handle spurious wake scenarios.
Changes:
- Add low-level futex-style wait/wake entrypoints on Linux and corresponding managed wrappers (Windows WaitOnAddress + Unix futex syscall).
- Replace platform-specific
LowLevelLifoSemaphoreimplementations with a single managed implementation built onLowLevelThreadBlocker+LowLevelLock. - Update worker dispatch plumbing (
DispatchResult, no-spin waits, head-index fencing) and related config/interop bits (NativeAOT libs, interop docs, small interop perf annotations).
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Declares new exported futex-style wait/wake functions. |
| src/native/libs/System.Native/pal_threading.c | Implements futex syscall wrappers on Linux (and stubs on non-Linux). |
| src/native/libs/System.Native/entrypoints.c | Exposes the new exports via the System.Native DllImport entry table. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Adjusts work-stealing and dispatch behavior, introduces DispatchResult, and refactors local-work transfer. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Switches worker loop to use DispatchResult and adds a no-spin wait mode. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Updates cooperative-blocking config to include a DOTNET_ env var name. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | New low-level blocking primitive (futex/WaitOnAddress or monitor fallback). |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLock.cs | Refactors lock waiting/signaling to use LowLevelThreadBlocker. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Removes the previous Windows-specific implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Removes the previous Unix-specific implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | New unified semaphore implementation built on blocker stack + counts. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | Adds Windows WaitOnAddress-based “futex” wrapper. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | Adds Unix wrapper over System.Native futex exports (Linux-only in practice). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Changes exponential backoff to return spin count and adjusts caps. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new CoreLib sources and new interop compile items into the build. |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds LibraryImport declarations for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.Threading.cs | Adds SetThreadPriorityBoost import used during worker parking. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition to LeaveCriticalSection. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds Libraries.Synch constant for the synch API-set DLL. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition to LowLevelMonitor_Release. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds LibraryImport declarations for the new System.Native futex exports. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Adds Synchronization.lib for NativeAOT link compatibility. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Registers the new synch API-set imports for NativeAOT. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Adds Synchronization.lib to the default NativeAOT SDK native libs list. |
| docs/coding-guidelines/interop-guidelines.md | Updates interop guidelines/examples (Mincore casing, library constants, folder layout). |
| { | ||
| // Only one worker is requested at a time to mitigate Thundering Herd problem. | ||
| if (Interlocked.Exchange(ref _hasOutstandingThreadRequest, 1) == 0) | ||
| if (_hasOutstandingThreadRequest == 0 && |
There was a problem hiding this comment.
The only requirement is that the read needs to happen after inserting a workitem. Inserting a workitem is a full fence.
Volatile is unnecessary here and would make no difference.
| DllImportEntry(SystemNative_LowLevelMonitor_Signal_Release) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WaitOnAddress) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WaitOnAddressTimeout) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WakeByAddressSingle) |
There was a problem hiding this comment.
Not sure about this. Seems like WASM/WASI tests are passing without this change.
| // Use Interlocked. This assignment must happen before trying to acquire the _blockerStackLock | ||
| int origWake = Interlocked.Exchange(ref _pendingWake, 1); | ||
| Debug.Assert(origWake == 0); | ||
| WakeOneCore(); |
There was a problem hiding this comment.
_pendingWake is a two-state variable. It is int only to be used in interlocked operations.
We should not be waking another worker if a previous wake is still pending.
|
|
||
| #pragma clang diagnostic push | ||
| #pragma clang diagnostic ignored "-Wunused-parameter" | ||
| #pragma clang diagnostic ignored "-Wmissing-noreturn" |
| var countsAfterUpdate = new Counts(Interlocked.Add(ref _data, unchecked((ulong)1) << WaiterCountShift)); | ||
| Debug.Assert(countsAfterUpdate.WaiterCount != ushort.MaxValue); // overflow check | ||
| return countsAfterUpdate; |
There was a problem hiding this comment.
The number of threadpool threads is bounded by ushort.MaxValue. We cannot have more waiters than threads.
…implementation of LIFO policy." (dotnet#125193) This reverts commit 51b1e92.
| DllImportEntry(SystemNative_LowLevelMonitor_Signal_Release) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WaitOnAddress) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WaitOnAddressTimeout) | ||
| DllImportEntry(SystemNative_LowLevelFutex_WakeByAddressSingle) |
| if (!HasWaitersToWake(countsBeforeUpdate)) | ||
| break; | ||
|
|
||
| // CAS collision, but still have waters to wake, try again. |
| // restore the default. | ||
| Interop.Kernel32.SetThreadPriorityBoost(Interop.Kernel32.GetCurrentThread(), bDisablePriorityBoost: false); |
|
|
||
| #if defined(TARGET_LINUX) | ||
| #include <linux/futex.h> /* Definition of FUTEX_* constants */ | ||
| #include <sys/syscall.h> /* Definition of SYS_* constants */ |
This is the same change as in #125193 with some fixes/improvements.
The original change was reverted due to unexpected regressions in NuGet restore benchmarks.
The root cause for the regression were behavioral differences in blocking on a completion port vs.
WaitOnAddress.This can defeat the idea that unnecessary threads stay mostly blocked, consume no CPU and eventually timeout and quit.
Mitigation:
Some other issues/opportunities were addressed while figuring/eliminating the root cause.
TODO: diffs on NuGet, asp.net \w cobalt