Skip to content

[Experiment] Epoll as Public API#1

Open
DeagleGross wants to merge 7 commits into
mainfrom
dmkorolev/tls/epoll
Open

[Experiment] Epoll as Public API#1
DeagleGross wants to merge 7 commits into
mainfrom
dmkorolev/tls/epoll

Conversation

@DeagleGross

Copy link
Copy Markdown
Owner

No description provided.

DeagleGross and others added 6 commits May 5, 2026 19:54
…polling

Adds a public SafePollHandle type in System.Threading namespace (housed in
System.Net.Sockets assembly) that wraps the platform's readiness polling
mechanism (epoll on Linux, kqueue on macOS/FreeBSD).

This enables consumers like Kestrel's DirectSsl transport to own their own
poll loop thread with connection-to-core affinity, without re-declaring
the epoll/kqueue P/Invoke surface.

New native PAL functions:
- SystemNative_WaitForSocketEventsWithTimeout: like WaitForSocketEvents
  but with configurable timeout (existing one is infinite-only)
- SystemNative_TryChangeSocketEventRegistrationWithFlags: like
  TryChangeSocketEventRegistration but supports EPOLLEXCLUSIVE

New public managed types (System.Threading namespace):
- SafePollHandle: Create/Add/Modify/Remove/Wait
- PollEvents: Read/Write/ReadClose/Close/Error flags
- PollRegistrationOptions: ExclusiveWakeup
- PollNotification: Token + Events readonly struct

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Key changes:
- Moved SafePollHandle and supporting types from System.Net.Sockets to
  System.Threading library (System.Threading namespace, Polling subfolder)
- Added Unix-conditional interop shared source includes to System.Threading.csproj
  (Interop.Libraries, Interop.Errors, Interop.SocketEvent, Interop.PollHandle)
- Added TargetFrameworks with unix/osx/freebsd platform identifiers
- SafePollHandle.Add/Modify/Remove accept SafeHandle (not SafeSocketHandle)
  to support any pollable fd (sockets, pipes, eventfd, timerfd, etc.)
- DangerousAddRef/DangerousRelease pattern on all handle operations for
  safe disposal — prevents use-after-close of underlying fd
- Remove() is idempotent — safe to call on already-removed or closed handles
- Error handling uses IOException (standard Unix interop pattern) via local
  CreateIOException helper that maps Interop.Error to errno-based IOException
- All public types marked with [UnsupportedOSPlatform] for windows, browser,
  wasi, android, ios, tvos
- Provider-opaque naming: SafePollHandle (not SafeEpollHandle) — the native
  PAL already abstracts epoll (Linux) and kqueue (macOS/FreeBSD) behind
  unified SocketEvent/SocketEvents types
- PollRegistrationOptions.ExclusiveWakeup maps to EPOLLEXCLUSIVE on Linux,
  silently ignored on kqueue (which naturally distributes events)
- Modify() does not accept PollRegistrationOptions — options are immutable
  at Add time (matches EPOLLEXCLUSIVE kernel semantics: ADD-only)
- Reverted System.Net.Sockets.csproj to clean state

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Changes based on peer review:

1. PollNotification.Token renamed to State — matches .NET convention
   (Task.AsyncState, IAsyncResult.AsyncState, SocketAsyncEventArgs.UserToken)

2. State type changed from IntPtr to nint — the modern C# alias,
   preferred by API review for new APIs

3. Wait(Span<PollNotification>, int timeoutMs) changed to
   Wait(Span<PollNotification>, TimeSpan timeout) — more idiomatic .NET.
   Supports Timeout.InfiniteTimeSpan for infinite wait and TimeSpan.Zero
   for non-blocking check. Validates that negative values other than
   InfiniteTimeSpan throw ArgumentOutOfRangeException.

4. Added explicit power-user documentation noting that Add/Modify/Remove
   do not track registration state — the caller is responsible for not
   calling Modify on unregistered handles. Remove is idempotent.
   A PollRegistration wrapper type is a possible future addition if
   there is demand for a safer API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactors SocketAsyncEngine.Unix.cs and SocketAsyncContext.Unix.cs to use
SafePollHandle instead of raw Interop.Sys.* calls. This validates that the
SafePollHandle API surface is sufficient for the primary in-repo consumer.

Changes to SocketAsyncEngine:
- IntPtr _port + SocketEvent* _buffer → SafePollHandle _poll
- Constructor: CreateSocketEventPort + CreateSocketEventBuffer → SafePollHandle.Create(EventBufferCount)
- TryRegisterSocket: IntPtr socketHandle → SafeSocketHandle (SafePollHandle.Add accepts SafeHandle)
- TryRegisterCore: TryChangeSocketEventRegistration → _poll.Add(socketHandle, Read|Write, None, index)
- EventLoop: WaitForSocketEvents → _poll.Wait(notifications, Timeout.InfiniteTimeSpan)
- SocketEventHandler struct eliminated — replaced by HandleSocketEvents method
  using PollNotification[] instead of ReadOnlySpan<SocketEvent>
- FreeNativeResources eliminated — SafePollHandle.Dispose handles cleanup

Changes to SocketAsyncContext:
- TryRegister: passes SafeSocketHandle directly instead of DangerousAddRef +
  DangerousGetHandle + IntPtr. The DangerousAddRef/Release is now internal
  to SafePollHandle.Add.

Design observations surfaced:
1. PollEvents values match SocketEvents 1:1 — cast works with no translation.
2. nint State maps to the context index cleanly.
3. TimeSpan timeout works — Timeout.InfiniteTimeSpan for the infinite-wait case.
4. SafeHandle parameter on Add eliminates the manual DangerousAddRef pattern
   at the call site — cleaner API.
5. The per-event copy from native buffer to PollNotification[] is new overhead
   but should be negligible (a few ns per event). Needs TechEmpower validation.
6. The SocketEventHandler struct (used for JIT lifetime extension workaround)
   was removed. The NoInlining attribute on HandleSocketEvents should provide
   the same benefit — needs verification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces PollWaitResult, a ref struct that reads directly from the
internal native event buffer. This eliminates the per-event copy from
the previous Span<PollNotification>-based Wait.

Before: Wait() copies each SocketEvent from native buffer into a
managed PollNotification[], then the caller iterates that array.
Two iterations over the data.

After: Wait() returns a PollWaitResult that wraps the native
SocketEvent* pointer and count. The caller enumerates it via foreach
or indexer — each access translates SocketEvent→PollNotification
on the fly. One iteration, zero intermediate allocation.

PollWaitResult is a ref struct:
- Cannot escape to the heap (no field storage, no async capture)
- Enforces the lifetime constraint: result is only valid until the
  next Wait() call (because the native buffer is reused)
- Provides Count, indexer, and GetEnumerator for foreach

The old Wait(Span<PollNotification>, TimeSpan) overload is removed.
Callers (SocketAsyncEngine) updated to use the new shape:

  foreach (PollNotification notification in _poll.Wait(timeout))
  {
      // process directly — no intermediate buffer
  }

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make PollNotification layout-identical to internal SocketEvent struct
(nint Data, PollEvents Events, int padding) so Wait() can return
ReadOnlySpan<PollNotification> backed directly by the native buffer.

This eliminates:
- PollWaitResult ref struct (deleted)
- PollWaitResult.Enumerator ref struct
- Per-element translation during enumeration

ReadOnlySpan<T> is already a ref struct, so the lifetime constraint
(buffer invalid after next Wait call) is enforced by the compiler.

Credit: tmds (Tom Deseyn) suggested the layout-compatible approach
in dotnet#127908.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
DeagleGross pushed a commit that referenced this pull request May 28, 2026
…128163)

> [!NOTE]
> This PR was authored with assistance from GitHub Copilot.

Fixes dotnet#128044.

## Problem

createdump SIGSEGVs on Linux when generating a Heap-type minidump for a
process running interpreted code. The crash reproduces locally with the
`InterpreterStack` DumpTests debuggee and matches the CI failure that
prompted `<DumpTypes>Full</DumpTypes>` to be added as a temporary
workaround.

The faulting backtrace is:

```
#0  Thread::IsAddressInStack    threads.cpp:6741
#1  Thread::EnumMemoryRegionsWorker  threads.cpp:6909 (calls IsAddressInStack(currentSP))
#2  Thread::EnumMemoryRegions        threads.cpp
#3  ThreadStore::EnumMemoryRegions
#4  ClrDataAccess::EnumMemDumpAllThreadsStack
dotnet#5  ClrDataAccess::EnumMemoryRegionsWorkerHeap   (HEAP2-only path)
```

## Root cause

`Thread::m_pInterpThreadContext` was declared as a raw
`InterpThreadContext *`. In non-DAC code that's a normal host pointer,
but in
DAC mode the field's value is a target-process address. When
`IsAddressInStack` (a DAC-callable helper) dereferenced
`m_pInterpThreadContext->pStackStart` it read from a target-process
address
as if it were a host address, which faults inside createdump.

## Fix

Change the field type to `PTR_InterpThreadContext` (DPTR), matching the
treatment of other Thread fields like `m_pFrame`. In non-DAC builds
`DPTR(T)` is just `T*`, so there is no overhead or behavior change. In
DAC
builds the read goes through `__DPtr<T>` and marshals correctly from the
target.

Also remove the `<DumpTypes>Full</DumpTypes>` workaround on the
`InterpreterStack` DumpTests debuggee so the Heap path that originally
failed is exercised again.

## Validation

Locally reproduced the original SIGSEGV on Linux x64 with the auto-dump
mechanism (`DOTNET_DbgMiniDumpType=2` + `DOTNET_Interpreter=MethodA`)
running the `InterpreterStack` debuggee. With this fix applied,
createdump
produces a complete Heap dump (~74 MB) instead of crashing.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant