Skip to content

.NET 10: HTTPS connection establishment latency regression compared to .NET 9, triggering downstream timeout policies #124888

@perkops

Description

@perkops

Description

After upgrading from .NET 9 to .NET 10 (no code or package changes), HTTPS requests through SocketsHttpHandler more frequently require connection times that exceed 500ms — a threshold that was consistently met on .NET 9.

This is causing failures in the Azure Cosmos DB .NET SDK, which uses a hard-coded 500ms first-attempt timeout for internal metadata/address resolution requests.

We have filed Azure/azure-cosmos-dotnet-v3#5642 for the SDK side, but the underlying question remains: what changed in .NET 10's SocketsHttpHandler/SslStream pipeline that increases HTTPS request latency compared to .NET 9?

We cloned the Cosmos DB SDK and changed the first-attempt timeout from 500ms to 5 seconds. All errors disappeared completely. This confirms that requests which completed within 500ms on .NET 9 now intermittently exceed 500ms on .NET 10 — with no code or package changes, only the runtime upgrade.

See the detailed reproduction steps and screenshots in azure-cosmos-dotnet-v3#5642.

Reproduction Steps

See the detailed reproduction steps and screenshots in azure-cosmos-dotnet-v3#5642, which includes before/after comparisons with traffic-shaped latency on both .NET 9 and .NET 10.

Expected behavior

HTTPS requests through SocketsHttpHandler should have comparable connection establishment latency to .NET 9. Requests that consistently completed within 500ms on .NET 9 should not intermittently exceed 500ms on .NET 10.

Actual behavior

  • Intermittent TaskCanceledException on HTTPS requests that take slightly over 500ms
  • The same requests consistently complete within 500ms on .NET 9
  • We can reproduce this locally using traffic shaping and introducing artificial delays
  • It happens intermittently inside of Azure

Regression?

No response

Known Workarounds

No response

Configuration

  • .NET 9: No issues (identical code and packages)
  • .NET 10.0.x: Reproducible
  • OS: Linux (Azure App Services) and Windows
  • Downstream library: Azure Cosmos DB SDK 3.46.0

Other information

Potentially relevant .NET 10 changes

  • #112383 — Disposed HTTP/1.1 connections are no longer returned to the pool, potentially reducing pool hit rate and forcing more fresh TCP+TLS connection establishments
  • #110744 — Race condition fix in connection timeout CTS assignment, changes connection establishment timing

Stack trace

  System.Threading.Tasks.TaskCanceledException: The operation was canceled.
   ---> System.IO.IOException: Unable to read data from the transport
        connection: Operation canceled.
   ---> System.Net.Sockets.SocketException (125): Operation canceled
     at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs
        .ThrowException(...)
     at System.Net.Security.SslStream
        .EnsureFullTlsFrameAsync...
     at System.Net.Security.SslStream
        .ReadAsyncInternal...
     at System.Net.Http.HttpConnection.SendAsync(...)
     at System.Net.Http.HttpConnectionPool
        .SendWithVersionDetectionAndRetryAsync(...)
     at System.Net.Http.Metrics.MetricsHandler
        .SendAsyncWithMetrics(...)
     at System.Net.Http.DiagnosticsHandler
        .SendAsyncCore(...)

Impact

The Azure Cosmos DB .NET SDK has an internal 500ms first-attempt timeout for control plane operations that worked reliably on .NET 9. After upgrading to .NET 10, these requests intermittently exceed 500ms, causing recurring TaskCanceledException errors across multiple microservices in production (Azure App Services).

While the immediate fix belongs in the Cosmos SDK (azure-cosmos-dotnet-v3#5642),
the latency regression in SocketsHttpHandler may affect other libraries with similar internal timeout policies.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions