Consider adding a telemetry event for failed HTTP connection attempts #110351

MihaZupan · 2024-12-03T11:59:47Z

With how requests are split from connections in SocketsHttpHandler, the user may not always see the errors that occurred during connection establishment (dns + tcp + tls + http/2 handshake).
The errors may be swallowed if we already served the initiating request with a different connection, or if that request timed out.
There are therefore situations where you may have poor visibility into what's happening.

One way of gaining the info is to inject a custom ConnectCallback along the lines of

handler.ConnectCallback = async (context, ct) =>
{
    var stopwatch = Stopwatch.StartNew();
    var socket = new Socket(SocketType.Stream, ProtocolType.Tcp) { NoDelay = true };
    try
    {
        await socket.ConnectAsync(context.DnsEndPoint, ct);
        return new NetworkStream(socket, ownsSocket: true);
    }
    catch (Exception ex)
    {
        socket.Dispose();
        Console.WriteLine($"Failed to connect to {context.DnsEndPoint} after {stopwatch.ElapsedMilliseconds:N2}: {ex}");
        throw;
    }
};

This gives you visibility into failures when trying to connect, but not into TLS failures/timeouts.
It's possible to also perform the TLS handshake in the callback, but we don't always make it trivial (how do you pick the host, how do you handle ALPN, potential custom ssl settings, ...).

It seems rather simple for us to add an EventSource event to System.Net.Http that just logs all such failures, corresponding to

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http1.cs

Lines 307 to 309 in 05fa881

    
           private void HandleHttp11ConnectionFailure(HttpConnectionWaiter<HttpConnection>? requestWaiter, Exception e) 
        
           { 
        
               if (NetEventSource.Log.IsEnabled()) Trace($"HTTP/1.1 connection failed: {e}");

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http2.cs

Lines 271 to 273 in 05fa881

    
           private void HandleHttp2ConnectionFailure(HttpConnectionWaiter<Http2Connection?> requestWaiter, Exception e) 
        
           { 
        
               if (NetEventSource.Log.IsEnabled()) Trace($"HTTP2 connection failed: {e}");

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http3.cs

Lines 325 to 329 in 05fa881

    
           private void HandleHttp3ConnectionFailure(HttpConnectionWaiter<Http3Connection?> requestWaiter, Exception? e) 
        
           { 
        
               Debug.Assert(IsHttp3Supported()); 
        
               if (NetEventSource.Log.IsEnabled()) Trace($"HTTP3 connection failed: {e}");

Maybe something like

[Event(42, Level = EventLevel.Error)]
public void ConnectionFailed(byte versionMajor, byte versionMinor, string scheme, string host, int port, string? remoteAddress, double elapsedMilliseconds, string exception);

Does this seem reasonable @antonfirsov, maybe we have better ways of exposing such data now?

The text was updated successfully, but these errors were encountered:

dotnet-policy-service · 2024-12-03T12:00:09Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

antonfirsov · 2024-12-03T17:53:38Z

maybe we have better ways of exposing such data now?

We are reporting the same exceptions on the HTTP connection_setup Activity when a connection attempt fails:

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

Lines 647 to 653 in efd5a63

    
           finally 
        
           { 
        
               if (activity is not null) 
        
               { 
        
                   ConnectionSetupDistributedTracing.StopConnectionSetupActivity(activity, exception, remoteEndPoint); 
        
               } 
        
           }

Whether it's a "better way", depends on the use-case. Some may prefer to consume distributed traces, others may want EventSource logs. IMO it would be valuable to expose the error information in both telemetry legs, but the it should be probably logged at the same point, and I would prefer it to happen around the line I linked. The difference is that at that point we don't catch exceptions comin from PlainTextStreamFilter and cancellations are not wrapped yet by CreateConnectTimeoutException.

MihaZupan · 2024-12-03T18:21:30Z

Ah, thanks, I forgot about this one. An event could still be useful but I'm not as worried in that case.

are not wrapped yet by CreateConnectTimeoutException.

Looks like we're inconsistent here rn, we do report the ConnectTimeout on H/3.

runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http3.cs

Lines 280 to 289 in 05fa881

    
           catch (Exception e) 
        
           { 
        
               connectionException = e is OperationCanceledException oce && oce.CancellationToken == cts.Token && !waiter.CancelledByOriginatingRequestCompletion ? 
        
                   CreateConnectTimeoutException(oce) : 
        
                   e; 
        
               // On success path connectionSetupActivity is stopped before calling InitQuicConnection(). 
        
               // This assertion makes sure that InitQuicConnection() does not throw unexpectedly. 
        
               Debug.Assert(connectionSetupActivity?.IsStopped is not true); 
        
               if (connectionSetupActivity is not null) ConnectionSetupDistributedTracing.StopConnectionSetupActivity(connectionSetupActivity, connectionException, null);

I could see it being useful if we did include info on whether the cancellation was due to ConnectTimeout / due to CancelledByOriginatingRequestCompletion (we don't include that info today anyway, but we could improve that).

One difference that could matter is that we also exclude H/2 handshake failures here.

antonfirsov · 2024-12-04T15:00:49Z

An event could still be useful but I'm not as worried in that case.

So are you aware of a specific user scenario you wanted this to help with? If yes, does this mean they would be OK consuming distributed traces?

MihaZupan · 2024-12-05T11:29:33Z

Yes. I'm helping in an investigation where a service isn't establishing connections for some reason.
In trying to advise what sort of logging might help us investigate the issue (outside of internal diagnostics), I can't give a good, clear, simple answer atm.
The ConnectCallback can cover some, but not all, cases. Existing EventCounters & events can also help.

does this mean they would be OK consuming distributed traces?

I'll try to find out, but it's likely that it only being available since 9.0 is going to be an issue.

(whatever improvements we make here are unlikely to help in this particular investigation, but I'd like us to have a better experience in the future - if for no other reason so that investigating things is easier for ourselves)

antonfirsov · 2024-12-05T12:05:48Z

Ok, tentatively triaged this to 10.0 assuming it's easy to implement. We can always change the priority.

MihaZupan added the area-System.Net.Http label Dec 3, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 3, 2024

antonfirsov added this to the 10.0.0 milestone Dec 5, 2024

antonfirsov removed the untriaged New issue has not been triaged by the area owner label Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a telemetry event for failed HTTP connection attempts #110351

Consider adding a telemetry event for failed HTTP connection attempts #110351

MihaZupan commented Dec 3, 2024 •

edited

Loading

dotnet-policy-service bot commented Dec 3, 2024

antonfirsov commented Dec 3, 2024 •

edited

Loading

MihaZupan commented Dec 3, 2024

antonfirsov commented Dec 4, 2024

MihaZupan commented Dec 5, 2024 •

edited

Loading

antonfirsov commented Dec 5, 2024

Consider adding a telemetry event for failed HTTP connection attempts #110351

Consider adding a telemetry event for failed HTTP connection attempts #110351

Comments

MihaZupan commented Dec 3, 2024 • edited Loading

dotnet-policy-service bot commented Dec 3, 2024

antonfirsov commented Dec 3, 2024 • edited Loading

MihaZupan commented Dec 3, 2024

antonfirsov commented Dec 4, 2024

MihaZupan commented Dec 5, 2024 • edited Loading

antonfirsov commented Dec 5, 2024

MihaZupan commented Dec 3, 2024 •

edited

Loading

antonfirsov commented Dec 3, 2024 •

edited

Loading

MihaZupan commented Dec 5, 2024 •

edited

Loading