Activities for Http Connections, Dns, Sockets and SslStream #103922

antonfirsov · 2024-06-24T22:01:24Z

Edit: for the final design see:

#103922 (comment)
open-telemetry/semantic-conventions#1192

This is a replacement for #101814 following our agreement that Http Connection sub-operations (Dns, Socket, SslStream) should be represented with their sub-activities instead of emitting events on the Http Connection activity.

The PR implements the following activity-tree across multiple libraries:

System.Net.Http.Connections.HttpConnection
  |- System.Net.NameResolution.DsnLookup
  |- System.Net.Sockets.Connect
  |- System.Net.Security.TlsHandshake

Where System.Net.Http.Connections.HttpConnection doesn't have a parent (has it's own Trace ID), and every request is linked to their connection via activity.AddLink() when the connection is being dispatched to a request.

The 4 new activities have their own ActivitySources which have to be enabled separately. By using OpenTelemetry.Exporter.Console this can be done via wildcards:

using var tracerProvider = Sdk.CreateTracerProviderBuilder()
                .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("ConnAct"))
                .AddSource("System.Net.*")
                .AddConsoleExporter()
                .Build();

A sampe program & output can be found here..

Fixes #93832.

/cc @samsp-msft @noahfalk @lmolkova

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs

davidfowl · 2024-06-25T16:37:25Z

A sampe program & output can be found here..

Can I ask you to show an example with the aspire dashboard? https://aspiredashboard.com/

It should be a simple docker command to run, pointing the app at it with the otlp exporter. Looking at console output for traces is quite hard.

# Conflicts: # src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http3Connection.cs # src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs

antonfirsov · 2024-06-26T19:34:59Z

@davidfowl here you go:

Where expanding the connection will see:

A few notes:

The connection span covers the entire connection lifetime until closure, while the subspans cover the networking operations (DNS lookup, TCP connection establishment, TLS handshake)
There are no attributes since thy are not yet implemented. We are currently discussing what is reasonable to include.
Currently, it is not possible to navigate to connections from requests. For that, Aspire dashboard needs to implement Aspire Dashboard: Traces: Span links aren't shown in span details aspire#2577.
With the current design, you will not be able to navigate to failing connections from their initiating requests. For that we would need to add another Activity Link and tag those links with names ("initiated connection" vs "serving connection"). This might raise the concern that we bake too many connection pool implementation details into the telemetry we produce.

stephentoub · 2024-06-26T21:53:30Z

The connection span covers the entire connection lifetime until closure

Is this going to be useful in a trace? I don't know one way or the other, but it seems like it could result in a lot of noise.

Is there an equivalent span that's only about setting up the connection, from the time it was requested until the time it became usable?

davidfowl · 2024-06-26T22:58:25Z

Is this going to be useful in a trace? I don't know one way or the other, but it seems like it could result in a lot of noise.

Is there an equivalent span that's only about setting up the connection, from the time it was requested until the time it became usable?

I agree, we care about how long it took to connect, not how long the connection itself lasts. That feels like it could be another span, but not the most useful one. I think about this as zooming into why a request may have taken so long.

antonfirsov · 2024-06-26T23:28:31Z

That feels like it could be another span, but not the most useful one.

The proposed connection span matches the semantics of the http.client.request.duration metric.

One solution to have more useful traces could be to decompose it as following:

[http] connection {peer}                 // Covers the entire pooled connection lifetime
  |- [http] connection_initiation {peer} // Covers the connection establishment
      |- DNS {host}
      |- [socket] connection_initiation {peer}
      |- TLS {server_name}

Another is to only introduce connection_initiation, however that feels problematic to link from follow-up requests reusing the same connection.

cc @noahfalk @lmolkova

lmolkova · 2024-06-26T23:54:25Z

The overall connection span could be useful to understand:

which connection my request was done over (if we can link connection span to request span). alternatives could be:
- some connection id attribute on the request span
- just logs
could help debugging abandoned response streams and connections that are not returned to the pool (some additional stuff is probably needed to make it obvious)
not sure if HTTP pipelining is still a thing, but it makes it easy to detect
how connection ended (server closed, error happened, etc), when did it end comparing to the request

Many other useful things are already exposed as metrics (number of active connections, connection duration), so I agree span usefulness is limited.

Regarding the noise: connections should be rare comparing to requests (if there is any significant load). The spans are reported by a different ActivitySource, so users decide if they want them.

Anyway, I don't have a strong opinion on how necessary it is - it can always be added later.

src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketsTelemetry.cs

MihaZupan

I'll let Mana double check the H3 part, but this LGTM, thank you

P.S. When we eventually remove the experimental prefix, my small preference would be to use the existing activity source for HTTP.

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs

ManickaP

HTTP for now, I have skipped tests though.
I'm continuing on DNS/Sockets/TLS.

ManickaP · 2024-07-11T12:58:13Z

...Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/ConnectionSetupDiagnostics.cs

+
+        public static void AddConnectionLinkToRequestActivity(Activity connectionSetupActivity)
+        {
+            Debug.Assert(connectionSetupActivity is not null);


Like in ReportError, why does this require the null check before every call instead of having:

if (connectionSetupActivity is null) return;

here.

This pattern seems to be generally around for our telemetry everywhere:

if (HttpTelemetry.Log.IsEnabled()) HttpTelemetry.Log.SomeMethod();

I assumed this exists to save a few cycles by avoiding calling a method and deepening the stack on the happy path, so I also applied it for metrics and distributed tracing.

ManickaP · 2024-07-11T14:35:22Z

...Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/ConnectionSetupDiagnostics.cs

+
+namespace System.Net.Http
+{
+    internal static class ConnectionSetupDiagnostics


I still don't think this warrants yet another diagnostics related class in HTTP, but I'm not gonna die on this hill.
Also I have a small preference for renaming to something like ConnectionDistributedTracing or something.

src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketsTelemetry.cs

ManickaP · 2024-07-11T16:04:34Z

...Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/ConnectionSetupDiagnostics.cs

+
+        public static void StopConnectionSetupActivity(Activity activity, Exception? exception, IPEndPoint? remoteEndPoint)
+        {
+            Debug.Assert(activity is not null);


Like in ReportError, why does this require the null check before every call instead of having:

if (connectionSetupActivity is null) return;

here.

See comment. IMO if we want to get rid of this pattern because we prefer clarity/readability over saving a few cycles, we should do it for all telemetry pillars not only distributed tracing.

cc @stephentoub

Actually, I'm being inconsistent since the Start() methods would also need a check if we want to strictly follow the pattern. I will wait for more feedback before changing anything about this.

ReportError has the check inside and not at the call site, that's one that is bothering me (or rather Stop and AddConnectionLink). With Start, the check and code is more complicated so it makes sense to keep it inside.

ReportError is already on an expensive Exception path, so no point bothering with micro-optimizations.
I leave this for now as-is, but I believe we should revisit our telemetry coding patterns/guidelines.

My preference would be clarity over microoptimizations everywhere, but some may disagree.

ManickaP · 2024-07-11T16:17:19Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

-                        stream = await ConnectHelper.EstablishSslConnectionAsync(_sslOptionsProxy, request, async, stream, cancellationToken).ConfigureAwait(false);
-                    }
-                    break;
+            Activity? activity = ConnectionSetupDiagnostics.StartConnectionSetupActivity(IsSecure, OriginAuthority);


Is IsSecure enough here? It assumes only HTTP goes through this, but what about web sockets (ws / wss)? And what about socks proxies?

Good catch, looks like ws/wss/socks will report http/https wrongly in both Metrics and Distributed Tracing. If we want to address this in this PR, I would prefer to do it by opting out from creating an activity for those for now, and cover other protocols later.

IMHO in practice this will unlikely hurt anyone for an experimental feature, so my preference would be to leave it as-is for now, and create a tracking issue for .NET 10, when we could fix this for both Metrics and Tracing.

Thoughts?

it as-is for now, and create a tracking issue for .NET 10

👍

...m.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http1.cs

...m.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.Http3.cs

...Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/ConnectionSetupDiagnostics.cs

ManickaP

Some more comments.

src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionTelemetry.cs

src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs

…olutionTelemetry.cs Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

…rsov/runtime into connection-activities-05

antonfirsov · 2024-07-11T23:51:07Z

@ManickaP addressed:

#103922 (comment)
#103922 (comment)
class renaming from: #103922 (comment)

ManickaP

Last batch of comments. Thank you for your patience!

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs

ManickaP · 2024-07-12T08:11:48Z

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs

+                }
+            }
+
+            static (string?, string?) GetNameAndVersionString(SslProtocols protocol) => protocol switch


SslProtocols is flags, so is this behaving as expected? E.g. returning nothing in case there are more protocols set, or we're sure that at this point there will always be just one?

EventSource telemetry does the same after acquiring the enum with GetSslProtocolInternal() so I assumed it should be a single value after handshake completion normally. If not, our telemetry doesn't emit the protocol info.

runtime/src/libraries/System.Net.Security/src/System/Net/Security/NetSecurityTelemetry.cs

Lines 178 to 201 in f9eda07

switch (protocol)

{

#pragma warning disable SYSLIB0039 // TLS 1.0 and 1.1 are obsolete

case SslProtocols.Tls:

protocolSessionsOpen = ref _sessionsOpenTls10;

handshakeDurationCounter = _handshakeDurationTls10Counter;

break;

case SslProtocols.Tls11:

protocolSessionsOpen = ref _sessionsOpenTls11;

handshakeDurationCounter = _handshakeDurationTls11Counter;

break;

#pragma warning restore SYSLIB0039

case SslProtocols.Tls12:

protocolSessionsOpen = ref _sessionsOpenTls12;

handshakeDurationCounter = _handshakeDurationTls12Counter;

break;

case SslProtocols.Tls13:

protocolSessionsOpen = ref _sessionsOpenTls13;

handshakeDurationCounter = _handshakeDurationTls13Counter;

break;

}

ManickaP · 2024-07-12T08:31:18Z

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs

@@ -15,6 +15,11 @@ namespace System.Net.Security
 {
    public partial class SslStream
    {
+        private const string ActivitySourceName = "Experimental.System.Net.Security";


Why isn't this part of NetSecurityTelemetry like in Dns and Socket?
So we have:

an extra class in Http

part of telemetry class in Dns and Socket

part of production class in Ssl

These pieces of code evolved like dust balls in a dirty room, but there is some logic:

In HTTP, various telemetry pillars follow complex requirements across multiple protocols and various aspects (request vs connection telemetry, Metrics vs EventSource vs Activities, http1-2 vs http3). When there is cohesion, a class has been defined to put related things into a single place, when there isn't, things were kept separate.

In DNS and Sockets, telemetry can be tracked together across the 3 pillars

In SSL, the Activity logic is so tiny, that I didn't bother to move it to a separate class.

I will do a second round of thinking about the last point.

Moved around things a bit for SslStream

src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketsTelemetry.cs

src/libraries/System.Net.Sockets/tests/FunctionalTests/System.Net.Sockets.Tests.csproj

Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

antonfirsov · 2024-07-12T17:52:45Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2024-07-12T17:52:55Z

Azure Pipelines successfully started running 1 pipeline(s).

Activities for Http Connection, Dns, Sockets and Tls

4e740d0

antonfirsov added the area-System.Net label Jun 24, 2024

antonfirsov added this to the 9.0.0 milestone Jun 24, 2024

dotnet-policy-service bot assigned antonfirsov Jun 24, 2024

stephentoub reviewed Jun 24, 2024

View reviewed changes

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jun 24, 2024

View reviewed changes

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jun 24, 2024

View reviewed changes

src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs Show resolved Hide resolved

stephentoub reviewed Jun 25, 2024

View reviewed changes

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs Outdated Show resolved Hide resolved

address review feedback

3945fe4

tarekgh reviewed Jun 25, 2024

View reviewed changes

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs Outdated Show resolved Hide resolved

antonfirsov added 3 commits June 25, 2024 19:22

Merge branch 'main' into connection-activities-05

f40ce38

# Conflicts: # src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http3Connection.cs # src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs

resolve conflicts

08bc611

get rid of finally block

e2cf15e

build-analysis bot mentioned this pull request Jun 26, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

antonfirsov requested review from wfurt, MihaZupan and rzikm June 26, 2024 20:59

antonfirsov removed request for wfurt, MihaZupan and rzikm June 26, 2024 23:28

antonfirsov marked this pull request as draft June 26, 2024 23:29

lmolkova reviewed Jul 11, 2024

View reviewed changes

src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketsTelemetry.cs Show resolved Hide resolved

MihaZupan approved these changes Jul 11, 2024

View reviewed changes

src/libraries/System.Net.Security/src/System/Net/Security/SslStream.IO.cs Outdated Show resolved Hide resolved

ManickaP reviewed Jul 11, 2024

View reviewed changes

src/libraries/System.Net.NameResolution/src/System/Net/NameResolutionTelemetry.cs Outdated Show resolved Hide resolved

src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs Show resolved Hide resolved

src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs Show resolved Hide resolved

antonfirsov and others added 6 commits July 11, 2024 23:46

*handshake

e829df2

Update src/libraries/System.Net.NameResolution/src/System/Net/NameRes…

7c94ae8

…olutionTelemetry.cs Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

Merge branch 'connection-activities-05' of https://github.com/antonfi…

92afed7

…rsov/runtime into connection-activities-05

fix H3 logic based on feedback

06896bf

ConnectionSetupDiagnostics -> ConnectionSetupDistributedTracing

b455d17

Merge branch 'main' into connection-activities-05

0d231ba

ManickaP approved these changes Jul 12, 2024

View reviewed changes

ManickaP mentioned this pull request Jul 12, 2024

[H/3] Distributed Tracing / Telemetry #104783

Open

antonfirsov and others added 8 commits July 12, 2024 16:26

suggestion

56e8cb3

Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

suggestion

870eae5

Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

suggestion

de774dd

Co-authored-by: Marie Píchová <11718369+ManickaP@users.noreply.github.com>

Merge branch 'main' into connection-activities-05

284734b

implement 'network.transport'

b1ef7f3

SslStream: move Activity management code to NetSecurityTelemetry

d99734f

readd assertion and add comment

ae587f7

Merge branch 'main' into connection-activities-05

9443cd2

Merge branch 'main' into connection-activities-05

c9a5ad2

antonfirsov merged commit ee5770d into dotnet:main Jul 13, 2024
83 checks passed

jakobbotsch mentioned this pull request Jul 15, 2024

System.Net.Security.Tests.TelemetryTest.SuccessfulHandshake_ActivityRecorded failures in CI #104883

Closed

matouskozak mentioned this pull request Jul 17, 2024

[Apple] Microsoft-Diagnostics-DiagnosticSource EventSource not terminated properly #104881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activities for Http Connections, Dns, Sockets and SslStream #103922

Activities for Http Connections, Dns, Sockets and SslStream #103922

antonfirsov commented Jun 24, 2024 •

edited

Loading

davidfowl commented Jun 25, 2024

antonfirsov commented Jun 26, 2024

stephentoub commented Jun 26, 2024 •

edited

Loading

davidfowl commented Jun 26, 2024

antonfirsov commented Jun 26, 2024 •

edited

Loading

lmolkova commented Jun 26, 2024 •

edited

Loading

MihaZupan left a comment •

edited

Loading

ManickaP left a comment

ManickaP Jul 11, 2024

antonfirsov Jul 11, 2024 •

edited

Loading

ManickaP Jul 11, 2024

ManickaP Jul 11, 2024

antonfirsov Jul 11, 2024 •

edited

Loading

antonfirsov Jul 11, 2024

antonfirsov Jul 11, 2024 •

edited

Loading

ManickaP Jul 12, 2024

antonfirsov Jul 12, 2024

ManickaP Jul 11, 2024

antonfirsov Jul 11, 2024 •

edited

Loading

ManickaP Jul 12, 2024

ManickaP left a comment

antonfirsov commented Jul 11, 2024

ManickaP left a comment

ManickaP Jul 12, 2024

antonfirsov Jul 12, 2024

ManickaP Jul 12, 2024

antonfirsov Jul 12, 2024 •

edited

Loading

antonfirsov Jul 12, 2024

antonfirsov commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

	switch (protocol)
	{
	#pragma warning disable SYSLIB0039 // TLS 1.0 and 1.1 are obsolete
	case SslProtocols.Tls:
	protocolSessionsOpen = ref _sessionsOpenTls10;
	handshakeDurationCounter = _handshakeDurationTls10Counter;
	break;

	case SslProtocols.Tls11:
	protocolSessionsOpen = ref _sessionsOpenTls11;
	handshakeDurationCounter = _handshakeDurationTls11Counter;
	break;
	#pragma warning restore SYSLIB0039

	case SslProtocols.Tls12:
	protocolSessionsOpen = ref _sessionsOpenTls12;
	handshakeDurationCounter = _handshakeDurationTls12Counter;
	break;

	case SslProtocols.Tls13:
	protocolSessionsOpen = ref _sessionsOpenTls13;
	handshakeDurationCounter = _handshakeDurationTls13Counter;
	break;
	}

Activities for Http Connections, Dns, Sockets and SslStream #103922

Activities for Http Connections, Dns, Sockets and SslStream #103922

Conversation

antonfirsov commented Jun 24, 2024 • edited Loading

davidfowl commented Jun 25, 2024

antonfirsov commented Jun 26, 2024

stephentoub commented Jun 26, 2024 • edited Loading

davidfowl commented Jun 26, 2024

antonfirsov commented Jun 26, 2024 • edited Loading

lmolkova commented Jun 26, 2024 • edited Loading

MihaZupan left a comment • edited Loading

Choose a reason for hiding this comment

ManickaP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ManickaP left a comment

Choose a reason for hiding this comment

antonfirsov commented Jul 11, 2024

ManickaP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

antonfirsov commented Jun 24, 2024 •

edited

Loading

stephentoub commented Jun 26, 2024 •

edited

Loading

antonfirsov commented Jun 26, 2024 •

edited

Loading

lmolkova commented Jun 26, 2024 •

edited

Loading

MihaZupan left a comment •

edited

Loading

antonfirsov Jul 11, 2024 •

edited

Loading

antonfirsov Jul 11, 2024 •

edited

Loading

antonfirsov Jul 11, 2024 •

edited

Loading

antonfirsov Jul 11, 2024 •

edited

Loading

antonfirsov Jul 12, 2024 •

edited

Loading