Fix OTel telemetry perf regression in CLI restore#54063
Fix OTel telemetry perf regression in CLI restore#54063marcpopMSFT wants to merge 1 commit intomainfrom
Conversation
marcpopMSFT
commented
Apr 24, 2026
- Defer OTel provider initialization to after TELEMETRY_OPTOUT check
- Remove AddHttpClientInstrumentation() that was hooking all NuGet HTTP calls
- Remove unused MeterProvider (no Meter defined in CLI, pure overhead)
- Gate OTLP exporter behind explicit OTEL_EXPORTER_OTLP_ENDPOINT config
- Add thread-safe double-checked lock for provider initialization
- Defer OTel provider initialization to after TELEMETRY_OPTOUT check - Remove AddHttpClientInstrumentation() that was hooking all NuGet HTTP calls - Remove unused MeterProvider (no Meter defined in CLI, pure overhead) - Gate OTLP exporter behind explicit OTEL_EXPORTER_OTLP_ENDPOINT config - Add thread-safe double-checked lock for provider initialization Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Perf summary with this fix: |
There was a problem hiding this comment.
Pull request overview
This PR addresses a performance regression in dotnet CLI restore telemetry by deferring OpenTelemetry (OTel) provider setup until after telemetry opt-out is evaluated and by reducing default instrumentation/export overhead.
Changes:
- Defer OTel
TracerProviderinitialization until afterTELEMETRY_OPTOUTis checked, with a guarded one-time initialization path. - Remove HTTP/runtime instrumentation and the (unused) metrics provider to avoid hooking/overhead during NuGet restore.
- Only enable OTLP exporting when an explicit
OTEL_EXPORTER_OTLP_*_ENDPOINTenvironment variable is configured.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Cli/dotnet/dotnet.csproj | Removes OTel instrumentation package references that were contributing to restore overhead. |
| src/Cli/dotnet/Telemetry/TelemetryClient.cs | Defers and gates tracer provider initialization; removes metrics provider and unconditional OTLP/http/runtime instrumentation. |
| if (s_tracerProvider is not null) | ||
| { | ||
| return; | ||
| } | ||
|
|
| // Only add the OTLP exporter when the user has explicitly configured an endpoint. | ||
| var otlpEndpoint = Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT") | ||
| ?? Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"); | ||
| if (!string.IsNullOrEmpty(otlpEndpoint)) |
| // Only add the OTLP exporter when the user has explicitly configured an endpoint. | ||
| var otlpEndpoint = Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT") | ||
| ?? Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"); | ||
| if (!string.IsNullOrEmpty(otlpEndpoint)) | ||
| { | ||
| tracerBuilder.AddOtlpExporter(); | ||
| } |
| var otlpEndpoint = Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_ENDPOINT") | ||
| ?? Env.GetEnvironmentVariable("OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"); |
There was a problem hiding this comment.
This is bad for usability - most folks won't be defining these, they'll just be using tools that bind to the conventional OTel ports of 4317 and 4318. Put another way, this is very much not the expectation for the OTel ecosystem.
There was a problem hiding this comment.
I'm still learning how all of this works, but all of the OLTP exported data doesn't go anywhere we can observe since data-x-platform only takes in traces which are produced from logging events (activity events get sent as well). If this is only powering the aspire dashboard for local perf investigations, I don't think we should incur the perf hit of having this exporter by default. What do you think?
There was a problem hiding this comment.
Based on conversation, this is only pushing or sending anything if there is a specific otel endpoint configured and running such as localhost:4317/traces
MiYanni
left a comment
There was a problem hiding this comment.
Uhhhh. My gut reaction is this wouldn't work. Here's some context:
The design of this has a duality between being static and also being an instance. I maintained that during the transition to OTel since it seemed more difficult to redesign it during the migration. That's why there is a "setup" of the static instances of these provider builders, so they are not duplicated. The changes in this PR are always creating new builders on each new telemetry client instance, which I don't know the impact of that.
That's why this is written this way:
s_metricsProvider ??= s_metricsProviderBuilder.Build();
s_tracerProvider ??= s_tracerProviderBuilder.Build();That's in instance code, so that when they are built, they are only built once. This code is a mess, but I did the best with what was there.
Here, I believe Copilot thinks the OtlpExporter is the problem. If that is the case, then it should just be optional behind the DOTNET_CLI_TELEMETRY_DISABLE_TRACE_EXPORT env-var, like I did for the Azure Monitor exporter. But that really won't change the "normal" case of just using the released SDK with telemetry turned on.
Overall, I don't understand what Copilot is trying to accomplish here. Based on the comment in the code it added, it looks like it is trying to defer telemetry setup until we know if telemetry is disabled or not. AFAIK, I did that by not building the providers until it is known. Making the provider builders statically shouldn't be incurring a cost, unless I misunderstand what the point of the builders are. .Build() should be where costs are incurred, which is already behind the OTel enabled flag.
And deferring this logic won't fix the perf impact for the normal use of the official SDK build (since telemetry is on). That doesn't make any sense.
|
It's an alternative to ETW events, yeah - but nothing was really hooked up to it yet aside from . NET runtime metrics IIRC. things like GC diagnostics and assembly loads. |
| var tracerBuilder = Sdk.CreateTracerProviderBuilder() | ||
| .ConfigureResource(r => { r.AddService("dotnet-cli", serviceVersion: Product.Version); }) | ||
| .AddSource(Activities.Source.Name) | ||
| .AddInMemoryExporter(s_activities) |