Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.AspNetCore.Hosting perf counters are slow #50412

Open
EgorBo opened this issue Aug 29, 2023 · 17 comments
Open

Microsoft.AspNetCore.Hosting perf counters are slow #50412

EgorBo opened this issue Aug 29, 2023 · 17 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions design-proposal This issue represents a design proposal for a different issue, linked in the description Perf

Comments

@EgorBo
Copy link
Member

EgorBo commented Aug 29, 2023

We're trying to benchmark a webapp on a many-cores system and we're mainly interested in RPS metrics at the moment. We don't want to rely on client side RPS metrics (bombardier or Wrk depending on OS) because there are multiple clients and they're not started instantly at the same time. It seems that Microsoft.AspNetCore.Hosting perf counters can be used to calculate RPS on aspnet side which is what we need. Unfortunately, it seems that these counters make everything twice slower when enabled.

Basically, it can be simulated on our PerfLab with crank:

crank 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/json.benchmarks.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/azure.profile.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/steadystate.profile.yml 
--config https://raw.githubusercontent.com/aspnet/Benchmarks/main/build/ci.profile.yml 
--scenario json 
--profile arm-lin-28-app 
--profile intel-load2-load 
--application.framework net8.0 
--application.collectDependencies true 
--application.options.collectCounters true 
--application.options.counterProviders "Microsoft.AspNetCore.Hosting"

the last line is the culprit. Is it possible to somehow achieve an overhead-free server-side RPS metric? The current culprit seems to be EvenPipe:

image

Tried Linux-x64, Linux-arm64 and Windows-x64 machines.

UPD: Current suspects:

Mode RPS
No counters 1M
Collect "System.Runtime" counters 1M
Collect "Microsoft.AspNetCore.Hosting" counters 0.48M
Collect "Microsoft.AspNetCore.Hosting" counters, EventPipeEnableStackwalk=0 0.65M
Collect "Microsoft.AspNetCore.Hosting" counters, EventPipeEnableStackwalk=0, without Interlocked 0.75M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked, without lock in CounterAggregator 0.8M
@EgorBo EgorBo added the design-proposal This issue represents a design proposal for a different issue, linked in the description label Aug 29, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically label Aug 29, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2023

also, cc @dotnet/dotnet-diag and @sebastienros

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2023

Also, can the cpu cache contention be an issue here? https://github.com/dotnet/aspnetcore/blob/main/src/Hosting/Hosting/src/Internal/HostingEventSource.cs#L55-L64

we perform 3 interlocked operations per each request

@davmason
Copy link
Member

Can you set DOTNET_EventPipeEnableStackwalk=0 and try again? That will make EventPipe never try to collect managed stacks for events and should greatly increase performance. It will mean you don't get stacks for any events, but I think that is OK with your scenario from what I see

@EgorBo
Copy link
Member Author

EgorBo commented Aug 29, 2023

Can you set DOTNET_EventPipeEnableStackwalk=0 and try again? That will make EventPipe never try to collect managed stacks for events and should greatly increase performance. It will mean you don't get stacks for any events, but I think that is OK with your scenario from what I see

Thanks! It does improve things, but still slower than just System.Runtime counters (which don't affect RPS)

Mode RPS
Defaut 1M
Collect counters, System.Runtime 1M
Collect counters, Microsoft.AspNetCore.Hosting 0.48M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0 0.65M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked 0.75M
Collect counters, Microsoft.AspNetCore.Hosting, EventPipeEnableStackwalk=0, without Interlocked, without lock in CounterAggregator 0.8M

@sebastienros
Copy link
Member

we perform 3 interlocked operations per each request

At the level of concurrency we have on these systems (a lot of concurrent requests and 80 cores on some of the ARM64 machines) it's probably the reason.

@EgorBo
Copy link
Member Author

EgorBo commented Aug 30, 2023

So, it seems that Interlocked counters do contribute but there is still something else that slows us down here. I compiled Microsoft.AspNetCore.Hosting.dll locally and removed the counters and got +0.1M RPS (resulting in 0.75M RPS)

@EgorBo
Copy link
Member Author

EgorBo commented Aug 30, 2023

Windows-x64, Interlocked.* operations commented (so the provider reports 0 RPS) and DOTNET_EventPipeEnableStackwalk=0:

image

Traces for a random TP thread:

image

so 20% of perf is still wasted somewhere

@EgorBo
Copy link
Member Author

EgorBo commented Aug 30, 2023

@gfoidl gfoidl added area-runtime and removed needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically labels Aug 31, 2023
@BrennanConroy BrennanConroy added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 31, 2023
@Tratcher Tratcher added the Perf label Sep 1, 2023
@davidfowl
Copy link
Member

Are we fixing this for .NET 8?

@noahfalk
Copy link
Member

noahfalk commented Sep 6, 2023

What EventPipe providers do you have enabled in the Collect "Microsoft.AspNetCore.Hosting" counters case? There is both an EventSource named Microsoft.AspNetCore.Hosting as well as a Meter named "Microsoft.AspNetCore.Hosting". All Meters get reported in aggregate via the MetricsEventSource.

Its possible to enable both, but there is likely no reason to do so. Looking at the code in ASP.NET Core, you may also be paying the costs to track both sets of metrics even if you only enabled one of them:
https://source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,57 https://source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,353
Its possible to check if they are enabled individually, but currently the code doesn't do that.

Given you saw improvements removing the locks in CounterAggregator, I assume you enabled MetricsEventSource. Do you know what filter arguments got passed that select which Meters and Instruments to listen to? For example if you enabled all instruments on the Microsoft.AspNetCore.Hosting Meter then you would have enabled the request duration histogram and that histogram update hits a different lock here: https://github.com/dotnet/runtime/blob/9c3f8b3727d9be4de483a1d725c2bda22f956688/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ExponentialHistogramAggregator.cs#L176

so 20% of perf is still wasted somewhere

If something not yet accounted for is taking 20% of the time that should make ETW traces look fairly different. Is the difference being elusive and things feel stuck, or its just some time is needed to peel the next layer of the onion?

Is it possible to somehow achieve an overhead-free server-side RPS metric?

Nothing is totally free, but I think we could get a metric that has sufficiently low overhead it wouldn't be observable at the scale of this measurement. We can do some combination of:

  • turning off things not necessary for the measurement
  • better optimizing what is needed
  • last resort, create a custom metric that is specifically tuned for perf

@JamesNK
Copy link
Member

JamesNK commented Sep 6, 2023

Its possible to enable both, but there is likely no reason to do so. Looking at the code in ASP.NET Core, you may also be paying the costs to track both sets of metrics even if you only enabled one of them:
source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,57 source.dot.net/#Microsoft.AspNetCore.Hosting/Internal/HostingApplicationDiagnostics.cs,353
Its possible to check if they are enabled individually, but currently the code doesn't do that.

Adding separate checks for metrics vs event counters in ASP.NET Core hosting isn't difficult. I'll do that and aim for .NET 8.
Edit: PR #50412

However, it won't help if someone enables metrics and event counters. I don't see why someone would want both. I think there is a task here to look at what .NET telemetry libraries and tooling are doing when they asks to listen for counters. I know dotnet-counters listens to both, but people aren't going to use that in production, so it doesn't matter. What about the .NET OpenTelemetry SDK? Or other counter gathering tooling?

I don't think it is worth in investing in optimizing the HostingEventSource's Interlocked.Increment calls. We should encourage people to move towards metrics.

@JamesNK
Copy link
Member

JamesNK commented Sep 7, 2023

My PR to enable metrics and event counters independently didn't have any impact on the benchmark. The RPS is approximately 500k both before and after. That means application.options.counterProviders "Microsoft.AspNetCore.Hosting" is capturing event counters and metrics counters. Supporting enabling them independently is a Good Thing and the PR should go into the product, but it has no impact here.

The performance of metrics incrementing values is out of our hands in aspnetcore. Perf sensitive code is in System.Diaganostics.Metrics. There is a PR looking at it here: dotnet/runtime#91566

Something we can impact in aspnetcore is the event counters' performance. We could update hosting event counters to use something other than Interlocked. We could implement something similar to what dotnet/runtime#91566 is doing. Or use ThreadLocal<long>. @sebastienros I have a vague memory of you talking about a library that did this to improve ARM perf. Can you share details, or did I imagine the conversation?

@danmoseley
Copy link
Member

danmoseley commented Sep 7, 2023

Are all of our counters required to be exact? Where a nearly accurate value is acceptable we can use an algorithm similar to what is done here to avoid interlocked operations.

dotnet/runtime#84427

@JamesNK
Copy link
Member

JamesNK commented Sep 7, 2023

There are two counters on the hot path: total requests and active requests.

It would be surprising if the total request count wasn’t exact. And it would be a regression from current behavior.

The active request count is incremented when a request starts and decremented when the request ends. It definitely needs to be exact. Otherwise, you could end up with a situation where an idle server reports there are active requests. Or even there are negative active requests.

@noahfalk
Copy link
Member

noahfalk commented Sep 7, 2023

Are all of our counters required to be exact? Where a nearly accurate value is acceptable we can use an algorithm similar to what is done here to avoid interlocked operations.

I think it depends what people want to use them for. In the scenario here a small error might be acceptable, in others people probably expect counts to be exact. I could easily imagine this is something that tools opt into when it is appropriate for the scenario at hand, but I think it would confuse people if we introduced error bars by default.

@EgorBo @sebastienros - can you guys help me answer some of the questions in #50412 (comment)? It sounds like @JamesNK answered one of them which is crank is enabling both the HostingEventSource EventCounters and the new Meters via MetricEventSource. Presumably you don't need both and when you change the tool to disable one the overhead is going to go down. If the goal is to get a good RPS measurement right now probably the EventCounters are going to have lower overhead at the moment. If the goal is improve the perf of Meters when run by customers then lets disable the EventCounters so we isolate and work on just the Meter overhead.

@noahfalk
Copy link
Member

noahfalk commented Sep 7, 2023

There is a PR looking at it here: #50412

Did you mean dotnet/runtime#91566?

@davidfowl
Copy link
Member

davidfowl commented Sep 7, 2023

Lets make sure our counters are exact, there are several options we have here to improve the performance of how we track that (@stephentoub mentioned these techniques in the other issue on runtime)

This is an option
Or something like this

We don't want to use the approximate counting algorithm here.

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 6, 2024
@wtgodbe wtgodbe removed the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
@dotnet dotnet deleted a comment from dotnet-policy-service bot Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions design-proposal This issue represents a design proposal for a different issue, linked in the description Perf
Projects
None yet
Development

No branches or pull requests