-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsoft.AspNetCore.Hosting perf counters are slow #50412
Comments
also, cc @dotnet/dotnet-diag and @sebastienros |
Also, can the cpu cache contention be an issue here? https://github.com/dotnet/aspnetcore/blob/main/src/Hosting/Hosting/src/Internal/HostingEventSource.cs#L55-L64 we perform 3 interlocked operations per each request |
Can you set |
Thanks! It does improve things, but still slower than just System.Runtime counters (which don't affect RPS)
|
At the level of concurrency we have on these systems (a lot of concurrent requests and 80 cores on some of the ARM64 machines) it's probably the reason. |
So, it seems that Interlocked counters do contribute but there is still something else that slows us down here. I compiled |
PS: yes, it's responsible for another 50k RPS |
Are we fixing this for .NET 8? |
What EventPipe providers do you have enabled in the Its possible to enable both, but there is likely no reason to do so. Looking at the code in ASP.NET Core, you may also be paying the costs to track both sets of metrics even if you only enabled one of them: Given you saw improvements removing the locks in CounterAggregator, I assume you enabled MetricsEventSource. Do you know what filter arguments got passed that select which Meters and Instruments to listen to? For example if you enabled all instruments on the Microsoft.AspNetCore.Hosting Meter then you would have enabled the request duration histogram and that histogram update hits a different lock here: https://github.com/dotnet/runtime/blob/9c3f8b3727d9be4de483a1d725c2bda22f956688/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Metrics/ExponentialHistogramAggregator.cs#L176
If something not yet accounted for is taking 20% of the time that should make ETW traces look fairly different. Is the difference being elusive and things feel stuck, or its just some time is needed to peel the next layer of the onion?
Nothing is totally free, but I think we could get a metric that has sufficiently low overhead it wouldn't be observable at the scale of this measurement. We can do some combination of:
|
Adding separate checks for metrics vs event counters in ASP.NET Core hosting isn't difficult. I'll do that and aim for .NET 8. However, it won't help if someone enables metrics and event counters. I don't see why someone would want both. I think there is a task here to look at what .NET telemetry libraries and tooling are doing when they asks to listen for counters. I know I don't think it is worth in investing in optimizing the |
My PR to enable metrics and event counters independently didn't have any impact on the benchmark. The RPS is approximately 500k both before and after. That means The performance of metrics incrementing values is out of our hands in aspnetcore. Perf sensitive code is in Something we can impact in aspnetcore is the event counters' performance. We could update hosting event counters to use something other than |
Are all of our counters required to be exact? Where a nearly accurate value is acceptable we can use an algorithm similar to what is done here to avoid interlocked operations. |
There are two counters on the hot path: total requests and active requests. It would be surprising if the total request count wasn’t exact. And it would be a regression from current behavior. The active request count is incremented when a request starts and decremented when the request ends. It definitely needs to be exact. Otherwise, you could end up with a situation where an idle server reports there are active requests. Or even there are negative active requests. |
I think it depends what people want to use them for. In the scenario here a small error might be acceptable, in others people probably expect counts to be exact. I could easily imagine this is something that tools opt into when it is appropriate for the scenario at hand, but I think it would confuse people if we introduced error bars by default. @EgorBo @sebastienros - can you guys help me answer some of the questions in #50412 (comment)? It sounds like @JamesNK answered one of them which is crank is enabling both the HostingEventSource EventCounters and the new Meters via MetricEventSource. Presumably you don't need both and when you change the tool to disable one the overhead is going to go down. If the goal is to get a good RPS measurement right now probably the EventCounters are going to have lower overhead at the moment. If the goal is improve the perf of Meters when run by customers then lets disable the EventCounters so we isolate and work on just the Meter overhead. |
Did you mean dotnet/runtime#91566? |
Lets make sure our counters are exact, there are several options we have here to improve the performance of how we track that (@stephentoub mentioned these techniques in the other issue on runtime) This is an option We don't want to use the approximate counting algorithm here. |
We're trying to benchmark a webapp on a many-cores system and we're mainly interested in RPS metrics at the moment. We don't want to rely on client side RPS metrics (bombardier or Wrk depending on OS) because there are multiple clients and they're not started instantly at the same time. It seems that
Microsoft.AspNetCore.Hosting
perf counters can be used to calculate RPS on aspnet side which is what we need. Unfortunately, it seems that these counters make everything twice slower when enabled.Basically, it can be simulated on our PerfLab with crank:
the last line is the culprit. Is it possible to somehow achieve an overhead-free server-side RPS metric? The current culprit seems to be EvenPipe:
Tried Linux-x64, Linux-arm64 and Windows-x64 machines.
UPD: Current suspects:
The text was updated successfully, but these errors were encountered: