Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Meter/Counters to track hits for each route/endpoint #46361

Closed
1 task done
Tratcher opened this issue Jan 31, 2023 · 15 comments
Closed
1 task done

Add Meter/Counters to track hits for each route/endpoint #46361

Tratcher opened this issue Jan 31, 2023 · 15 comments
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Milestone

Comments

@Tratcher
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Is your feature request related to a problem? Please describe the problem.

Customers have asked to know how much traffic is going to specific routes/endpoints, especially routes they've added to catch misdirected traffic (e.g. favicon.ico when the site has none). These are used to guide investigations and development efforts.

Describe the solution you'd like

We can use System.Diagnostics.Metrics.Meter to make counters per route or endpoint. and then monitor those using dotnet-counters.

Here's a simple example I tried in the EndpointRoutingMiddleware.

    private readonly Meter _meter = new Meter("Microsoft.AspNetCore.Routing");
    private readonly Counter<int> _routeMatchCounter;
...
        _routeMatchCounter = _meter.CreateCounter<int>("matched");
...
            _routeMatchCounter.Add(1, new KeyValuePair<string, object?>("endpoint", endpoint.DisplayName));

Then
dotnet-counters monitor --name MinimalSample Microsoft.AspNetCore.Routing
Produces:

Press p to pause, r to resume, q to quit.
    Status: Running

[Microsoft.AspNetCore.Routing]
    matched (Count / 1 sec)
        endpoint=405 HTTP Method Not Supported                             0
        endpoint=HTTP: GET /                                               0
        endpoint=HTTP: GET /hello/{name} => SayHello                       0
        endpoint=HTTP: GET /json => Json                                   0
        endpoint=HTTP: GET /null-result                                    0
        endpoint=HTTP: GET /plaintext => Plaintext                         0
        endpoint=HTTP: GET /problem/{problemType}                          0
        endpoint=HTTP: GET /todo/{id}                                      0

Which kind of counter?

  • I'm told ObservableCounter is more performant than Counter, but it requires more of your own bookkeeping.

What kind of data?

  • Not the full path or query, it may contain sensitive data or vary too much by parameters
  • Maybe not even individual endpoints? That might be too granular.
  • How about one per route pattern (including constraints?)? That avoids things like 405 HTTP Method Not Supported. Is it granular enough? Probably, people can be more specific with their route definitions if they want more granular counters.

Where does the counter go?

  • EndpointRoutingMiddleware can see endpoints, but lacks visibility into routes.
  • Where has access to the path and constraints? In the DFA builder/matcher somewhere?

Additional context

No response

@davidfowl
Copy link
Member

Related to #33387.

We need to do this in a uniform way across the stack and measure the overhead (I want to avoid extra allocations that you cannot turn off).

cc @noahfalk

@noahfalk
Copy link
Member

noahfalk commented Feb 1, 2023

I'm told ObservableCounter is more performant than Counter, but it requires more of your own bookkeeping.

ObservableCounter puts you in control of all the hot-path code. Whether it winds up being more efficient depends on whether your implementation to track updates can beat the performance of the default implementations provided by other listeners like OpenTelemetry or dotnet-counters. My suggestion is not to bother unless either:

  • your code already tracks these values and they are easy to poll on demand
  • you are planning to do performance benchmarking to see what the overhead is and you'd be willing to complicate the code in exchange for saving maybe low 10s of ns per counter update when the telemetry is enabled.

measure the overhead (I want to avoid extra allocations that you cannot turn off).

Writing to an instrument nobody is listening to is allocation-free as long as the ASP.NET code isn't doing allocations to prepare the arguments to the call. If you would need to do allocations to prepare arguments you can always check Instrument.Enabled in advance and early-out if nobody is listening.

We need to do this in a uniform way across the stack

Is this uniformity like "lets figure out all the ASP.NET metrics we want to add and make a big spec" or "lets make sure we are using uniform naming conventions across all .NET library telemetry", or something else?

In terms of feedback and some of the questions above about what data to return, I think that depends a lot on what the customer use-cases are. Right now it sounded like customers said they wanted to see certain data so we have a feature to capture that data. If the customer requests weren't specific about what data to capture either we need to ask for clarification, or, usually more helpful, we need to get a good understanding of what they are trying to accomplish so that we can propose what data is useful for that task. For example if devs want to understand why URLs aren't going to their intended routes then they probably are going to want to see the complete unmatched URL somewhere. If they want to understand which endpoints to spend their development effort improving, then they probably want whatever data makes it easy to identify specific endpoints.

@davidfowl
Copy link
Member

Is this uniformity like "lets figure out all the ASP.NET metrics we want to add and make a big spec" or "lets make sure we are using uniform naming conventions across all .NET library telemetry", or something else?

All of the above. Once the pattern lands in a single component, it'll be copied to other components. Let's make sure we answer all of the hard questions with this first one, so that the next 10 we add are easy 😄.

@tekian
Copy link

tekian commented Feb 1, 2023

If you want hard question, I'm puzzled to define our strategy for metering with respect to third-party efforts like OpenTelemetry that had started subscribing to our diagnostic source events & producing metrics on their own here

@davidfowl
Copy link
Member

Time machines, priorities and timing:

  • We initially only had the logging APIs
  • Then we added diagnostic source events so that external telemetry collection system could get data without needing to hook everywhere in the framework
  • Then we added activities and event counters and emitted them from the framework and libraries
  • Then we worked with OTel to figure out how to fit everything together
  • Then the metrics API were inveted (while working with Otel)

It's easy to see why we're in the current state we're in if you know the history.

@tekian
Copy link

tekian commented Feb 1, 2023

Granted, and I respect that evolution. So at this moment we have meter support and we could produce such metrics out of the box. And I think we should. I mean it's practical that at the moment OpenTelemetry has our requestIn/requestOut story covered, but that's for largely historical reasons.

@davidfowl
Copy link
Member

davidfowl commented Feb 1, 2023

Yes, nobody is arguing that we shouldn't produce metrics out of the box. In fact, there's a big issue about it (and here). I'm just saying that the first place we add them takes on the responsibility of understanding how it's gonna work all over asp.net core.

@tekian
Copy link

tekian commented Feb 1, 2023

Ok, good. I thought there might not be consensus on this yet. In which case looking at what is being done exactly in OpenTelemetry+Prometheus is a good start. AspNetCore can and should emit more metrics of course but the alfa & omega metric is a metric for incoming requests.

@dpk83
Copy link

dpk83 commented Feb 1, 2023

@davidfowl @reyang I know this request is only for tracking the request hit count, but I am expecting once this is available there will be requests to also capture the latency and success/failure of the request. If that's the case then should we consider having the AspNetCore metering auto instrumentation directly in .NET instead of having 2 solutions (one in .NET and one in OpenTelemetry .NET)?

@Tratcher
Copy link
Member Author

Tratcher commented Feb 1, 2023

@tekian
Copy link

tekian commented Feb 1, 2023

For reference here are couple of AspNetCore metrics from Prometheus. In addition to that one major incoming request duration metric, Prometheus has an inflight metric which is useful to detect problems of hanged requests.

@reyang
Copy link

reyang commented Feb 1, 2023

@davidfowl @reyang I know this request is only for tracking the request hit count, but I am expecting once this is available there will be requests to also capture the latency and success/failure of the request. If that's the case then should we consider having the AspNetCore metering auto instrumentation directly in .NET instead of having 2 solutions (one in .NET and one in OpenTelemetry .NET)?

The end goal is to have AspNetCore well instrumented, so users can turn on certain telemetry if they want to pay the corresponding cost (and those who don't need such telemetry don't have to pay for the cost). OpenTelemetry instrumentation for AspNetCore might still exist in order to support older versions of AspNetCore which does not have the built-in instrumentation + is still under Microsoft's officially support (eventually it'll disappear). This is actually the same across all programming languages and libraries, open-telemetry/opentelemetry-dotnet-contrib#788 is one example.

@dpk83
Copy link

dpk83 commented Feb 1, 2023

The end goal is to have AspNetCore well instrumented, so users can turn on certain telemetry if they want to pay the corresponding cost (and those who don't need such telemetry don't have to pay for the cost). OpenTelemetry instrumentation for AspNetCore might still exist in order to support older versions of AspNetCore which does not have the built-in instrumentation + is still under Microsoft's officially support (eventually it'll disappear). This is actually the same across all programming languages and libraries, open-telemetry/opentelemetry-dotnet-contrib#788 is one example.

Agree, and the final convergence point should be considered when designing the solution in .NET.

@ghost
Copy link

ghost commented Mar 15, 2023

Thanks for contacting us.

We're moving this issue to the .NET 8 Planning milestone for future evaluation / consideration. We would like to keep this around to collect more feedback, which can help us with prioritizing this work. We will re-evaluate this issue, during our next planning meeting(s).
If we later determine, that the issue has no community involvement, or it's very rare and low-impact issue, we will close it - so that the team can focus on more important and high impact issues.
To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@zehonghuang
Copy link

Meter/Counters Not Thread-safe. Don't use this.

@JamesNK JamesNK modified the milestones: .NET 8 Planning, 8.0-preview4 May 17, 2023
@JamesNK JamesNK closed this as completed May 17, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2023
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions
Projects
None yet
Development

No branches or pull requests