Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Tokio runtime metrics #2968

Open
divergentdave opened this issue Apr 5, 2024 · 0 comments
Open

Expose Tokio runtime metrics #2968

divergentdave opened this issue Apr 5, 2024 · 0 comments

Comments

@divergentdave
Copy link
Contributor

We need visibility into how long it takes to poll futures, as slow polls can have impacts across an entire process and cause issues. I think it would make sense to enable Tokio's unstable --cfg flag for this purpose, so long as we can still turn the flag off again and build without the enhanced observability if we need to address breaking changes. The relevant data would come from the RuntimeMetrics::poll_count_histogram_bucket_count() method, though the API requires some additional post-processing.

I found a couple smaller projects that pipe counters from Tokio's RuntimeMetrics to the prometheus crate, but nothing that exposes the poll time histogram, and nothing that integrates with OTel. There is a first party tokio-metrics crate that provides a nicer frontend with its own RuntimeMetrics, taking care of the necessary post-processing (plus another API that instruments individual futures). However, this doesn't integrate with any observability libraries, and its examples print metrics to standard output.

The fundamental difficulty in exposing the poll time histogram to traditional metrics APIs is that Tokio's runtime is already partially pre-aggregating the histogram. Thus, prometheus::Histogram::observe() and or opentelemetry::metrics::Histogram::record() would be insufficient. There is another way we can introduce already-aggregated data, via the MetricProducer trait from the OTel SDK. This trait is implemented by a private SDK type already to collect data from SDK instruments. Readers such as ManualReader and the opentelemetry_prometheus exporter allow adding external producers during construction, and they will combine metrics from both the SDK producer and the external producers. Thus, we can write an implementation of this trait that bridges in Tokio's runtime metrics, add it as an external producer during initialization, and we should be able to see the results from Prometheus. In addition to the data, a MetricProducer must also provide OTel scope information (i.e. name) and names, descriptions, and units of metrics. This would fit well into the OTel concept of an "instrumentation library".

Related to #2955.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant