You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need visibility into how long it takes to poll futures, as slow polls can have impacts across an entire process and cause issues. I think it would make sense to enable Tokio's unstable --cfg flag for this purpose, so long as we can still turn the flag off again and build without the enhanced observability if we need to address breaking changes. The relevant data would come from the RuntimeMetrics::poll_count_histogram_bucket_count() method, though the API requires some additional post-processing.
I found a couple smaller projects that pipe counters from Tokio's RuntimeMetrics to the prometheus crate, but nothing that exposes the poll time histogram, and nothing that integrates with OTel. There is a first party tokio-metrics crate that provides a nicer frontend with its own RuntimeMetrics, taking care of the necessary post-processing (plus another API that instruments individual futures). However, this doesn't integrate with any observability libraries, and its examples print metrics to standard output.
The fundamental difficulty in exposing the poll time histogram to traditional metrics APIs is that Tokio's runtime is already partially pre-aggregating the histogram. Thus, prometheus::Histogram::observe() and or opentelemetry::metrics::Histogram::record() would be insufficient. There is another way we can introduce already-aggregated data, via the MetricProducer trait from the OTel SDK. This trait is implemented by a private SDK type already to collect data from SDK instruments. Readers such as ManualReader and the opentelemetry_prometheus exporter allow adding external producers during construction, and they will combine metrics from both the SDK producer and the external producers. Thus, we can write an implementation of this trait that bridges in Tokio's runtime metrics, add it as an external producer during initialization, and we should be able to see the results from Prometheus. In addition to the data, a MetricProducer must also provide OTel scope information (i.e. name) and names, descriptions, and units of metrics. This would fit well into the OTel concept of an "instrumentation library".
We need visibility into how long it takes to poll futures, as slow polls can have impacts across an entire process and cause issues. I think it would make sense to enable Tokio's unstable
--cfg
flag for this purpose, so long as we can still turn the flag off again and build without the enhanced observability if we need to address breaking changes. The relevant data would come from theRuntimeMetrics::poll_count_histogram_bucket_count()
method, though the API requires some additional post-processing.I found a couple smaller projects that pipe counters from Tokio's
RuntimeMetrics
to theprometheus
crate, but nothing that exposes the poll time histogram, and nothing that integrates with OTel. There is a first partytokio-metrics
crate that provides a nicer frontend with its ownRuntimeMetrics
, taking care of the necessary post-processing (plus another API that instruments individual futures). However, this doesn't integrate with any observability libraries, and its examples print metrics to standard output.The fundamental difficulty in exposing the poll time histogram to traditional metrics APIs is that Tokio's runtime is already partially pre-aggregating the histogram. Thus,
prometheus::Histogram::observe()
and oropentelemetry::metrics::Histogram::record()
would be insufficient. There is another way we can introduce already-aggregated data, via theMetricProducer
trait from the OTel SDK. This trait is implemented by a private SDK type already to collect data from SDK instruments. Readers such asManualReader
and theopentelemetry_prometheus
exporter allow adding external producers during construction, and they will combine metrics from both the SDK producer and the external producers. Thus, we can write an implementation of this trait that bridges in Tokio's runtime metrics, add it as an external producer during initialization, and we should be able to see the results from Prometheus. In addition to the data, aMetricProducer
must also provide OTel scope information (i.e. name) and names, descriptions, and units of metrics. This would fit well into the OTel concept of an "instrumentation library".Related to #2955.
The text was updated successfully, but these errors were encountered: