crossplane · jbw976 · Dec 2, 2025 · Nov 26, 2025 · haarchri · Nov 26, 2025
diff --git a/content/master/guides/metrics.md b/content/master/guides/metrics.md
@@ -1,7 +1,7 @@
 ---
 title: Metrics
 weight: 60
-description: "Monitor Crossplane operations with metrics"
+description: "Track Crossplane operations with metrics"
 ---
 
 Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment.
@@ -23,39 +23,91 @@ prometheus.io/port: "8080"
 prometheus.io/scrape: "true"
 ```    
 
+## Crossplane core metrics
+
+The Crossplane pod emits these metrics.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{<hover label="function_run_function_request_total" line="1">}}function_run_function_request_total{{</hover>}} | Total number of RunFunctionRequests sent |
+| {{<hover label="function_run_function_response_total" line="2">}}function_run_function_response_total{{</hover>}} | Total number of RunFunctionResponses received |
+| {{<hover label="function_run_function_seconds" line="3">}}function_run_function_seconds{{</hover>}} | Histogram of RunFunctionResponse latency (seconds) |
+| {{<hover label="function_run_function_response_cache_hits_total" line="4">}}function_run_function_response_cache_hits_total{{</hover>}} | Total number of RunFunctionResponse cache hits |
+| {{<hover label="function_run_function_response_cache_misses_total" line="5">}}function_run_function_response_cache_misses_total{{</hover>}} | Total number of RunFunctionResponse cache misses |
+| {{<hover label="function_run_function_response_cache_errors_total" line="6">}}function_run_function_response_cache_errors_total{{</hover>}} | Total number of RunFunctionResponse cache errors |
+| {{<hover label="function_run_function_response_cache_writes_total" line="7">}}function_run_function_response_cache_writes_total{{</hover>}} | Total number of RunFunctionResponse cache writes |
+| {{<hover label="function_run_function_response_cache_deletes_total" line="8">}}function_run_function_response_cache_deletes_total{{</hover>}} | Total number of RunFunctionResponse cache deletes |
+| {{<hover label="function_run_function_response_cache_bytes_written_total" line="9">}}function_run_function_response_cache_bytes_written_total{{</hover>}} | Total number of RunFunctionResponse bytes written to cache |
+| {{<hover label="function_run_function_response_cache_bytes_deleted_total" line="10">}}function_run_function_response_cache_bytes_deleted_total{{</hover>}} | Total number of RunFunctionResponse bytes deleted from cache |
+| {{<hover label="function_run_function_response_cache_read_seconds" line="11">}}function_run_function_response_cache_read_seconds{{</hover>}} | Histogram of cache read latency (seconds) |
+| {{<hover label="function_run_function_response_cache_write_seconds" line="12">}}function_run_function_response_cache_write_seconds{{</hover>}} | Histogram of cache write latency (seconds) |
+| {{<hover label="circuit_breaker_opens_total" line="13">}}circuit_breaker_opens_total{{</hover>}} | Number of times the XR circuit breaker transitioned from closed to open |
+| {{<hover label="circuit_breaker_closes_total" line="14">}}circuit_breaker_closes_total{{</hover>}} | Number of times the XR circuit breaker transitioned from open to closed |
+| {{<hover label="circuit_breaker_events_total" line="15">}}circuit_breaker_events_total{{</hover>}} | Number of XR watch events handled by the circuit breaker, labeled by outcome |
+| {{<hover label="engine_controllers_started_total" line="16">}}engine_controllers_started_total{{</hover>}} | Total number of controllers started |
+| {{<hover label="engine_controllers_stopped_total" line="17">}}engine_controllers_stopped_total{{</hover>}} | Total number of controllers stopped |
+| {{<hover label="engine_watches_started_total" line="18">}}engine_watches_started_total{{</hover>}} | Total number of watches started |
+| {{<hover label="engine_watches_stopped_total" line="19">}}engine_watches_stopped_total{{</hover>}} | Total number of watches stopped |
+{{</table >}}
+
+## Provider metrics
+
+Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the `crossplane_managed_resource_*` metrics.
+
+Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{<hover label="crossplane_managed_resource_exists" line="1">}}crossplane_managed_resource_exists{{</hover>}} | The number of managed resources that exist |
+| {{<hover label="crossplane_managed_resource_ready" line="2">}}crossplane_managed_resource_ready{{</hover>}} | The number of managed resources in `Ready=True` state |
+| {{<hover label="crossplane_managed_resource_synced" line="3">}}crossplane_managed_resource_synced{{</hover>}} | The number of managed resources in `Synced=True` state |
+| {{<hover label="crossplane_managed_resource_deletion_seconds" line="4">}}crossplane_managed_resource_deletion_seconds{{</hover>}} | The time it took to delete a managed resource |
+| {{<hover label="crossplane_managed_resource_first_time_to_readiness_seconds" line="5">}}crossplane_managed_resource_first_time_to_readiness_seconds{{</hover>}} | The time it took for a managed resource to become ready first time after creation |
+| {{<hover label="crossplane_managed_resource_first_time_to_reconcile_seconds" line="6">}}crossplane_managed_resource_first_time_to_reconcile_seconds{{</hover>}} | The time it took to detect a managed resource by the controller |
+| {{<hover label="crossplane_managed_resource_drift_seconds" line="7">}}crossplane_managed_resource_drift_seconds{{</hover>}} | Time elapsed after the last successful reconcile when detecting an out-of-sync resource |
+{{</table >}}
+
+## Upjet provider metrics
+
+These metrics are only emitted by Upjet-based providers (such as [provider-upjet-aws](https://github.com/crossplane-contrib/provider-upjet-aws), [provider-upjet-azure](https://github.com/crossplane-contrib/provider-upjet-azure), [provider-upjet-gcp](https://github.com/crossplane-contrib/provider-upjet-gcp)).
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{<hover label="upjet_resource_ext_api_duration" line="1">}}upjet_resource_ext_api_duration{{</hover>}} | Measures in seconds how long it takes a Cloud SDK call to complete |
+| {{<hover label="upjet_resource_external_api_calls_total" line="2">}}upjet_resource_external_api_calls_total{{</hover>}} | The number of external API calls to cloud providers, with labels describing the endpoints and resources |
+| {{<hover label="upjet_resource_reconcile_delay_seconds" line="3">}}upjet_resource_reconcile_delay_seconds{{</hover>}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |
+| {{<hover label="upjet_resource_ttr" line="4">}}upjet_resource_ttr{{</hover>}} | Measures in seconds the time-to-readiness (TTR) for managed resources |
+| {{<hover label="upjet_resource_cli_duration" line="5">}}upjet_resource_cli_duration{{</hover>}} | Measures in seconds how long it takes a Terraform CLI invocation to complete |
+| {{<hover label="upjet_resource_active_cli_invocations" line="6">}}upjet_resource_active_cli_invocations{{</hover>}} | The number of active (running) Terraform CLI invocations |
+| {{<hover label="upjet_resource_running_processes" line="7">}}upjet_resource_running_processes{{</hover>}} | The number of running Terraform CLI and Terraform provider processes |
+{{</table >}}
+
+## Controller-runtime and Kubernetes client metrics
+
+These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.
+
 {{< table "table table-hover table-striped table-sm">}}
-| Metric Name | Description | Further Explanation |
-| --- | --- | --- |
-| {{<hover label="certwatcher_read_certificate_errors_total" line="1">}}certwatcher_read_certificate_errors_total{{</hover>}} | Total number of certificate read errors |  |
-| {{<hover label="certwatcher_read_certificate_total" line="2">}}certwatcher_read_certificate_total{{</hover>}} | Total number of certificate reads |  |
-| {{<hover label="composition_run_function_seconds_bucket" line="3">}}composition_run_function_seconds_bucket{{</hover>}} | Histogram of RunFunctionResponse latency (seconds) |  |
-| {{<hover label="controller_runtime_active_workers" line="4">}}controller_runtime_active_workers{{</hover>}} | Number of used workers per controller | The number of threads processing jobs from the work queue. |
-| {{<hover label="controller_runtime_max_concurrent_reconciles" line="5">}}controller_runtime_max_concurrent_reconciles{{</hover>}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. |
-| {{<hover label="controller_runtime_reconcile_errors_total" line="6">}}controller_runtime_reconcile_errors_total{{</hover>}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. |
-| {{<hover label="controller_runtime_reconcile_time_seconds_bucket" line="7">}}controller_runtime_reconcile_time_seconds_bucket{{</hover>}} | Length of time per reconciliation per controller |  |
-| {{<hover label="controller_runtime_reconcile_total" line="8">}}controller_runtime_reconcile_total{{</hover>}} | Total number of reconciliations per controller |  |
-| {{<hover label="controller_runtime_webhook_latency_seconds_bucket" line="9">}}controller_runtime_webhook_latency_seconds_bucket{{</hover>}} | Histogram of the latency of processing admission requests |  |
-| {{<hover label="controller_runtime_webhook_requests_in_flight" line="10">}}controller_runtime_webhook_requests_in_flight{{</hover>}} | Current number of admission requests served |  |
-| {{<hover label="controller_runtime_webhook_requests_total" line="11">}}controller_runtime_webhook_requests_total{{</hover>}} | Total number of admission requests by HTTP status code |  |
-| {{<hover label="rest_client_requests_total" line="12">}}rest_client_requests_total{{</hover>}} | Number of HTTP requests, partitioned by status code, method, and host |  |
-| {{<hover label="workqueue_adds_total" line="13">}}workqueue_adds_total{{</hover>}} | Total number of adds handled by `workqueue` |  |
-| {{<hover label="workqueue_depth" line="14">}}workqueue_depth{{</hover>}} | Current depth of `workqueue` |  |
-| {{<hover label="workqueue_longest_running_processor_seconds" line="15">}}workqueue_longest_running_processor_seconds{{</hover>}} | The number of seconds has the longest running processor for `workqueue` been running |  |
-| {{<hover label="workqueue_queue_duration_seconds_bucket" line="16">}}workqueue_queue_duration_seconds_bucket{{</hover>}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. |
-| {{<hover label="workqueue_retries_total" line="17">}}workqueue_retries_total{{</hover>}} | Total number of retries handled by `workqueue` |  |
-| {{<hover label="workqueue_unfinished_work_seconds" line="18">}}workqueue_unfinished_work_seconds{{</hover>}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. |  |
-| {{<hover label="workqueue_work_duration_seconds_bucket" line="19">}}workqueue_work_duration_seconds_bucket{{</hover>}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). |
-| {{<hover label="crossplane_managed_resource_exists" line="20">}}crossplane_managed_resource_exists{{</hover>}} | The number of managed resources that exist |  |
-| {{<hover label="crossplane_managed_resource_ready" line="21">}}crossplane_managed_resource_ready{{</hover>}} | The number of managed resources in `Ready=True` state |  |
-| {{<hover label="crossplane_managed_resource_synced" line="22">}}crossplane_managed_resource_synced{{</hover>}} | The number of managed resources in `Synced=True` state |  |
-| {{<hover label="upjet_resource_ext_api_duration_bucket" line="23">}}upjet_resource_ext_api_duration_bucket{{</hover>}} | Measures in seconds how long it takes a Cloud SDK call to complete |  |
-| {{<hover label="upjet_resource_external_api_calls_total" line="24">}}upjet_resource_external_api_calls_total{{</hover>}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. |
-| {{<hover label="upjet_resource_reconcile_delay_seconds_bucket" line="25">}}upjet_resource_reconcile_delay_seconds_bucket{{</hover>}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |  |
-| {{<hover label="crossplane_managed_resource_deletion_seconds_bucket" line="26">}}crossplane_managed_resource_deletion_seconds_bucket{{</hover>}} | The time it took to delete a managed resource |  |
-| {{<hover label="crossplane_managed_resource_first_time_to_readiness_seconds_bucket" line="27">}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{</hover>}} | The time it took for a managed resource to become ready first time after creation |  |
-| {{<hover label="crossplane_managed_resource_first_time_to_reconcile_seconds_bucket" line="28">}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{</hover>}} | The time it took to detect a managed resource by the controller |  |
-| {{<hover label="upjet_resource_ttr_bucket" line="29">}}upjet_resource_ttr_bucket{{</hover>}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources |  |
-| {{<hover label="circuit_breaker_opens_total" line="30">}}circuit_breaker_opens_total{{</hover>}} | Total number of times the XR watch circuit breaker opened |  |
-| {{<hover label="circuit_breaker_closes_total" line="31">}}circuit_breaker_closes_total{{</hover>}} | Total number of times the XR watch circuit breaker closed again |  |
-| {{<hover label="circuit_breaker_events_total" line="32">}}circuit_breaker_events_total{{</hover>}} | Total number of watched events handled by the XR circuit breaker | Labeled by outcome (`Allowed`, `HalfOpenAllowed`, `Dropped`); deletion events skip the breaker. |
+| Metric Name | Description |
+| --- | --- |
+| {{<hover label="certwatcher_read_certificate_errors_total" line="1">}}certwatcher_read_certificate_errors_total{{</hover>}} | Total number of certificate read errors |
+| {{<hover label="certwatcher_read_certificate_total" line="2">}}certwatcher_read_certificate_total{{</hover>}} | Total number of certificate reads |
+| {{<hover label="controller_runtime_active_workers" line="3">}}controller_runtime_active_workers{{</hover>}} | Number of workers (threads processing jobs from the work queue) per controller |
+| {{<hover label="controller_runtime_max_concurrent_reconciles" line="4">}}controller_runtime_max_concurrent_reconciles{{</hover>}} | Maximum number of concurrent reconciles per controller |
+| {{<hover label="controller_runtime_reconcile_errors_total" line="5">}}controller_runtime_reconcile_errors_total{{</hover>}} | Total number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem. |
+| {{<hover label="controller_runtime_reconcile_time_seconds" line="6">}}controller_runtime_reconcile_time_seconds{{</hover>}} | Histogram of time per reconciliation per controller |
+| {{<hover label="controller_runtime_reconcile_total" line="7">}}controller_runtime_reconcile_total{{</hover>}} | Total number of reconciliations per controller |
+| {{<hover label="controller_runtime_webhook_latency_seconds" line="8">}}controller_runtime_webhook_latency_seconds{{</hover>}} | Histogram of the latency of processing admission requests |
+| {{<hover label="controller_runtime_webhook_requests_in_flight" line="9">}}controller_runtime_webhook_requests_in_flight{{</hover>}} | Current number of admission requests served |
+| {{<hover label="controller_runtime_webhook_requests_total" line="10">}}controller_runtime_webhook_requests_total{{</hover>}} | Total number of admission requests by HTTP status code |
+| {{<hover label="rest_client_requests_total" line="11">}}rest_client_requests_total{{</hover>}} | Number of HTTP requests, partitioned by status code, method, and host |
+| {{<hover label="workqueue_adds_total" line="12">}}workqueue_adds_total{{</hover>}} | Total number of adds handled by `workqueue` |
+| {{<hover label="workqueue_depth" line="13">}}workqueue_depth{{</hover>}} | Current depth of `workqueue` |
+| {{<hover label="workqueue_longest_running_processor_seconds" line="14">}}workqueue_longest_running_processor_seconds{{</hover>}} | How long the longest running processor for `workqueue` has been running |
+| {{<hover label="workqueue_queue_duration_seconds" line="15">}}workqueue_queue_duration_seconds{{</hover>}} | Histogram of time an item stays in `workqueue` before processing starts |
+| {{<hover label="workqueue_retries_total" line="16">}}workqueue_retries_total{{</hover>}} | Total number of retries handled by `workqueue` |
+| {{<hover label="workqueue_unfinished_work_seconds" line="17">}}workqueue_unfinished_work_seconds{{</hover>}} | Seconds of work in progress not yet observed by `work_duration`. Large values suggest stuck threads. |
+| {{<hover label="workqueue_work_duration_seconds" line="18">}}workqueue_work_duration_seconds{{</hover>}} | Histogram of time to process an item from `workqueue` (from start to completion) |
 {{</table >}}