diff --git a/content/master/guides/metrics.md b/content/master/guides/metrics.md
index 9af3a719f..7fa40c1b6 100644
--- a/content/master/guides/metrics.md
+++ b/content/master/guides/metrics.md
@@ -1,7 +1,7 @@
---
title: Metrics
weight: 60
-description: "Monitor Crossplane operations with metrics"
+description: "Track Crossplane operations with metrics"
---
Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment.
@@ -23,39 +23,91 @@ prometheus.io/port: "8080"
prometheus.io/scrape: "true"
```
+## Crossplane core metrics
+
+The Crossplane pod emits these metrics.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}function_run_function_request_total{{}} | Total number of RunFunctionRequests sent |
+| {{}}function_run_function_response_total{{}} | Total number of RunFunctionResponses received |
+| {{}}function_run_function_seconds{{}} | Histogram of RunFunctionResponse latency (seconds) |
+| {{}}function_run_function_response_cache_hits_total{{}} | Total number of RunFunctionResponse cache hits |
+| {{}}function_run_function_response_cache_misses_total{{}} | Total number of RunFunctionResponse cache misses |
+| {{}}function_run_function_response_cache_errors_total{{}} | Total number of RunFunctionResponse cache errors |
+| {{}}function_run_function_response_cache_writes_total{{}} | Total number of RunFunctionResponse cache writes |
+| {{}}function_run_function_response_cache_deletes_total{{}} | Total number of RunFunctionResponse cache deletes |
+| {{}}function_run_function_response_cache_bytes_written_total{{}} | Total number of RunFunctionResponse bytes written to cache |
+| {{}}function_run_function_response_cache_bytes_deleted_total{{}} | Total number of RunFunctionResponse bytes deleted from cache |
+| {{}}function_run_function_response_cache_read_seconds{{}} | Histogram of cache read latency (seconds) |
+| {{}}function_run_function_response_cache_write_seconds{{}} | Histogram of cache write latency (seconds) |
+| {{}}circuit_breaker_opens_total{{}} | Number of times the XR circuit breaker transitioned from closed to open |
+| {{}}circuit_breaker_closes_total{{}} | Number of times the XR circuit breaker transitioned from open to closed |
+| {{}}circuit_breaker_events_total{{}} | Number of XR watch events handled by the circuit breaker, labeled by outcome |
+| {{}}engine_controllers_started_total{{}} | Total number of controllers started |
+| {{}}engine_controllers_stopped_total{{}} | Total number of controllers stopped |
+| {{}}engine_watches_started_total{{}} | Total number of watches started |
+| {{}}engine_watches_stopped_total{{}} | Total number of watches stopped |
+{{}}
+
+## Provider metrics
+
+Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the `crossplane_managed_resource_*` metrics.
+
+Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist |
+| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state |
+| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state |
+| {{}}crossplane_managed_resource_deletion_seconds{{}} | The time it took to delete a managed resource |
+| {{}}crossplane_managed_resource_first_time_to_readiness_seconds{{}} | The time it took for a managed resource to become ready first time after creation |
+| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds{{}} | The time it took to detect a managed resource by the controller |
+| {{}}crossplane_managed_resource_drift_seconds{{}} | Time elapsed after the last successful reconcile when detecting an out-of-sync resource |
+{{}}
+
+## Upjet provider metrics
+
+These metrics are only emitted by Upjet-based providers (such as [provider-upjet-aws](https://github.com/crossplane-contrib/provider-upjet-aws), [provider-upjet-azure](https://github.com/crossplane-contrib/provider-upjet-azure), [provider-upjet-gcp](https://github.com/crossplane-contrib/provider-upjet-gcp)).
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}upjet_resource_ext_api_duration{{}} | Measures in seconds how long it takes a Cloud SDK call to complete |
+| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls to cloud providers, with labels describing the endpoints and resources |
+| {{}}upjet_resource_reconcile_delay_seconds{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |
+| {{}}upjet_resource_ttr{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources |
+| {{}}upjet_resource_cli_duration{{}} | Measures in seconds how long it takes a Terraform CLI invocation to complete |
+| {{}}upjet_resource_active_cli_invocations{{}} | The number of active (running) Terraform CLI invocations |
+| {{}}upjet_resource_running_processes{{}} | The number of running Terraform CLI and Terraform provider processes |
+{{}}
+
+## Controller-runtime and Kubernetes client metrics
+
+These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.
+
{{< table "table table-hover table-striped table-sm">}}
-| Metric Name | Description | Further Explanation |
-| --- | --- | --- |
-| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | |
-| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | |
-| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | |
-| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. |
-| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. |
-| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. |
-| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | |
-| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | |
-| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | |
-| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | |
-| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | |
-| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | |
-| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | |
-| {{}}workqueue_depth{{}} | Current depth of `workqueue` | |
-| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | |
-| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. |
-| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | |
-| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | |
-| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). |
-| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | |
-| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | |
-| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | |
-| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | |
-| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. |
-| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | |
-| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | |
-| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | |
-| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | |
-| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | |
-| {{}}circuit_breaker_opens_total{{}} | Total number of times the XR watch circuit breaker opened | |
-| {{}}circuit_breaker_closes_total{{}} | Total number of times the XR watch circuit breaker closed again | |
-| {{}}circuit_breaker_events_total{{}} | Total number of watched events handled by the XR circuit breaker | Labeled by outcome (`Allowed`, `HalfOpenAllowed`, `Dropped`); deletion events skip the breaker. |
+| Metric Name | Description |
+| --- | --- |
+| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors |
+| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads |
+| {{}}controller_runtime_active_workers{{}} | Number of workers (threads processing jobs from the work queue) per controller |
+| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller |
+| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem. |
+| {{}}controller_runtime_reconcile_time_seconds{{}} | Histogram of time per reconciliation per controller |
+| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller |
+| {{}}controller_runtime_webhook_latency_seconds{{}} | Histogram of the latency of processing admission requests |
+| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served |
+| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code |
+| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host |
+| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` |
+| {{}}workqueue_depth{{}} | Current depth of `workqueue` |
+| {{}}workqueue_longest_running_processor_seconds{{}} | How long the longest running processor for `workqueue` has been running |
+| {{}}workqueue_queue_duration_seconds{{}} | Histogram of time an item stays in `workqueue` before processing starts |
+| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` |
+| {{}}workqueue_unfinished_work_seconds{{}} | Seconds of work in progress not yet observed by `work_duration`. Large values suggest stuck threads. |
+| {{}}workqueue_work_duration_seconds{{}} | Histogram of time to process an item from `workqueue` (from start to completion) |
{{}}
diff --git a/content/v1.20/guides/metrics.md b/content/v1.20/guides/metrics.md
index d46bff2cc..3cca92c34 100644
--- a/content/v1.20/guides/metrics.md
+++ b/content/v1.20/guides/metrics.md
@@ -1,7 +1,7 @@
---
title: Metrics
weight: 60
-description: "Metrics are essential for monitoring Crossplane's operations, helping to quickly identify and resolve potential issues."
+description: "Track Crossplane operations with metrics"
---
Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment.
@@ -23,36 +23,88 @@ prometheus.io/port: "8080"
prometheus.io/scrape: "true"
```
+## Crossplane core metrics
+
+The Crossplane pod emits these metrics.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}composition_run_function_request_total{{}} | Total number of RunFunctionRequests sent |
+| {{}}composition_run_function_response_total{{}} | Total number of RunFunctionResponses received |
+| {{}}composition_run_function_seconds{{}} | Histogram of RunFunctionResponse latency (seconds) |
+| {{}}composition_run_function_response_cache_hits_total{{}} | Total number of RunFunctionResponse cache hits |
+| {{}}composition_run_function_response_cache_misses_total{{}} | Total number of RunFunctionResponse cache misses |
+| {{}}composition_run_function_response_cache_errors_total{{}} | Total number of RunFunctionResponse cache errors |
+| {{}}composition_run_function_response_cache_writes_total{{}} | Total number of RunFunctionResponse cache writes |
+| {{}}composition_run_function_response_cache_deletes_total{{}} | Total number of RunFunctionResponse cache deletes |
+| {{}}composition_run_function_response_cache_bytes_written_total{{}} | Total number of RunFunctionResponse bytes written to cache |
+| {{}}composition_run_function_response_cache_bytes_deleted_total{{}} | Total number of RunFunctionResponse bytes deleted from cache |
+| {{}}composition_run_function_response_cache_read_seconds{{}} | Histogram of cache read latency (seconds) |
+| {{}}composition_run_function_response_cache_write_seconds{{}} | Histogram of cache write latency (seconds) |
+| {{}}composition_controllers_started_total{{}} | Total number of controllers started |
+| {{}}composition_controllers_stopped_total{{}} | Total number of controllers stopped |
+| {{}}composition_watches_started_total{{}} | Total number of watches started |
+| {{}}composition_watches_stopped_total{{}} | Total number of watches stopped |
+{{}}
+
+## Provider metrics
+
+Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the `crossplane_managed_resource_*` metrics.
+
+Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist |
+| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state |
+| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state |
+| {{}}crossplane_managed_resource_deletion_seconds{{}} | The time it took to delete a managed resource |
+| {{}}crossplane_managed_resource_first_time_to_readiness_seconds{{}} | The time it took for a managed resource to become ready first time after creation |
+| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds{{}} | The time it took to detect a managed resource by the controller |
+| {{}}crossplane_managed_resource_drift_seconds{{}} | Time elapsed after the last successful reconcile when detecting an out-of-sync resource |
+{{}}
+
+## Upjet provider metrics
+
+These metrics are only emitted by Upjet-based providers (such as [provider-upjet-aws](https://github.com/crossplane-contrib/provider-upjet-aws), [provider-upjet-azure](https://github.com/crossplane-contrib/provider-upjet-azure), [provider-upjet-gcp](https://github.com/crossplane-contrib/provider-upjet-gcp)).
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}upjet_resource_ext_api_duration{{}} | Measures in seconds how long it takes a Cloud SDK call to complete |
+| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls to cloud providers, with labels describing the endpoints and resources |
+| {{}}upjet_resource_reconcile_delay_seconds{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |
+| {{}}upjet_resource_ttr{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources |
+| {{}}upjet_resource_cli_duration{{}} | Measures in seconds how long it takes a Terraform CLI invocation to complete |
+| {{}}upjet_resource_active_cli_invocations{{}} | The number of active (running) Terraform CLI invocations |
+| {{}}upjet_resource_running_processes{{}} | The number of running Terraform CLI and Terraform provider processes |
+{{}}
+
+## Controller-runtime and Kubernetes client metrics
+
+These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.
+
{{< table "table table-hover table-striped table-sm">}}
-| Metric Name | Description | Further Explanation |
-| --- | --- | --- |
-| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | |
-| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | |
-| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | |
-| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. |
-| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. |
-| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. |
-| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | |
-| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | |
-| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | |
-| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | |
-| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | |
-| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | |
-| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | |
-| {{}}workqueue_depth{{}} | Current depth of `workqueue` | |
-| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | |
-| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. |
-| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | |
-| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | |
-| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). |
-| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | |
-| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | |
-| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | |
-| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | |
-| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. |
-| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | |
-| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | |
-| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | |
-| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | |
-| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | |
+| Metric Name | Description |
+| --- | --- |
+| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors |
+| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads |
+| {{}}controller_runtime_active_workers{{}} | Number of workers (threads processing jobs from the work queue) per controller |
+| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller |
+| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem. |
+| {{}}controller_runtime_reconcile_time_seconds{{}} | Histogram of time per reconciliation per controller |
+| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller |
+| {{}}controller_runtime_webhook_latency_seconds{{}} | Histogram of the latency of processing admission requests |
+| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served |
+| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code |
+| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host |
+| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` |
+| {{}}workqueue_depth{{}} | Current depth of `workqueue` |
+| {{}}workqueue_longest_running_processor_seconds{{}} | How long the longest running processor for `workqueue` has been running |
+| {{}}workqueue_queue_duration_seconds{{}} | Histogram of time an item stays in `workqueue` before processing starts |
+| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` |
+| {{}}workqueue_unfinished_work_seconds{{}} | Seconds of work in progress not yet observed by `work_duration`. Large values suggest stuck threads. |
+| {{}}workqueue_work_duration_seconds{{}} | Histogram of time to process an item from `workqueue` (from start to completion) |
{{}}
\ No newline at end of file
diff --git a/content/v2.0/guides/metrics.md b/content/v2.0/guides/metrics.md
index 255d584e2..c6d0c4fe1 100644
--- a/content/v2.0/guides/metrics.md
+++ b/content/v2.0/guides/metrics.md
@@ -1,7 +1,7 @@
---
title: Metrics
weight: 60
-description: "Monitor Crossplane operations with metrics"
+description: "Track Crossplane operations with metrics"
---
Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment.
@@ -23,36 +23,88 @@ prometheus.io/port: "8080"
prometheus.io/scrape: "true"
```
+## Crossplane core metrics
+
+The Crossplane pod emits these metrics.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}function_run_function_request_total{{}} | Total number of RunFunctionRequests sent |
+| {{}}function_run_function_response_total{{}} | Total number of RunFunctionResponses received |
+| {{}}function_run_function_seconds{{}} | Histogram of RunFunctionResponse latency (seconds) |
+| {{}}function_run_function_response_cache_hits_total{{}} | Total number of RunFunctionResponse cache hits |
+| {{}}function_run_function_response_cache_misses_total{{}} | Total number of RunFunctionResponse cache misses |
+| {{}}function_run_function_response_cache_errors_total{{}} | Total number of RunFunctionResponse cache errors |
+| {{}}function_run_function_response_cache_writes_total{{}} | Total number of RunFunctionResponse cache writes |
+| {{}}function_run_function_response_cache_deletes_total{{}} | Total number of RunFunctionResponse cache deletes |
+| {{}}function_run_function_response_cache_bytes_written_total{{}} | Total number of RunFunctionResponse bytes written to cache |
+| {{}}function_run_function_response_cache_bytes_deleted_total{{}} | Total number of RunFunctionResponse bytes deleted from cache |
+| {{}}function_run_function_response_cache_read_seconds{{}} | Histogram of cache read latency (seconds) |
+| {{}}function_run_function_response_cache_write_seconds{{}} | Histogram of cache write latency (seconds) |
+| {{}}engine_controllers_started_total{{}} | Total number of controllers started |
+| {{}}engine_controllers_stopped_total{{}} | Total number of controllers stopped |
+| {{}}engine_watches_started_total{{}} | Total number of watches started |
+| {{}}engine_watches_stopped_total{{}} | Total number of watches stopped |
+{{}}
+
+## Provider metrics
+
+Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the `crossplane_managed_resource_*` metrics.
+
+Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist |
+| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state |
+| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state |
+| {{}}crossplane_managed_resource_deletion_seconds{{}} | The time it took to delete a managed resource |
+| {{}}crossplane_managed_resource_first_time_to_readiness_seconds{{}} | The time it took for a managed resource to become ready first time after creation |
+| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds{{}} | The time it took to detect a managed resource by the controller |
+| {{}}crossplane_managed_resource_drift_seconds{{}} | Time elapsed after the last successful reconcile when detecting an out-of-sync resource |
+{{}}
+
+## Upjet provider metrics
+
+These metrics are only emitted by Upjet-based providers (such as [provider-upjet-aws](https://github.com/crossplane-contrib/provider-upjet-aws), [provider-upjet-azure](https://github.com/crossplane-contrib/provider-upjet-azure), [provider-upjet-gcp](https://github.com/crossplane-contrib/provider-upjet-gcp)).
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}upjet_resource_ext_api_duration{{}} | Measures in seconds how long it takes a Cloud SDK call to complete |
+| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls to cloud providers, with labels describing the endpoints and resources |
+| {{}}upjet_resource_reconcile_delay_seconds{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |
+| {{}}upjet_resource_ttr{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources |
+| {{}}upjet_resource_cli_duration{{}} | Measures in seconds how long it takes a Terraform CLI invocation to complete |
+| {{}}upjet_resource_active_cli_invocations{{}} | The number of active (running) Terraform CLI invocations |
+| {{}}upjet_resource_running_processes{{}} | The number of running Terraform CLI and Terraform provider processes |
+{{}}
+
+## Controller-runtime and Kubernetes client metrics
+
+These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.
+
{{< table "table table-hover table-striped table-sm">}}
-| Metric Name | Description | Further Explanation |
-| --- | --- | --- |
-| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | |
-| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | |
-| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | |
-| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. |
-| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. |
-| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. |
-| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | |
-| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | |
-| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | |
-| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | |
-| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | |
-| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | |
-| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | |
-| {{}}workqueue_depth{{}} | Current depth of `workqueue` | |
-| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | |
-| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. |
-| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | |
-| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | |
-| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). |
-| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | |
-| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | |
-| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | |
-| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | |
-| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. |
-| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | |
-| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | |
-| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | |
-| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | |
-| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | |
+| Metric Name | Description |
+| --- | --- |
+| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors |
+| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads |
+| {{}}controller_runtime_active_workers{{}} | Number of workers (threads processing jobs from the work queue) per controller |
+| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller |
+| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem. |
+| {{}}controller_runtime_reconcile_time_seconds{{}} | Histogram of time per reconciliation per controller |
+| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller |
+| {{}}controller_runtime_webhook_latency_seconds{{}} | Histogram of the latency of processing admission requests |
+| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served |
+| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code |
+| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host |
+| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` |
+| {{}}workqueue_depth{{}} | Current depth of `workqueue` |
+| {{}}workqueue_longest_running_processor_seconds{{}} | How long the longest running processor for `workqueue` has been running |
+| {{}}workqueue_queue_duration_seconds{{}} | Histogram of time an item stays in `workqueue` before processing starts |
+| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` |
+| {{}}workqueue_unfinished_work_seconds{{}} | Seconds of work in progress not yet observed by `work_duration`. Large values suggest stuck threads. |
+| {{}}workqueue_work_duration_seconds{{}} | Histogram of time to process an item from `workqueue` (from start to completion) |
{{}}
\ No newline at end of file
diff --git a/content/v2.1/guides/metrics.md b/content/v2.1/guides/metrics.md
index c2444b23f..5282ee685 100644
--- a/content/v2.1/guides/metrics.md
+++ b/content/v2.1/guides/metrics.md
@@ -1,7 +1,7 @@
---
title: Metrics
weight: 60
-description: "Monitor Crossplane operations with metrics"
+description: "Track Crossplane operations with metrics"
---
Crossplane produces [Prometheus style metrics](https://prometheus.io/docs/introduction/overview/#what-are-metrics) for effective monitoring and alerting in your environment.
@@ -23,39 +23,91 @@ prometheus.io/port: "8080"
prometheus.io/scrape: "true"
```
+## Crossplane core metrics
+
+The Crossplane pod emits these metrics.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}function_run_function_request_total{{}} | Total number of RunFunctionRequests sent |
+| {{}}function_run_function_response_total{{}} | Total number of RunFunctionResponses received |
+| {{}}function_run_function_seconds{{}} | Histogram of RunFunctionResponse latency (seconds) |
+| {{}}function_run_function_response_cache_hits_total{{}} | Total number of RunFunctionResponse cache hits |
+| {{}}function_run_function_response_cache_misses_total{{}} | Total number of RunFunctionResponse cache misses |
+| {{}}function_run_function_response_cache_errors_total{{}} | Total number of RunFunctionResponse cache errors |
+| {{}}function_run_function_response_cache_writes_total{{}} | Total number of RunFunctionResponse cache writes |
+| {{}}function_run_function_response_cache_deletes_total{{}} | Total number of RunFunctionResponse cache deletes |
+| {{}}function_run_function_response_cache_bytes_written_total{{}} | Total number of RunFunctionResponse bytes written to cache |
+| {{}}function_run_function_response_cache_bytes_deleted_total{{}} | Total number of RunFunctionResponse bytes deleted from cache |
+| {{}}function_run_function_response_cache_read_seconds{{}} | Histogram of cache read latency (seconds) |
+| {{}}function_run_function_response_cache_write_seconds{{}} | Histogram of cache write latency (seconds) |
+| {{}}circuit_breaker_opens_total{{}} | Number of times the XR circuit breaker transitioned from closed to open |
+| {{}}circuit_breaker_closes_total{{}} | Number of times the XR circuit breaker transitioned from open to closed |
+| {{}}circuit_breaker_events_total{{}} | Number of XR watch events handled by the circuit breaker, labeled by outcome |
+| {{}}engine_controllers_started_total{{}} | Total number of controllers started |
+| {{}}engine_controllers_stopped_total{{}} | Total number of controllers stopped |
+| {{}}engine_watches_started_total{{}} | Total number of watches started |
+| {{}}engine_watches_stopped_total{{}} | Total number of watches stopped |
+{{}}
+
+## Provider metrics
+
+Crossplane providers emit these metrics. All providers built with crossplane-runtime emit the `crossplane_managed_resource_*` metrics.
+
+Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`.
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist |
+| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state |
+| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state |
+| {{}}crossplane_managed_resource_deletion_seconds{{}} | The time it took to delete a managed resource |
+| {{}}crossplane_managed_resource_first_time_to_readiness_seconds{{}} | The time it took for a managed resource to become ready first time after creation |
+| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds{{}} | The time it took to detect a managed resource by the controller |
+| {{}}crossplane_managed_resource_drift_seconds{{}} | Time elapsed after the last successful reconcile when detecting an out-of-sync resource |
+{{}}
+
+## Upjet provider metrics
+
+These metrics are only emitted by Upjet-based providers (such as [provider-upjet-aws](https://github.com/crossplane-contrib/provider-upjet-aws), [provider-upjet-azure](https://github.com/crossplane-contrib/provider-upjet-azure), [provider-upjet-gcp](https://github.com/crossplane-contrib/provider-upjet-gcp)).
+
+{{< table "table table-hover table-striped table-sm">}}
+| Metric Name | Description |
+| --- | --- |
+| {{}}upjet_resource_ext_api_duration{{}} | Measures in seconds how long it takes a Cloud SDK call to complete |
+| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls to cloud providers, with labels describing the endpoints and resources |
+| {{}}upjet_resource_reconcile_delay_seconds{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods |
+| {{}}upjet_resource_ttr{{}} | Measures in seconds the time-to-readiness (TTR) for managed resources |
+| {{}}upjet_resource_cli_duration{{}} | Measures in seconds how long it takes a Terraform CLI invocation to complete |
+| {{}}upjet_resource_active_cli_invocations{{}} | The number of active (running) Terraform CLI invocations |
+| {{}}upjet_resource_running_processes{{}} | The number of running Terraform CLI and Terraform provider processes |
+{{}}
+
+## Controller-runtime and Kubernetes client metrics
+
+These metrics come from the controller-runtime framework and Kubernetes client libraries. Both Crossplane and providers emit these metrics.
+
{{< table "table table-hover table-striped table-sm">}}
-| Metric Name | Description | Further Explanation |
-| --- | --- | --- |
-| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors | |
-| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads | |
-| {{}}composition_run_function_seconds_bucket{{}} | Histogram of RunFunctionResponse latency (seconds) | |
-| {{}}controller_runtime_active_workers{{}} | Number of used workers per controller | The number of threads processing jobs from the work queue. |
-| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. |
-| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. |
-| {{}}controller_runtime_reconcile_time_seconds_bucket{{}} | Length of time per reconciliation per controller | |
-| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller | |
-| {{}}controller_runtime_webhook_latency_seconds_bucket{{}} | Histogram of the latency of processing admission requests | |
-| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served | |
-| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code | |
-| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host | |
-| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` | |
-| {{}}workqueue_depth{{}} | Current depth of `workqueue` | |
-| {{}}workqueue_longest_running_processor_seconds{{}} | The number of seconds has the longest running processor for `workqueue` been running | |
-| {{}}workqueue_queue_duration_seconds_bucket{{}} | How long in seconds an item stays in `workqueue` before requested | The time it takes from the moment a job enter the `workqueue` until the processing of this job starts. |
-| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` | |
-| {{}}workqueue_unfinished_work_seconds{{}} | The number of seconds of work done that's in progress and hasn't observed by `work_duration`. Large values means stuck threads. | |
-| {{}}workqueue_work_duration_seconds_bucket{{}} | How long in seconds processing an item from `workqueue` takes | The time it takes from the moment the job start until it finish (either successfully or with an error). |
-| {{}}crossplane_managed_resource_exists{{}} | The number of managed resources that exist | |
-| {{}}crossplane_managed_resource_ready{{}} | The number of managed resources in `Ready=True` state | |
-| {{}}crossplane_managed_resource_synced{{}} | The number of managed resources in `Synced=True` state | |
-| {{}}upjet_resource_ext_api_duration_bucket{{}} | Measures in seconds how long it takes a Cloud SDK call to complete | |
-| {{}}upjet_resource_external_api_calls_total{{}} | The number of external API calls | The number of calls to cloud providers, with labels describing the endpoints resources. |
-| {{}}upjet_resource_reconcile_delay_seconds_bucket{{}} | Measures in seconds how long the reconciles for a resource delay from the configured poll periods | |
-| {{}}crossplane_managed_resource_deletion_seconds_bucket{{}} | The time it took to delete a managed resource | |
-| {{}}crossplane_managed_resource_first_time_to_readiness_seconds_bucket{{}} | The time it took for a managed resource to become ready first time after creation | |
-| {{}}crossplane_managed_resource_first_time_to_reconcile_seconds_bucket{{}} | The time it took to detect a managed resource by the controller | |
-| {{}}upjet_resource_ttr_bucket{{}} | Measures in seconds the `time-to-readiness` `(TTR)` for managed resources | |
-| {{}}circuit_breaker_opens_total{{}} | Total number of times the XR watch circuit breaker opened | |
-| {{}}circuit_breaker_closes_total{{}} | Total number of times the XR watch circuit breaker closed again | |
-| {{}}circuit_breaker_events_total{{}} | Total number of watched events handled by the XR circuit breaker | Labeled by outcome (`Allowed`, `HalfOpenAllowed`, `Dropped`); deletion events skip the breaker. |
+| Metric Name | Description |
+| --- | --- |
+| {{}}certwatcher_read_certificate_errors_total{{}} | Total number of certificate read errors |
+| {{}}certwatcher_read_certificate_total{{}} | Total number of certificate reads |
+| {{}}controller_runtime_active_workers{{}} | Number of workers (threads processing jobs from the work queue) per controller |
+| {{}}controller_runtime_max_concurrent_reconciles{{}} | Maximum number of concurrent reconciles per controller |
+| {{}}controller_runtime_reconcile_errors_total{{}} | Total number of reconciliation errors per controller. Sharp or continuous rising of this metric indicates a problem. |
+| {{}}controller_runtime_reconcile_time_seconds{{}} | Histogram of time per reconciliation per controller |
+| {{}}controller_runtime_reconcile_total{{}} | Total number of reconciliations per controller |
+| {{}}controller_runtime_webhook_latency_seconds{{}} | Histogram of the latency of processing admission requests |
+| {{}}controller_runtime_webhook_requests_in_flight{{}} | Current number of admission requests served |
+| {{}}controller_runtime_webhook_requests_total{{}} | Total number of admission requests by HTTP status code |
+| {{}}rest_client_requests_total{{}} | Number of HTTP requests, partitioned by status code, method, and host |
+| {{}}workqueue_adds_total{{}} | Total number of adds handled by `workqueue` |
+| {{}}workqueue_depth{{}} | Current depth of `workqueue` |
+| {{}}workqueue_longest_running_processor_seconds{{}} | How long the longest running processor for `workqueue` has been running |
+| {{}}workqueue_queue_duration_seconds{{}} | Histogram of time an item stays in `workqueue` before processing starts |
+| {{}}workqueue_retries_total{{}} | Total number of retries handled by `workqueue` |
+| {{}}workqueue_unfinished_work_seconds{{}} | Seconds of work in progress not yet observed by `work_duration`. Large values suggest stuck threads. |
+| {{}}workqueue_work_duration_seconds{{}} | Histogram of time to process an item from `workqueue` (from start to completion) |
{{}}
\ No newline at end of file
diff --git a/utils/vale/styles/Crossplane/crossplane-words.txt b/utils/vale/styles/Crossplane/crossplane-words.txt
index 7317d5c1f..4c285942a 100644
--- a/utils/vale/styles/Crossplane/crossplane-words.txt
+++ b/utils/vale/styles/Crossplane/crossplane-words.txt
@@ -29,6 +29,8 @@ Crossplane
crossplane-admin
crossplane-browse
crossplane-edit
+controller-runtime
+Controller-runtime
crossplane-runtime
Crossplane's
crossplane-view
@@ -81,7 +83,9 @@ ProviderConfig
ProviderConfigs
ProviderRevision
RunFunctionRequest
+RunFunctionRequests
RunFunctionResponse
+RunFunctionResponses
Sigstore
SSL
StoreConfig
@@ -91,6 +95,7 @@ ToEnvironmentFieldPath
toFieldPath
TrimPrefix
TrimSuffix
+TTR
UnhealthyPackageRevision
UnknownPackageRevisionHealth
ValidPipeline
diff --git a/utils/vale/styles/Crossplane/spelling-exceptions.txt b/utils/vale/styles/Crossplane/spelling-exceptions.txt
index ae35534f6..7cd5094fe 100644
--- a/utils/vale/styles/Crossplane/spelling-exceptions.txt
+++ b/utils/vale/styles/Crossplane/spelling-exceptions.txt
@@ -47,6 +47,7 @@ one-time
One-time
one-way
One-way
+out-of-sync
Operation-level
pattern-based
Pattern-based
@@ -84,6 +85,8 @@ team-based
Team-based
third-party
Time-sensitive
+time-to-readiness
+Upjet-based
top-level
unpause
untrusted