DOC-13170 Product Change- PR #143536 - metric: add /metrics endpoint with static labels #19823

florence-crl · 2025-06-23T20:23:42Z

Added prometheus-endpoint.md with info about metrics endpoint.
In monitoring-and-alerting.md, moved info in the existing Prometheus endpoint section to the new prometheus-endpoint page.
In self-hosted-deployments.json, added link to new page.
Replaced instances of ({% link {{ page.version.version }}/monitoring-and-alerting.md %}#prometheus-endpoint) and (#prometheus-endpoint) with ({% link {{ page.version.version }}/prometheus-endpoint.md %}).
Replace instances of status/vars with Prometheus endpoint.

Rendered preview

Prometheus endpoint

In monitoring-and-alerting.md, moved info in the existing Prometheus endpoint section to the new page. In self-hosted-deployments.json, added link to new page.

netlify · 2025-06-23T20:24:05Z

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name	Link
🔨 Latest commit	`e75663b`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/686d66335c9dad00084ab62b

github-actions · 2025-06-23T20:24:05Z

Files changed:

src/current/_includes/v25.3/cdc/metrics-labels.md:

src/current/v25.3/monitor-and-debug-changefeeds.md

src/current/_includes/v25.3/faq/clock-synchronization-monitoring.md:

src/current/v25.3/operational-faqs.md

src/current/_includes/v25.3/prod-deployment/cluster-unavailable-monitoring.md:

src/current/_includes/v25.3/sidebar-data/self-hosted-deployments.json
src/current/v25.3/api-support-policy.md
src/current/v25.3/backup-and-restore-monitoring.md
src/current/v25.3/cockroachdb-feature-availability.md
src/current/v25.3/datadog.md
src/current/v25.3/differences-in-metrics-between-third-party-monitoring-integrations-and-db-console.md
src/current/v25.3/kibana.md
src/current/v25.3/load-based-splitting.md
src/current/v25.3/manage-a-backup-schedule.md
src/current/v25.3/monitor-and-analyze-transaction-contention.md
src/current/v25.3/monitor-and-debug-changefeeds.md
src/current/v25.3/monitor-cockroachdb-kubernetes.md
src/current/v25.3/monitor-cockroachdb-with-prometheus.md
src/current/v25.3/monitoring-and-alerting.md
src/current/v25.3/multi-dimensional-metrics.md
src/current/v25.3/operational-faqs.md
src/current/v25.3/pause-job.md
src/current/v25.3/prometheus-endpoint.md
src/current/v25.3/row-level-ttl.md
src/current/v25.3/third-party-monitoring-tools.md
src/current/v25.3/work-with-virtual-clusters.md

netlify · 2025-06-23T20:24:06Z

✅ Deploy Preview for cockroachdb-api-docs canceled.

Name	Link
🔨 Latest commit	`e75663b`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-api-docs/deploys/686d6633637a5f0008815f9c

netlify · 2025-06-23T20:33:12Z

✅ Netlify Preview

Name	Link
🔨 Latest commit	`e75663b`
🔍 Latest deploy log	https://app.netlify.com/projects/cockroachdb-docs/deploys/686d6633bf3bad0008cfb49c
😎 Deploy Preview	https://deploy-preview-19823--cockroachdb-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…nd-alerting.md %}#prometheus-endpoint) with ({% link {{ page.version.version }}/prometheus-endpoint.md %}). Replace instances of (#prometheus-endpoint) with ({% link {{ page.version.version }}/prometheus-endpoint.md %}).

kevin-v-ngo

The Prometheus endpoint doc looks great!

src/current/v25.3/prometheus-endpoint.md

florence-crl

TFTR

src/current/v25.3/prometheus-endpoint.md

dhartunian

LGTM! Just one question.

src/current/v25.3/prometheus-endpoint.md

mikeCRL

Looks good overall. I left some suggestions, and questions to potentially consider.

mikeCRL · 2025-07-08T19:27:24Z

src/current/v25.3/operational-faqs.md

@@ -175,6 +175,10 @@ Cockroach Labs recommends that you avoid _increasing_ the period of time that DB

 ### Disable time-series storage

+{{site.data.alerts.callout_info}}
+Even if you rely on external tools for storing and visualizing your cluster's time-series metrics, CockroachDB continues to store time-series metrics for its [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards). These stored time-series metrics may be used to generate a [tsdump]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %}), which is critical during escalations to Cockroach Labs support.


Suggested change

Even if you rely on external tools for storing and visualizing your cluster's time-series metrics, CockroachDB continues to store time-series metrics for its [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards). These stored time-series metrics may be used to generate a [tsdump]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %}), which is critical during escalations to Cockroach Labs support.

Even if you rely on external tools for storing and visualizing your cluster's time-series metrics, CockroachDB continues to store time-series metrics for its [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards), unless you manually disable this collection. These stored time-series metrics may be used to generate a [tsdump]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %}), which is critical during escalations to Cockroach Labs support.

Edited for clarity and to fit better in the context of potentially disabling, however, I'm not sure why this is a callout—because we want customers to know we're still collecting data, which could have a storage cost, or because you may not want to do this, to preserve tsdump data that might be critical (I am also not sure about saying it "is critical" vs., say, "may be critical".)

In the next paragraph, we seem to say the opposite - it's almost implied that it's not critical

Disabling time-series storage is recommended only if you exclusively use a third-party tool such as [Prometheus]...

Do we mean to say that disabling time-series storage is an option if you exclusively use a third-party tool such as Prometheus, but even then, we recommend keeping it enabled in case it might help to provide it to CockroachDB Support during an issue?

(For that matter, why couldn't we just ask them to give us the data sourced from their third party tool; does it have less fidelity? Is that process/format less reliable?)

Just some food for thought to help inspire edits, or help ask SME/Support what they really care about and how they'd frame this.

mikeCRL · 2025-07-10T20:40:40Z

src/current/v25.3/prometheus-endpoint.md

+In addition to using the exported time-series data to monitor a cluster through an external system, you can write alerting rules to ensure prompt notification of critical events or issues requiring intervention or investigation. Refer to [Essential Alerts]({% link {{ page.version.version }}/essential-alerts-self-hosted.md %}) for more details.
+{{site.data.alerts.end}}
+
+Even if you rely on external tools for storing and visualizing your cluster's time-series metrics, CockroachDB continues to store time-series metrics for its [DB Console Metrics dashboards]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#metrics-dashboards). These stored time-series metrics may be used to generate a [tsdump]({% link {{ page.version.version }}/cockroach-debug-tsdump.md %}), which is critical during escalations to Cockroach Labs support.


Should we add a mention that it's possible to limit or disable this, here (e.g. to limit storage) and link out to the other page?

mikeCRL · 2025-07-10T20:57:19Z

src/current/v25.3/prometheus-endpoint.md

+
+### Static labels
+
+Static labels allow segmentation of a metric across various facets for later querying and aggregation.


I saw the later phrase "Another common scenario", which led me to realize I didn't grasp the first scenario, so here's an attempt to characterize/introduce that first scenario.

Suggested change

Static labels allow segmentation of a metric across various facets for later querying and aggregation.

Static labels allow segmentation of a metric across various facets for later querying and aggregation.

One common use of static labels is to support aggregation across related metric types. For example, rather than emitting separate metrics for inserts, selects, updates, and deletes, a single metric like `sql_count` can use a `query_type` label to distinguish among these operations. This enables operators to easily aggregate across query types (e.g., summing all SQL operations) or filter for a specific type using a label-based query.

The following tables contrast unlabeled metrics from the `_status/vars` endpoint with their labeled counterparts from the `metrics` endpoint:

mikeCRL · 2025-07-10T21:11:59Z

src/current/v25.3/prometheus-endpoint.md

+Another common scenario occurs when each label value represents a disjoint set of categories. An example here is the various certificate expiration metrics, which differ only by the specific certificate they refer to. Operators are unlikely to aggregate these, but may still want to view all certificate expiration metrics on a dashboard.
+
+For example, the output from the `metrics` endpoint will be similar to the following:


Suggested change

Another common scenario occurs when each label value represents a disjoint set of categories. An example here is the various certificate expiration metrics, which differ only by the specific certificate they refer to. Operators are unlikely to aggregate these, but may still want to view all certificate expiration metrics on a dashboard.

For example, the output from the `metrics` endpoint will be similar to the following:

In other cases, label values can represent distinct categories not meant to be aggregated. For example, certificate expiration metrics differ only by the specific certificate type they refer to. Operators are unlikely to sum or average these, but may still want to display them side by side on a dashboard for visibility.

In this case, a single metric name like `security_certificate_expiration` is reused, with the certificate type expressed as a label. The output from the `metrics` endpoint will be similar to the following:

mikeCRL · 2025-07-10T21:14:01Z

src/current/v25.3/prometheus-endpoint.md

+security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="client-ca"} 0
+security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="ui-ca"} 0
+security_certificate_expiration{node_id="1",tenant="demoapp",certificate_type="node"} 1.840654953e+09
+~~~


Suggested change

~~~

~~~

This approach avoids a proliferation of metric names while allowing third-party tools to display each certificate's expiration as a separate line in a unified graph or table.

florence-crl added 2 commits June 23, 2025 16:09

Added prometheus-endpoing.md with info about metrics endpoint.

73b0a01

In monitoring-and-alerting.md, moved info in the existing Prometheus endpoint section to the new page. In self-hosted-deployments.json, added link to new page.

fixed link

ac7ade2

florence-crl added 4 commits June 24, 2025 16:12

Merge remote-tracking branch 'origin/main' into DOC-13170

8006302

Merge remote-tracking branch 'origin/main' into DOC-13170

a1c92da

Replace instances of status/vars with Prometheus endpoint.

47ab2a5

florence-crl requested review from dhartunian and kevin-v-ngo June 26, 2025 14:04

kevin-v-ngo approved these changes Jul 8, 2025

View reviewed changes

src/current/v25.3/prometheus-endpoint.md Outdated Show resolved Hide resolved

src/current/v25.3/prometheus-endpoint.md Outdated Show resolved Hide resolved

florence-crl added 3 commits July 8, 2025 11:42

resolved merge conflict.

7e79949

Incorporated Kevin’s and Docs Reviewer GPT feedback.

1ad2c59

Merge remote-tracking branch 'origin/main' into DOC-13170

ce1c1b0

florence-crl commented Jul 8, 2025

View reviewed changes

src/current/v25.3/prometheus-endpoint.md Outdated Show resolved Hide resolved

src/current/v25.3/prometheus-endpoint.md Outdated Show resolved Hide resolved

dhartunian approved these changes Jul 8, 2025

View reviewed changes

src/current/v25.3/prometheus-endpoint.md Show resolved Hide resolved

florence-crl added 2 commits July 8, 2025 14:37

Incorporated Kevin’s and David’s feedback.

1027929

Merge remote-tracking branch 'origin/main' into DOC-13170

e75663b

florence-crl requested a review from mikeCRL July 8, 2025 18:45

mikeCRL requested changes Jul 10, 2025

View reviewed changes


		### Static labels

		Static labels allow segmentation of a metric across various facets for later querying and aggregation.

		Another common scenario occurs when each label value represents a disjoint set of categories. An example here is the various certificate expiration metrics, which differ only by the specific certificate they refer to. Operators are unlikely to aggregate these, but may still want to view all certificate expiration metrics on a dashboard.

		For example, the output from the `metrics` endpoint will be similar to the following:

-~~~
+~~~
+This approach avoids a proliferation of metric names while allowing third-party tools to display each certificate's expiration as a separate line in a unified graph or table.

DOC-13170 Product Change- PR #143536 - metric: add /metrics endpoint with static labels #19823

Are you sure you want to change the base?

DOC-13170 Product Change- PR #143536 - metric: add /metrics endpoint with static labels #19823

Uh oh!

Conversation

florence-crl commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Uh oh!

github-actions bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files changed:

Uh oh!

netlify bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for cockroachdb-api-docs canceled.

Uh oh!

netlify bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Netlify Preview

Uh oh!

kevin-v-ngo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

florence-crl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikeCRL left a comment

Choose a reason for hiding this comment

Uh oh!

mikeCRL Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

mikeCRL Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

mikeCRL Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

mikeCRL Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

mikeCRL Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

florence-crl commented Jun 23, 2025 •

edited

Loading

netlify bot commented Jun 23, 2025 •

edited

Loading

github-actions bot commented Jun 23, 2025 •

edited

Loading

netlify bot commented Jun 23, 2025 •

edited

Loading

netlify bot commented Jun 23, 2025 •

edited

Loading