feat(external_dns)!: add OpenMetrics V2 support#23327
feat(external_dns)!: add OpenMetrics V2 support#23327cunymatthieu wants to merge 5 commits intoDataDog:masterfrom
Conversation
Add OpenMetricsBaseCheckV2 implementation alongside legacy OpenMetricsBaseCheck. - Support both `openmetrics_endpoint` (OMV2) and `prometheus_url` (OMV1) - Add support for external-dns v1.18+ vector metrics with `record_type` label - Automatic `host` label renaming to `http_host` to avoid reserved tag conflicts - Add `external_dns.openmetrics.health` service check for OMV2 BREAKING CHANGE: Counter metrics now submitted as monotonic_count in OMV2 mode instead of gauge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aad8c506b0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| instance = instances[0] | ||
|
|
||
| if 'openmetrics_endpoint' in instance: | ||
| return ExternalDNS(name, init_config, instances) |
There was a problem hiding this comment.
Dispatch integration mode per instance, not from first instance
__new__ chooses OMV1 vs OMV2 from only instances[0], but each instance is validated independently to allow either prometheus_url or openmetrics_endpoint. In a mixed configuration (for example during migration), this forces all instances down one code path and then the other mode fails (_create_external_dns_instance errors when prometheus_url is absent, or OMV2 rejects instances without openmetrics_endpoint), which can stop the whole check from running.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is Additional details and impacted files🚀 New features to boost your workflow:
|
|
|
1944462 to
bb0f936
Compare
domalessi
left a comment
There was a problem hiding this comment.
Editorial review of external_dns/README.md. Three must-fix issues (metric type formatting, typo, missing punctuation) and five suggestions. No issues found in metadata.csv or service_checks.json.
|
|
||
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. | ||
|
|
||
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. |
There was a problem hiding this comment.
The Datadog metric type name is monotonic_count (underscore, code-formatted).
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. | |
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as `monotonic_count` type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. |
| ##### Metric collection | ||
|
|
||
| #### Using with service discovery | ||
| Set [Autodiscovery Integrations Templates][9] as pod annotations on your application container. Alternatively, you can configure templates with a [file, configmap, or key-value store][10]. |
There was a problem hiding this comment.
Typo: "Integrations Templates" should be "Integration Templates" (singular) — consistent with the Docker section above.
| Set [Autodiscovery Integrations Templates][9] as pod annotations on your application container. Alternatively, you can configure templates with a [file, configmap, or key-value store][10]. | |
| Set [Autodiscovery Integration Templates][9] as pod annotations on your application container. Alternatively, you can configure templates with a [file, configmap, or key-value store][10]. |
| - Counter metrics with zero values are not emitted by the latest mode on the first scrape (requires a delta > 0) | ||
| - The `metadata.csv` reflects the latest mode behavior as the reference | ||
| - The `host` label from `http_request_duration_seconds` is automatically renamed to `http_host` to avoid conflicts with Datadog's reserved tags |
There was a problem hiding this comment.
All three items are complete sentences and require end periods per the style guide.
| - Counter metrics with zero values are not emitted by the latest mode on the first scrape (requires a delta > 0) | |
| - The `metadata.csv` reflects the latest mode behavior as the reference | |
| - The `host` label from `http_request_duration_seconds` is automatically renamed to `http_host` to avoid conflicts with Datadog's reserved tags | |
| - Counter metrics with zero values are not emitted by the latest mode on the first scrape (requires a delta > 0). | |
| - The `metadata.csv` reflects the latest mode behavior as the reference. | |
| - The `host` label from `http_request_duration_seconds` is automatically renamed to `http_host` to avoid conflicts with Datadog's reserved tags. |
|
|
||
| ## Setup | ||
|
|
||
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. |
There was a problem hiding this comment.
"To get all the most up-to-date features" is awkward — "all" and "most" are redundant together.
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. | |
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). For the latest features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. |
|
|
||
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. | ||
|
|
||
| For hosts unable to use Python 3, or if you previously implemented this integration mode, see the `legacy` mode [configuration example][12]. For Autodiscovery users relying on the `external_dns.d/auto_conf.yaml` file, this file enables the `prometheus_url` option for the `legacy` mode of the check by default. See the sample [external_dns.d/auto_conf.yaml][13] for the default configuration options and the sample [external_dns.d/conf.yaml.example][3] for all available configuration options. |
There was a problem hiding this comment.
"this integration mode" is ambiguous. Clarify which mode is meant.
| For hosts unable to use Python 3, or if you previously implemented this integration mode, see the `legacy` mode [configuration example][12]. For Autodiscovery users relying on the `external_dns.d/auto_conf.yaml` file, this file enables the `prometheus_url` option for the `legacy` mode of the check by default. See the sample [external_dns.d/auto_conf.yaml][13] for the default configuration options and the sample [external_dns.d/conf.yaml.example][3] for all available configuration options. | |
| For hosts unable to use Python 3, or if you previously enabled the legacy mode, see the `legacy` mode [configuration example][12]. For Autodiscovery users relying on the `external_dns.d/auto_conf.yaml` file, this file enables the `prometheus_url` option for the `legacy` mode of the check by default. See the sample [external_dns.d/auto_conf.yaml][13] for the default configuration options and the sample [external_dns.d/conf.yaml.example][3] for all available configuration options. |
|
|
||
| - The shipped `external_dns.d/auto_conf.yaml` file enables the `prometheus_url` option by default for legacy mode. | ||
| - The `externaldns-pod` tag keeps track of the target DNS pod IP. The other tags are related to the Datadog Agent that is polling the information using the service discovery. | ||
| - For Deployments, add the annotations to the metadata of the template's specifications. Do not add it at the outer specification level. |
There was a problem hiding this comment.
"it" doesn't agree in number with "annotations" (plural) and is ambiguous.
| - For Deployments, add the annotations to the metadata of the template's specifications. Do not add it at the outer specification level. | |
| - For Deployments, add the annotations to the metadata of the template's specifications. Do not add them at the outer specification level. |
| | summary.sum | gauge | monotonic_count | | ||
| | summary.count | gauge | monotonic_count | | ||
|
|
||
| **Notes:** |
There was a problem hiding this comment.
Inconsistent formatting: line 137 uses **Notes**: (colon outside bold), but this line uses **Notes:** (colon inside bold). Per style guide, bold the term, not the colon.
| **Notes:** | |
| **Notes**: |
|
|
||
| ### Prerequisites | ||
|
|
||
| Ensure external-dns is configured to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). |
There was a problem hiding this comment.
"Ensure" should be avoided per the style guide. Recasting as a direct imperative is cleaner.
| Ensure external-dns is configured to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). | |
| Configure external-dns to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). |
|
|
||
| ## Setup | ||
|
|
||
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. |
There was a problem hiding this comment.
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. | |
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to the target endpoint) and a legacy mode (enabled by setting `prometheus_url`). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. |
| ### Installation | ||
|
|
||
| The external DNS check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your servers. | ||
| The External DNS check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your servers. |
There was a problem hiding this comment.
| The External DNS check is included in the [Datadog Agent][2] package, so you don't need to install anything else on your servers. | |
| The External DNS check is included in the [Datadog Agent][2] package. You do not need to install anything else on your servers. |
domalessi
left a comment
There was a problem hiding this comment.
Follow-up: move Python 3 requirement to Prerequisites and remove temporal language from Setup intro.
|
|
||
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. | ||
|
|
||
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. |
There was a problem hiding this comment.
Remove the Python 3 requirement (moving it to Prerequisites) and reframe the metric type difference as present-tense mode behavior rather than a before/after description.
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. | |
| The latest mode of the External DNS check submits the `.sum` and `.count` summary samples as `monotonic_count` type. In legacy mode, these are submitted as `gauge` type. See the [`metadata.csv` file][5] for a list of metrics available in each mode. |
| Ensure external-dns is configured to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). | ||
|
|
There was a problem hiding this comment.
Add the Python 3 requirement here so users see it before configuring.
| Ensure external-dns is configured to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). | |
| Configure external-dns to expose Prometheus metrics by setting the `--metrics-address` flag (default: `:7979`). | |
| The latest mode requires Python 3. For hosts unable to use Python 3, use the legacy mode. |
|
|
||
| #### External-dns version differences | ||
|
|
||
| The integration supports both old and new external-dns metric formats: |
There was a problem hiding this comment.
| The integration supports both old and new external-dns metric formats: | |
| The integration supports the following external-dns metric formats: |
domalessi
left a comment
There was a problem hiding this comment.
Structural suggestion for the Setup section intro.
| Starting with version 7.0.0, this OpenMetrics-based integration has a latest mode (enabled by setting `openmetrics_endpoint` to point to the target endpoint) and a legacy mode (enabled by setting `prometheus_url` instead). To get all the most up-to-date features, Datadog recommends enabling the latest mode. For more information, see [Latest and Legacy Versioning For OpenMetrics-based Integrations][11]. | ||
|
|
||
| The latest mode of the External DNS check requires Python 3 and submits the `.sum` and `.count` summary samples as monotonic count type. These metrics were previously submitted as `gauge` type in the legacy mode. See the [`metadata.csv` file][5] for a list of metrics available in each mode. | ||
|
|
||
| For hosts unable to use Python 3, or if you previously implemented this integration mode, see the `legacy` mode [configuration example][12]. For Autodiscovery users relying on the `external_dns.d/auto_conf.yaml` file, this file enables the `prometheus_url` option for the `legacy` mode of the check by default. See the sample [external_dns.d/auto_conf.yaml][13] for the default configuration options and the sample [external_dns.d/conf.yaml.example][3] for all available configuration options. |
There was a problem hiding this comment.
The Setup intro is doing a lot of work before the user reaches any subsection heading. Consider tightening the structure:
- Setup intro: one short paragraph covering the two modes and the Datadog recommendation. Link to the versioning guide. Nothing else.
- Prerequisites: Python 3 requirement (latest mode) + the
--metrics-addressflag note. - Configuration: move the legacy mode config pointer here ("if Python 3 is unavailable, see the legacy configuration example") as a note at the top of the section, where it's directly relevant to which config example to follow.
The third paragraph (line 17) chains four separate ideas together and is the hardest to parse. That content belongs in Configuration, not in the intro.
There was a problem hiding this comment.
Thank you for proofreading this.
- Add minimal E2E test skeleton for integration testing - Regenerate config models with latest ddev templates: - Add enable_legacy_tags_normalization option - Add SECURE_FIELD_NAMES for security validation - Add security field validation in model validator - Restructure README: - Simplify Setup section - Reorganize Prerequisites with Python 3 requirement first
bb0f936 to
30ff1a5
Compare
|
I have created the below feature request on customers behalf for this: Support both openmetrics_endpoint (OMV2) and prometheus_url (OMV1) PLease do let me know if anything else is required from the support side. |
| | summary.sum | gauge | monotonic_count | | ||
| | summary.count | gauge | monotonic_count | | ||
|
|
||
| **Notes:**: |
There was a problem hiding this comment.
Same **Notes**: formatting issue from the previous review — colon should be outside the bold, not duplicated. Looks like the fix was applied to the first occurrence (line 109) but a new instance was introduced here.
| **Notes:**: | |
| **Notes**: |
| **Notes**: | ||
|
|
||
| - The shipped `external_dns.d/auto_conf.yaml` file enables the `prometheus_url` option by default for legacy mode. | ||
| - The `externaldns-pod` tag keeps track of the target DNS pod IP. The other tags are related to the Datadog Agent that is polling the information using the service discovery. |
There was a problem hiding this comment.
Datadog's feature is Autodiscovery (one word, capital A) — the same paragraph above already uses "Autodiscovery Integration Templates" correctly. "the service discovery" is inconsistent and reads as awkward English.
| - The `externaldns-pod` tag keeps track of the target DNS pod IP. The other tags are related to the Datadog Agent that is polling the information using the service discovery. | |
| - The `externaldns-pod` tag keeps track of the target DNS pod IP. The other tags are related to the Datadog Agent that is polling the information using Autodiscovery. |
| ## Overview | ||
|
|
||
| Get metrics from the external DNS service in real time to visualize and monitor DNS metrics collected with the Kubernetes external DNS Prometheus add on. | ||
| Get real-time metrics from External DNS to visualize and monitor DNS synchronization. Track source and registry records, controller status, and HTTP request latencies. |
There was a problem hiding this comment.
| Get real-time metrics from External DNS to visualize and monitor DNS synchronization. Track source and registry records, controller status, and HTTP request latencies. | |
| Get real-time metrics from External DNS to visualize and monitor DNS synchronization, including source and registry records, controller status, and HTTP request latencies. |
|
|
||
| - The shipped `external_dns.d/auto_conf.yaml` file enables the `prometheus_url` option by default for legacy mode. | ||
| - The `externaldns-pod` tag keeps track of the target DNS pod IP. The other tags are related to the Datadog Agent that is polling the information using the service discovery. | ||
| - For Deployments, add the annotations to the metadata of the template's specifications. Do not add them at the outer specification level. |
There was a problem hiding this comment.
| - For Deployments, add the annotations to the metadata of the template's specifications. Do not add them at the outer specification level. | |
| - For Deployments, add the annotations to the metadata of the template's specification. Do not add them at the outer specification level. |
| external_dns.source.errors.total,gauge,,error,,Number of source errors,-1,external_dns,source errors, | ||
| metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags | ||
| external_dns.controller.last_sync,gauge,,second,,Timestamp of last successful sync with the DNS provider,0,external_dns,controller last sync timestamp,, | ||
| external_dns.controller.verified_a_records,gauge,,record,,Number of DNS A-records that exist both in source and registry (before v1.18),0,external_dns,verified a records,, |
There was a problem hiding this comment.
Hyphenation is inconsistent with the rest of the file (lines 9, 10, 14, 15 all use unhyphenated "A records" / "AAAA records").
| external_dns.controller.verified_a_records,gauge,,record,,Number of DNS A-records that exist both in source and registry (before v1.18),0,external_dns,verified a records,, | |
| external_dns.controller.verified_a_records,gauge,,record,,Number of DNS A records that exist both in source and registry (before v1.18),0,external_dns,verified a records,, |
| metric_name,metric_type,interval,unit_name,per_unit_name,description,orientation,integration,short_name,curated_metric,sample_tags | ||
| external_dns.controller.last_sync,gauge,,second,,Timestamp of last successful sync with the DNS provider,0,external_dns,controller last sync timestamp,, | ||
| external_dns.controller.verified_a_records,gauge,,record,,Number of DNS A-records that exist both in source and registry (before v1.18),0,external_dns,verified a records,, | ||
| external_dns.controller.verified_aaaa_records,gauge,,record,,Number of DNS AAAA-records that exist both in source and registry (before v1.18),0,external_dns,verified aaaa records,, |
There was a problem hiding this comment.
| external_dns.controller.verified_aaaa_records,gauge,,record,,Number of DNS AAAA-records that exist both in source and registry (before v1.18),0,external_dns,verified aaaa records,, | |
| external_dns.controller.verified_aaaa_records,gauge,,record,,Number of DNS AAAA records that exist both in source and registry (before v1.18),0,external_dns,verified aaaa records,, |
|
Thanks for this second review @domalessi. I'm ready for one last review |
There was a problem hiding this comment.
Editorial review of docs-team-owned files (README.md, metadata.csv, service_checks.json). All suggestions below are stylistic polish — nothing blocking. README structure matches the integration template.
Note: GitHub's review-comments API was rejecting all inline-position posts on this PR at submission time, so the suggestions are consolidated here.
external_dns/README.md
L13 — sentence case ("for") matches the destination page title:
```suggestion
This OpenMetrics-based integration has two modes: latest mode (enabled by setting `openmetrics_endpoint`) and legacy mode (enabled by setting `prometheus_url`). Datadog recommends using the latest mode for all new deployments. For more information, see [Latest and Legacy Versioning for OpenMetrics-based Integrations][10].
```
L27 — the auto_conf.yaml info here duplicates the bullet at L111. Suggest dropping it and the **Note** wrapper:
```suggestion
See the sample [external_dns.d/conf.yaml.example][3] for all available configuration options. For hosts unable to use Python 3, see the legacy mode [configuration example][11].
```
L131 — table header should be sentence case:
```suggestion
| Prometheus type | Legacy mode (`prometheus_url`) | Latest mode (`openmetrics_endpoint`) |
```
external_dns/metadata.csv
L6 — "Total count of" is redundant:
```suggestion
external_dns.http.request.duration_seconds.count,count,,request,,Number of HTTP requests (v1.18+),0,external_dns,http request count,,
```
L12 — this metric changed from gauge to count in this PR. The "Number of..." phrasing reads like a snapshot — suggest reflecting the cumulative semantics:
```suggestion
external_dns.registry.errors.total,count,,error,,Total number of registry errors,-1,external_dns,registry errors,,
```
L17 — same as registry.errors.total (gauge → count):
```suggestion
external_dns.source.errors.total,count,,error,,Total number of source errors,-1,external_dns,source errors,,
```
external_dns/assets/service_checks.json
No issues with the new entry. Note that the existing legacy entry uses lowercase "prometheus health" while the new entry uses "OpenMetrics endpoint health" — inconsistent capitalization, but the legacy entry is unchanged in this PR so out of scope.
domalessi
left a comment
There was a problem hiding this comment.
Approving but see my earlier comment for my edits! GH isn't letting me post them inline.
sarah-witt
left a comment
There was a problem hiding this comment.
Thanks for the PR! I left a few comments
| with pytest.raises(Exception): | ||
| dd_agent_check(instance, rate=True) | ||
| endpoint_tag = "endpoint:" + instance.get('prometheus_url') | ||
| tags = instance.get('tags').append(endpoint_tag) |
There was a problem hiding this comment.
Can we instead define the expected tags statically?
| aggregator.assert_service_check("external_dns.prometheus.health", AgentCheck.CRITICAL, count=2, tags=tags) | ||
| def test_empty_input(self): | ||
| """construct_metrics_config handles empty input.""" | ||
| assert construct_metrics_config({}) == [] |
There was a problem hiding this comment.
Could you add a test asserting metadata ie use aggregator.assert_metrics_using_metadata and assert_all_metrics_covered? This will confirm the metadata.csv is accurate.
| @@ -0,0 +1,120 @@ | |||
| # HELP external_dns_build_info A metric with a constant '1' value labeled with 'version' and 'revision' of external_dns and the 'go_version', 'os' and the 'arch' used the build. | |||
There was a problem hiding this comment.
Could you update some of the counter metric (ie external_dns_registry_errors_total and external_dns_source_errors_total) values here and in the other fixture so we can test that they are emitted with the new implementation? Right now they are 0.
|
FR has been created: |
Add OpenMetricsBaseCheckV2 implementation alongside legacy OpenMetricsBaseCheck.
openmetrics_endpoint(OMV2) andprometheus_url(OMV1)record_typelabelhostlabel renaming tohttp_hostto avoid reserved tag conflictsexternal_dns.openmetrics.healthservice check for OMV2BREAKING CHANGE: Counter metrics now submitted as monotonic_count in OMV2 mode instead of gauge.
What does this PR do?
Motivation
Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged