[datadog] Add datadog.discovery.serviceMap.enabled#2641
Conversation
Wires up the free Discovery Service Map mode introduced in agent 7.78. When enabled, system-probe boots a restricted USM monitor that produces HTTP/HTTPS topology only — no paid USM RED metrics, no billing impact. Designed for non-APM customers to see a service-to-service dependency map without enabling paid USM. - New value: datadog.discovery.serviceMap.enabled (default false) - Renders discovery.service_map.enabled in system-probe.yaml - Treats serviceMap.enabled as a system-probe-feature trigger so the daemon is deployed when service map is on standalone - should-render-discovery-config returns true when service map is on, so the discovery block is emitted even if discovery.enabled itself is left to its default Coexistence with paid USM is handled agent-side: when both datadog.serviceMonitoring.enabled and datadog.discovery.serviceMap.enabled are true, the agent silently disables discovery (USM wins on billing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8d7f7ec685
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| */}} | ||
| {{- define "system-probe-feature" -}} | ||
| {{- if or .Values.datadog.securityAgent.runtime.enabled .Values.datadog.networkMonitoring.enabled .Values.datadog.systemProbe.enableTCPQueueLength .Values.datadog.systemProbe.enableOOMKill .Values.datadog.serviceMonitoring.enabled .Values.datadog.traceroute.enabled (eq (include "resolved-discovery-enabled" .) "true") (and .Values.datadog.gpuMonitoring.enabled .Values.datadog.gpuMonitoring.privilegedMode) .Values.datadog.dynamicInstrumentationGo.enabled (and .Values.datadog.securityAgent.compliance.enabled .Values.datadog.securityAgent.compliance.runInSystemProbe) (eq (include "should-enable-sbom-enrichment-usage" .) "true") -}} | ||
| {{- if or .Values.datadog.securityAgent.runtime.enabled .Values.datadog.networkMonitoring.enabled .Values.datadog.systemProbe.enableTCPQueueLength .Values.datadog.systemProbe.enableOOMKill .Values.datadog.serviceMonitoring.enabled .Values.datadog.traceroute.enabled (eq (include "resolved-discovery-enabled" .) "true") .Values.datadog.discovery.serviceMap.enabled (and .Values.datadog.gpuMonitoring.enabled .Values.datadog.gpuMonitoring.privilegedMode) .Values.datadog.dynamicInstrumentationGo.enabled (and .Values.datadog.securityAgent.compliance.enabled .Values.datadog.securityAgent.compliance.runInSystemProbe) (eq (include "should-enable-sbom-enrichment-usage" .) "true") -}} |
There was a problem hiding this comment.
Gate service map on legacy Autopilot
When datadog.discovery.serviceMap.enabled=true is set on legacy GKE Autopilot (providers.gke.autopilot=true with WorkloadAllowlist unavailable/HELM_FORCE_RENDER=false), this new term makes should-enable-system-probe render the system-probe container even though that Autopilot mode only permits the core agent. The existing discovery trigger goes through resolved-discovery-enabled, which suppresses system-probe for that path; service map needs the same Autopilot guard or installs on those clusters will be rejected.
Useful? React with 👍 / 👎.
| # datadog.discovery.serviceMap.enabled -- (bool) Enable the free Discovery Service Map mode. When true, system-probe boots a restricted USM monitor (HTTP/HTTPS topology only, not billed) so non-APM customers can see a service-to-service dependency map. Mutually exclusive with paid USM (`datadog.serviceMonitoring.enabled`); when both are set, paid USM wins and discovery is silently disabled. Linux only. Requires Agent >= 7.78.0. | ||
| serviceMap: | ||
| enabled: false |
There was a problem hiding this comment.
Add chart version and changelog for this value
This adds a new Datadog chart value and changes rendered templates, but the commit leaves charts/datadog/Chart.yaml at 3.208.2 and does not update charts/datadog/CHANGELOG.md. The datadog chart review guide requires a chart version bump and changelog entry for template/value behavior changes, otherwise the packaged chart release will not advertise the new behavior to users upgrading the chart.
Useful? React with 👍 / 👎.
Minor bump for the new datadog.discovery.serviceMap.enabled value added in 8d7f7ec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve chart-version-validation CI error: the badge at the top of charts/datadog/README.md must match the version in Chart.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shorten CHANGELOG, README, and values.yaml descriptions for datadog.discovery.serviceMap.enabled to match the terse phrasing established by PR #1645. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The system-probe configmap now emits discovery.service_map.enabled, so every baseline that renders system-probe needs the new line. Regenerated via `make update-test-baselines-datadog-agent`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s wiring
Ship datadog.discovery.serviceMap.enabled as nil and only render the
service_map sub-block when explicitly set, matching the chart's
discovery.enabled convention so an explicit false is preserved rather
than silently elided.
Wire serviceMap into should-render-discovery-config and
system-probe-feature so setting it alone deploys system-probe and
renders the discovery block. Do not route through
resolved-discovery-enabled: the agent processes service_map.enabled
independently of discovery.enabled, so the chart should not flip
discovery.enabled on the user's behalf.
Extend the GKE Autopilot WorkloadAllowlist fail message in NOTES.txt
to list datadog.discovery.serviceMap.enabled, and shorten the parameter
description ("paid USM" -> "restricted USM").
Includes two merges from main during development; net diff against
origin/main is the serviceMap feature only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2c8d7bf to
c72fc84
Compare
…rvice-map # Conflicts: # charts/datadog/CHANGELOG.md # charts/datadog/Chart.yaml # charts/datadog/README.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ddee031bd0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| service_map: | ||
| enabled: {{ $.Values.datadog.discovery.serviceMap.enabled }} |
There was a problem hiding this comment.
Ensure service map doesn't render under disabled discovery
When datadog.discovery.serviceMap.enabled=true is combined with a configuration where discovery resolves to false, such as datadog.discovery.enabled=false or an Agent pinned below 7.78, this renders service_map.enabled: true inside a discovery block whose parent enabled field remains false. The chart also starts system-probe for the new value, so the install appears to accept Service Map while the system-probe config disables the discovery subsystem that would consume it; gate the value on supported/resolved discovery or have it explicitly drive discovery.enabled in the supported case.
Useful? React with 👍 / 👎.
| | datadog.disablePasswdMount | bool | `false` | Set this to true to disable mounting /etc/passwd in all containers | | ||
| | datadog.discovery.enabled | bool | `nil` | Enable Service Discovery. If omitted, the chart auto-enables it when the effective node Agent version resolved by the chart is >= 7.78.0, except on GKE Autopilot clusters where system-probe is not supported. If that resolution still yields a non-semver-ish tag, discovery treats it as latest. Explicit true/false always takes precedence. On supported Agent versions, the chart also enables `discovery.use_system_probe_lite` so discovery-only deployments can exec into `system-probe-lite`. | | ||
| | datadog.discovery.networkStats.enabled | bool | `true` | Enable Service Discovery Network Stats | | ||
| | datadog.discovery.serviceMap.enabled | bool | `nil` | Enable Discovery Service Map (HTTP/HTTPS topology only; restricted USM) | |
There was a problem hiding this comment.
No need to write HTTP/HTTPs, in the future we might add more protocols
…rvice-map # Conflicts: # charts/datadog/CHANGELOG.md # charts/datadog/Chart.yaml # charts/datadog/README.md
…escription The chart-side description does not need to enumerate the agent-side protocol scope of Discovery Service Map. Keep only the "restricted USM" qualifier, which captures the licensing constraint that affects whether operators can turn it on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
|
|
||
| # datadog.discovery.serviceMap.enabled -- (bool) Enable Discovery Service Map (restricted USM) | ||
| serviceMap: | ||
| enabled: # false |
There was a problem hiding this comment.
Why is this empty and not false?
There was a problem hiding this comment.
Followed the same pattern as datadog.discovery.enabled right above it
There was a problem hiding this comment.
Unless there is a good intentional reason, I don't see why a bool variable should be anything but false or true.
044427e to
703c412
Compare
Make the documented chart default explicit instead of nil so the rendered system-probe configmap shows the value without relying on the nil-gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the now-rendered `service_map: enabled: false` block to the system-probe configmap snapshot in 42 baseline manifests. Generated via `make update-test-baselines-datadog-agent`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rvice-map # Conflicts: # charts/datadog/CHANGELOG.md
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| use_system_probe_lite: {{ include "discovery-use-system-probe-lite" . }} | ||
| network_stats: | ||
| enabled: {{ $.Values.datadog.discovery.networkStats.enabled }} | ||
| {{- if not (eq $.Values.datadog.discovery.serviceMap.enabled nil) }} |
There was a problem hiding this comment.
I think this can be removed now?
…rvice-map # Conflicts: # charts/datadog/CHANGELOG.md # charts/datadog/Chart.yaml # charts/datadog/README.md
Bump to 3.215.0 (minor on top of main's 3.214.1) for the datadog.discovery.serviceMap.enabled feature. Remove the always-true nil-guard around service_map in system-probe-configmap.yaml since serviceMap.enabled has a concrete default in values.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3ed5b12 to
6ef1577
Compare
Wires up the free Discovery Service Map mode, system-probe boots a restricted USM monitor that produces HTTP/HTTPS topology only — no paid USM RED metrics, no billing impact. Designed for non-APM customers to see a service-to-service dependency map without enabling paid USM.
Coexistence with paid USM is handled agent-side: when both datadog.serviceMonitoring.enabled and datadog.discovery.serviceMap.enabled are true, the agent silently disables discovery (USM wins on billing).
What this PR does / why we need it:
Which issue this PR fixes
(optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close that issue when PR gets merged)Special notes for your reviewer:
Checklist
[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]
<chartName>/minor-version,<chartName>/patch-version, or<chartName>/no-version-bump)datadogordatadog-operatorchart or value changes, update the test baselines (run:make update-test-baselines)datadogchart changes, received ✅ from a member of your teamGitHub CI takes care of the below, but are still required:
.github/helm-docs.sh)CHANGELOG.mdhas been updatedREADME.md