Agent
Prelude
Released on: 2026-06-11
- Please refer to the 7.80.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
Health Platform: the
ReportIssuemethod now takes a singleIssueReportargument instead of(checkID, checkName string, report *IssueReport). TheIssueReportstruct carries three new fields —IssueID(unique instance id),IssueType(template id), andSource(reporting integration name) — replacing the separatecheckIDandcheckNamearguments.The health platform persistence file format has been bumped to version 2. Existing persistence files (
<run_path>/health-platform/issues.json) written by a previous agent version will be detected, logged as incompatible, and discarded on startup; the agent starts with a fresh issue state. No data migration is performed.For integrations calling
ReportIssue: construct anIssueReportwithIssueIDset to a unique instance key (e.g."check-execution-failure:<check-id>"),IssueTypeset to the template identifier that was previously passed as theIssueIdfield of the protoIssueReport, andSourceset to the integration name. To resolve an issue, callResolveIssue(issueID)instead of passingniltoReportIssue. -
Health Platform: the
health_platform.issues_detectedtelemetry counter is now tagged withissue_typeinstead ofhealth_check_id. Update any dashboards, monitors, or telemetry configuration that filtered or grouped by thehealth_check_idtag to useissue_typeinstead. -
APM: On Linux, the trace agent process now only starts once data is sent to any of its configured listeners.
Previously, the trace agent started immediately on agent startup, it now starts lazily when needed, which reduces resource usage. To disable and restore the previous behavior, set
apm_config.socket_activation.enabled: falseindatadog.yaml, or set the environment variableDD_APM_SOCKET_ACTIVATION_ENABLED=false.
New Features
-
The Windows MSI installer now ships the AI usage Chrome native messaging host (
ai-prompt-logger-native-host.exe) underbin\agent. The installer generates a Chrome Native Messaging Host manifest underbin\agent\distand registers it machine-wide underHKLM\SOFTWARE\Google\Chrome\NativeMessagingHosts(including theWOW6432Nodeview for 32-bit Chrome). The host's runtime configuration is generated asC:\ProgramData\Datadog\ai_usage_native_host.yamland uses the Agent's configured APM receiver port. -
Adds a new
discovery.service_map.enabledsystem-probe configuration option that boots the universal service monitoring (USM) eBPF monitor in a restricted mode, capturing only the data needed to render a service dependency map (HTTP and HTTPS via TLS uprobes). Hosts running in this mode are not billed as USM customers, are not surfaced in USM dashboards, and do not produceuniversal.http.*metrics. Intended for non-APM customers as a free preview of application observability. -
Add
k8sobjectsreceiverto the DDOT (Datadog Distribution of OpenTelemetry Collector) default manifest, enabling collection of Kubernetes object events and resource states via the OpenTelemetry Collector pipeline. -
Adds a new action
get-resourcein kubeactions. -
The fleet installer's
agent-packageOCI index now contains a FIPS-flavored sibling manifest for each platform, distinguished by the OCIPlatform.Variantfield. WhenDD_FIPS_MODE=trueis set, the installer downloads the FIPS manifest; otherwise it downloads the base manifest. The package URL is unchanged in both cases. -
Add a new
ncclcore check that collects per-rank NCCL collective communication metrics from GPU training and inference workloads.The check listens on a Unix domain socket (default
/var/run/datadog/nccl.socket) for JSON events emitted by the NCCL profiler plugin (libnccl-profiler-dd.so) running inside GPU pods. Each event is tagged withrank,collective,n_ranks,kube_pod_name,kube_namespace, andkube_container_name.Metrics emitted:
nccl.collective.exec_time_us— time a rank spends inside a collective operation. A rank with a significantly lower value than its peers is the straggler; ranks with higher values are waiting at the barrier.nccl.collective.algo_bandwidth_gbps— algorithm bandwidth of the collective.nccl.collective.bus_bandwidth_gbps— bus bandwidth normalised for the collective type.nccl.collective.msg_size_bytes— tensor size being communicated.nccl.rank.seconds_since_last_event— seconds since this rank last reported an event; non-zero values indicate a potential hang.
Enable the check cluster-wide by setting
gpu.nccl.enabled: truein the Agent configuration (orDD_GPU_NCCL_ENABLED=true). The socket path can be overridden viagpu.nccl.socket_path; the host directory mounted into training pods can be overridden viagpu.nccl.host_socket_path. -
Add support for the
datadog.metric.as_typedatapoint attribute on OTLP delta sum metrics. When this attribute is set to"rate", the metric is sent to Datadog as a Rate (value divided by interval) instead of a Count. Accepted values are"rate","count", and"gauge"; unknown values are logged and ignored. This allows users migrating from DogStatsD to OpenTelemetry to preserve rate-type metric behavior. -
Add
multi_secret_backendsindatadog.yamlso you can declare extra named secret backends (each withtypeandconfig). When nosecret_backend_typeis set, select the backend per handle usingENC[backendID;secretKey](backendIDmatches a name undermulti_secret_backends). Precedence issecret_backend_command(if set) oversecret_backend_typeovermulti_secret_backends: a custom command wins over native type; when nativesecret_backend_typeis set (and no custom command), everyENC[...]inner string is resolved only through that type andmulti_secret_backendsis not used for routing. -
Add
admission_controller.auto_instrumentation.container_registry_allow_listconfiguration option (env varDD_ADMISSION_CONTROLLER_AUTO_INSTRUMENTATION_CONTAINER_REGISTRY_ALLOW_LIST) to restrict which container registries can be used as sources for APM library injection via Single Step Instrumentation. When set to a non-empty comma-separated list, the admission controller will skip injection for any pod whose injector image registry is not in the list, and will set theinternal.apm.datadoghq.com/injection-errorannotation with the reason. An empty list (the default) allows injection from any registry. -
Windows:
windows_certificatecheck addscertificate_store_regex, a list of Go regular expressions matched againstHKLMcertificate store names. Patterns are matched case-insensitively.certificate_storeandcertificate_store_regexcan be used together; at least one must be set.
Enhancement Notes
-
Process kubernetes actions asynchronously to avoid blocking the main thread.
-
Add an example OpenMetrics check configuration for Agent Data Plane deployments to restore
datadog.agent.dogstatsd.*anddatadog.agent.forwarder.transactions.*metrics. -
Pre-register
datadog-apm-library-iis,datadog-apm-library-iis-rum, anddatadog-apm-library-httpdin the fleet installer. The packages are gated behind remote updates so they can be rolled out via remote configuration without a new installer release. -
Chunk remote workloadmeta messages in the Agent to avoid exceeding the gRPC max message size.
-
APM : The Trace Agent
agent statusoutput now shows the UDS (Unix Domain Socket) receiver path when UDS is enabled, in addition to the existing TCP receiver address. Each per-client entry in the receiver stats section also displays the connection type (tcp,uds, orpipe), making it easier to distinguish traffic arriving via different transports. -
Autodiscovery template resolution failures are now logged at ERROR level instead of DEBUG, making them visible without enabling debug logging. Additionally, when the health platform is enabled, these failures are reported as AD misconfiguration health events with actionable remediation steps, providing proactive visibility when an autodiscovered check config is silently skipped due to unsupported template variables.
-
When
infrastructure_modeisbasic, the Agent's default allowlist now includes the Directory, WMI Check, Windows Certificate, Windows Performance Counters, and Windows Registry integrations so they can run without extraintegration.additionalconfiguration on Windows-oriented deployments. -
Agents are now built with Go
1.25.10. -
On Windows, network connections collected by Cloud Network Monitoring are now tagged with
interface_nameandinterface_type. -
The Agent now streams Kubernetes metadata from the Cluster Agent by default, instead of polling for it periodically. This propagates tags derived from Kubernetes metadata (like
kube_service) with less delay. This behavior is controlled by thekubernetes_metadata_streamingsetting. -
agent diagnosenow renders the check name as a prefix for all checks under thecheck-datadogsuite. The JSON output gains acheck_namefield for the same purpose. -
The
--includeand--excludeflags ofagent diagnosenow match against the suite name, the owning check name, and the diagnosis category. For example,agent diagnose --include postgresnow filters individual diagnoses across all suites instead of only matching suite names. -
DogStatsD timing metrics (
ttype) now include an explicit unit value (millisecond) in the metric payload sent to Datadog, allowing the Datadog UI to display the correct unit automatically. -
Dynamic Instrumentation now supports compound conditions using
&&,||, and!. -
When
infrastructure_modeis set tonone, ECS task metadata collection is now disabled by default (seeecs_task_collection_enabled). SetDD_ECS_TASK_COLLECTION_ENABLEDtotrueto override. -
When
infrastructure_modeis set toend_user_device, the Agent now attaches additional host tags to identify the device:infrastructure_mode:end_user_device,os_name,os_version,cpu_model,total_memory_gb, anddevice_model. Hardware and OS tags are collected on macOS and Windows only. -
Add new
ad_tag_completeness_max_waitconfiguration option. When set, autodiscovery waits up to that many seconds for an entity's tags to be complete before scheduling checks for it. This avoids checks running briefly with incomplete tags. It's disabled by default. -
Add
logs_config.use_container_timestampto optionally use thetimefield from container log files as the log timestamp instead of ingestion time, preserving container-provided per-line timestamps. -
Logs Agent:
logs_config.tag_multi_line_logsandlogs_config.tag_truncated_logsnow default totrueso file logs are tagged by default when they were aggregated as multiline logs or truncated by the Agent. -
Use native API requestWhenInUseAuthorization() to manage location permission prompt on MacOS.
-
The OTel Agent standalone mode now automatically disables IPC with a core Datadog Agent. When
DD_OTEL_STANDALONEis enabled,DD_CMD_PORTis forced to-1, so users no longer need to set it manually when running DDOT without a core Agent. -
The
service.instance.idOpenTelemetry resource attribute is now mapped to theservice.instance.idDatadog metric tag when converting OTLP metrics. This attribute is required for OTel traffic metrics in Datadog Fleet Automation. -
Private Action Runner: When
private_action_runner.api_key_only_enrollmentis enabled, the agent now enrolls via the new API-key-only OPMS endpoint (/api/unstable/on_prem_runners/api_key_only). This allows runners to self-enroll using only a scoped API key, without requiring an application key. -
The Private Action Runner now honors the
X-Retry-After-Msresponse header returned by the Datadog backend on workflow task dequeue and health check requests. -
The Private Action Runner now retries self-enrollment and auto-connection creation requests when the Datadog API returns a transient
5xxresponse. -
Parse the ECS
/taskshost metadata endpoint on Managed Instances and populateDaemonNamefor daemon-scheduled tasks. -
The default for
logs_config.file_scan_periodis now 1 second instead of 10, so the Agent discovers new and rotated log files on disk more quickly. Setlogs_config.file_scan_periodexplicitly if you need a slower scan to reduce filesystem load (for example on network file systems). -
Bumped the Security Agent policies to v0.80.0
-
Reduce the payload size of SNMP device metrics by letting the Datadog backend enrich device tags (such as
snmp_device,device_ip, anddevice_id) from device metadata instead of attaching them to every metric. Existing queries and monitors continue to work, and no action is required. This only applies whencollect_device_metadatais enabled (the default). -
SNMP network device metadata: when a profile lists multiple scalar
symbolsfor the same metadata field (for exampleserial_number), the check now skips values that resolve to an empty string (after trimming whitespace) and continues to the next symbol, matching the intended fallback order when an OID exists but carries no usable serial. -
Upgrade OpenTelemetry Collector dependencies from v0.150.0 to v0.151.0 (core v1.56.0 to v1.57.0).
Notable upstream changes:
- Removed stable feature gates that are no longer needed:
connector.datadogconnector.NativeIngest,exporter.datadogexporter.UseLogsAgentExporter, andexporter.datadogexporter.metricexportnativeclient. - Several collector-contrib components have been renamed with deprecated aliases (
spanmetricstospan_metrics,hostmetricstohost_metrics,fluentforwardtofluent_forward). The old names continue to work but will be removed in a future release.
See the full upstream changelogs: collector-contrib v0.151.0, collector core v0.151.0.
- Removed stable feature gates that are no longer needed:
-
Upgrade OpenTelemetry Collector dependencies from v0.151.0 to v0.152.0 (core v1.57.0 to v1.58.0).
See the full upstream changelogs: collector-contrib v0.152.0, collector core v0.152.0.
-
A small sample of series metric flushes (0.1% by default) is now additionally sent to a v3beta metrics intake endpoint to validate the upcoming v3 metrics protocol. Shadow traffic is only sent for agents configured against the
datadoghq.com(US1) site. To opt out, setserializer_experimental_use_v3_api.series.shadow_sample_rateto0. -
The OTel Agent now logs a warning and displays it in
agent statuswhen thehostmetricsreceiver is configured while running in connected mode (DD_OTEL_STANDALONE=false). In connected mode the core Datadog Agent already collects host metrics, so enabling thehostmetricsreceiver can lead to duplicate or conflicting metric names. To suppress the warning, either remove thehostmetricsreceiver or switch to standalone mode (DD_OTEL_STANDALONE=true). -
Add six opt-in tag flags to the Windows Certificate Store integration:
certificate_template_tag,enhanced_key_usage_tag,friendly_name_tag,subject_alternative_names_tag,issuer_tag, andsignature_algorithm_tag. When enabled, each certificate's metrics and service checks are tagged with the corresponding X.509 or Windows certificate property. All flags default tofalse.
Deprecation Notes
- APM: Restored the deprecated
DD_APM_SPAN_DERIVED_PRIMARY_TAGSconfiguration option, but only in serverless contexts: the Datadog Azure App Services extension (DD_AZURE_APP_SERVICES=1) andserverless-init(Cloud Run, Container Apps, Cloud Run Functions). In all other deployments the option is silently ignored. Tracers should populateadditional_metric_tagsinstead; do not useDD_APM_SPAN_DERIVED_PRIMARY_TAGSin new deployments.
Security Notes
- Bumped pip to 26.1.1 in the embedded Python distribution to address CVE-2026-6357.
- Updated the Windows 1809 / LTSC 2019 Agent container base images from the deprecated
mcr.microsoft.com/powershell:*-1809images tomcr.microsoft.com/dotnet/sdk:9.0-nanoserver-1809(nanoserver) andmcr.microsoft.com/dotnet/sdk:9.0-windowsservercore-ltsc2019(servercore). The previous PowerShell base images were unmaintained and still shipped PowerShell 7.1.0, which is affected by CVE-2022-26788.
Bug Fixes
- The debugger proxy no longer forwards Exception Replay and Live Debugger logs when
logs_enabledisfalse. This can be overridden using the newapm_config.debugger_logs_enabled_overridesetting (environment variableDD_APM_DEBUGGER_LOGS_ENABLED_OVERRIDE), which enables Exception Replay and Live Debugger whenlogs_enabledisfalse. - APM : Fixed trace span obfuscation for OpenSearch request bodies when Elasticsearch JSON obfuscation is also enabled. Spans that only included the
opensearch.bodytag (and notelasticsearch.body) were previously left unobfuscated in that configuration. - APM : Fix SQL obfuscation error when a query uses PostgreSQL array slice syntax with bind parameters (e.g.
arr[$1:]orarr[$1:$2]). The tokenizer was incorrectly treating the:range separator as the start of a named bind variable, causing obfuscation to fail with aLexError. - APM OTLP: Preserve gRPC status codes on trace metrics computed by DDOT and the OpenTelemetry Collector Datadog connector. This includes explicit gRPC status attributes such as
rpc.grpc.status_codeand the newer OpenTelemetry semantic conventionrpc.response.status_codewhenrpc.system.nameisgrpc. - APM : Enforce body-size limits on trace-agent proxy endpoints (DogStatsD, pipeline stats, OpenLineage, Debugger, SymDB). All endpoints are capped at
apm_config.max_request_bytes(default 25 MB). The profiling proxy uses a separate limit configurable viaapm_config.profiling_max_request_bytes(default 50 MB, envDD_APM_PROFILING_MAX_REQUEST_BYTES). TheTracesandClientStatsPayloadmsgpack decoders now reject payloads declaring more than 500,000 elements in a single array. - APM : Fix an issue where converting traces to the v1 format did not prefer the root span's sampling priority when multiple
_sampling_priority_v1values were present on spans in the same trace. - Logs collected with automatic multiline detection now fall back to individual events when combining lines would exceed
logs_config.max_message_size_bytes. Oversized single log lines continue to use the normal truncation path, and multiline logs that fit within the limit are still aggregated. - [DBM] Bump
go-sqllexerto v0.2.2 to fix the following bugs:- Obfuscate
EXTRACTfield keywords (e.g.epoch,year) so that queries frompg_stat_activityandpg_stat_statementsconverge on the same DBM signature. - Fix handling of PostgreSQL
VACUUMcommands so they are correctly extracted into statement metadata. - Fix lexer handling of multiline comments immediately following keywords.
- Obfuscate
- Fixed HTTP flows being incorrectly dropped when a request body arrives before the response. Fixed pending HTTP transactions not being finalized when a connection is closed by a bare FIN or RST. Fixed a bug check (BSOD) caused by mismatched
maxRequestFragmentvalues during HTTP initialization. - Fixed the container ID to PID mapping for processes running in a sub-cgroup of their container's cgroup (for example, CrowdStrike Falcon's
sensor.falconscope nested under a container scope). - Fix the
connection_reset_intervalsetting not being applied to additional log endpoints and HTTP MRF endpoints. Previously, only the main log endpoint would periodically reset its connection, which could cause additional endpoints to send logs to stale destinations after a DNS failover. Additional endpoints now inherit the globallogs_config.connection_reset_intervalvalue by default, and can also be overridden per-endpoint in theadditional_endpointsconfiguration. - Fix a panic in the system-probe network tracer caused by concurrent access to the gateway lookup subnet cache. The cache now uses a thread-safe LRU implementation.
- Fixed the health platform forwarder using the wrong intake endpoint. Agent health reports are now sent to
agenthealth-intake.{site}instead ofevent-platform-intake.{site}, which is only configured for thelogsandprocessesrawtracks. This caused org_id to be missing from all agent health recommendations. - Fix a class of IPv6
host:portformatting bugs found in multiple call sites across the Agent. IPv6 literals in host configuration values were not bracketed when used to build URLs and dial addresses, producing strings likehttp://fd38::1:5005instead ofhttp://[fd38::1]:5005and causingtoo many colons in addresserrors at runtime. - Fixes the journald log tailer skipping the first journal entry when
start_positionis set tobeginningorforceBeginning. - Fix chassis type detection for Mac mini and Mac Pro hosts, which were previously reported as
Other. - Fix the macOS battery check detecting battery on Mac minis.
- Logs: Fixed a bug where the MultiLineParser did not mark truncation when reassembled log lines exceeded the 900KB size cap. Oversized lines are now properly flagged with
IsTruncatedso that downstream handlers can apply truncation markers and increment telemetry. - Fix spurious "Unknown environment variable" warnings for
DD_SYNC_DELAY,DD_SYNC_TO, andDD_CORE_CONFIGwhen running the OTel Agent. - Fix a panic in the OTLP metrics pipeline when a sender submits a histogram with more
BucketCountsentries thanExplicitBoundsallows (violating thecounts == bounds + 1OpenTelemetry specification invariant). Such data points are now rejected with an error instead of crashing the agent. - Fix a C-memory leak in the logs batch sender where
resetBatch()replaced the zstdStreamCompressorwithout closing the previous one. - The
datadog-installer.exeinstall script now addsdatadog.yaml.exampletemplate comments to the config files on fresh installs. - Fixed the gohai resource check silently dropping processes whose UID does not exist in the host's
/etc/passwd. This commonly affects containerized processes running as UIDs created inside container images. The "Processes memory usage" widget on the host infrastructure page now correctly includes these processes by falling back to the numeric UID string when username lookup fails. - gpu: fix an issue where some GPM metrics (gpu.gr_engine_active, gpu.sm_utilization, gpu.sm_occupancy, gpu.integer_active, gpu.fp16_active, gpu.fp32_active, gpu.fp64_active, gpu.tensor_active) were only emitted correctly one out of eight times on average.
- NDM SNMP: fix IP metadata fields rendered as
<nil>for OIDs declared with SYNTAXIpAddress(e.g. Cisco IPsec tunnel local/remote outside IPs, CDP remote addresses). gosnmp decodes these values as Go strings, but the metadata store previously only handled the raw-bytes path. - Fixes an issue with the NetFlow collector where certain packets would be dropped, producing error logs and lost data. The issue was caused when a packet had trailing padding which was not properly handled.
- Fixed a bug where empty log sources would get orphaned due to empty
serviceIDdue the agent attempting to collect logs from short lived containers that exit quickly. - Fixed a regression introduced around Agent 7.40 where non-template check configurations containing unresolvable
ENC[...]secrets were still scheduled with raw secret handles in their config. Checks are now correctly dropped when all instances fail secret decryption. When only some instances fail, the surviving instances are scheduled and the failing instances are dropped, preserving the pre-regression per-instance behavior. - SNMP: Detect GetBulk response truncation (fewer varbinds than requested OIDs) and automatically reduce the batch size, preventing silent metric loss on devices that truncate large SNMP responses.
Other Notes
- Add handling for dbm-column-statistics events in the event platform forwarder. These events are used by Database Monitoring integrations to report column statistics from database catalogs.
- Add metrics origins for Cisco SD-WAN and Versa integrations.
- Add metrics origins for HPE Aruba EdgeConnect and NiFi.
- Removed support for using the OpenTelemetry components contained in this repo from an external collector, using OCB. The equivalent components in the opentelemetry-collector-contrib repository are designed exactly for this use case and should be used instead: - datadog exporter - datadog extension
Datadog Cluster Agent
Prelude
Released on: 2026-06-11 Pinned to datadog-agent v7.80.0: CHANGELOG.
Upgrade Notes
- Updated the bundled kube-state-metrics library from v2.13 to v2.18. The kube-state-metrics metric allow/deny list now uses ECMAScript regular expression syntax instead of Go
regexpsyntax. Most patterns are compatible, but users relying on Go-specific regex features (e.g.(?s)flag) inmetric_allowlistormetric_denylistshould update their patterns.
New Features
- The Cluster Agent admission controller now reports connectivity probe failures to the Datadog Health Platform. When the admission webhook becomes unreachable, an
admission-controller-connectivity-failurehealth issue is raised with severityhighand categoryavailability, including remediation steps. The issue is automatically resolved when connectivity is restored. - Add a Prometheus HTTP Service Discovery (HTTP SD) provider for the Cluster Agent. The provider polls Prometheus-compatible HTTP SD endpoints and generates check configurations for each discovered target. Configure endpoints under
prometheus_http_sd.configs, each providing its ownurlandcheck_template. - Autoscaling profiles (
DatadogPodAutoscalerClusterProfile) now support Argo Rollouts as a target workload type. The Cluster Agent automatically detects whether the Argo Rollouts CRD is installed at startup and, if present, watches Rollout resources for profile labels alongside Deployments and StatefulSets. - The
kubernetes_statecore check now collects bothendpointsandendpointslicesresources by default, and emits newkubernetes_state.endpointslice.address_availableandkubernetes_state.endpointslice.address_not_readymetrics for Kubernetes EndpointSlice objects, mirroring the existingkubernetes_state.endpoint.address_availableandkubernetes_state.endpoint.address_not_readymetrics.
Enhancement Notes
- The orchestrator check now collects force-deleted pods by default. The
orchestrator_explorer.terminated_pods_improved.enabledoption will be removed in a future release.