Skip to content

7.48.0

Compare
Choose a tag to compare
@kacper-murzyn kacper-murzyn released this 10 Oct 13:41
· 5741 commits to main since this release
8e04ee8

Agent

Prelude

Release on: 2023-10-10

Upgrade Notes

  • The EventIDs logged to the Windows Application Event Log by the Agent services have been normalized and now have the same meaning across Agent services. Some EventIDs have changed and the rendered message may be incorrect if you view an Event Log from a host that uses a different version of the Agent than the host that created the Event Log. To ensure you see the correct message, choose "Display information for these languages" when exporting the Event Log from the host. This does not affect Event Logs collected by the Datadog Agent's Windows Event Log integration, which renders the event messages on the originating host. The EventIDs and messages used by the Agent services can be viewed in pkg/util/winutil/messagestrings/messagestrings.mc.

  • datadog-connectivity and metadata-availability subcommands do not exist anymore and their diagnoses are reported in a more general and structured way.

    Diagnostics previously reported via datadog-connectivity subcommand will be reported now as part of connectivity-datadog-core-endpointssuite. Correspondingly, diagnostics previously reported via metadata-availability subcommand will be reported now as part of connectivity-datadog-autodiscovery suite.

  • Streamlined settings by renaming workloadmeta.remote_process_collector.enabled and process_config.language_detection.enabled to language_detection.enabled.

  • The command line arguments to the Datadog Agent Trace Agent trace-agent have changed from single-dash arguments to double-dash arguments. For example, -config must now be provided as --config. Additionally, subcommands have been added, these may be listed with the --help switch. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.

New Features

  • Added the kubernetes_state.pod.tolerations metric to the KSM core check

  • Grab, base64 decode, and attach trace context from message attributes passed through SNS->SQS->Lambda

  • Add kubelet healthz check (check_run.kubernetes_core.kubelet.check) to the Agent's core checks to replace the old kubernetes.kubelet.check generated from Python.

  • Tag the aws.lambda span generated by the datadog-extension with a language tag based on runtime information in dotnet and java cases

  • Extended the "agent diagnose" CLI command to allow the easy addition of new diagnostics for diverse and dispersed Agent code.

  • Add support for the otlp_config.metrics.sums.initial_cumulative_monotonic_value setting.

  • [BETA] Adds Golang language and version detection through the system probe. This beta feature can be enabled by setting system_probe_config.language_detection.enabled to true in your system-probe.yaml.

  • Add new kubelet corecheck, which will eventually replace the existing kubelet check.

  • Add custom queries to Oracle monitoring.

  • Adding new configuration setting otlp_config.logs.enabled to enable/disable logs support in the OTLP ingest endpoint.

  • Add logsagentexporter, which is used in OTLP agent to translate ingested logs and forward them to logs-agent

  • Flush in-flight requests and pending retries to disk at shutdown when disk-based buffering of metrics is enabled (for example, when forwarder_storage_max_size_in_bytes is set).

  • Added a new collector in the process agent in workloadmeta. This collector allows for collecting processes when the process_config.process_collection.enabled is false and language_detection.enabled is true. The interval at which this collector collects processes can be adjusted with the setting workloadmeta.local_process_collector.collection_interval.

  • Tag lambda cold starts and proactive initializations on the root aws.lambda span

  • APM - This change improves the acceptance and queueing strategy for trace payloads sent to the Trace Agent. These changes create a system of backpressure in the Trace Agent, causing it to reject payloads when it cannot keep up with the rate of traffic, rather than buffering and causing OOM issues.

    This change has been shown to increase overall throughput in the Trace Agent while decreasing peak resource usage. Existing configurations for CPU and memory work at least as well, and often better, with these changes compared to previous Agent versions. This means users do not have to adjust their configuration to take advantage of these changes, and they do not experience performance degredation as a result of upgrading.

Enhancement Notes

  • When jmx_use_container_support is enabled you can use jmx_max_ram_percentage to set a maximum JVM heap size based off a percentage of the total container memory.
  • SNMP profile detection now updates the SNMP profile for a given IP if the device at that IP changes.
  • Add Process Language Detection Enabled in the output of the Agent Status command under the Process Agent section.
  • Improve agent diagnose command to be executed in context of running Agent process.
  • Agents are now built with Go 1.20.7. This version of Golang fixes CVE-2023-29409.
  • Added the container.memory.usage.peak metric to the container check. It shows the maximum memory usage recorded since the container started.
  • Unified agent diagnose CLI command by removing all, datadog-connectivity, and metadata-availability subcommands. These separate subcommands became one of the diagnose suites. The all subcommand became unnecessary.
  • APM: Improved performance and memory consumption in obfuscation, both halved on average.
  • Agents are now built with Go 1.20.8.
  • The processor frequency sent in metadata is now a decimal value on Darwin and Windows, as it already is on Linux. The precision of the value is increased on Darwin.
  • CPU metadata which failed to be collected is no longer sent as empty values on Windows.
  • Platform metadata which failed to be collected is no longer sent as empty values on Windows.
  • Filesystem metadata is now collected without running the df binary on Unix.
  • Adds language detection support for JRuby, which is detected as Ruby.
  • Add the oracle.can_connect metric.
  • Add duration to the plan payload.
  • Increasing the collection interval for all the checks except for activity samples from 10s to 60s.
  • Collect the number of CPUs and physical memory.
  • Improve Oracle query metrics algorithm and the fetching time for execution plans.
  • OTLP ingest pipeline panics no longer stop the Datadog Agent and instead only shutdown this pipeline. The panic is now available in the OTLP status section.
  • During the process check, collect the command name from /proc/[pid]/comm. This allows more accurate language detection of processes.
  • Change how SNMP trap variables with bit enumerations are resolved to hexadecimal strings prefixed with "0x" (previously base64 encoded strings).
  • The Datadog agent container image is now using Ubuntu 23.04 lunar as the base image.
  • Upgraded JMXFetch to 0.47.10 <https://github.com/DataDog/jmxfetch/releases/0.47.10>. This version improves how JMXFetch communicates with the Agent, and fixes a race condition where an exception is thrown if the Agent hasn't finished initializing before JMXFetch starts to shut down.
  • Added collector.worker_utilization to the telemetry. This metric represents the amount of time that a runner worker has been running checks.

Deprecation Notes

  • The command line arguments to the Datadog Agent Trace Agent trace-agent have changed from single-dash arguments to double-dash arguments. For example, -config must now be provided as --config. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.

Security Notes

  • APM: In order to improve the default customer experience regarding sensitive data, the Agent now obfuscates database statements within span metadata by default. This includes MongoDB queries, ElasticSearch request bodies, and raw commands from Redis and MemCached. Previously, this setting was off by default. This update could have performance implications, or obfuscate data that is not sensitive, and can be disabled or configured through the obfuscation options within the apm_config, or with the environment variables prefixed with DD_APM_OBFUSCATION. Please read the [Data Security documentation for full details](https://docs.datadoghq.com/tracing/configure_data_security/#trace-obfuscation).

  • This update ensures the sql.query tag is always obfuscated by the Datadog Agent even if this tag was already set by a tracer or manually by a user. This is to prevent potentially sensitive data from being sent to Datadog. If you wish to have a raw, unobfuscated query within a span, then manually add a span tag of a different name (for example, sql.rawquery).

  • Fix CVE-2023-39320, CVE-2023-39318, CVE-2023-39319, and CVE-2023-39321.

  • Update OpenSSL from 3.0.9 to 3.0.11. This addresses CVEs CVE-2023-2975, CVE-2023-3446, CVE-2023-3817, CVE-2023-4807.

Bug Fixes

  • APM: Fix issue of agent status returning an error when run shortly after starting the trace agent.

  • APM: Fix incorrect filenames and line numbers in logs from the trace agent.

  • OTLP logs ingestion is now disabled by default. To enable it, set otlp_config.logs.enabled to true.

  • Avoids fetching tags for ECS tasks when they're not consumed.

  • APM: Concurrency issue at high volumes fixed in obfuscation.

    • Updated datadog.agent.sbom_generation_duration to only be observed for successful scans.
  • Fixes a bug that prevents the Agent from writing permissions information about system-probe files when creating a flare.

  • Fixed a bug that causes the Agent to report the datadog.agent_name.running metric with missing tags in some environments with cgroups v1.

  • Fix dogstatsd_mapper_profiles wrong serialization when displaying the configuration (for example match_type was shown as matchtype). This also fixes a bug in which the secret management feature was incompatible with dogstatsd_mapper_profiles due to the renaming of the match_type key in the YAML data.

  • Fix a crash in the Cluster Agent when Remote Configuration is disabled

  • Corrected a bug in calculating the total size of a container image, now accounting for the configuration file size.

  • Fix to the process-agent from picking up processes which are kernel threads due integer overflow when parsing /proc/<pid>/stat.

  • Fixes a rare bug in the Kubernetes State check that causes the Agent to incorrectly tag the kubernetes_state.job.complete service check.

  • On Windows, the host metadata correctly reflects the Windows 11 version.

  • Fix a datadog.yaml configuration file parsing issue. When the datadog.yaml configuration file contained a complex configuration under prometheus.checks[*].configurations[*].metrics, a parsing error could lead to an OpenMetrics check not being properly scheduled. Instead, the Agent logged the following error:

    2023-07-26 14:09:23 UTC | CORE | WARN | (pkg/autodiscovery/common/utils/prometheus.go:77 in buildInstances) | Error processing prometheus configuration: json: unsupported type: map[interface {}]interface {}
    
  • Fixes the KSM check to support HPA v2beta2 again. This stopped working in Agent v7.44.0.

  • Counts sent through the no-aggregation pipeline are now sent as rate with a forced interval 10 to mimick the normal DogStatsD pipelines.

  • Bug fix for the wrong query signature.

  • Populate OTLP resource attributes in Datadog logs

  • Changes mapping for jvm.loaded_classes from process.runtime.jvm.classes.loaded to process.runtime.jvm.classes.current_loaded

  • The minimum and maximum estimation for OTLP Histogram to Datadog distribution mapping now ensures the average is within [min, max].

  • This estimation is only used when the minimum and maximum are not available in the OTLP payload or this is a cumulative payload.

  • Fixes a panic in the OTLP ingest metrics pipeline when sending OpenTelemetry runtime metrics

  • Set correct tag value "otel_source:datadog_agent" for OTLP logs ingestion

  • Removed specific environment variable filter on the Windows platform to fetch ECS task tags.

  • diagnose datadog-connectivity subcommand now loads and resolves secrets before checking connectivity.

  • The Agent now starts even if it cannot write events to the Application event log

  • Fix Windows Service detection by replacing svc.IsAnInteractiveSession() (deprecated) with svc.IsWindowsService()

Other Notes

  • System-probe no longer tries to resolve secrets in configurations.
  • Refactor in the logs collection pipeline, the journald and windowsevents support is now using the same pipeline as the rest of the logs collection implementations.
  • Please note that significant changes have been introduced to the Datadog Trace Agent for this release. Though these changes should not alter user-facing agent behavior beyond the CLI changes described above, please reach out to support should you experience any unexpected behavior.

Datadog Cluster Agent

New Features

  • Added the kubernetes_state.pod.tolerations metric to the KSM core check
  • Add HorizontalPodAutoscaler collection in the orchestrator check.

Enhancement Notes

  • Add safeguards for orchestrator CRD collection.
  • The Datadog cluster-agent container image is now using Ubuntu 23.04 lunar as the base image.

Bug Fixes

  • Fixed an error in the calculations performed by the algorithm that rebalances cluster checks. Cluster checks are now more evenly distributed when advanced dispatching is enabled (cluster_checks.advanced_dispatching_enabled is set to true).
  • Service checks are no longer excluded from rebalancing decisions when advanced dispatching is enabled (cluster_checks.advanced_dispatching_enabled is set to true).
  • Fixes a rare bug in the Kubernetes State check that causes the Agent to incorrectly tag the kubernetes_state.job.complete service check.
  • Removes an incorrect warning log message that mentions that the DD_POD_NAME env var is unknown.
  • Fixes the KSM check to support HPA v2beta2 again. This stopped working in Agent v7.44.0.
  • Adds the kube_cluster_name tag as a static global tag to the cluster agent when the DD_CLUSTER_NAME config option is set. This should fix an issue where the tag is not being attached to metrics in certain environments, such as EKS Fargate.
  • Fixed a bug in the advanced dispatching of cluster checks. All the checks scheduled since the last rebalance were being scheduled in the same node. Now they should be distributed among the available nodes.