Skip to content

[OTAGENT-823] bootstrap Dogtel extension#47532

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 18 commits intomainfrom
yang.song/OTAGENT-824
Mar 30, 2026
Merged

[OTAGENT-823] bootstrap Dogtel extension#47532
gh-worker-dd-mergequeue-cf854d[bot] merged 18 commits intomainfrom
yang.song/OTAGENT-824

Conversation

@songy23
Copy link
Copy Markdown
Member

@songy23 songy23 commented Mar 6, 2026

What does this PR do?

Adds standalone mode support to the otel-agent (DD_OTEL_STANDALONE=true) and introduces the dogtelextension OTel Collector extension for Datadog Agent functionalities.

Key changes:

  • dogtelextension (comp/otelcol/dogtelextension/): New OTel Collector extension providing a tagger gRPC server, host metadata submission, and secrets resolution when otel-agent runs without a core Datadog Agent.
  • Standalone/connected FX split (cmd/otel-agent/subcommands/run/command.go): Refactors otel-agent startup into commonAgentFxOptions + mode-specific standaloneAgentFxOptions / connectedAgentFxOptions. Standalone wires local hostname, real secrets backend, local tagger, and host metadata runner. Connected mode keeps remote hostname, remote tagger, and on-init config sync from the core agent.
  • K8s tag enrichment (comp/core/workloadmeta/collectors/catalog-otel/): New catalog-otel workloadmeta catalog (kubelet, containerd, docker, ECS, crio, podman). Added kubelet to OTEL_AGENT_TAGS. In standalone mode the infraattributes processor enriches spans/metrics/logs with K8s tags (kube_deployment, kube_namespace, pod_name, etc.) via the local tagger.

Motivation

Standalone Dogtel Agent

Describe how you validated your changes

  • Deployed otel-agent in standalone mode on a kind cluster with DD_OTEL_STANDALONE=true, DD_KUBERNETES_KUBELET_HOST=status.hostIP, and DD_KUBELET_TLS_VERIFY=false.
  • Sent a test trace with k8s.pod.uid; confirmed infraattributes processor enriched it with kube_deployment, kube_namespace, pod_name, kube_replica_set, pod_phase, and UST tags via the debug exporter.
  • Unit tests added for dogtelextension and fxutil.TestRun tests for both standalone and connected FX graphs.

Additional Notes

Deployments using infraattributes in standalone mode require:

  1. DD_KUBERNETES_KUBELET_HOST: status.hostIP env var
  2. DD_KUBELET_TLS_VERIFY=false (or kubelet CA cert)
  3. RBAC: get on nodes/proxy for the otel-agent ServiceAccount

@songy23 songy23 added this to the 7.78.0 milestone Mar 6, 2026
@songy23 songy23 added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests team/opentelemetry-agent labels Mar 6, 2026
@github-actions github-actions Bot added the long review PR is complex, plan time to review it label Mar 6, 2026
@dd-octo-sts dd-octo-sts Bot added internal Identify a non-fork PR team/agent-configuration labels Mar 6, 2026
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Mar 6, 2026

Go Package Import Differences

Baseline: 0b0f0c7
Comparison: f5a8d19

binaryosarchchange
otel-agentlinuxamd64
+83, -0
+code.cloudfoundry.org/garden
+code.cloudfoundry.org/garden/client
+code.cloudfoundry.org/garden/client/connection
+code.cloudfoundry.org/garden/routes
+code.cloudfoundry.org/garden/transport
+code.cloudfoundry.org/lager
+github.com/DataDog/datadog-agent/comp/core/hostname
+github.com/DataDog/datadog-agent/comp/core/hostname/hostnameimpl
+github.com/DataDog/datadog-agent/comp/core/tagger/collectors
+github.com/DataDog/datadog-agent/comp/core/tagger/common
+github.com/DataDog/datadog-agent/comp/core/tagger/fx
+github.com/DataDog/datadog-agent/comp/core/tagger/impl
+github.com/DataDog/datadog-agent/comp/core/tagger/k8s_metadata
+github.com/DataDog/datadog-agent/comp/core/tagger/mock
+github.com/DataDog/datadog-agent/comp/core/tagger/proto
+github.com/DataDog/datadog-agent/comp/core/tagger/server
+github.com/DataDog/datadog-agent/comp/core/tagger/subscriber
+github.com/DataDog/datadog-agent/comp/core/tagger/taglist
+github.com/DataDog/datadog-agent/comp/core/tagger/tagstore
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/baseimpl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/fx
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl/parse
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/program
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/proto
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/telemetry
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-otel
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/util
+github.com/DataDog/datadog-agent/comp/dogstatsd/packets
+github.com/DataDog/datadog-agent/comp/metadata/host
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/hosttags
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/utils
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost/inventoryhostimpl
+github.com/DataDog/datadog-agent/comp/metadata/packagesigning/utils
+github.com/DataDog/datadog-agent/comp/metadata/resources
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/def
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl/metrics
+github.com/DataDog/datadog-agent/pkg/collector/python
+github.com/DataDog/datadog-agent/pkg/gohai
+github.com/DataDog/datadog-agent/pkg/gohai/cpu
+github.com/DataDog/datadog-agent/pkg/gohai/filesystem
+github.com/DataDog/datadog-agent/pkg/gohai/memory
+github.com/DataDog/datadog-agent/pkg/gohai/network
+github.com/DataDog/datadog-agent/pkg/gohai/platform
+github.com/DataDog/datadog-agent/pkg/gohai/processes
+github.com/DataDog/datadog-agent/pkg/gohai/processes/gops
+github.com/DataDog/datadog-agent/pkg/gohai/utils
+github.com/DataDog/datadog-agent/pkg/gpu/tags
+github.com/DataDog/datadog-agent/pkg/logs/status
+github.com/DataDog/datadog-agent/pkg/logs/tailers
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/alibaba
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/cloudfoundry
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/ibm
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/kubernetes
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/oracle
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/tencent
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metadata
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsmanagedinstances
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/gpu
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/cloudprovider
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/clusterinfo
+github.com/DataDog/datadog-agent/pkg/util/net
+github.com/DataDog/datadog-agent/pkg/util/procfilestats
+github.com/DataDog/datadog-agent/pkg/util/size
+github.com/DataDog/datadog-agent/pkg/util/tags
+github.com/DataDog/datadog-agent/pkg/util/tmplvar
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/bmizerany/pat
+github.com/tedsuo/rata
otel-agentlinuxarm64
+83, -0
+code.cloudfoundry.org/garden
+code.cloudfoundry.org/garden/client
+code.cloudfoundry.org/garden/client/connection
+code.cloudfoundry.org/garden/routes
+code.cloudfoundry.org/garden/transport
+code.cloudfoundry.org/lager
+github.com/DataDog/datadog-agent/comp/core/hostname
+github.com/DataDog/datadog-agent/comp/core/hostname/hostnameimpl
+github.com/DataDog/datadog-agent/comp/core/tagger/collectors
+github.com/DataDog/datadog-agent/comp/core/tagger/common
+github.com/DataDog/datadog-agent/comp/core/tagger/fx
+github.com/DataDog/datadog-agent/comp/core/tagger/impl
+github.com/DataDog/datadog-agent/comp/core/tagger/k8s_metadata
+github.com/DataDog/datadog-agent/comp/core/tagger/mock
+github.com/DataDog/datadog-agent/comp/core/tagger/proto
+github.com/DataDog/datadog-agent/comp/core/tagger/server
+github.com/DataDog/datadog-agent/comp/core/tagger/subscriber
+github.com/DataDog/datadog-agent/comp/core/tagger/taglist
+github.com/DataDog/datadog-agent/comp/core/tagger/tagstore
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/baseimpl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/catalog
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/fx
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/impl/parse
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/program
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/proto
+github.com/DataDog/datadog-agent/comp/core/workloadfilter/telemetry
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/catalog-otel
+github.com/DataDog/datadog-agent/comp/core/workloadmeta/collectors/util
+github.com/DataDog/datadog-agent/comp/dogstatsd/packets
+github.com/DataDog/datadog-agent/comp/metadata/host
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/hosttags
+github.com/DataDog/datadog-agent/comp/metadata/host/hostimpl/utils
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost
+github.com/DataDog/datadog-agent/comp/metadata/inventoryhost/inventoryhostimpl
+github.com/DataDog/datadog-agent/comp/metadata/packagesigning/utils
+github.com/DataDog/datadog-agent/comp/metadata/resources
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/def
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl
+github.com/DataDog/datadog-agent/comp/otelcol/dogtelextension/impl/metrics
+github.com/DataDog/datadog-agent/pkg/collector/python
+github.com/DataDog/datadog-agent/pkg/gohai
+github.com/DataDog/datadog-agent/pkg/gohai/cpu
+github.com/DataDog/datadog-agent/pkg/gohai/filesystem
+github.com/DataDog/datadog-agent/pkg/gohai/memory
+github.com/DataDog/datadog-agent/pkg/gohai/network
+github.com/DataDog/datadog-agent/pkg/gohai/platform
+github.com/DataDog/datadog-agent/pkg/gohai/processes
+github.com/DataDog/datadog-agent/pkg/gohai/processes/gops
+github.com/DataDog/datadog-agent/pkg/gohai/utils
+github.com/DataDog/datadog-agent/pkg/gpu/tags
+github.com/DataDog/datadog-agent/pkg/logs/status
+github.com/DataDog/datadog-agent/pkg/logs/tailers
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/alibaba
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/cloudfoundry
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/ibm
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/kubernetes
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/oracle
+github.com/DataDog/datadog-agent/pkg/util/cloudproviders/tencent
+github.com/DataDog/datadog-agent/pkg/util/containers/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metadata
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/containerd
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/cri
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/docker
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsfargate
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/ecsmanagedinstances
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/kubelet
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/provider
+github.com/DataDog/datadog-agent/pkg/util/containers/metrics/system
+github.com/DataDog/datadog-agent/pkg/util/gpu
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/cloudprovider
+github.com/DataDog/datadog-agent/pkg/util/kubernetes/clusterinfo
+github.com/DataDog/datadog-agent/pkg/util/net
+github.com/DataDog/datadog-agent/pkg/util/procfilestats
+github.com/DataDog/datadog-agent/pkg/util/size
+github.com/DataDog/datadog-agent/pkg/util/tags
+github.com/DataDog/datadog-agent/pkg/util/tmplvar
+github.com/DataDog/datadog-agent/pkg/util/trie
+github.com/bmizerany/pat
+github.com/tedsuo/rata

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Mar 6, 2026

Files inventory check summary

File checks results against ancestor 0b0f0c74:

Results for datadog-agent_7.79.0~devel.git.297.f5a8d19.pipeline.105115926-1_amd64.deb:

No change detected

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Mar 6, 2026

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 0b0f0c7
📊 Static Quality Gates Dashboard
🔗 SQG Job

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64_fips +4.0 KiB (0.00% increase) 710.029 → 710.033 → 713.900
agent_rpm_amd64_fips +4.0 KiB (0.00% increase) 710.012 → 710.016 → 713.880
agent_suse_amd64_fips +4.0 KiB (0.00% increase) 710.012 → 710.016 → 713.880
docker_dogstatsd_arm64 +64.0 KiB (0.17% increase) 37.445 → 37.507 → 37.940
27 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
agent_deb_amd64 753.089 MiB
agent_heroku_amd64 313.318 MiB
agent_msi 604.881 MiB
agent_rpm_amd64 753.073 MiB
agent_rpm_arm64 731.483 MiB
agent_rpm_arm64_fips 691.455 MiB
agent_suse_amd64 753.073 MiB
agent_suse_arm64 731.483 MiB
agent_suse_arm64_fips 691.455 MiB
docker_agent_amd64 813.388 MiB
docker_agent_arm64 816.572 MiB
docker_agent_jmx_amd64 1004.303 MiB
docker_agent_jmx_arm64 996.266 MiB
docker_cluster_agent_amd64 203.941 MiB
docker_cluster_agent_arm64 218.419 MiB
docker_cws_instrumentation_amd64 7.142 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 39.238 MiB
dogstatsd_deb_amd64 29.881 MiB
dogstatsd_deb_arm64 28.034 MiB
dogstatsd_rpm_amd64 29.881 MiB
dogstatsd_suse_amd64 29.881 MiB
iot_agent_deb_amd64 43.289 MiB
iot_agent_deb_arm64 40.340 MiB
iot_agent_deb_armhf 41.088 MiB
iot_agent_rpm_amd64 43.290 MiB
iot_agent_suse_amd64 43.290 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +40.71 KiB (0.02% increase) 174.726 → 174.766 → 178.360
agent_deb_amd64_fips +2.06 KiB (0.00% increase) 165.363 → 165.365 → 172.790
agent_heroku_amd64 -5.75 KiB (0.01% reduction) 75.008 → 75.003 → 79.970
agent_msi -8.0 KiB (0.01% reduction) 138.391 → 138.383 → 146.220
agent_rpm_amd64 +24.36 KiB (0.01% increase) 177.574 → 177.598 → 181.830
agent_rpm_amd64_fips -62.79 KiB (0.04% reduction) 167.695 → 167.634 → 173.370
agent_rpm_arm64 neutral 159.591 MiB → 163.060
agent_rpm_arm64_fips +15.51 KiB (0.01% increase) 151.385 → 151.400 → 156.170
agent_suse_amd64 +24.36 KiB (0.01% increase) 177.574 → 177.598 → 181.830
agent_suse_amd64_fips -62.79 KiB (0.04% reduction) 167.695 → 167.634 → 173.370
agent_suse_arm64 neutral 159.591 MiB → 163.060
agent_suse_arm64_fips +15.51 KiB (0.01% increase) 151.385 → 151.400 → 156.170
docker_agent_amd64 -4.06 KiB (0.00% reduction) 268.198 → 268.194 → 272.480
docker_agent_arm64 +5.04 KiB (0.00% increase) 255.401 → 255.406 → 261.060
docker_agent_jmx_amd64 neutral 336.849 MiB → 341.100
docker_agent_jmx_arm64 +3.03 KiB (0.00% increase) 320.035 → 320.038 → 325.620
docker_cluster_agent_amd64 neutral 71.373 MiB → 72.920
docker_cluster_agent_arm64 neutral 66.995 MiB → 68.220
docker_cws_instrumentation_amd64 neutral 2.999 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.729 MiB → 3.090
docker_dogstatsd_amd64 neutral 15.175 MiB → 15.820
docker_dogstatsd_arm64 +6.96 KiB (0.05% increase) 14.488 → 14.494 → 14.830
dogstatsd_deb_amd64 neutral 7.893 MiB → 8.790
dogstatsd_deb_arm64 neutral 6.779 MiB → 7.710
dogstatsd_rpm_amd64 neutral 7.903 MiB → 8.800
dogstatsd_suse_amd64 neutral 7.903 MiB → 8.800
iot_agent_deb_amd64 +5.24 KiB (0.04% increase) 11.401 → 11.406 → 12.040
iot_agent_deb_arm64 neutral 9.705 MiB → 10.450
iot_agent_deb_armhf neutral 9.941 MiB → 10.620
iot_agent_rpm_amd64 -2.62 KiB (0.02% reduction) 11.420 → 11.418 → 12.060
iot_agent_suse_amd64 -2.62 KiB (0.02% reduction) 11.420 → 11.418 → 12.060

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Mar 6, 2026

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 2973550c-e784-4602-98c8-c685d4dd56e0

Baseline: 1555c13
Comparison: 2e3b712
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -2.12 [-5.10, +0.86] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
otlp_ingest_logs memory utilization +1.62 [+1.50, +1.74] 1 Logs
ddot_metrics_sum_delta memory utilization +0.44 [+0.27, +0.60] 1 Logs
ddot_metrics_sum_cumulative memory utilization +0.32 [+0.18, +0.46] 1 Logs
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization +0.21 [+0.15, +0.27] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization +0.15 [-0.08, +0.37] 1 Logs
file_to_blackhole_0ms_latency egress throughput +0.01 [-0.53, +0.54] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.00 [-0.11, +0.11] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.00 [-0.21, +0.20] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.02 [-0.22, +0.18] 1 Logs
file_to_blackhole_100ms_latency egress throughput -0.02 [-0.11, +0.07] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.05 [-0.47, +0.38] 1 Logs
ddot_metrics memory utilization -0.05 [-0.22, +0.13] 1 Logs
file_to_blackhole_500ms_latency egress throughput -0.05 [-0.45, +0.35] 1 Logs
otlp_ingest_metrics memory utilization -0.07 [-0.23, +0.10] 1 Logs
docker_containers_memory memory utilization -0.07 [-0.14, +0.00] 1 Logs
quality_gate_idle_all_features memory utilization -0.21 [-0.25, -0.17] 1 Logs bounds checks dashboard
quality_gate_idle memory utilization -0.29 [-0.34, -0.24] 1 Logs bounds checks dashboard
file_tree memory utilization -0.40 [-0.46, -0.34] 1 Logs
ddot_logs memory utilization -0.83 [-0.89, -0.76] 1 Logs
tcp_syslog_to_blackhole ingress throughput -0.85 [-1.02, -0.69] 1 Logs
quality_gate_metrics_logs memory utilization -1.22 [-1.46, -0.99] 1 Logs bounds checks dashboard
quality_gate_logs % cpu utilization -1.55 [-3.15, +0.04] 1 Logs bounds checks dashboard
docker_containers_cpu % cpu utilization -2.12 [-5.10, +0.86] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
docker_containers_cpu simple_check_run 10/10 710 ≥ 26
docker_containers_memory memory_usage 10/10 271.23MiB ≤ 370MiB
docker_containers_memory simple_check_run 10/10 703 ≥ 26
file_to_blackhole_0ms_latency memory_usage 10/10 0.19GiB ≤ 1.20GiB
file_to_blackhole_0ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_1000ms_latency memory_usage 10/10 0.23GiB ≤ 1.20GiB
file_to_blackhole_1000ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_100ms_latency memory_usage 10/10 0.19GiB ≤ 1.20GiB
file_to_blackhole_100ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_500ms_latency memory_usage 10/10 0.21GiB ≤ 1.20GiB
file_to_blackhole_500ms_latency missed_bytes 10/10 0B = 0B
quality_gate_idle intake_connections 10/10 3 = 3 bounds checks dashboard
quality_gate_idle memory_usage 10/10 172.59MiB ≤ 175MiB bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 3 = 3 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 491.51MiB ≤ 550MiB bounds checks dashboard
quality_gate_logs intake_connections 10/10 4 ≤ 6 bounds checks dashboard
quality_gate_logs memory_usage 10/10 204.01MiB ≤ 220MiB bounds checks dashboard
quality_gate_logs missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 361.74 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 414.26MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs missed_bytes 10/10 0B = 0B bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.

@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 0f93f0b to 5d620f6 Compare March 9, 2026 16:48
@dd-octo-sts dd-octo-sts Bot added team/container-platform The Container Platform Team team/agent-devx labels Mar 10, 2026
@songy23 songy23 requested a review from truthbk March 10, 2026 20:35
Introduces the dogtelextension OTel Collector extension and refactors
otel-agent startup to support standalone mode (DD_OTEL_STANDALONE=true),
enabling the otel-agent to run independently without a core Datadog Agent.

Key changes:

- dogtelextension (comp/otelcol/dogtelextension): New OTel Collector
  extension providing a tagger gRPC server, host metadata submission,
  and secrets resolution for standalone mode.

- Standalone/connected FX split (cmd/otel-agent/subcommands/run):
  Refactors otel-agent startup into commonAgentFxOptions plus mode-
  specific standaloneAgentFxOptions / connectedAgentFxOptions. Standalone
  mode wires local hostname, real secrets backend, local tagger, host
  metadata runner, and disables on-init config sync. Connected mode
  keeps remote hostname, remote tagger, and core-agent config sync.

- K8s tag enrichment (comp/core/workloadmeta/collectors/catalog-otel):
  New catalog-otel workloadmeta catalog (kubelet, containerd, docker,
  ECS, crio, podman) compiled into otel-agent via the new kubelet build
  tag. In standalone mode the infraattributes processor enriches spans,
  metrics, and logs with K8s tags (kube_deployment, kube_namespace,
  pod_name, etc.) via the local tagger.

Deployments require DD_KUBERNETES_KUBELET_HOST=status.hostIP,
DD_KUBELET_TLS_VERIFY=false (or CA cert), and nodes/proxy RBAC on the
otel-agent ServiceAccount for K8s tag enrichment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 0dcb28d to 3d7b219 Compare March 10, 2026 21:12
Copy link
Copy Markdown
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean bootstrap! Also love how you were able to bring in the best of both worlds with fx + actual otel extension interfaces; and that resolves the extension configuration issue very cleanly. This is awesome.

We have to talk about what the otel-agent should default to, but this is a great start.

)
}

if acfg.GetBool("otel_standalone") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I have some doubts with this: should we instead consider a check on otel_bundled? Or !acfg.GetBool("otel_standalone")?

On one hand this is better because it's backward compatible with our operator and helm charts. On the other it's not ideal because we'd have to set an env var when deploying with the otel operator/helm. We really do want to make a strong attempt to minimize the number of steps our OTel customers need to take on tooling we don't have full control over. Let's discuss this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers would have to set env vars in the otel operator/helm already, e.g. DD_OTELCOLLECTOR_ENABLED. Setting one more env var is probably fine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should optimize to minimize the number of options a customer needs to set on the OpenTelemetry operator/helm chart. I feel like we can get away with a lot more of that transparently on the DD side.

I'm fine with merging this as-is; but I also think there's chances we want to revisit this specifically.

Comment thread comp/otelcol/dogtelextension/impl/config.go Outdated
songy23 and others added 2 commits March 13, 2026 15:54
…andalone mode

- Apply dogtelextension settings to DD agent pkgconfig only when
  otel_standalone=true; connected mode leaves core agent config untouched.
- Make EnableMetadataCollection a *bool (like KubeletTLSVerify) so absence
  preserves the agent default rather than forcing false.
- Add MetadataInterval default (1800 s) to comment.
- Gate standalone block with pkgconfig.GetBool("otel_standalone").
- Add TestDogtelExtensionConfig_ConnectedModeIgnored to assert dogtelextension
  fields are no-ops in connected mode.
- Tests use DD_OTEL_STANDALONE=true env var for standalone test cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread .github/CODEOWNERS Outdated
@songy23 songy23 force-pushed the yang.song/OTAGENT-824 branch from 60768d2 to 4c5b322 Compare March 13, 2026 20:56
@songy23 songy23 marked this pull request as ready for review March 16, 2026 08:36
@songy23 songy23 requested a review from a team as a code owner March 16, 2026 08:36
Copy link
Copy Markdown
Contributor

@jeremy-hanna jeremy-hanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for agent-runtime owned files

Copy link
Copy Markdown
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Added a couple of nits you can feel free to ignore. I do think for the actual standalone vs connected default path we may have to make some changes, but we can do that later once we take on the deployment question more specifically. At that point we'll have a better understanding of what's better.

Comment thread cmd/otel-agent/config/agent_config.go
)
}

if acfg.GetBool("otel_standalone") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should optimize to minimize the number of options a customer needs to set on the OpenTelemetry operator/helm chart. I feel like we can get away with a lot more of that transparently on the DD side.

I'm fine with merging this as-is; but I also think there's chances we want to revisit this specifically.

songy23 and others added 2 commits March 20, 2026 14:26
…er stream subscribers

- getDogtelExtensionConfig now returns an error when multiple dogtel*
  extension entries are found instead of silently picking one
- stopTaggerServer replaces unbounded GracefulStop() with a 5-second
  timeout that falls back to Stop(), preventing long-lived
  TaggerStreamEntities subscribers from blocking otel-agent termination
- Add unit tests for both changes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23
Copy link
Copy Markdown
Member Author

songy23 commented Mar 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90662a7436

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread cmd/otel-agent/subcommands/run/command.go
Comment thread cmd/otel-agent/config/agent_config.go Outdated
songy23 and others added 4 commits March 30, 2026 13:26
…ders list

Setting metadata_interval in the dogtel extension config was replacing
metadata_providers wholesale with a single {name: host} entry, silently
dropping any other providers (e.g. "resources") configured in datadog.yaml.

Read the existing providers first, update the host entry in place (or
append it if absent), then write back the merged list. Handle both
map[string]interface{} and the map[interface{}]interface{} type that YAML
v2 produces for maps inside sequences.

Add a regression test that pre-seeds a "resources" provider in datadog.yaml
and asserts it survives alongside the updated host interval.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add TestFxRun_NoDatadogExporter_Standalone and its config fixture to
cover the case where the otel-agent runs in standalone mode with no
datadog exporter in the pipeline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23 songy23 requested a review from a team as a code owner March 30, 2026 17:39
@songy23 songy23 removed the changelog/no-changelog No changelog entry needed label Mar 30, 2026
@songy23 songy23 requested a review from a team March 30, 2026 17:45
Copy link
Copy Markdown
Contributor

@gabedos gabedos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otel workloadmeta catalog lgtm!

songy23 and others added 5 commits March 30, 2026 14:09
…ta catalog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… files

Packages with local BUILD.bazel files that load @rules_go (comp/core/log/def,
comp/core/ipc/def) cannot be referenced via @com_github_datadog_... external
repo labels because @rules_go is not visible in the external module context.
Use local //comp/... paths instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Consistent with ddflareextension and ddprofilingextension, exclude the
dogtelextension fx and impl directories from gazelle. The impl/BUILD.bazel
uses local //comp/core/... paths for sub-modules that have their own
BUILD.bazel files, which gazelle would incorrectly revert to @com_github_...
external refs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… dep chain

The gazelle-generated BUILD.bazel files for dogtelextension impl, fx,
metrics, and metadata reference @com_github_datadog_datadog_agent_pkg_metrics
and related external deps. These chain through pkg/util/buf which has a
broken external BUILD.bazel (loads @rules_go unavailable in external context).

Following the pattern of ddflareextension and ddprofilingextension, only
def/BUILD.bazel is retained. The impl subtree cannot be built via Bazel
because its transitive deps (pkg/metrics, pkg/serializer, pkg/util/grpc)
lack local BUILD.bazel files and their external dep chains are broken.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@songy23 songy23 modified the milestones: 7.78.0, 7.79.0 Mar 30, 2026
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 2e3b712 into main Mar 30, 2026
283 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the yang.song/OTAGENT-824 branch March 30, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ask-review Ask required teams to review this PR internal Identify a non-fork PR long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/agent-build team/agent-configuration team/agent-devx team/agent-runtimes team/container-platform The Container Platform Team team/opentelemetry-agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants