Skip to content

[Backport v1.25] [DatadogMonitor] Unregister metrics forwarder on finalization#2824

Merged
tbavelier merged 1 commit intov1.25from
backport-2804-to-v1.25
Mar 25, 2026
Merged

[Backport v1.25] [DatadogMonitor] Unregister metrics forwarder on finalization#2824
tbavelier merged 1 commit intov1.25from
backport-2804-to-v1.25

Conversation

@dd-octo-sts
Copy link

@dd-octo-sts dd-octo-sts bot commented Mar 25, 2026

Backport 15a35f5 from #2804.


What does this PR do?

  • On finalization (deletion) of a DatadogMonitor, make sure to un-register its associated metrics forwarder to close the channel
  • If error on deletion attempt, do not remove finalizer and instead requeue for a next attempt except in the case of a 404: if a monitor in the process of being deleted is not found, it was already deleted from the UI/api somewhere else, so we should keep proceeding with the deletion. All other errors should be retried

Motivation

Fixes #2803 -> goroutine leak

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

  • Deploy with --datadogMonitorEnabled=true (requires adding DD_API_KEY and DD_APP_KEY to your operator deployment)
  • Expose locally the operator deploy to get goroutines (note: the port might be different based on the deployment method) k port-forward deploy/datadog-operator-manager 8080:8080
  • Once it's leader, check the baseline number of goroutines: curl -s localhost:8080/metrics | grep -i go_goroutines
  • Create 100 DatadogMonitor with NAMESPACE=<operator namespace> bash datadogmonitor-load-test.sh create 100
  • Wait for a minute or two, and verify the new number of goroutines is baseline + ~ 100 (1 goroutine per monitor)
  • Delete the monitors with NAMESPACE=<operator namespace> bash datadogmonitor-load-test.sh delete
  • Verify after a minute or two that the number of goroutines is back to baseline (around 100 less goroutines)

Load test script

#!/usr/bin/env bash

set -euo pipefail

usage() {
  cat <<'EOF'
Usage:
  datadogmonitor-load-test.sh create [count]
  datadogmonitor-load-test.sh delete

Environment variables:
  NAMESPACE     Target namespace. Default: datadog-monitor-load-test
  PREFIX        Resource name prefix. Default: leak-check
  LABEL_KEY     Label key used for cleanup. Default: load-test.datadoghq.com/group
  LABEL_VALUE   Label value used for cleanup. Default: datadogmonitor-leak
  QUERY         Monitor query. Default: avg(last_5m):avg:system.cpu.user{*} > 100
  MESSAGE       Monitor message. Default: DatadogMonitor leak test

Examples:
  bash hack/datadogmonitor-load-test.sh create 1000
  bash hack/datadogmonitor-load-test.sh delete
EOF
}

if [[ $# -lt 1 ]]; then
  usage
  exit 1
fi

mode="$1"
count="${2:-1000}"
namespace="${NAMESPACE:-datadog-monitor-load-test}"
prefix="${PREFIX:-leak-check}"
label_key="${LABEL_KEY:-load-test.datadoghq.com/group}"
label_value="${LABEL_VALUE:-datadogmonitor-leak}"
query="${QUERY:-avg(last_5m):avg:system.load.1{*\} > 100}"
message="${MESSAGE:-DatadogMonitor leak test}"

yaml_escape() {
  printf '%s' "$1" | sed 's/\\/\\\\/g; s/"/\\"/g'
}

case "$mode" in
  create)
    query_escaped="$(yaml_escape "$query")"
    message_escaped="$(yaml_escape "$message")"
    kubectl get namespace "$namespace" >/dev/null 2>&1 || kubectl create namespace "$namespace" >/dev/null
    {
      echo "apiVersion: v1"
      echo "kind: List"
      echo "items:"
      for i in $(seq 1 "$count"); do
        cat <<EOF
- apiVersion: datadoghq.com/v1alpha1
  kind: DatadogMonitor
  metadata:
    name: ${prefix}-$(printf "%05d" "$i")
    namespace: ${namespace}
    labels:
      ${label_key}: "${label_value}"
  spec:
    name: "${prefix}-$(printf "%05d" "$i")"
    type: "metric alert"
    query: "${query_escaped}"
    message: "${message_escaped}"
EOF
      done
    } | kubectl apply -f -
    ;;
  delete)
    kubectl delete datadogmonitors.datadoghq.com -n "$namespace" -l "${label_key}=${label_value}" --ignore-not-found
    ;;
  *)
    usage
    exit 1
    ;;
esac

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

* [DatadogMonitor] Unregister metrics forwarder on finalization

* ignore 404 not found and consider success

(cherry picked from commit 15a35f5)
@dd-octo-sts dd-octo-sts bot added the bug Something isn't working label Mar 25, 2026
@dd-octo-sts dd-octo-sts bot requested a review from a team as a code owner March 25, 2026 12:56
@dd-octo-sts dd-octo-sts bot added qa/skip-qa backport label added by backport action bot label added by backport bot labels Mar 25, 2026
@dd-octo-sts dd-octo-sts bot requested a review from a team as a code owner March 25, 2026 12:56
@dd-octo-sts dd-octo-sts bot added bug Something isn't working qa/skip-qa backport label added by backport action bot label added by backport bot labels Mar 25, 2026
@dd-octo-sts dd-octo-sts bot added this to the v1.25.0 milestone Mar 25, 2026
@codecov-commenter
Copy link

codecov-commenter commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.64%. Comparing base (776b124) to head (0d49848).

Files with missing lines Patch % Lines
internal/controller/datadogmonitor/finalizer.go 28.57% 3 Missing and 2 partials ⚠️
internal/controller/datadogmonitor/monitor.go 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##            v1.25    #2824      +/-   ##
==========================================
- Coverage   38.65%   38.64%   -0.01%     
==========================================
  Files         309      309              
  Lines       26727    26734       +7     
==========================================
+ Hits        10330    10331       +1     
- Misses      15623    15627       +4     
- Partials      774      776       +2     
Flag Coverage Δ
unittests 38.64% <27.27%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogmonitor/monitor.go 73.59% <25.00%> (-0.70%) ⬇️
internal/controller/datadogmonitor/finalizer.go 60.00% <28.57%> (-7.75%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 776b124...0d49848. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tbavelier tbavelier merged commit 1dd69d7 into v1.25 Mar 25, 2026
75 of 76 checks passed
@tbavelier tbavelier deleted the backport-2804-to-v1.25 branch March 25, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport label added by backport action bot label added by backport bot bug Something isn't working qa/skip-qa team/container-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants