[DatadogMonitor] Unregister metrics forwarder on finalization#2804
[DatadogMonitor] Unregister metrics forwarder on finalization#2804
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f71bcbad8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err != nil { | ||
| logger.Error(err, "failed to finalize monitor", "Monitor ID", fmt.Sprint(dm.Status.ID)) | ||
|
|
||
| return | ||
| return err | ||
| } |
There was a problem hiding this comment.
Treat monitor-not-found as successful finalization
Returning every deleteMonitor error here can permanently block CR deletion when the Datadog monitor was already removed out-of-band (for example, deleted in Datadog UI before deleting the DatadogMonitor resource). In that case Datadog returns a 404, this path keeps the finalizer forever, and the Kubernetes object stays in Terminating until users manually patch finalizers. The controller already treats 404 as a recoverable state in normal reconcile (ctrutils.NotFoundString), so finalization should do the same.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in b625500 : indeed, an already deleted monitor (from the UI / somewhere else) should not prevent the finalization to complete
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2804 +/- ##
==========================================
- Coverage 38.83% 38.57% -0.27%
==========================================
Files 309 311 +2
Lines 26906 27422 +516
==========================================
+ Hits 10450 10579 +129
- Misses 15674 16054 +380
- Partials 782 789 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| if r.forwarders != nil { | ||
| r.forwarders.Unregister(dm) | ||
| } |
There was a problem hiding this comment.
This is the part that fixes the leak
* [DatadogMonitor] Unregister metrics forwarder on finalization * ignore 404 not found and consider success (cherry picked from commit 15a35f5)
What does this PR do?
DatadogMonitor, make sure to un-register its associated metrics forwarder to close the channel404: if a monitor in the process of being deleted is not found, it was already deleted from the UI/api somewhere else, so we should keep proceeding with the deletion. All other errors should be retriedMotivation
Fixes #2803 -> goroutine leak
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
--datadogMonitorEnabled=true(requires addingDD_API_KEYandDD_APP_KEYto your operator deployment)k port-forward deploy/datadog-operator-manager 8080:8080curl -s localhost:8080/metrics | grep -i go_goroutinesNAMESPACE=<operator namespace> bash datadogmonitor-load-test.sh create 100NAMESPACE=<operator namespace> bash datadogmonitor-load-test.sh deleteLoad test script
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel