Skip to content

fix monitor recreate#3050

Merged
frank-spano merged 2 commits into
mainfrom
fix/datadogmonitor-recreate-on-404
Jun 3, 2026
Merged

fix monitor recreate#3050
frank-spano merged 2 commits into
mainfrom
fix/datadogmonitor-recreate-on-404

Conversation

@frank-spano
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes the DatadogMonitor controller to recreate a monitor when it has been deleted out-of-band (e.g. from the Datadog UI). The "manifest changed" path now does a GET before the PUT and falls back to create on 404, and the update path catches 404 by clearing status.ID so the next reconcile recreates instead of looping on "Monitor not found".

Motivation

CONS-8333: customer (Grainger) reported the controller getting stuck logging 404s forever when a monitor was deleted in the UI and the CR spec was then edited before the next sync.

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

Write there any instructions and details you may have to test your PR.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@frank-spano frank-spano requested a review from a team May 27, 2026 15:58
@datadog-official

This comment has been minimized.

@frank-spano frank-spano added bug Something isn't working qa/skip-qa labels May 27, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 42.54%. Comparing base (c85321f) to head (0912768).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3050      +/-   ##
==========================================
+ Coverage   42.43%   42.54%   +0.10%     
==========================================
  Files         337      337              
  Lines       29020    29029       +9     
==========================================
+ Hits        12315    12350      +35     
+ Misses      15895    15869      -26     
  Partials      810      810              
Flag Coverage Δ
unittests 42.54% <100.00%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
internal/controller/datadogmonitor/controller.go 60.37% <100.00%> (+12.59%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c85321f...0912768. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@levan-m levan-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, two minor comments

// instead of looping on 404.
logger.V(1).Info("DatadogMonitor manifest has changed")
shouldUpdate = true
_, err = r.get(auth, instance, newStatus)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - this and next if block has become identical and could be merged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataDog fix this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only run on private repositories.

assert.NoError(t, err)
rec.client.Get(context.TODO(), types.NamespacedName{Name: resourcesName, Namespace: resourcesNamespace}, dm)

assert.Equal(t, 0, dm.Status.ID, "status.ID should be cleared after update returns 404 so next reconcile recreates")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completness sake, do we need to assert on next reconcile outcome?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataDog fix this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only run on private repositories.

@frank-spano frank-spano merged commit dad893d into main Jun 3, 2026
38 checks passed
@frank-spano frank-spano deleted the fix/datadogmonitor-recreate-on-404 branch June 3, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants