Skip to content

Component reconcile refactor#2434

Merged
levan-m merged 15 commits intomainfrom
levan-m/component-reconcile-refactor
Feb 19, 2026
Merged

Component reconcile refactor#2434
levan-m merged 15 commits intomainfrom
levan-m/component-reconcile-refactor

Conversation

@levan-m
Copy link
Collaborator

@levan-m levan-m commented Dec 26, 2025

What does this PR do?

Ultimate goal of this change is to move Reconcile and Cleanup to ComponentRegistry and remove those from the ComponentReconciler interface. Assumption is that these two should be almost same for all components (except maybe Agent). Two hooks are still necessary: 1) for cleaning up DCA RBAC 2) deleting CLC if DCA is disabled. These are added to the interface.

ComponentReconciler/Registry refactor following approach in #2380.

  • eb5b177 carries over changes from Levan m/dca ccr reconciler refactor #2380. Mostly affect DCA, CLC components; also adds controller_reconcile_deployment_test.go to assert on existing behavior.
  • e66e910 adds DDAI support in controller_reconcile_deployment_test.go but disables as tests fail. There seems some variation between DDA/DDAI components, hence this change can't be drop-in replacement of DDAI components.
  • 3ea55d3 function signature changes to align DCA and CLC components.
  • b96dc88 This should be only functional change in the PR. splits cleanupV2*** functions in two parts: 1) deleting deployment 2) updating status. Regular reconcile flow executes both, cleanupOld*** only deployment deletion. Reasoning is that deleting old/stale deployments after rename shouldn't cleanup status on DDA.
  • 1f6f9e6 move deployment deletion part to controller_reconcile_v2_helpers.go as this is not component specific any more.
  • 04a98b6 adds function getters to interface so they can be used in ComponentRegistry, some minor cleanup.
  • f9eb064 moves ComponentReconcilerReconcile() from components to a common one in ComponentRegistry, removes interface function.
  • 17063f8 does same for Cleanup.

Motivation

What inspired you to submit this pull request?

Additional Notes

Anything else we should know when reviewing?

Minimum Agent Versions

Are there minimum versions of the Datadog Agent and/or Cluster Agent required?

  • Agent: vX.Y.Z
  • Cluster Agent: vX.Y.Z

Describe your test plan

Write there any instructions and details you may have to test your PR.

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label

@levan-m levan-m modified the milestones: v1.23.0, v1.22.0 Dec 26, 2025
@levan-m levan-m force-pushed the levan-m/component-reconcile-refactor branch from 57dcb1e to 17063f8 Compare December 26, 2025 19:16
@codecov-commenter
Copy link

codecov-commenter commented Dec 26, 2025

Codecov Report

❌ Patch coverage is 74.21053% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.60%. Comparing base (c645079) to head (4414dd8).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...al/controller/datadogagent/component_reconciler.go 68.42% 11 Missing and 7 partials ⚠️
...er/datadogagent/controller_reconcile_v2_helpers.go 84.84% 6 Missing and 4 partials ⚠️
...troller/datadogagent/component_otelagentgateway.go 50.00% 8 Missing ⚠️
pkg/testutils/builder.go 0.00% 5 Missing ⚠️
...ller/datadogagent/component_clusterchecksrunner.go 90.90% 1 Missing and 1 partial ⚠️
...adogagentinternal/component_clusterchecksrunner.go 0.00% 2 Missing ⚠️
...datadogagentinternal/component_otelagentgateway.go 0.00% 2 Missing ⚠️
...adogagent/component/clusterchecksrunner/default.go 0.00% 1 Missing ⚠️
...datadogagent/component/otelagentgateway/default.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2434      +/-   ##
==========================================
+ Coverage   38.33%   38.60%   +0.26%     
==========================================
  Files         305      305              
  Lines       26203    26685     +482     
==========================================
+ Hits        10046    10302     +256     
- Misses      15398    15604     +206     
- Partials      759      779      +20     
Flag Coverage Δ
unittests 38.60% <74.21%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../controller/datadogagent/component_clusteragent.go 82.35% <100.00%> (+10.74%) ⬆️
...adogagent/component/clusterchecksrunner/default.go 10.52% <0.00%> (ø)
...datadogagent/component/otelagentgateway/default.go 0.00% <0.00%> (ø)
...ller/datadogagent/component_clusterchecksrunner.go 93.75% <90.90%> (+15.37%) ⬆️
...adogagentinternal/component_clusterchecksrunner.go 40.25% <0.00%> (ø)
...datadogagentinternal/component_otelagentgateway.go 0.00% <0.00%> (ø)
pkg/testutils/builder.go 0.00% <0.00%> (ø)
...troller/datadogagent/component_otelagentgateway.go 61.53% <50.00%> (+17.78%) ⬆️
...er/datadogagent/controller_reconcile_v2_helpers.go 63.67% <84.84%> (+6.42%) ⬆️
...al/controller/datadogagent/component_reconciler.go 70.96% <68.42%> (-6.23%) ⬇️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c645079...4414dd8. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@levan-m levan-m marked this pull request as ready for review December 26, 2025 19:46
@levan-m levan-m requested a review from a team as a code owner December 26, 2025 19:46
Copy link
Member

@tbavelier tbavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the implementation in the DDAI code path, which was not changed to follow the same pattern.
The bug I described below only applies in the DDA code path

condition.UpdateDatadogAgentStatusConditions(
params.Status,
now,
metav1.NewTime(time.Now()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended instead of using the function-level now ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is moved to a different function I left as is, don't see reason to have same in all functions and cleanup is entirely separate code branch anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an issue here: if the component override struct is not empty but not explicitly disabled, we reconcile a component that should not be possibly (CCR is not enabled by default).

  features:
    clusterChecks:
      enabled: true
      useClusterChecksRunners: false
  override:
    clusterChecksRunner:
      disabled: true
      containers:
        agent:
          resources:
            requests: 256m

-> we get a CCR deployment (no pods scheduled cuz missing dependencies but the component shouldnt be reconciled at all)

We could early exit instead of further reconciling:

	// Explicit override disable always wins (and may create a conflict condition if the component is otherwise enabled).
	if ok && apiutils.BoolValue(componentOverride.Disabled) {
		if componentEnabled {
			// The override supersedes what's set in requiredComponents; update status to reflect the conflict
			condition.UpdateDatadogAgentStatusConditions(
				params.Status,
				now,
				common.OverrideReconcileConflictConditionType,
				metav1.ConditionTrue,
				"OverrideConflict",
				fmt.Sprintf("%s component is set to disabled", component.Name()),
				true,
			)
		}
		return r.Cleanup(ctx, params, component)
	}

	// If the component isn't enabled, we should cleanup regardless of whether an override struct exists.
	// (Overrides should not implicitly enable a component.)
	if !componentEnabled {
		return r.Cleanup(ctx, params, component)
	}

	// Apply non-disable overrides.
	if ok {
		override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
		override.Deployment(deployment, componentOverride)
	}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. Yea implemented similar logic but in a bit different way. I separated cleanup (except forceCleanup) from creation/update and moved in ReconcileComponent in f9072fe. Also added test to cover the case.

continue
return r.Cleanup(ctx, params, component)
}
override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
override.PodTemplateSpec(deploymentLogger, podManagers, componentOverride, component.Name(), params.DDA.Name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f9072fe

@levan-m levan-m modified the milestones: v1.22.0, v1.23.0, v1.24.0 Jan 21, 2026
Copy link
Member

@tbavelier tbavelier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One nit about now usage, but doesn't matter much.
AI-generated: Also possible boilerplate reduction in component by moving some common parts in a "base" 70a3418 ? However, this has the risk of being a "god" object that gets cluttered more and more every time we need to tweak it further + it's once again a new layer of abstraction so more complexity in understanding the flow. Gain is that each component implementation is easier. To discuss offline

condition.UpdateDatadogAgentStatusConditions(
params.Status,
now,
metav1.NewTime(time.Now()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metav1.NewTime(time.Now()),
now,

with now := metav1.NewTime(time.Now()) before the loop ? Doesn't matter much, just happens in case of conflict, but just to be "consistent" with other places ?

@levan-m levan-m merged commit 808adb9 into main Feb 19, 2026
33 checks passed
@levan-m levan-m deleted the levan-m/component-reconcile-refactor branch February 19, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants