Conversation
…ts/utils; test pass
57dcb1e to
17063f8
Compare
tbavelier
left a comment
There was a problem hiding this comment.
We also have the implementation in the DDAI code path, which was not changed to follow the same pattern.
The bug I described below only applies in the DDA code path
| condition.UpdateDatadogAgentStatusConditions( | ||
| params.Status, | ||
| now, | ||
| metav1.NewTime(time.Now()), |
There was a problem hiding this comment.
Is this intended instead of using the function-level now ?
There was a problem hiding this comment.
Since this is moved to a different function I left as is, don't see reason to have same in all functions and cleanup is entirely separate code branch anyway.
There was a problem hiding this comment.
We have an issue here: if the component override struct is not empty but not explicitly disabled, we reconcile a component that should not be possibly (CCR is not enabled by default).
features:
clusterChecks:
enabled: true
useClusterChecksRunners: false
override:
clusterChecksRunner:
disabled: true
containers:
agent:
resources:
requests: 256m-> we get a CCR deployment (no pods scheduled cuz missing dependencies but the component shouldnt be reconciled at all)
We could early exit instead of further reconciling:
// Explicit override disable always wins (and may create a conflict condition if the component is otherwise enabled).
if ok && apiutils.BoolValue(componentOverride.Disabled) {
if componentEnabled {
// The override supersedes what's set in requiredComponents; update status to reflect the conflict
condition.UpdateDatadogAgentStatusConditions(
params.Status,
now,
common.OverrideReconcileConflictConditionType,
metav1.ConditionTrue,
"OverrideConflict",
fmt.Sprintf("%s component is set to disabled", component.Name()),
true,
)
}
return r.Cleanup(ctx, params, component)
}
// If the component isn't enabled, we should cleanup regardless of whether an override struct exists.
// (Overrides should not implicitly enable a component.)
if !componentEnabled {
return r.Cleanup(ctx, params, component)
}
// Apply non-disable overrides.
if ok {
override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name)
override.Deployment(deployment, componentOverride)
}There was a problem hiding this comment.
Good find. Yea implemented similar logic but in a bit different way. I separated cleanup (except forceCleanup) from creation/update and moved in ReconcileComponent in f9072fe. Also added test to cover the case.
| continue | ||
| return r.Cleanup(ctx, params, component) | ||
| } | ||
| override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name) |
There was a problem hiding this comment.
| override.PodTemplateSpec(params.Logger, podManagers, componentOverride, component.Name(), params.DDA.Name) | |
| override.PodTemplateSpec(deploymentLogger, podManagers, componentOverride, component.Name(), params.DDA.Name) |
…ing to deployment creation
Co-authored-by: Timothée Bavelier <97530782+tbavelier@users.noreply.github.com>
tbavelier
left a comment
There was a problem hiding this comment.
Thanks! One nit about now usage, but doesn't matter much.
AI-generated: Also possible boilerplate reduction in component by moving some common parts in a "base" 70a3418 ? However, this has the risk of being a "god" object that gets cluttered more and more every time we need to tweak it further + it's once again a new layer of abstraction so more complexity in understanding the flow. Gain is that each component implementation is easier. To discuss offline
| condition.UpdateDatadogAgentStatusConditions( | ||
| params.Status, | ||
| now, | ||
| metav1.NewTime(time.Now()), |
There was a problem hiding this comment.
| metav1.NewTime(time.Now()), | |
| now, |
with now := metav1.NewTime(time.Now()) before the loop ? Doesn't matter much, just happens in case of conflict, but just to be "consistent" with other places ?
What does this PR do?
Ultimate goal of this change is to move
ReconcileandCleanupto ComponentRegistry and remove those from theComponentReconcilerinterface. Assumption is that these two should be almost same for all components (except maybe Agent). Two hooks are still necessary: 1) for cleaning up DCA RBAC 2) deleting CLC if DCA is disabled. These are added to the interface.ComponentReconciler/Registry refactor following approach in #2380.
controller_reconcile_deployment_test.goto assert on existing behavior.controller_reconcile_deployment_test.gobut disables as tests fail. There seems some variation between DDA/DDAI components, hence this change can't be drop-in replacement of DDAI components.cleanupV2***functions in two parts: 1) deleting deployment 2) updating status. Regular reconcile flow executes both,cleanupOld***only deployment deletion. Reasoning is that deleting old/stale deployments after rename shouldn't cleanup status on DDA.controller_reconcile_v2_helpers.goas this is not component specific any more.ComponentRegistry, some minor cleanup.ComponentReconcilerReconcile()from components to a common one inComponentRegistry, removes interface function.Cleanup.Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
Minimum Agent Versions
Are there minimum versions of the Datadog Agent and/or Cluster Agent required?
Describe your test plan
Write there any instructions and details you may have to test your PR.
Checklist
bug,enhancement,refactoring,documentation,tooling, and/ordependenciesqa/skip-qalabel