Skip to content

Ssa#302

Draft
fwiesel wants to merge 14 commits intomainfrom
ssa
Draft

Ssa#302
fwiesel wants to merge 14 commits intomainfrom
ssa

Conversation

@fwiesel
Copy link
Copy Markdown
Contributor

@fwiesel fwiesel commented May 6, 2026

Summary by CodeRabbit

  • Refactor
    • Modernized internal status update mechanisms across multiple controllers to use Kubernetes apply configurations for improved reliability and clearer field ownership semantics.
    • Removed legacy patch-retry status update utilities in favor of server-side apply approaches.
    • Enhanced status condition handling with new utilities for applying configuration-based conditions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79755840-e9e8-45f7-8ae3-bc0bc679bdb3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR systematically migrates status update patterns across eleven controller files from retry-based HTTP PATCH operations to Kubernetes server-side apply (SSA) using ApplyConfigurations. A new utility module provides helpers for working with ApplyConfiguration-style conditions, and the old patch-with-retry infrastructure is removed.

Changes

Unified Status Apply Migration

Layer / File(s) Summary
Utility Foundation
internal/utils/conditions.go
New helpers ConditionsFromStatus and SetApplyConfigurationStatusCondition convert metav1.Conditions to ApplyConfiguration format and manage condition state (LastTransitionTime, status changes) for SSA workflows.
Removed Legacy Infrastructure
internal/utils/status_patch.go
Removed PatchHypervisorStatusWithRetry and StatusPatchBackoff, as controllers now use native Kubernetes ApplyConfiguration apply() instead of retry-based patching.
Controller Status Refactoring
internal/controller/aggregates_controller.go, hypervisor_controller.go, hypervisor_taint_controller.go, ready/controller.go
Build status via apiv1.HypervisorStatus(), populate conditions using ConditionsFromStatus and SetApplyConfigurationStatusCondition, apply with Status().Apply(ctx, ..., FieldOwner, ForceOwnership); replaces DeepCopy + equality checks + PatchHypervisorStatusWithRetry.
Eviction & Maintenance Status
internal/controller/eviction_controller.go, hypervisor_maintenance_controller.go
Introduce evictionStatusCfg/statusCfg helpers to pre-populate status objects; refactor reconcile paths (preflight, migration, finish) to apply status via SSA instead of in-memory mutations + updateStatus patches; new reconcileComputeService and reconcileEviction methods delegate to centralized status config.
Instance HA & Traits Updates
internal/controller/hypervisor_instance_ha_controller.go, traits_controller.go
Replace in-memory condition mutations and patch retry with applyconfig-based status construction; compute desired condition state upfront (HA enable/disable, trait changes) and apply once via Status().Apply with proper ownership.
Onboarding & Certificate Refactoring
internal/controller/onboarding_controller.go, node_certificate_controller.go
Onboarding: refactor lifecycle functions (initialOnboarding, smokeTest, completeOnboarding, abortOnboarding) to accept statusCfg and apply callback; introduce lookupNovaProperties helper. Certificate: switch from CreateOrUpdate retry loop to single applyconfig-based Certificate construction with ForceOwnership.
Remaining Controllers
internal/controller/offboarding_controller.go
Replace direct patch operations with applyStatus helper that leverages server-side apply for condition updates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • cobaltcore-dev/openstack-hypervisor-operator#286: Both PRs modify the Ready controller's status logic—retrieved PR adds the ReadyController while this PR refactors its status mechanism to use server-side apply.
  • cobaltcore-dev/openstack-hypervisor-operator#293: Both PRs touch the same controller status-update patterns across aggregates, offboarding, traits, and ready controllers but with opposing approaches—this PR replaces retry-patch with SSA while the retrieved PR uses PatchHypervisorStatusWithRetry.
  • cobaltcore-dev/openstack-hypervisor-operator#300: This PR refactors controllers to build ConditionApplyConfigurations for SSA, while the retrieved PR updates CRD annotations to enable server-side condition merging—directly complementary changes.

Suggested reviewers

  • notandy
  • mchristianl

Poem

🐰 Whiskers twitch with glee—
No more patches, retries flee!
Server-side apply now reigns supreme,
Conditions merge like a controller's dream.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Ssa' is vague and provides no meaningful information about the changeset beyond a cryptic abbreviation. Use a descriptive title that summarizes the main change, such as 'Refactor status updates to use server-side apply (SSA)' or similar.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ssa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

fwiesel added 14 commits May 6, 2026 13:35
Replace MergeFromWithOptimisticLock + retry with Status().Apply() for
all eviction status updates. Introduces ConditionsFromStatus and
SetApplyConfigurationStatusCondition helpers to utils — mirroring
meta.SetStatusCondition semantics — as the foundation for all SSA
condition management. The evictionStatusCfg helper seeds every apply
with the full set of owned fields so SSA never prunes values between
reconcile cycles.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for the
InternalIP scalar and ConditionTypeTerminating condition. Re-fetch the
hypervisor after the status apply so the subsequent spec patch sees a
fresh resourceVersion.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for the
ConditionTypeTainted condition.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for the
ConditionTypeHaEnabled condition.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for
ConditionTypeHypervisorDisabled, ConditionTypeEvicting, and the Evicted
scalar. ConditionTypeEvicting is removed by omitting it from the apply
config — SSA prunes map-list entries no longer claimed by the sole owner.
Replace PatchHypervisorStatusWithRetry with Status().Apply(). The Nova
ID lookup is inlined into Reconcile and combined with the initial
condition in a single apply, avoiding two applies from the same field
manager in one reconcile cycle which would cause SSA to prune the IDs.
HypervisorID and ServiceID are included in every subsequent apply to
retain ownership.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for the
ConditionTypeOffboarded condition.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for
ConditionTypeAggregatesUpdated and the Aggregates slice. The error path
includes the current Aggregates in the apply to prevent SSA from pruning
previously-set values on a transient OpenStack error. Clearing aggregates
to an empty slice uses a targeted merge patch to work around the omitempty
limitation in the generated apply configuration type.
Replace PatchHypervisorStatusWithRetry with Status().Apply() for
ConditionTypeTraitsUpdated and the Traits slice. Error paths pass the
current hv.Status.Traits to retain ownership and avoid SSA releasing
it on transient placement API failures.
…rStatusWithRetry

Replace the last caller of PatchHypervisorStatusWithRetry with
Status().Apply() for ConditionTypeReady. With no remaining callers,
PatchHypervisorStatusWithRetry and StatusPatchBackoff are removed.
Replace Create + SetControllerReference with Apply for the Eviction CR.
The owner reference and labels are set in the apply configuration
metadata, and SSA handles the upsert. A Get is still performed after the
apply to read the current eviction status conditions.
Replace CreateOrUpdate + SetOwnerReference with Apply for the cert-manager
Certificate CR. The owner reference is set in the apply configuration
metadata. The retry-on-conflict loop is no longer needed since SSA handles
concurrent updates correctly. The controller now applies the full
certificate spec on every reconcile so IP/DNS changes are picked up
immediately.
…ctions

With SSA the status config is built once in Reconcile and passed to
reconcileComputeService and reconcileEviction which mutate it directly.
This eliminates the intermediate condition return values and makes the
single-apply-per-reconcile explicit. The early-return guard for
already-enabled/disabled state is inlined into each branch.
With SSA the status config is built once at the top of Reconcile and
threaded through to all sub-functions (abortOnboarding, initialOnboarding,
smokeTest, completeOnboarding) which mutate it directly. An apply closure
is also passed so each function calls apply() when it has determined the
desired condition, rather than building and applying the config itself.
This removes the applyOnboardingCondition helper and makes the flow fully
declarative: compute desired state, then apply once.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
internal/utils/conditions.go (1)

108-116: 💤 Low value

strChanged doesn't need a **string parameter.

The function only reads through a (no *a = … write), so *string would suffice and the call sites can drop the &:

♻️ Proposed simplification
-// strChanged reports whether the pointer value of b differs from the current
-// value at a.
-func strChanged(a **string, b *string) bool {
-	if *a == nil && b == nil {
+// strChanged reports whether b differs from a.
+func strChanged(a, b *string) bool {
+	if a == nil && b == nil {
 		return false
 	}
-	if *a == nil || b == nil {
+	if a == nil || b == nil {
 		return true
 	}
-	return **a != *b
+	return *a != *b
 }

And at call sites (Lines 94, 98):

-	if strChanged(&existing.Reason, newCondition.Reason) {
+	if strChanged(existing.Reason, newCondition.Reason) {
 		existing.Reason = newCondition.Reason
 		changed = true
 	}
-	if strChanged(&existing.Message, newCondition.Message) {
+	if strChanged(existing.Message, newCondition.Message) {
 		existing.Message = newCondition.Message
 		changed = true
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/utils/conditions.go` around lines 108 - 116, The strChanged function
unnecessarily takes a **string (pointer to pointer) even though it only reads
the value; change its signature to accept *string instead and update its logic
to check a == nil, b == nil, and compare *a vs *b accordingly; then update all
call sites that currently pass an address-of (&someString) to pass the plain
*string value (i.e., drop the &) so callers match the new signature (adjust any
callers named in the diff to stop taking the address).
internal/controller/onboarding_controller.go (2)

166-183: 💤 Low value

Status=True with Reason=Aborted is semantically unusual — confirm intent.

In the deleteTestServers failure branch the condition is set to Status=ConditionTrue with Reason=ConditionReasonAborted and the error message. The accompanying comment ("No cleanup, so we are still 'onboarding'") makes the intent clear, but downstream consumers — both within this PR (e.g. aggregates_controller.determineDesiredState keying on ConditionReasonInitial/Testing/Handover) and any external dashboards/queries — typically treat Aborted as a terminal reason. Mixing it with Status=True may produce surprising behavior for any code path that filters on either dimension alone.

If "we tried to abort but cleanup failed" is meant to be a distinct state, a dedicated reason (e.g. ConditionReasonCleanupFailed or keeping the previous Reason while overlaying the message) would model this more cleanly. The same pattern is repeated in completeOnboarding (lines 327-334).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/onboarding_controller.go` around lines 166 - 183, The
branch that handles deleteTestServers failures sets the onboarding condition to
Status=True with Reason=ConditionReasonAborted which is semantically confusing;
update the failure handling in the deleteTestServers branch (and the analogous
block in completeOnboarding) to use a distinct, non-terminal reason (e.g.
ConditionReasonCleanupFailed) or preserve the prior non-terminal reason and
append the error text, and ensure the condition status reflects the intended
"still onboarding" state via statusCfg.Conditions and the
k8sacmetav1.Condition() builder rather than using ConditionReasonAborted.

91-102: 💤 Low value

Always-include for HypervisorID/ServiceID is correct — but be aware of the empty-string force-write.

Setting WithHypervisorID(hv.Status.HypervisorID).WithServiceID(hv.Status.ServiceID) upfront ensures this controller stays the SSA owner of these fields across reconciles. One subtle implication with ForceOwnership: when both are empty in the in-memory copy, the apply still asserts them as "". That's only safe because the only path that calls apply() while they are empty is abortOnboarding, which is gated by if status == nil { return nil } (so we never abort a never-onboarded hypervisor) and by the prior lookupNovaProperties block updating them on first onboarding. A short comment near the constructor noting this invariant would prevent a future edit from accidentally calling apply() before either Lookup runs.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/onboarding_controller.go` around lines 91 - 102, Add a
short comment above the statusCfg/applier construction (the block that calls
apiv1.HypervisorStatus().WithHypervisorID(...).WithServiceID(...) and defines
apply) documenting the invariant: the SSA owner intentionally force-writes
HypervisorID and ServiceID (possibly as empty strings) and apply() must not be
invoked before lookupNovaProperties fills them — the only safe call path when
they are empty is abortOnboarding which is already guarded by if status == nil {
return nil } and the prior lookupNovaProperties. Mention the specific symbols:
statusCfg, WithHypervisorID, WithServiceID, apply, abortOnboarding, and
lookupNovaProperties so future editors won’t call apply() earlier.
internal/controller/hypervisor_maintenance_controller.go (1)

70-84: ⚖️ Poor tradeoff

SSA pattern: copying all conditions + ForceOwnership makes per-condition ownership tracking essentially meaningless.

utils.ConditionsFromStatus copies every condition currently in hv.Status.Conditions into the apply config, and the final Status().Apply(...) uses k8sclient.ForceOwnership. Each reconcile of this controller therefore force-claims ownership of every condition on the resource — including ones written by aggregates, onboarding, traits, etc. The values still converge as long as each controller only mutates its own conditions via SetApplyConfigurationStatusCondition, but managedFields will thrash between controllers and you lose the main benefit of SSA's per-field tracking.

The idiomatic SSA approach here is to include in each apply only the conditions this controller actually owns (HypervisorDisabled, Evicting) plus the scalar fields it owns (Evicted), and let +listType=map/+listMapKey=type on the conditions slice handle merging across managers. That can be deferred, but it is worth confirming the intent before this lands across all controllers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/hypervisor_maintenance_controller.go` around lines 70 -
84, The current code copies all conditions via utils.ConditionsFromStatus and
calls Status().Apply with k8sclient.ForceOwnership, which force-claims every
condition and breaks per-condition SSA ownership; change the apply to only set
the scalar Evicted via apiv1.Hypervisor(...).WithStatus(...).WithEvicted(...)
(or apiv1.HypervisorStatus().WithEvicted) and construct the Status apply config
to include only the condition entries this controller owns (e.g.
HypervisorDisabled and Evicting) instead of utils.ConditionsFromStatus; remove
ForceOwnership (or avoid forcing ownership of conditions) and use
SetApplyConfigurationStatusCondition/explicit apiv1 condition builders when
updating conditions in reconcileComputeService/reconcileEviction so only owned
conditions are applied and SSA per-condition merging works.
internal/controller/aggregates_controller.go (1)

153-169: 💤 Low value

Redundant Get before raw merge patch; consider cleaner alternatives.

A few small things in the empty-aggregates clear path:

  • ac.Get(ctx, ..., fresh) is unnecessary — Status().Patch only needs metav1.ObjectMeta{Name: hv.Name} to identify the resource for a RawPatch. The extra round trip can be dropped.
  • The two-step "SSA apply, then merge-patch to clear" is fragile: if the apply succeeds and the patch fails, the controller has reported AggregatesUpdated=True while aggregates are still populated. On the next reconcile this branch should self-heal, but it's worth confirming.
  • An alternative to the merge-patch workaround is to drop omitempty on HypervisorStatus.Aggregates (or set the apply-config field to a non-nil empty slice via a small wrapper) so the SSA pathway alone is sufficient. That avoids a second writer for this field entirely.
♻️ Minimal cleanup of the Get
-		fresh := &kvmv1.Hypervisor{}
-		if err := ac.Get(ctx, k8sclient.ObjectKey{Name: hv.Name}, fresh); err != nil {
-			return ctrl.Result{}, err
-		}
-		return ctrl.Result{}, ac.Status().Patch(ctx, fresh, k8sclient.RawPatch(types.MergePatchType, patch))
+		stub := &kvmv1.Hypervisor{ObjectMeta: metav1.ObjectMeta{Name: hv.Name}}
+		return ctrl.Result{}, ac.Status().Patch(ctx, stub, k8sclient.RawPatch(types.MergePatchType, patch))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/aggregates_controller.go` around lines 153 - 169, The
current clear-path does an unnecessary ac.Get and then a RawPatch; remove the
round-trip by passing a minimal object with only metadata to Status().Patch
(e.g., a Hypervisor object whose ObjectMeta has Name: hv.Name) instead of
fetching fresh, and keep the existing merge patch payload; alternatively
consider fixing the root cause by ensuring HypervisorStatus.Aggregates is
serialized as a non-nil empty slice in the SSA apply (remove `omitempty` on
HypervisorStatus.Aggregates or wrap the apply to set an explicit empty slice) so
the second writer/patch is unnecessary — reference symbols: aggregatesChanged,
newAggregates, ac.Status().Patch, Hypervisor, HypervisorStatus.Aggregates, SSA.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/aggregates_controller.go`:
- Around line 126-151: The success apply path omits Aggregates when
aggregatesChanged==false causing ownership loss; always set statusCfg.Aggregates
to the effective list by building aggregate apply configs from newAggregates
when aggregatesChanged && len(newAggregates)>0, otherwise from
hv.Status.Aggregates, then assign that slice to statusCfg.Aggregates before
calling ac.Status().Apply; extract the duplicated construction logic into a
helper (e.g., buildAggregateApplyConfigs that returns
[]apiv1.AggregateApplyConfiguration given the input aggregate list and uses
apiv1.Aggregate().WithName(...).WithUUID(...).WithMetadata(...) as needed) and
replace both places that currently duplicate the loop with calls to that helper;
finally remove the redundant Get() in the merge-patch flow (the already-fetched
hv should be used instead of re-calling Get).

In `@internal/controller/eviction_controller.go`:
- Around line 254-265: The OutstandingInstances field on the apply config can be
omitted when it becomes an empty slice, so modify the eviction status update
logic around evictionStatusCfg(eviction) /
statusCfg.WithHypervisorServiceId(hypervisor.ID) /
statusCfg.WithOutstandingRamMb(...) to detect when OutstandingInstances should
transition to an empty slice (i.e., hypervisor.Servers is nil or len == 0 and
previously there was a non-empty value) and, in that case, send a targeted PATCH
using MergePatchType that explicitly clears that field on the server rather than
relying on the generated apply config (mirror the MergePatchType workaround used
in aggregates_controller.go for OutstandingInstances). Ensure other paths still
use the normal apply/update path when OutstandingInstances is non-empty.

In `@internal/controller/hypervisor_controller.go`:
- Around line 107-119: The current code always calls
apiv1.HypervisorStatus().WithInternalIP(newInternalIP) even when newInternalIP
is empty, which can overwrite a valid stored IP due to ForceOwnership; change it
to only apply WithInternalIP when newInternalIP != "" (i.e., after the loop that
extracts NodeInternalIP), otherwise do not mutate the InternalIP field on
statusCfg so existing IP is preserved (use the existing apiv1.HypervisorStatus()
as-is when newInternalIP == ""). Reference symbols: newInternalIP,
node.Status.Addresses loop, statusCfg :=
apiv1.HypervisorStatus().WithInternalIP(newInternalIP), and the ForceOwnership
apply behavior.

In `@internal/controller/offboarding_controller.go`:
- Around line 151-158: applyStatus currently copies hv.Status.Conditions and
calls Apply with k8sclient.ForceOwnership, which steals ownership of unrelated
condition entries; instead construct the status apply payload to include only
the condition(s) this reconciler owns (the provided cond/Offboarded condition)
and stop forcing ownership of the whole conditions list. Concretely, in
HypervisorOffboardingReconciler.applyStatus replace
utils.ConditionsFromStatus/SetApplyConfigurationStatusCondition usage and
building from hv.Status.Conditions with a statusCfg whose Conditions contains
only the incoming cond (the Offboarded condition this controller owns) and call
r.Status().Apply with k8sclient.FieldOwner(OffboardingControllerName) but remove
k8sclient.ForceOwnership so SSA preserves other controllers' condition entries.
Ensure references are to applyStatus, OffboardingControllerName,
apiv1.HypervisorStatus, and the cond parameter.

In `@internal/controller/traits_controller.go`:
- Around line 143-158: The applyTraitsStatus function conflates nil and empty
trait slices and therefore may not express intent to clear status.traits; remove
the conditional guard and call statusCfg.WithTraits(traits...) unconditionally
inside applyTraitsStatus (i.e., always invoke WithTraits in the
TraitsController.applyTraitsStatus before applying status) so the controller
explicitly sets the desired trait set (including empty) and lets SSA +
k8sclient.ForceOwnership clear stale entries; if some call sites should not
manage traits, instead split into two helpers (one that only sets the condition
and one that sets condition+traits) and call the appropriate helper.

In `@internal/utils/conditions.go`:
- Around line 29-42: ConditionsFromStatus currently omits
metav1.Condition.ObservedGeneration and thus can strip that field during apply;
update ConditionsFromStatus to set ObservedGeneration using a fresh pointer
(e.g., ptr.To(c.ObservedGeneration) or equivalent) alongside the other fields to
avoid aliasing, and modify SetApplyConfigurationStatusCondition to compare and
propagate ObservedGeneration when deciding to replace/merge conditions; also
update the doc comment to state all metav1.Condition fields (including
ObservedGeneration) are preserved verbatim.

---

Nitpick comments:
In `@internal/controller/aggregates_controller.go`:
- Around line 153-169: The current clear-path does an unnecessary ac.Get and
then a RawPatch; remove the round-trip by passing a minimal object with only
metadata to Status().Patch (e.g., a Hypervisor object whose ObjectMeta has Name:
hv.Name) instead of fetching fresh, and keep the existing merge patch payload;
alternatively consider fixing the root cause by ensuring
HypervisorStatus.Aggregates is serialized as a non-nil empty slice in the SSA
apply (remove `omitempty` on HypervisorStatus.Aggregates or wrap the apply to
set an explicit empty slice) so the second writer/patch is unnecessary —
reference symbols: aggregatesChanged, newAggregates, ac.Status().Patch,
Hypervisor, HypervisorStatus.Aggregates, SSA.

In `@internal/controller/hypervisor_maintenance_controller.go`:
- Around line 70-84: The current code copies all conditions via
utils.ConditionsFromStatus and calls Status().Apply with
k8sclient.ForceOwnership, which force-claims every condition and breaks
per-condition SSA ownership; change the apply to only set the scalar Evicted via
apiv1.Hypervisor(...).WithStatus(...).WithEvicted(...) (or
apiv1.HypervisorStatus().WithEvicted) and construct the Status apply config to
include only the condition entries this controller owns (e.g. HypervisorDisabled
and Evicting) instead of utils.ConditionsFromStatus; remove ForceOwnership (or
avoid forcing ownership of conditions) and use
SetApplyConfigurationStatusCondition/explicit apiv1 condition builders when
updating conditions in reconcileComputeService/reconcileEviction so only owned
conditions are applied and SSA per-condition merging works.

In `@internal/controller/onboarding_controller.go`:
- Around line 166-183: The branch that handles deleteTestServers failures sets
the onboarding condition to Status=True with Reason=ConditionReasonAborted which
is semantically confusing; update the failure handling in the deleteTestServers
branch (and the analogous block in completeOnboarding) to use a distinct,
non-terminal reason (e.g. ConditionReasonCleanupFailed) or preserve the prior
non-terminal reason and append the error text, and ensure the condition status
reflects the intended "still onboarding" state via statusCfg.Conditions and the
k8sacmetav1.Condition() builder rather than using ConditionReasonAborted.
- Around line 91-102: Add a short comment above the statusCfg/applier
construction (the block that calls
apiv1.HypervisorStatus().WithHypervisorID(...).WithServiceID(...) and defines
apply) documenting the invariant: the SSA owner intentionally force-writes
HypervisorID and ServiceID (possibly as empty strings) and apply() must not be
invoked before lookupNovaProperties fills them — the only safe call path when
they are empty is abortOnboarding which is already guarded by if status == nil {
return nil } and the prior lookupNovaProperties. Mention the specific symbols:
statusCfg, WithHypervisorID, WithServiceID, apply, abortOnboarding, and
lookupNovaProperties so future editors won’t call apply() earlier.

In `@internal/utils/conditions.go`:
- Around line 108-116: The strChanged function unnecessarily takes a **string
(pointer to pointer) even though it only reads the value; change its signature
to accept *string instead and update its logic to check a == nil, b == nil, and
compare *a vs *b accordingly; then update all call sites that currently pass an
address-of (&someString) to pass the plain *string value (i.e., drop the &) so
callers match the new signature (adjust any callers named in the diff to stop
taking the address).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b53adc2b-6a4b-4ae2-b5b2-c5b89ee46516

📥 Commits

Reviewing files that changed from the base of the PR and between 43b6568 and 4b98907.

📒 Files selected for processing (13)
  • internal/controller/aggregates_controller.go
  • internal/controller/eviction_controller.go
  • internal/controller/hypervisor_controller.go
  • internal/controller/hypervisor_instance_ha_controller.go
  • internal/controller/hypervisor_maintenance_controller.go
  • internal/controller/hypervisor_taint_controller.go
  • internal/controller/node_certificate_controller.go
  • internal/controller/offboarding_controller.go
  • internal/controller/onboarding_controller.go
  • internal/controller/ready/controller.go
  • internal/controller/traits_controller.go
  • internal/utils/conditions.go
  • internal/utils/status_patch.go
💤 Files with no reviewable changes (1)
  • internal/utils/status_patch.go

Comment on lines +126 to +151
statusCfg := apiv1.HypervisorStatus()
statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions,
*k8sacmetav1.Condition().
WithType(desiredCondition.Type).
WithStatus(desiredCondition.Status).
WithReason(desiredCondition.Reason).
WithMessage(desiredCondition.Message))

if aggregatesChanged && len(newAggregates) > 0 {
aggCfgs := make([]apiv1.AggregateApplyConfiguration, len(newAggregates))
for i, agg := range newAggregates {
a := apiv1.Aggregate().WithName(agg.Name).WithUUID(agg.UUID)
if len(agg.Metadata) > 0 {
a.WithMetadata(agg.Metadata)
}
aggCfgs[i] = *a
}
statusCfg.Aggregates = aggCfgs
}

if err := ac.Status().Apply(ctx,
apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
k8sclient.ForceOwnership, k8sclient.FieldOwner(AggregatesControllerName)); err != nil {
return ctrl.Result{}, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "aggregates_controller.go" -type f 2>/dev/null

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 70


🏁 Script executed:

# Search for the aggregates_controller.go file
git ls-files | grep -i aggregates_controller

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 183


🏁 Script executed:

# Let's get the file path and read the relevant sections
git ls-files | grep controller | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1117


🏁 Script executed:

wc -l internal/controller/aggregates_controller.go

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 137


🏁 Script executed:

cat -n internal/controller/aggregates_controller.go

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 12968


Aggregates may be inconsistently omitted from apply when condition alone changes.

When aggregatesChanged == false but conditionUnchanged == false (e.g., the handover phase transitions from Reason=TestAggregates to Reason=WaitingForTraits while aggregate names [zone, testAggregate] remain the same), the success path at lines 126–151 builds and applies a statusCfg with only conditions. The aggregates field is not set because the condition at line 135 (if aggregatesChanged && len(newAggregates) > 0) evaluates to false.

This creates an inconsistency with the error path (lines 92–102), which explicitly preserves hv.Status.Aggregates in the apply even when aggregates haven't changed. Under SSA + ForceOwnership where this controller is the sole owner of aggregate entries, the success path's omission could cause issues. For consistency and robustness, always populate statusCfg.Aggregates from the effective aggregate list (new ones if changed, otherwise current hv.Status.Aggregates) so ownership is maintained throughout reconciliation.

Additionally, the aggregate-building logic at lines 93–100 and 136–143 is duplicated; extracting it as a helper would improve maintainability. The Get() call at line 165 is redundant in the merge-patch flow (line 168 uses the already-fetched hv object).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/aggregates_controller.go` around lines 126 - 151, The
success apply path omits Aggregates when aggregatesChanged==false causing
ownership loss; always set statusCfg.Aggregates to the effective list by
building aggregate apply configs from newAggregates when aggregatesChanged &&
len(newAggregates)>0, otherwise from hv.Status.Aggregates, then assign that
slice to statusCfg.Aggregates before calling ac.Status().Apply; extract the
duplicated construction logic into a helper (e.g., buildAggregateApplyConfigs
that returns []apiv1.AggregateApplyConfiguration given the input aggregate list
and uses apiv1.Aggregate().WithName(...).WithUUID(...).WithMetadata(...) as
needed) and replace both places that currently duplicate the loop with calls to
that helper; finally remove the redundant Get() in the merge-patch flow (the
already-fetched hv should be used instead of re-calling Get).

Comment on lines +254 to +265
var instances []string
if hypervisor.Servers != nil {
uuids := make([]string, len(*hypervisor.Servers))
instances = make([]string, len(*hypervisor.Servers))
for i, server := range *hypervisor.Servers {
uuids[i] = server.UUID
instances[i] = server.UUID
}
eviction.Status.OutstandingInstances = uuids
}

// Update status
eviction.Status.HypervisorServiceId = hypervisor.ID
eviction.Status.OutstandingRamMb = int64(hypervisor.MemoryMBUsed)
meta.SetStatusCondition(&eviction.Status.Conditions, metav1.Condition{
Type: kvmv1.ConditionTypePreflight,
Status: metav1.ConditionTrue,
Message: "Preflight checks passed, hypervisor is disabled and ready for eviction",
Reason: kvmv1.ConditionReasonSucceeded,
})
return ctrl.Result{}, r.updateStatus(ctx, eviction)
statusCfg := evictionStatusCfg(eviction)
statusCfg.WithHypervisorServiceId(hypervisor.ID)
statusCfg.WithOutstandingRamMb(int64(hypervisor.MemoryMBUsed))
statusCfg.OutstandingInstances = instances
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm omitempty on the generated apply-configuration field
fd -e go -p 'applyconfigurations/api/v1' --exec rg -nC2 'OutstandingInstances'

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1076


🏁 Script executed:

#!/bin/bash
# Find and examine the aggregates_controller.go to see the merge-patch workaround
fd -type f -name 'aggregates_controller.go' -exec cat {} \;

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 259


🏁 Script executed:

#!/bin/bash
# Search for how OutstandingInstances is used after being set
rg -n 'OutstandingInstances' --type go -B3 -A3

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 9119


🏁 Script executed:

#!/bin/bash
# Check how evictionStatusCfg is used in the controller
rg -n 'statusCfg|evictionStatusCfg' internal/controller/eviction_controller.go -B2 -A5 | head -100

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 4842


🏁 Script executed:

#!/bin/bash
# Find aggregates_controller.go
fd -name 'aggregates_controller.go'

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 319


🏁 Script executed:

#!/bin/bash
# Search for merge-patch or omitempty references in the codebase
rg -n 'merge.patch|omitempty' --type-list | head -5
rg -n 'merge.patch|omitempty' -i

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 17767


🏁 Script executed:

#!/bin/bash
# Look at the aggregates_controller.go to see the merge-patch workaround
cat -n internal/controller/aggregates_controller.go | sed -n '150,170p'

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 902


🏁 Script executed:

#!/bin/bash
# Get more context around the merge patch workaround
cat -n internal/controller/aggregates_controller.go | sed -n '145,180p'

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1752


The omitempty hazard on OutstandingInstances is real and reachable.

When the last instance is popped in the eviction loop (line 369), statusCfg.OutstandingInstances becomes an empty slice []string{}. With the omitempty tag on this field in the generated apply config, SSA will not include an empty slice in its JSON patch, preventing it from clearing any previously-applied value that exists on the server.

The same issue and workaround pattern are documented in aggregates_controller.go (lines 153-169). Apply a targeted merge patch with MergePatchType when OutstandingInstances transitions to empty to ensure the field is properly cleared.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/eviction_controller.go` around lines 254 - 265, The
OutstandingInstances field on the apply config can be omitted when it becomes an
empty slice, so modify the eviction status update logic around
evictionStatusCfg(eviction) / statusCfg.WithHypervisorServiceId(hypervisor.ID) /
statusCfg.WithOutstandingRamMb(...) to detect when OutstandingInstances should
transition to an empty slice (i.e., hypervisor.Servers is nil or len == 0 and
previously there was a non-empty value) and, in that case, send a targeted PATCH
using MergePatchType that explicitly clears that field on the server rather than
relying on the generated apply config (mirror the MergePatchType workaround used
in aggregates_controller.go for OutstandingInstances). Ensure other paths still
use the normal apply/update path when OutstandingInstances is non-empty.

Comment on lines +107 to +119
var newInternalIP string
for _, address := range node.Status.Addresses {
if address.Type == corev1.NodeInternalIP && hypervisor.Status.InternalIP != address.Address {
hypervisor.Status.InternalIP = address.Address
if address.Type == corev1.NodeInternalIP {
newInternalIP = address.Address
break
}
}

// update terminating condition
nodeTerminationCondition := FindNodeStatusCondition(node.Status.Conditions, "Terminating")

// Capture values to apply - only mutate fields this controller owns
statusCfg := apiv1.HypervisorStatus().WithInternalIP(newInternalIP)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't clobber a valid InternalIP with an empty string.

If node.Status.Addresses does not contain a NodeInternalIP (transient state during node bootstrap, kubelet restart, etc.), newInternalIP stays "". WithInternalIP("") then becomes part of the apply payload, and combined with ForceOwnership on Line 133 it will overwrite a previously stored valid IP with the empty string and take ownership of the field.

🛡️ Proposed guard
-		// Capture values to apply - only mutate fields this controller owns
-		statusCfg := apiv1.HypervisorStatus().WithInternalIP(newInternalIP)
+		// Capture values to apply - only mutate fields this controller owns
+		statusCfg := apiv1.HypervisorStatus()
+		if newInternalIP != "" {
+			statusCfg.WithInternalIP(newInternalIP)
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/hypervisor_controller.go` around lines 107 - 119, The
current code always calls apiv1.HypervisorStatus().WithInternalIP(newInternalIP)
even when newInternalIP is empty, which can overwrite a valid stored IP due to
ForceOwnership; change it to only apply WithInternalIP when newInternalIP != ""
(i.e., after the loop that extracts NodeInternalIP), otherwise do not mutate the
InternalIP field on statusCfg so existing IP is preserved (use the existing
apiv1.HypervisorStatus() as-is when newInternalIP == ""). Reference symbols:
newInternalIP, node.Status.Addresses loop, statusCfg :=
apiv1.HypervisorStatus().WithInternalIP(newInternalIP), and the ForceOwnership
apply behavior.

Comment on lines +151 to +158
func (r *HypervisorOffboardingReconciler) applyStatus(ctx context.Context, hv *kvmv1.Hypervisor, cond *k8sacmetav1.ConditionApplyConfiguration) error {
statusCfg := apiv1.HypervisorStatus()
statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions, *cond)
return r.Status().Apply(ctx,
apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
k8sclient.ForceOwnership, k8sclient.FieldOwner(OffboardingControllerName))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

Kubernetes server-side apply listType=map conditions per-entry ownership ForceOwnership best practice

💡 Result:

Best practice: for Kubernetes server-side apply (SSA), ensure your “conditions” list is modeled as a listType=map with a stable per-entry key (typically +listMapKey=type / x-kubernetes-list-map-keys=["type"]). This makes SSA treat each condition entry as independently addressable so different managers can own different entries without conflicts. [1][2][3] About “ForceOwnership” / forcing ownership: - SSA’s mechanism is “force conflicts” (kubectl --force-conflicts or force=true on the apply request). When you force, the operation succeeds, updates the field value, and removes the field from all other managers’ managedFields entries. [1][4] - Therefore, “force ownership” is not a general best practice; it should be used cautiously, typically only as a last resort when you intentionally want to override another actor’s desired state (e.g., an operator/automation takeover scenario). [4] Concrete guidance for “server-side apply listType=map conditions per-entry ownership” 1) Use associative list semantics for conditions - In CRD/OpenAPI or Go types, declare the conditions list as listType=map and listMapKey as the field(s) that uniquely identify each condition entry (commonly “type”). [1][3] - Kubernetes documents that with x-kubernetes-list-type=map, “different managers can manage entries separately”, whereas atomic lists are owned as a whole. [1] 2) Ensure the key is truly unique and stable - listMapKey applies only when listType=map and the key fields must be scalars; the key uniquely identifies entries. [1] 3) Don’t rely on forcing ownership to avoid modeling conflicts - If you leave conditions as the default/atomic list behavior, then a single manager can end up owning the entire list (and concurrent updates from different managers can conflict or overwrite). Proper listType=map + listMapKey avoids this by enabling per-entry ownership. [1][3] 4) When forcing is needed, force the smallest scope possible - Forcing conflicts causes the applying manager to become sole owner for the forced fields by removing ownership from other managers. That’s why Kubernetes guidance frames it as a case where the applier/controller intends to override conflicting desired state, not as a routine technique. [4][1] Answer to your “ForceOwnership best practice” phrasing - Best practice is: avoid “ForceOwnership” except for deliberate override scenarios; instead, structure the API schema (conditions listType=map with listMapKey=type) so SSA can naturally assign per-entry ownership without needing to force. [1][4][3]

Citations:


🏁 Script executed:

# Find the CRD or API schema definition to confirm conditions listType
fd -t f -e yaml -e yml | head -20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 1405


🏁 Script executed:

# Search for HypervisorStatus definition and conditions field schema
rg "HypervisorStatus|conditions.*listType" -A 5 -B 2 --max-count 20

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 33372


🏁 Script executed:

# Examine the utility functions mentioned
fd "utils" -t f | head -10

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 180


🏁 Script executed:

# Check ConditionsFromStatus and SetApplyConfigurationStatusCondition implementations
rg "ConditionsFromStatus|SetApplyConfigurationStatusCondition" -B 3 -A 10

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 43195


🏁 Script executed:

# Verify ForceOwnership is truly applied with all conditions copied in other controllers
rg "ForceOwnership" -B 10 -A 2 | head -100

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 7739


🏁 Script executed:

# Check the conditions field definition in the CRD YAML
cat charts/openstack-hypervisor-operator/crds/kvm.cloud.sap_hypervisors.yaml | grep -A 50 "conditions:" | head -60

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 3008


🏁 Script executed:

# Look for WithConditions method in the HypervisorStatusApplyConfiguration
rg "WithConditions" applyconfigurations/api/v1/hypervisorstatus.go

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 376


🏁 Script executed:

# Check the full HypervisorStatusApplyConfiguration struct definition
cat applyconfigurations/api/v1/hypervisorstatus.go | head -100

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 6026


🏁 Script executed:

# Check if the CRD has x-kubernetes-list-type or x-kubernetes-list-map-keys for conditions
cat charts/openstack-hypervisor-operator/crds/kvm.cloud.sap_hypervisors.yaml | grep -B 5 -A 80 "conditions:" | grep -E "(x-kubernetes|type:|mapKeys)"

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 632


🏁 Script executed:

# Get the full conditions definition from CRD
cat charts/openstack-hypervisor-operator/crds/kvm.cloud.sap_hypervisors.yaml | sed -n '/conditions:/,/^\s\s[a-z]/p' | head -120

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 5834


🏁 Script executed:

# Verify the proposed fix would work - check if WithConditions accepts the right parameter type
rg "func.*WithConditions" -A 15 applyconfigurations/api/v1/hypervisorstatus.go

Repository: cobaltcore-dev/openstack-hypervisor-operator

Length of output: 817


Avoid taking ownership of conditions you don't manage.

The conditions field in the CRD is properly configured as x-kubernetes-list-type: map with x-kubernetes-list-map-keys=[type], which means Kubernetes SSA treats each condition entry as independently addressable. However, applyStatus copies the entire hv.Status.Conditions slice into the apply payload and then applies it with ForceOwnership. This forcibly transfers ownership of every existing condition (e.g. Onboarding, Ready, Tainted, HaEnabled, Evicting, …) to OffboardingControllerName on every reconcile. The next time another controller reconciles its own condition, it forces ownership back, and so on — producing field-management churn and unnecessary resourceVersion bumps even though the visible status data is unchanged.

The idiomatic SSA pattern is "fully specified intent over the fields I manage" – include only the Offboarded condition you actually own:

♻️ Proposed simplification
 func (r *HypervisorOffboardingReconciler) applyStatus(ctx context.Context, hv *kvmv1.Hypervisor, cond *k8sacmetav1.ConditionApplyConfiguration) error {
-    statusCfg := apiv1.HypervisorStatus()
-    statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
-    utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions, *cond)
+    statusCfg := apiv1.HypervisorStatus().WithConditions(cond)
     return r.Status().Apply(ctx,
         apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
         k8sclient.ForceOwnership, k8sclient.FieldOwner(OffboardingControllerName))
 }

The same anti-pattern (copy all existing conditions + ForceOwnership) is repeated in hypervisor_controller.go, hypervisor_taint_controller.go, hypervisor_instance_ha_controller.go, ready/controller.go, traits_controller.go, hypervisor_maintenance_controller.go, aggregates_controller.go, and onboarding_controller.go. Each controller should apply only the condition type(s) it owns. This will also let you remove utils.ConditionsFromStatus and utils.SetApplyConfigurationStatusCondition entirely, since SSA's merge semantics already preserve unrelated entries.

Per the Kubernetes SSA documentation, conflicts forced by one controller will undo another collaborator's changes; controllers should not compete for fields they don't intend to manage.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/offboarding_controller.go` around lines 151 - 158,
applyStatus currently copies hv.Status.Conditions and calls Apply with
k8sclient.ForceOwnership, which steals ownership of unrelated condition entries;
instead construct the status apply payload to include only the condition(s) this
reconciler owns (the provided cond/Offboarded condition) and stop forcing
ownership of the whole conditions list. Concretely, in
HypervisorOffboardingReconciler.applyStatus replace
utils.ConditionsFromStatus/SetApplyConfigurationStatusCondition usage and
building from hv.Status.Conditions with a statusCfg whose Conditions contains
only the incoming cond (the Offboarded condition this controller owns) and call
r.Status().Apply with k8sclient.FieldOwner(OffboardingControllerName) but remove
k8sclient.ForceOwnership so SSA preserves other controllers' condition entries.
Ensure references are to applyStatus, OffboardingControllerName,
apiv1.HypervisorStatus, and the cond parameter.

Comment on lines +143 to 158
func (tc *TraitsController) applyTraitsStatus(ctx context.Context, hv *kvmv1.Hypervisor, traits []string, cond metav1.Condition) error {
statusCfg := apiv1.HypervisorStatus()
statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions,
*k8sacmetav1.Condition().
WithType(cond.Type).
WithStatus(cond.Status).
WithReason(cond.Reason).
WithMessage(cond.Message))
if traits != nil {
statusCfg.WithTraits(traits...)
}
return tc.Status().Apply(ctx,
apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
k8sclient.ForceOwnership, k8sclient.FieldOwner(TraitsControllerName))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

traits == nil and traits == []string{} are conflated, and neither clears the field.

if traits != nil skips the WithTraits call only when the slice is nil; a non-nil empty slice still calls WithTraits() with zero variadic args, which is also a no-op (the apply config's Traits stays nil). In both cases the controller does not express intent over status.traits in this apply. After a successful sync where targetTraits ends up empty (e.g., all custom traits removed and current.Traits was empty), hv.Status.Traits will retain the previous (now-stale) values until something else nudges the apply payload, since SSA "owner doesn't manage this field" preserves the existing value.

If the goal is "always reflect the current desired set, even if empty", explicitly set traits unconditionally (and rely on SSA + ForceOwnership to clear stale entries when this controller is the sole owner):

🛠️ Proposed change
-	if traits != nil {
-		statusCfg.WithTraits(traits...)
-	}
+	statusCfg.WithTraits(traits...)

If you instead want certain call sites (e.g., the failure paths on Lines 108-109 and 134-135 that pass hv.Status.Traits) to not manage traits, consider splitting them into two helpers — one that sets the condition only, one that sets the condition + traits — to make the intent explicit at the call site.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (tc *TraitsController) applyTraitsStatus(ctx context.Context, hv *kvmv1.Hypervisor, traits []string, cond metav1.Condition) error {
statusCfg := apiv1.HypervisorStatus()
statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions,
*k8sacmetav1.Condition().
WithType(cond.Type).
WithStatus(cond.Status).
WithReason(cond.Reason).
WithMessage(cond.Message))
if traits != nil {
statusCfg.WithTraits(traits...)
}
return tc.Status().Apply(ctx,
apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
k8sclient.ForceOwnership, k8sclient.FieldOwner(TraitsControllerName))
}
func (tc *TraitsController) applyTraitsStatus(ctx context.Context, hv *kvmv1.Hypervisor, traits []string, cond metav1.Condition) error {
statusCfg := apiv1.HypervisorStatus()
statusCfg.Conditions = utils.ConditionsFromStatus(hv.Status.Conditions)
utils.SetApplyConfigurationStatusCondition(&statusCfg.Conditions,
*k8sacmetav1.Condition().
WithType(cond.Type).
WithStatus(cond.Status).
WithReason(cond.Reason).
WithMessage(cond.Message))
statusCfg.WithTraits(traits...)
return tc.Status().Apply(ctx,
apiv1.Hypervisor(hv.Name, "").WithStatus(statusCfg),
k8sclient.ForceOwnership, k8sclient.FieldOwner(TraitsControllerName))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/traits_controller.go` around lines 143 - 158, The
applyTraitsStatus function conflates nil and empty trait slices and therefore
may not express intent to clear status.traits; remove the conditional guard and
call statusCfg.WithTraits(traits...) unconditionally inside applyTraitsStatus
(i.e., always invoke WithTraits in the TraitsController.applyTraitsStatus before
applying status) so the controller explicitly sets the desired trait set
(including empty) and lets SSA + k8sclient.ForceOwnership clear stale entries;
if some call sites should not manage traits, instead split into two helpers (one
that only sets the condition and one that sets condition+traits) and call the
appropriate helper.

Comment on lines +29 to +42
func ConditionsFromStatus(conditions []metav1.Condition) []k8sacmetav1.ConditionApplyConfiguration {
result := make([]k8sacmetav1.ConditionApplyConfiguration, len(conditions))
for i := range conditions {
c := &conditions[i]
result[i] = k8sacmetav1.ConditionApplyConfiguration{
Type: &c.Type,
Status: &c.Status,
Reason: &c.Reason,
Message: &c.Message,
LastTransitionTime: &c.LastTransitionTime,
}
}
return result
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

ObservedGeneration is silently dropped during the conversion.

metav1.Condition has six exported fields, but only five are copied: ObservedGeneration is omitted. This is inconsistent with the doc comment which claims "All fields ... are preserved verbatim", and it has real consequences when combined with the per-controller apply pattern in this PR:

  • The taint controller writes Tainted with WithObservedGeneration(hypervisor.Generation) and gates its skip check on it (see hypervisor_taint_controller.go Lines 67–73).
  • Any other controller that calls ConditionsFromStatus and applies (even just to update its own condition) will not express interest in ObservedGeneration, so ownership stays with the original setter — that part is fine — but if the taint controller itself ever round-trips through this helper for its own condition, ObservedGeneration becomes nil in the apply payload and gets defaulted to 0 by the API server, breaking the skip check on the next reconcile.
🐛 Proposed fix
 		result[i] = k8sacmetav1.ConditionApplyConfiguration{
 			Type:               &c.Type,
 			Status:             &c.Status,
+			ObservedGeneration: ptr.To(c.ObservedGeneration),
 			Reason:             &c.Reason,
 			Message:            &c.Message,
 			LastTransitionTime: &c.LastTransitionTime,
 		}

(Use a fresh pointer here — &c.ObservedGeneration would alias the source slice, just like the other fields already do; consider switching all of them to ptr.To for consistency and to avoid surprising aliasing if the input is later mutated.)

You should also extend SetApplyConfigurationStatusCondition (Lines 78–101) to compare and propagate ObservedGeneration, and update its doc comment to match — meta.SetStatusCondition does propagate it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/utils/conditions.go` around lines 29 - 42, ConditionsFromStatus
currently omits metav1.Condition.ObservedGeneration and thus can strip that
field during apply; update ConditionsFromStatus to set ObservedGeneration using
a fresh pointer (e.g., ptr.To(c.ObservedGeneration) or equivalent) alongside the
other fields to avoid aliasing, and modify SetApplyConfigurationStatusCondition
to compare and propagate ObservedGeneration when deciding to replace/merge
conditions; also update the doc comment to state all metav1.Condition fields
(including ObservedGeneration) are preserved verbatim.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller 66.46% (-0.03%) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/ready 74.55% (+2.62%) 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/utils 84.78% (+11.45%) 🎉

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/aggregates_controller.go 77.78% (-8.16%) 90 (+26) 70 (+15) 20 (+11) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/eviction_controller.go 36.75% (-2.63%) 166 (+6) 61 (-2) 105 (+8) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_controller.go 72.97% (-2.70%) 74 54 (-2) 20 (+2) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_instance_ha_controller.go 93.48% (+2.57%) 46 (+2) 43 (+3) 3 (-1) 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_maintenance_controller.go 79.01% (-0.06%) 81 (-5) 64 (-4) 17 (-1) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_taint_controller.go 95.00% (+2.14%) 20 (+6) 19 (+6) 1 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/node_certificate_controller.go 80.95% (-0.68%) 42 (-7) 34 (-6) 8 (-1) 👎
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/offboarding_controller.go 72.73% (+0.85%) 66 (+2) 48 (+2) 18 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/onboarding_controller.go 56.62% (+1.48%) 219 (-24) 124 (-10) 95 (-14) 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/ready/controller.go 74.55% (+2.62%) 55 (-2) 41 14 (-2) 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/traits_controller.go 77.97% (+3.36%) 59 (-4) 46 (-1) 13 (-3) 👍
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/utils/conditions.go 89.47% (+89.47%) 38 (+38) 34 (+34) 4 (+4) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/utils/status_patch.go 0.00% (-85.71%) 0 (-7) 0 (-6) 0 (-1) 💀 💀 💀 💀 💀

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant