Skip to content

Add retry-on-conflict for hypervisor status patches#293

Merged
notandy merged 1 commit intomainfrom
fix/reconciler-conflict-retry
Apr 27, 2026
Merged

Add retry-on-conflict for hypervisor status patches#293
notandy merged 1 commit intomainfrom
fix/reconciler-conflict-retry

Conversation

@notandy
Copy link
Copy Markdown
Contributor

@notandy notandy commented Apr 27, 2026

Multiple controllers updating the same Hypervisor resource were causing "the object has been modified" errors due to stale resourceVersions.

Add PatchHypervisorStatusWithRetry helper that re-fetches the resource before each patch attempt and uses retry logic. Update aggregates, offboarding, and traits controllers to use this helper.

Summary by CodeRabbit

  • Refactor
    • Centralized and made status updates retry-capable across Hypervisor controllers, improving reliability and consistency of status changes under contention and reducing failed or stale status states during concurrent updates.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Warning

Rate limit exceeded

@notandy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 35 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4891d46a-944f-447d-a1e7-e747c73e79e6

📥 Commits

Reviewing files that changed from the base of the PR and between 6d111c1 and d6f93c5.

📒 Files selected for processing (10)
  • internal/controller/aggregates_controller.go
  • internal/controller/hypervisor_controller.go
  • internal/controller/hypervisor_instance_ha_controller.go
  • internal/controller/hypervisor_maintenance_controller.go
  • internal/controller/hypervisor_taint_controller.go
  • internal/controller/offboarding_controller.go
  • internal/controller/onboarding_controller.go
  • internal/controller/ready/controller.go
  • internal/controller/traits_controller.go
  • internal/utils/status_patch.go
📝 Walkthrough

Walkthrough

Multiple controllers were changed to stop doing inline optimistic-lock status patches and instead use a new centralized utils.PatchHypervisorStatusWithRetry helper that fetches the latest Hypervisor, applies a caller callback, and retries on conflicts with exponential backoff.

Changes

Cohort / File(s) Summary
New Retry Patch Utility
internal/utils/status_patch.go
Add StatusPatchBackoff and PatchHypervisorStatusWithRetry(ctx, c, name, fieldOwner, updateFn) to fetch latest Hypervisor, apply updateFn, and patch status with retry-on-conflict and field owner.
Aggregates Controller
internal/controller/aggregates_controller.go
Replace direct status patching with PatchHypervisorStatusWithRetry; compute newAggregates and set aggregatesChanged, apply aggregates and condition inside retry callback and return helper error.
Offboarding & Traits Controllers
internal/controller/offboarding_controller.go, internal/controller/traits_controller.go
Remove r.Status().Patch + DeepCopy() merge logic; set conditions and status fields inside PatchHypervisorStatusWithRetry callbacks and return joined/propagated errors from retry helper.
Hypervisor Core & HA Controllers
internal/controller/hypervisor_controller.go, internal/controller/hypervisor_instance_ha_controller.go
Stop inline optimistic-lock merge patches; snapshot controller-owned status fields and delegate updates to PatchHypervisorStatusWithRetry, applying InternalIP, Terminating, and HA conditions via callback.
Maintenance, Taint & Onboarding Controllers
internal/controller/hypervisor_maintenance_controller.go, internal/controller/hypervisor_taint_controller.go, internal/controller/onboarding_controller.go
Unified status updates to use retry helper; callbacks set/remove specific conditions (HypervisorDisabled, Evicting, Tainted, onboarding fields) and preserve reconciled values instead of direct merge patches.
Ready Controller
internal/controller/ready/controller.go
Ready-condition update delegated to PatchHypervisorStatusWithRetry with callback that writes the computed readyCondition into Status.Conditions instead of inline Status().Patch.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • fwiesel

Poem

🐰 With whiskers twitching, I hop and peep,

Patch retries guard the status I keep,
No more deep copies strewn about,
One callback fixes every clout,
Hops and patches — tidy and neat. 🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: introducing retry-on-conflict logic for hypervisor status patches across multiple controllers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/reconciler-conflict-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
internal/controller/utils.go (1)

146-157: Optional: consider tolerating NotFound from the inner Get.

If the Hypervisor is deleted between the controller's top-level Get (which uses IgnoreNotFound) and the helper's Get, the helper returns the NotFound verbatim. Callers then wrap it as "cannot update hypervisor status due to ...", which surfaces as a reconcile error and triggers a requeue. The next reconcile self-heals via the top-level IgnoreNotFound, so this is at most cosmetic — but you may want to either IgnoreNotFound inside the helper or document the contract so callers don't add their own ad-hoc checks later.

Optional tweak
 func PatchHypervisorStatusWithRetry(ctx context.Context, c k8sclient.Client, name, fieldOwner string, updateFn func(*kvmv1.Hypervisor)) error {
 	return retry.RetryOnConflict(retry.DefaultRetry, func() error {
 		hv := &kvmv1.Hypervisor{}
 		if err := c.Get(ctx, k8sclient.ObjectKey{Name: name}, hv); err != nil {
-			return err
+			// Resource gone between reconcile and patch — nothing to update.
+			return k8sclient.IgnoreNotFound(err)
 		}
 		base := hv.DeepCopy()
 		updateFn(hv)
 		return c.Status().Patch(ctx, hv, k8sclient.MergeFromWithOptions(base,
 			k8sclient.MergeFromWithOptimisticLock{}), k8sclient.FieldOwner(fieldOwner))
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/utils.go` around lines 146 - 157,
PatchHypervisorStatusWithRetry currently returns a NotFound error if the
Hypervisor is deleted between the top-level Get and the helper's Get; update the
function to tolerate NotFound by detecting the NotFound error from c.Get (using
k8s API errors check, e.g., apierrors.IsNotFound) and returning nil (no-op)
instead of propagating it, so callers don't surface a reconcile error; mention
this contract in a short comment on PatchHypervisorStatusWithRetry so callers
know a deleted Hypervisor is silently ignored.
internal/controller/aggregates_controller.go (1)

100-108: Status patch is now unconditional — confirm this is intentional.

Previously a semantic equality check could skip the status patch entirely (per the AI summary of the prior version). Now every reconcile performs a Get + Status().Patch even when neither aggregates nor the condition would change. meta.SetStatusCondition is idempotent for unchanged fields, so the resulting JSON merge patch should be (near-)empty and the server will accept it as a no-op — but the extra round trip happens on every requeue.

If reconciles for stable hypervisors are frequent, consider re-introducing a cheap pre-check (e.g., compare desiredCondition to the existing one, plus aggregatesChanged) and short-circuiting before invoking the helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/aggregates_controller.go` around lines 100 - 108, Add a
cheap pre-check before calling PatchHypervisorStatusWithRetry: if
aggregatesChanged is false and the existing condition on hv already matches
desiredCondition, skip the patch and return ctrl.Result{}, nil. Concretely, use
meta.FindStatusCondition(&hv.Status.Conditions, desiredCondition.Type) (or
equivalent) to retrieve the current condition and compare the relevant fields
(Status, Reason, Message) against desiredCondition; only call
PatchHypervisorStatusWithRetry when aggregatesChanged is true or the condition
differs. This avoids an unnecessary Get+Status().Patch round trip for stable
hypervisors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@internal/controller/aggregates_controller.go`:
- Around line 100-108: Add a cheap pre-check before calling
PatchHypervisorStatusWithRetry: if aggregatesChanged is false and the existing
condition on hv already matches desiredCondition, skip the patch and return
ctrl.Result{}, nil. Concretely, use
meta.FindStatusCondition(&hv.Status.Conditions, desiredCondition.Type) (or
equivalent) to retrieve the current condition and compare the relevant fields
(Status, Reason, Message) against desiredCondition; only call
PatchHypervisorStatusWithRetry when aggregatesChanged is true or the condition
differs. This avoids an unnecessary Get+Status().Patch round trip for stable
hypervisors.

In `@internal/controller/utils.go`:
- Around line 146-157: PatchHypervisorStatusWithRetry currently returns a
NotFound error if the Hypervisor is deleted between the top-level Get and the
helper's Get; update the function to tolerate NotFound by detecting the NotFound
error from c.Get (using k8s API errors check, e.g., apierrors.IsNotFound) and
returning nil (no-op) instead of propagating it, so callers don't surface a
reconcile error; mention this contract in a short comment on
PatchHypervisorStatusWithRetry so callers know a deleted Hypervisor is silently
ignored.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b8dee713-66ce-49a1-a753-87669e51956e

📥 Commits

Reviewing files that changed from the base of the PR and between dd9145b and faac5d6.

📒 Files selected for processing (4)
  • internal/controller/aggregates_controller.go
  • internal/controller/offboarding_controller.go
  • internal/controller/traits_controller.go
  • internal/controller/utils.go

@notandy notandy force-pushed the fix/reconciler-conflict-retry branch from faac5d6 to e1dc9b5 Compare April 27, 2026 17:42
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/controller/ready/controller.go (1)

110-120: ⚠️ Potential issue | 🟠 Major

Recompute Ready on the fresh object.

readyCondition is derived before the retry wrapper, so a retry can write a stale Ready value after another controller changes the upstream conditions.

♻️ Proposed fix
- return ctrl.Result{}, utils.PatchHypervisorStatusWithRetry(ctx, r.Client, req.Name, ControllerName, func(h *kvmv1.Hypervisor) {
- 	meta.SetStatusCondition(&h.Status.Conditions, readyCondition)
- })
+ return ctrl.Result{}, utils.PatchHypervisorStatusWithRetry(ctx, r.Client, req.Name, ControllerName, func(h *kvmv1.Hypervisor) {
+ 	meta.SetStatusCondition(&h.Status.Conditions, ComputeReadyCondition(h))
+ })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/ready/controller.go` around lines 110 - 120, The Ready
condition is computed from a stale hv object before the retry wrapper so a
concurrent change can cause a stale Ready to be written; fix this by computing
the Ready condition from the fresh object inside the retry closure: call
ComputeReadyCondition(h) within the func(h *kvmv1.Hypervisor) passed to
PatchHypervisorStatusWithRetry and pass that resulting readyCondition to
meta.SetStatusCondition(h.Status.Conditions, readyCondition). Also update the
log to reflect the recomputed readyCondition (move the log inside the closure or
re-compute before logging) so the logged value matches what is persisted.
🧹 Nitpick comments (1)
internal/controller/aggregates_controller.go (1)

100-108: Consider guarding the status patch call to avoid no-op updates.

The PatchHypervisorStatusWithRetry helper does not short-circuit no-op updates. Lines 101–105 call it unconditionally even when aggregatesChanged is false and desiredCondition may be unchanged. The ready controller (internal/controller/ready/controller.go:113) demonstrates the pattern: check equality.Semantic.DeepEqual(hv.Status, base.Status) before patching to reduce unnecessary API calls and conflict pressure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/aggregates_controller.go` around lines 100 - 108, Guard
the call to PatchHypervisorStatusWithRetry to avoid no-op updates: capture the
original hv.Status into a base/status snapshot (e.g., base := hv.DeepCopy();
base.Status = hv.Status) and only call utils.PatchHypervisorStatusWithRetry when
aggregatesChanged or the status actually differs (use
equality.Semantic.DeepEqual on hv.Status vs base.Status or compare the specific
fields like h.Status.Aggregates and the desiredCondition against existing
conditions) so the patch (called with AggregatesControllerName and
desiredCondition) is skipped when there is no change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/hypervisor_controller.go`:
- Around line 125-127: The closure passed to PatchHypervisorStatusWithRetry
currently does h.Status = hypervisor.Status which overwrites the freshly fetched
object with stale data; instead mutate only the specific status fields this
controller owns inside that closure (e.g., set h.Status.<OwnedField1>,
h.Status.<OwnedField2>, etc. from hypervisor.Status) so you do not clobber other
controllers' updates—update the body of the function passed to
PatchHypervisorStatusWithRetry to assign only the controller-owned fields rather
than replacing h.Status wholesale (keep using PatchHypervisorStatusWithRetry and
HypervisorControllerName).

In `@internal/controller/hypervisor_instance_ha_controller.go`:
- Around line 145-147: The patch currently replaces all status conditions via
h.Status.Conditions = hv.Status.Conditions which can overwrite newer updates;
instead, inside the PatchHypervisorStatusWithRetry callback used in the
HypervisorInstanceHaControllerName reconciliation, update only the HaEnabled
condition on h (create or replace the single condition with the same Type
"HaEnabled", Status, Reason, Message and LastTransitionTime from
hv.Status.Conditions[HaEnabled]) and leave other h.Status.Conditions untouched
so concurrent controller updates are preserved.

In `@internal/controller/hypervisor_maintenance_controller.go`:
- Around line 86-88: The code is overwriting the entire status by doing h.Status
= hv.Status inside PatchHypervisorStatusWithRetry; instead, only copy/assign the
maintenance-owned fields so you don't clobber other controllers' status. Update
the closure passed to PatchHypervisorStatusWithRetry (in
hypervisor_maintenance_controller.go) to mutate only the specific maintenance
fields on h.Status (for example the maintenance condition, reason, message,
timestamps or a dedicated Maintenance sub-struct) using values from hv.Status,
leaving all other h.Status fields untouched; keep using
HypervisorMaintenanceControllerName and the existing function
PatchHypervisorStatusWithRetry.

In `@internal/controller/hypervisor_taint_controller.go`:
- Around line 88-90: The current patch replaces the entire conditions slice on
the fresh object with the stale outer slice (h.Status.Conditions =
hypervisor.Status.Conditions), risking lost updates; instead, inside the
PatchHypervisorStatusWithRetry callback locate the tainted condition in the
fresh h.Status.Conditions (by Type == HypervisorTaintControllerName or matching
condition Type/Reason), update only that condition's fields (Status, Reason,
Message, LastTransitionTime) from the outer hypervisor.Status.Conditions entry,
or append it if missing, preserving any other conditions that may have been
added between retries; update the callback used in
PatchHypervisorStatusWithRetry accordingly so you modify the fresh h object
in-place rather than replacing the slice.

In `@internal/controller/onboarding_controller.go`:
- Around line 572-575: The patchStatus helper currently replaces the entire
Hypervisor status (h.Status = hv.Status) which can overwrite newer, unrelated
fields; change patchStatus (and its PatchHypervisorStatusWithRetry callback) to
copy only the onboarding-owned fields from the provided snapshot (for example
assign h.Status.OnboardingState, h.Status.OnboardingMessage and any
onboarding-specific conditions or sub-struct like h.Status.OnboardingConditions
= hv.Status.OnboardingConditions) instead of replacing h.Status wholesale; also
remove the unused second parameter or rename it to make intent clear so the
retry callback only mutates the onboarding-specific fields in the Hypervisor
status.

---

Outside diff comments:
In `@internal/controller/ready/controller.go`:
- Around line 110-120: The Ready condition is computed from a stale hv object
before the retry wrapper so a concurrent change can cause a stale Ready to be
written; fix this by computing the Ready condition from the fresh object inside
the retry closure: call ComputeReadyCondition(h) within the func(h
*kvmv1.Hypervisor) passed to PatchHypervisorStatusWithRetry and pass that
resulting readyCondition to meta.SetStatusCondition(h.Status.Conditions,
readyCondition). Also update the log to reflect the recomputed readyCondition
(move the log inside the closure or re-compute before logging) so the logged
value matches what is persisted.

---

Nitpick comments:
In `@internal/controller/aggregates_controller.go`:
- Around line 100-108: Guard the call to PatchHypervisorStatusWithRetry to avoid
no-op updates: capture the original hv.Status into a base/status snapshot (e.g.,
base := hv.DeepCopy(); base.Status = hv.Status) and only call
utils.PatchHypervisorStatusWithRetry when aggregatesChanged or the status
actually differs (use equality.Semantic.DeepEqual on hv.Status vs base.Status or
compare the specific fields like h.Status.Aggregates and the desiredCondition
against existing conditions) so the patch (called with AggregatesControllerName
and desiredCondition) is skipped when there is no change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 170d15bf-c117-4f0f-8d25-a6275d35c71e

📥 Commits

Reviewing files that changed from the base of the PR and between faac5d6 and e1dc9b5.

📒 Files selected for processing (10)
  • internal/controller/aggregates_controller.go
  • internal/controller/hypervisor_controller.go
  • internal/controller/hypervisor_instance_ha_controller.go
  • internal/controller/hypervisor_maintenance_controller.go
  • internal/controller/hypervisor_taint_controller.go
  • internal/controller/offboarding_controller.go
  • internal/controller/onboarding_controller.go
  • internal/controller/ready/controller.go
  • internal/controller/traits_controller.go
  • internal/utils/status_patch.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/offboarding_controller.go

Comment thread internal/controller/hypervisor_controller.go
Comment thread internal/controller/hypervisor_instance_ha_controller.go
Comment thread internal/controller/hypervisor_maintenance_controller.go
Comment thread internal/controller/hypervisor_taint_controller.go
Comment thread internal/controller/onboarding_controller.go Outdated
@notandy notandy force-pushed the fix/reconciler-conflict-retry branch from e1dc9b5 to 6d111c1 Compare April 27, 2026 20:11
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/controller/onboarding_controller.go (1)

572-585: Drop the unused second parameter.

patchStatus no longer reads base; keeping the _ *kvmv1.Hypervisor parameter forces every caller to keep allocating a hv.DeepCopy() they otherwise don't need (some still legitimately need it for a local DeepEqual check, but several call sites allocate it solely to feed this dead parameter). Cleaning it up makes the contract obvious — the helper now refetches internally — and removes an inconsistent-API smell.

♻️ Proposed change
-func (r *OnboardingController) patchStatus(ctx context.Context, hv, _ *kvmv1.Hypervisor) error {
+func (r *OnboardingController) patchStatus(ctx context.Context, hv *kvmv1.Hypervisor) error {
 	// Capture only the fields this controller owns
 	hypervisorID := hv.Status.HypervisorID
 	serviceID := hv.Status.ServiceID
 	onboardingCondition := meta.FindStatusCondition(hv.Status.Conditions, kvmv1.ConditionTypeOnboarding)

 	return utils.PatchHypervisorStatusWithRetry(ctx, r.Client, hv.Name, OnboardingControllerName, func(h *kvmv1.Hypervisor) {
 		h.Status.HypervisorID = hypervisorID
 		h.Status.ServiceID = serviceID
 		if onboardingCondition != nil {
 			meta.SetStatusCondition(&h.Status.Conditions, *onboardingCondition)
 		}
 	})
 }

…and update each call site (e.g. r.patchStatus(ctx, hv, base)r.patchStatus(ctx, hv)), dropping the base := hv.DeepCopy() lines that exist only to satisfy this parameter (keep the base lines that are still used by local equality.Semantic.DeepEqual checks).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/onboarding_controller.go` around lines 572 - 585, Remove
the unused second parameter from the patchStatus signature and update all call
sites: change func (r *OnboardingController) patchStatus(ctx context.Context,
hv, _ *kvmv1.Hypervisor) error to func (r *OnboardingController) patchStatus(ctx
context.Context, hv *kvmv1.Hypervisor) error (keep the body unchanged), then
replace every call like r.patchStatus(ctx, hv, base) with r.patchStatus(ctx, hv)
and delete any base := hv.DeepCopy() allocations that existed solely to supply
the removed argument (but keep any DeepCopy() uses that are still needed for
local equality checks).
internal/utils/status_patch.go (1)

31-39: Add a Cap to bound per-attempt delay.

With Steps=10, Factor=2.0 and no Cap, the per-attempt delay grows unbounded (final attempt ≈ 25.6s, cumulative worst case ≈ 51s). Since retry.RetryOnConflict runs synchronously on the reconcile worker, a contended hypervisor can block a worker slot for the entire backoff window before yielding, delaying other reconciles in the same queue. Standard Kubernetes helpers (e.g. retry.DefaultBackoff) set a Cap for exactly this reason.

♻️ Proposed change
 var StatusPatchBackoff = wait.Backoff{
 	Steps:    10,
 	Duration: 50 * time.Millisecond,
+	Cap:      5 * time.Second,
 	Factor:   2.0,
 	Jitter:   0.2,
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/utils/status_patch.go` around lines 31 - 39, StatusPatchBackoff
currently omits a Cap so per-attempt delays grow unbounded; update the
wait.Backoff literal for StatusPatchBackoff to include a Cap to bound the
maximum per-attempt delay (e.g. Cap: 5 * time.Second) so retries don’t block a
reconcile worker for tens of seconds—modify the StatusPatchBackoff variable
initializer to add the Cap field while keeping Steps, Duration, Factor and
Jitter as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/controller/aggregates_controller.go`:
- Around line 100-108: Before calling utils.PatchHypervisorStatusWithRetry,
short-circuit when there is nothing to change: if aggregatesChanged is false and
the Hypervisor already has the same condition as desiredCondition (use
meta.FindStatusCondition(&hv.Status.Conditions, desiredCondition.Type) and
compare the found condition with desiredCondition), return ctrl.Result{}, nil;
only call PatchHypervisorStatusWithRetry when aggregatesChanged is true or the
existing condition differs from desiredCondition so we avoid unnecessary
Get/Patch roundtrips.

---

Nitpick comments:
In `@internal/controller/onboarding_controller.go`:
- Around line 572-585: Remove the unused second parameter from the patchStatus
signature and update all call sites: change func (r *OnboardingController)
patchStatus(ctx context.Context, hv, _ *kvmv1.Hypervisor) error to func (r
*OnboardingController) patchStatus(ctx context.Context, hv *kvmv1.Hypervisor)
error (keep the body unchanged), then replace every call like r.patchStatus(ctx,
hv, base) with r.patchStatus(ctx, hv) and delete any base := hv.DeepCopy()
allocations that existed solely to supply the removed argument (but keep any
DeepCopy() uses that are still needed for local equality checks).

In `@internal/utils/status_patch.go`:
- Around line 31-39: StatusPatchBackoff currently omits a Cap so per-attempt
delays grow unbounded; update the wait.Backoff literal for StatusPatchBackoff to
include a Cap to bound the maximum per-attempt delay (e.g. Cap: 5 * time.Second)
so retries don’t block a reconcile worker for tens of seconds—modify the
StatusPatchBackoff variable initializer to add the Cap field while keeping
Steps, Duration, Factor and Jitter as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2442100a-15d9-49e6-b194-f36ea03aa9d5

📥 Commits

Reviewing files that changed from the base of the PR and between e1dc9b5 and 6d111c1.

📒 Files selected for processing (10)
  • internal/controller/aggregates_controller.go
  • internal/controller/hypervisor_controller.go
  • internal/controller/hypervisor_instance_ha_controller.go
  • internal/controller/hypervisor_maintenance_controller.go
  • internal/controller/hypervisor_taint_controller.go
  • internal/controller/offboarding_controller.go
  • internal/controller/onboarding_controller.go
  • internal/controller/ready/controller.go
  • internal/controller/traits_controller.go
  • internal/utils/status_patch.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/controller/hypervisor_maintenance_controller.go
  • internal/controller/hypervisor_taint_controller.go

Comment thread internal/controller/aggregates_controller.go
@notandy notandy force-pushed the fix/reconciler-conflict-retry branch from 6d111c1 to 833306d Compare April 27, 2026 20:28
Multiple controllers updating the same Hypervisor resource were causing
"the object has been modified" errors due to stale resourceVersions.

Add PatchHypervisorStatusWithRetry helper that re-fetches the resource
before each patch attempt and uses exponential backoff retry logic.
Update aggregates, offboarding, and traits controllers to use this helper.
@notandy notandy force-pushed the fix/reconciler-conflict-retry branch from 833306d to d6f93c5 Compare April 27, 2026 21:06
@github-actions
Copy link
Copy Markdown

⚠️ Note: Baseline coverage from main branch is not available (artifact may be expired). Showing current coverage for changed files only.

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller 65.66% (+65.66%) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/ready 69.23% (+69.23%) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/utils 60.00% (+60.00%) 🌟

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/aggregates_controller.go 84.62% (+84.62%) 65 (+65) 55 (+55) 10 (+10) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_controller.go 78.87% (+78.87%) 71 (+71) 56 (+56) 15 (+15) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_instance_ha_controller.go 86.96% (+86.96%) 46 (+46) 40 (+40) 6 (+6) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_maintenance_controller.go 78.16% (+78.16%) 87 (+87) 68 (+68) 19 (+19) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/hypervisor_taint_controller.go 92.86% (+92.86%) 14 (+14) 13 (+13) 1 (+1) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/offboarding_controller.go 71.88% (+71.88%) 64 (+64) 46 (+46) 18 (+18) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/onboarding_controller.go 55.14% (+55.14%) 243 (+243) 134 (+134) 109 (+109) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/ready/controller.go 69.23% (+69.23%) 52 (+52) 36 (+36) 16 (+16) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/controller/traits_controller.go 73.44% (+73.44%) 64 (+64) 47 (+47) 17 (+17) 🌟
github.com/cobaltcore-dev/openstack-hypervisor-operator/internal/utils/status_patch.go 85.71% (+85.71%) 7 (+7) 6 (+6) 1 (+1) 🌟

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

@notandy notandy merged commit 38c9868 into main Apr 27, 2026
6 checks passed
@notandy notandy deleted the fix/reconciler-conflict-retry branch April 27, 2026 21:09
This was referenced Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants