Skip to content

fix(controller,satellite): operational-lifecycle parity round 1 — Bugs 384–387 + L7 harness hardening (388)#83

Merged
Andrei Kvapil (kvaps) merged 19 commits into
mainfrom
integration/operational-lifecycle-384-387
Jun 3, 2026
Merged

fix(controller,satellite): operational-lifecycle parity round 1 — Bugs 384–387 + L7 harness hardening (388)#83
Andrei Kvapil (kvaps) merged 19 commits into
mainfrom
integration/operational-lifecycle-384-387

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) commented Jun 2, 2026

Summary

Round 1 of the operational-lifecycle parity work: four user-found lifecycle bugs that the earlier REST/wire-validation bug-hunt did not cover, plus a hardening pass on the L7 replay harness that surfaced while validating them on the live stand.

Validated as a unit on the dev stand (mTLS apiserver, linstor-client 1.27.1): every fix behaves correctly under direct operator-CLI exercise, and all target + adjacent replays pass green through the corrected harness. No regressions.

Bugs fixed

  • 384 (P0, data integrity) — adding a volume to an existing multi-replica resource (vd c) left the new volume Inconsistent on every replica forever. The late-add seed path passed isWinner=false unconditionally because first-activation election was gated on !rdInitialized, so no replica seeded the new volume UpToDate. Fix recomputes the lowest-node-id winner per fresh volume in the satellite reconciler.
  • 385 (P1)node evict with a free tiebreaker demoted a healthy diskful replica to TieBreaker and left the evicted node's replica counted. The RD tiebreaker decision now excludes replicas on EVICTED/LOST nodes and reaps stranded witnesses.
  • 386 (P1)node restore did not recreate the auto-tiebreaker (the RD reconciler did not watch Node), leaving two diskful UpToDate with no witness. Adds a Node watch so the tiebreaker invariant re-runs after restore.
  • 387 (P1)r d of a diskful replica on a 2-diskful + 1-INACTIVE resource spawned a useless TieBreaker because the tiebreaker count treated the INACTIVE (deactivated) replica as a voting diskful. INACTIVE replicas are now excluded from the voting set.
  • 388 (test harness) — the L7 replay assertions all_uptodate/wait_settle filtered on spec.resourceName (the CRD field is spec.resourceDefinitionName) and the tiebreaker assertions used a case-insensitive grep that matched the rd name — both silent no-ops that gave false-green. Fixed, plus no_orphans settle window, linstor-client 1.27.1 compat (-m instead of --output-fmt=json, --storage-pool= equals form, a node evict REST shim), and tolerance for Diskless/TieBreaker rows.

Test tiers

Each code bug ships its L1/L2 regression, an L6 cli-matrix cell, and an L7 replay YAML. The L7 replays were run green on the live stand through the corrected (non-vacuous) harness; the harness fix is proven non-vacuous (an Inconsistent mid-sync replica now fails all_uptodate where it previously passed).

Supersedes the per-bug branches fix/bug-384-vd-late-seed-gi, fix/bug-385-evict-tb-shuffle, fix/bug-386-restore-recreate-tb, fix/bug-387-rd-inactive-no-tb, fix/bug-388-l7-assertion-noop (their commits are included here).

Summary by CodeRabbit

  • Bug Fixes

    • Fixed tiebreaker witness handling when cluster nodes are evicted
    • Fixed tiebreaker recreation after evicted nodes are restored
    • Fixed inactive replicas being incorrectly counted in quorum calculations
    • Fixed late-added volume replica election logic
  • Tests

    • Added comprehensive test coverage for tiebreaker scenarios
    • Added end-to-end regression tests

Andrei Kvapil (kvaps) and others added 19 commits June 2, 2026 23:29
… (Bug 385)

ensureTiebreaker counted every replica of an RD, including a TIE_BREAKER
witness sitting on a just-EVICTED node. The witness invariant then read
as satisfied (diskful=2 + witness=1) and the reconciler never relocated
the witness off the drained node, so `linstor n e <tiebreaker-node>` did
not actually take effect; downstream the stranded witness blocked a
fresh one from landing on a healthy spare, which surfaced as a healthy
diskful drifting into the tiebreaker role.

Treat replicas on EVICTED / LOST nodes as draining placements (mirroring
the placer's disabledNodes semantic): exclude them from the witness /
quorum decision and reap a TIE_BREAKER stranded on a disabled node so a
fresh witness can relocate to a healthy spare, or quorum falls back to
off when none remains. A healthy diskful is never demoted. Diskful
replicas on disabled nodes are left to the NodeReconciler's migration
path.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…Bug 385)

L6 cli-matrix/n-evict-tiebreaker-no-shuffle.sh drives the real
`linstor n e` CLI on the 2r+tb shape and asserts the witness leaves the
evicted node while neither healthy diskful is demoted to TIE_BREAKER.

L7 replay/n-evict-tiebreaker-no-shuffle.yaml codifies the operator
sequence (rd c → vd c → 2x r c → tiebreaker on node3 → n e node3) and
the convergence assertions: witness reaped off the evicted node
(resource_absent), both diskful replicas stay UpToDate.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…g count (Bug 387)

An INACTIVE replica is `drbdadm down` (operator deactivation) — its
DRBD device is not up, so it casts no vote in the quorum:majority
decision the auto-tiebreaker invariant defends. The RD reconciler's
ensureTiebreaker counted it as a full diskful, so deleting one active
diskful on a 2-active + 1-INACTIVE resource looked like "2 diskful,
even parity, no user-diskless" and spuriously grew a TIE_BREAKER
witness (1 active diskful + 1 witness = a 2-voter quorum with no
majority protection), diverging from upstream LINSTOR.

Drop INACTIVE replicas from the voting set before the diskful/diskless
split so they influence neither the diskful count nor the
diskless/witness count.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…wns no tiebreaker

L1 regression. TestBug387InactiveReplicaNotCountedAsVotingDiskful
reproduces the operator repro (1 active diskful + 1 INACTIVE after
`r d`): EnsureTiebreaker must create no witness and write quorum=off.
TestBug387TwoActiveDiskfulStillGetWitness is the positive control —
2 genuine active diskful + 1 ignored INACTIVE still grows a witness
with quorum=majority, so the INACTIVE-exclusion does not regress the
canonical auto-witness invariant.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
L6 r-d-inactive-no-tiebreaker.sh: 3 diskful → deactivate one (INACTIVE)
→ r d one active diskful, asserts no TIE_BREAKER is spawned and the
deleted node does not re-appear as a witness.

L7 r-d-inactive-no-tiebreaker.yaml: codifies the same operator sequence
with no_tiebreaker + resource_absent convergence assertions.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The RD reconciler watched only ResourceDefinitions and Resources, never
Nodes. An EVICTED/LOST node is excluded as a tiebreaker witness
candidate, so a 2-diskful resource whose witness collapsed while a node
was drained could not regain it. After `linstor n rst` cleared the
EVICTED flag, nothing enqueued the RD, so ensureTiebreaker never re-ran
and the resource sat at two diskful UpToDate replicas with no quorum
witness until the next periodic re-sync (split-brain risk in between).

Add a Node watch (nodeDrainFlagChanged predicate + enqueueRDsForNode
mapper) that re-enqueues every RD when a node's EVICTED/LOST flag set
transitions, so node restore re-runs the tiebreaker invariant and the
witness is re-placed.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
L1 regression coverage for the Bug 386 fix:

- TestBug386NodeRestoreRecreatesTiebreaker drives the reconcile-level
  outcome: a 2-diskful RD with an EVICTED third node has no witness;
  clearing EVICTED (`n rst`) and re-running ensureTiebreaker re-places
  the TIE_BREAKER on the restored node.
- TestBug386NodeHasDrainFlag pins the EVICTED/LOST flag-set probe that
  gates the Node watch predicate (Spec + Status flags).
- TestBug386EnqueueRDsForNode pins the cluster-wide RD fan-out.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… 386)

Stand-side companion to the L1 unit pin: drives the real python-linstor
CLI on the 2r-tb shape, evacuates the witness node to collapse the
TieBreaker, restores it with `n rst`, and asserts the TieBreaker
reappears in `linstor r l` within 60s.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add the n-rst-recreates-tiebreaker replay workflow: autoplace a 2r-tb
shape, `n evacuate` the witness node to collapse the TieBreaker, then
`n rst` and assert the witness is re-created.

Add a `tiebreaker_present` await kind (inverse of no_tiebreaker) so the
workflow can assert the witness EXISTS after restore.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ToDate (Bug 384)

Adding a volume to an already-initialized multi-replica resource left
the new volume stuck Inconsistent on every replica with no SyncSource.

The late-add seed path (seedFreshVolumes) runs with firstActivation=
false on the parent RD, so the dispatcher's first-activation winner
election (auto-primary, gated on !rdInitialized) never fires once the RD
is Initialized. The code therefore passed isWinner=false unconditionally
and every diskful replica took the case-A skip-init-sync seed (clean
bitmap, no UpToDate flag). With no replica declaring itself the UpToDate
source and no primary --force on this path, the freshly-carved volume
came up Inconsistent on ALL replicas and never converged.

Recompute the same lowest-node-id election locally per fresh volume
(isLateAddWinner): exactly one diskful replica seeds the new volume
Consistent+UpToDate (case B) and becomes the SyncSource, the rest take
case A. resolveVolumeSeed's PeerHasData / AnyConnectedPeerHasDataForVolume
vetoes still fire first, so a relocate/migrate target never wins itself
UpToDate-empty; the winner seed shares the day0 lineage anchor, so a
staggered dual election agrees rather than split-brains.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
L1 (reconciler_drbd_test.go):
  - TestApplyLateAddedVolumeWinnerSeedsUpToDate: the lowest-node-id
    replica seeds the late-added vol-1 Consistent+UpToDate (positional
    idx4=1/idx5=1 at current=day0); vol-0 is never re-seeded. Fails
    pre-fix with the case-A <day0>:0:0:0 tuple.
  - TestApplyLateAddedVolumeNonWinnerTakesSkipInitSync: a non-lowest-id
    replica seeds case-A skip-init-sync and MUST NOT carry the UpToDate
    flag (split-brain guard).

L6 (cli-matrix/bug-384-late-vd-thin-converges.sh): 2-diskful thin RD,
  vol-0 UpToDate, then a SECOND vd c adds vol-1; assert vol-1 reaches
  UpToDate on both replicas plus a kernel-truth drbdadm status guard
  against the stuck-Inconsistent surface.

L7 (replay/vd-late-create-second-volume.yaml): operator sequence
  rd c -> vd c vol-0 -> r c x2 -> wait all_uptodate -> vd c vol-1, with
  all_uptodate (every volume of every replica) as the convergence
  assertion. Distinct from vd-late-create.yaml, which exercises the
  always-green first-volume late-add path.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… into integration/operational-lifecycle-384-387
…to integration/operational-lifecycle-384-387
…nto integration/operational-lifecycle-384-387

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

# Conflicts:
#	internal/controller/resourcedefinition_controller.go
…ow (Bug 388)

The L7 replay assertions all_uptodate and wait_settle filtered replicas on
spec.resourceName, but the Resource CRD field is spec.resourceDefinitionName.
The filter matched nothing, so all_uptodate counted zero non-UpToDate
replicas and returned green vacuously — a silent no-op across all 16 replay
YAMLs that use it. Convergence was never actually asserted.

Fix the field name and make all_uptodate tolerate Diskless/TieBreaker
replicas (never UpToDate by design) while still flagging a diskful replica
stuck Inconsistent/Outdated/SyncTarget. Treat an rd with zero observed
replicas as not-yet-converged so the waiter keeps polling instead of
passing on an absent resource.

Also give the no_orphans invariant a settle window: teardown removes
Resource CRDs asynchronously, so a single post-teardown snapshot races GC
and reports phantom orphans. Poll up to NO_ORPHANS_SETTLE_S (default 30s).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…teps

linstor-client 1.27.1 declares --storage-pool with nargs='*', so the
form "--storage-pool stand <rd>" makes argparse greedily consume the rd
positional into the storage-pool list and the resource create fails with
"too few arguments". Use the equals form --storage-pool=<sp> so the rd
positional is preserved. Affects every replay that drives
resource create --auto-place ... --storage-pool ...

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Three client-compat defects surfaced running the L7 replays against the
stand client (1.27.1):

- replica_count / vd_size_kib used --output-fmt=json, which 1.27.1 does
  not accept (only -m / --machine-readable). The CLI errored, the parser
  saw empty input and returned 0, so every replica_count await timed out.
  Switch to -m; the nested-list parser already handles its shape.

- no_tiebreaker / tiebreaker_present grepped case-INSENSITIVELY for
  TieBreaker. The replay RD names contain the lowercase substring
  "tiebreaker" (replay-*-tiebreaker-*), so grep -ci matched the resource
  name on every data row: no_tiebreaker could never reach 0 and timed out
  on a correct cluster, tiebreaker_present passed vacuously. Match the
  State token TieBreaker case-sensitively.

- node evict is a controller action that 1.27.1 does not expose as a
  subcommand (only evacuate/lost/restore). Route node evict through the
  REST endpoint PUT /v1/nodes/<node>/evict so the replay step stays
  operator-faithful.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…skless

The Bug 385 replay hand-created the node3 witness with r c node3 --diskless.
Bug 385's stranded-witness reaping (removeStrandedWitnesses) only deletes
replicas carrying the TieBreaker flag; a user-placed --diskless replica is
operator-owned and is never auto-reaped on evict, so the resource_absent
await timed out against a correct controller. Let the RD reconciler auto-place
the TIE_BREAKER on the only remaining node (node3) and assert tiebreaker_present
before the evict — the same auto-witness shape the cli-matrix catcher uses.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Bug 384 is a deterministic per-volume winner-election defect in the
satellite (seedFreshVolumes / isLateAddWinner), not a sync-timing race, so
the trigger is size-independent. A 1G fresh-replica sync on an
I/O-constrained stand (~0.5 MB/s) overshoots the 240s all_uptodate await;
64M converges in seconds and exercises the identical late-add code path.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR fixes four interconnected bugs affecting DRBD tiebreaker auto-placement and satellite volume election. The satellite now correctly elects a per-volume winner for late-added volumes. The controller refines tiebreaker logic to exclude evicted/lost/inactive nodes, removes stranded witnesses, and re-runs when node drain flags toggle. Unit, e2e, and replay tests validate all scenarios.

Changes

Bug Fixes: Satellite and Controller

Layer / File(s) Summary
Late-added volume winner election (Bug 384)
pkg/satellite/reconciler.go
seedFreshVolumes now computes isWinner by comparing local node-id to peer node-ids, ensuring only the lowest-id replica seeds UpToDate and others use skip-init-sync on late-added volumes.
Controller tiebreaker filtering and disabled-node handling (Bugs 385–387)
internal/controller/resourcedefinition_controller.go, internal/controller/export_test.go
ensureTiebreaker derives a disabled-node set for EVICTED/LOST nodes, removes stranded TIE_BREAKER witnesses, filters INACTIVE replicas before quorum decisions, and adds a Node watch to re-run when drain flags toggle; exports test helpers EnqueueRDsForNode and NodeHasDrainFlag.

Test Coverage and Harness

Layer / File(s) Summary
Unit tests for late-add winner election (Bug 384)
pkg/satellite/reconciler_drbd_test.go
Two tests verify that the winner replica seeds UpToDate while non-winners use skip-init-sync, and that per-volume metadata creation and convergence work correctly.
Unit tests for evicted node witness handling (Bug 385)
internal/controller/ensure_tiebreaker_evict_bug_385_test.go
Three tests verify that stranded TIE_BREAKER witnesses are removed from evicted nodes, optionally relocated to spare nodes, and that evicted diskful replicas are not counted in witness quorum.
Unit tests for node drain flag watch and witness recreation (Bug 386)
internal/controller/ensure_tiebreaker_test.go
Three tests verify that witnesses are recreated when evicted nodes are restored, that drain-flag predicates correctly identify EVICTED/LOST transitions, and that node changes fan out reconcile requests to all ResourceDefinitions.
Unit tests for INACTIVE replica exclusion (Bug 387)
internal/controller/ensure_tiebreaker_inactive_bug_387_test.go
Two tests verify that INACTIVE replicas are not counted toward the diskful voting set, so a single active diskful replica with INACTIVE peers does not trigger auto-tiebreaker, while two active diskfuls still do.
Test harness improvements for node evict and assertions
tests/operator-harness/lib.sh, tests/operator-harness/replay-runner.sh
Harness adds REST-based node evict support, fixes TieBreaker detection to use case-sensitive grep, treats Diskless replicas as acceptable in uptodate checks, filters snapshots by resourceDefinitionName, and extends orphan-cleanup polling; documents new tiebreaker_present assertion kind.
E2E test scenarios for late-add volume convergence (Bug 384)
tests/e2e/cli-matrix/bug-384-late-vd-thin-converges.sh, tests/operator-harness/replay/vd-late-create-second-volume.yaml
CLI-matrix and replay tests verify that a second volume added to an existing resource converges to UpToDate on all replicas without stuck Inconsistent state or split-brain.
E2E test scenarios for evicted tiebreaker handling (Bug 385)
tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh, tests/operator-harness/replay/n-evict-tiebreaker-no-shuffle.yaml
CLI-matrix and replay tests verify that evicting a tiebreaker node reaps or relocates the witness, does not demote healthy diskfuls, and handles node cleanup correctly.
E2E test scenarios for witness recreation on node restore (Bug 386)
tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh, tests/operator-harness/replay/n-rst-recreates-tiebreaker.yaml
CLI-matrix and replay tests verify that evacuating a tiebreaker node collapses the witness, and restoring the node via n rst triggers re-creation of the witness.
E2E test scenarios for INACTIVE replica behavior (Bug 387)
tests/e2e/cli-matrix/r-d-inactive-no-tiebreaker.sh, tests/operator-harness/replay/r-d-inactive-no-tiebreaker.yaml
CLI-matrix and replay tests verify that deactivating a replica and deleting another does not auto-spawn a tiebreaker when only one active diskful remains.
YAML test format updates (--storage-pool argument)
tests/operator-harness/replay/auto-place-*.yaml, tests/operator-harness/replay/*-lifecycle.yaml
Multiple replay YAML files updated to pass --storage-pool as a single combined argument instead of separate tokens for consistency with CLI invocation semantics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • cozystack/blockstor#36: Modifies ensureTiebreaker witness-handling logic in the same controller file; direct code-level connection on tiebreaker removal semantics.
  • cozystack/blockstor#31: Validates orphan TIE_BREAKER cleanup when diskful replicas are removed; related to this PR's witness cleanup invariants.

Poem

🐰 Four bugs now mended in the witness realm,
Late volumes elected, no split-brain's helm,
Drained nodes shed their orphaned ties,
Inactive whispers pass unwise,
The tiebreaker dance—at last—stands true!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing four operational-lifecycle bugs (384-387) and hardening the L7 harness (388) in controller and satellite components.
Docstring Coverage ✅ Passed Docstring coverage is 81.48% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch integration/operational-lifecycle-384-387

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 23:25
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses several critical bugs (Bugs 384, 385, 386, and 387) in the blockstor controller and satellite reconciler, improving volume initialization, node eviction, node restoration, and inactive replica handling. The review feedback suggests adding defensive nil checks in isLateAddWinner and nodeHasDrainFlag to prevent potential panics, logging a swallowed error in enqueueRDsForNode, and using grep -F in the test harness scripts to avoid unexpected regex matching.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +2886 to +2890
func isLateAddWinner(dr *intent.DesiredResource) bool {
localID, ok := localNodeIDFromOpts(dr)
if !ok {
return false
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If dr is nil, calling localNodeIDFromOpts(dr) or accessing its properties will cause a nil pointer dereference panic. Adding a defensive nil check at the beginning of the function ensures robustness.

func isLateAddWinner(dr *intent.DesiredResource) bool {
	if dr == nil {
		return false
	}
	localID, ok := localNodeIDFromOpts(dr)
	if !ok {
		return false
	}

Comment on lines +1505 to +1507
func nodeHasDrainFlag(n *blockstoriov1alpha1.Node) bool {
for _, f := range n.Spec.Flags {
if f == apiv1.NodeFlagEvicted || f == apiv1.NodeFlagLost {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If n is nil, accessing n.Spec.Flags will cause a nil pointer dereference panic. Adding a defensive nil check at the beginning of the function ensures robustness.

func nodeHasDrainFlag(n *blockstoriov1alpha1.Node) bool {
	if n == nil {
		return false
	}
	for _, f := range n.Spec.Flags {
		if f == apiv1.NodeFlagEvicted || f == apiv1.NodeFlagLost {

Comment on lines +1531 to +1534
err := r.List(ctx, &rdList)
if err != nil {
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error returned by r.List is silently swallowed. In Kubernetes controllers, swallowing errors in event handlers or mappers makes debugging extremely difficult because there are no logs indicating why the event failed to trigger reconciliations. It is highly recommended to log the error using a logger (e.g., log.FromContext(ctx)) before returning nil.

Comment on lines +478 to +479
leftover=$(kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null \
| grep -c "$prefix" || true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using grep without the -F flag treats the $prefix as a regular expression. If the prefix contains regex metacharacters (such as .), it could lead to unexpected matches and false positives/negatives. Since $prefix is a literal string, using grep -F (or grep -Fc) is safer and more robust.

Suggested change
leftover=$(kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null \
| grep -c "$prefix" || true)
leftover=$(kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null \
| grep -Fc "$prefix" || true)

sleep 2
done
echo " INVARIANT FAIL: $leftover Resource CRD(s) for $prefix still present after ${settle_s}s" >&2
kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null | grep "$prefix" >&2 || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using grep without the -F flag treats the $prefix as a regular expression. Since $prefix is a literal string, using grep -F is safer and more robust to prevent unexpected regex matching.

Suggested change
kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null | grep "$prefix" >&2 || true
kubectl get resources.blockstor.cozystack.io -o name 2>/dev/null | grep -F "$prefix" >&2 || true

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/operator-harness/replay/r-full-lifecycle.yaml (1)

128-128: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Inconsistent --storage-pool argument form.

Line 128 still uses the split form (--storage-pool, {{sp}}), while the rest of this PR normalizes to the equals form (--storage-pool={{sp}}). For consistency with linstor-client 1.27.1 compatibility and the harness command runner, consider updating this line as well.

🔧 Proposed fix
-    cmd: ["resource", "toggle-disk", "--storage-pool", "{{sp}}", "{{node1}}", "{{rd}}"]
+    cmd: ["resource", "toggle-disk", "--storage-pool={{sp}}", "{{node1}}", "{{rd}}"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/r-full-lifecycle.yaml` at line 128, The cmd
entry for the "resource toggle-disk" step uses the split argument form
"--storage-pool", "{{sp}}" which is inconsistent with the rest of the PR; update
the cmd array element so the storage-pool argument uses the equals form (i.e.,
"--storage-pool={{sp}}") in the same "cmd" that invokes "resource toggle-disk"
to match the harness command runner and linstor-client 1.27.1 expectations.
🧹 Nitpick comments (1)
internal/controller/ensure_tiebreaker_test.go (1)

1510-1604: ⚡ Quick win

Add a predicate-level regression test for the Bug 386 watch.

These tests cover the helper and the fan-out mapper, but they still bypass nodeDrainFlagChanged(), which is the load-bearing gate for the restore path. A refactor that breaks the false→true / true→false update check would still pass here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/ensure_tiebreaker_test.go` around lines 1510 - 1604,
Tests currently cover NodeHasDrainFlag and EnqueueRDsForNode but omit exercising
the actual predicate nodeDrainFlagChanged (the UpdateEvent gate); add unit tests
that construct old/new *blockstoriov1alpha1.Node objects and call
nodeDrainFlagChanged().Update with events that: 1) change from no drain flags to
spec/ status EVICTED/LOST (expect true), 2) change from drained to no flags
(expect true), and 3) make unrelated metadata changes or unrelated flag changes
(expect false); place these cases in
internal/controller/ensure_tiebreaker_test.go alongside the existing tests and
reference nodeDrainFlagChanged and the controllerpkg.NodeHasDrainFlag helper to
build expected outcomes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/cli-matrix/bug-384-late-vd-thin-converges.sh`:
- Around line 59-64: Replace the bare trap usage with register_strict_cleanup:
remove or stop using "trap cleanup EXIT" and instead call
register_strict_cleanup cleanup (imported from tests/e2e/lib.sh) so the existing
cleanup() function (which calls delete_rd, assert_no_orphans and
linstor_cli_teardown) is executed via register_strict_cleanup to ensure strict
teardown and cluster verification.

In `@tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh`:
- Around line 70-80: Replace the bare trap registration with the repository’s
strict cleanup helper: remove or stop using the direct "trap cleanup EXIT"
registration and instead call register_strict_cleanup to register the existing
cleanup() function (which restores "$EVICTED_NODE", calls delete_rd "$RD",
assert_no_orphans "$RD", and linstor_cli_teardown). Locate the cleanup()
definition and the trailing "trap cleanup EXIT" and replace that trap invocation
with a call to register_strict_cleanup so the test uses the shared strict
cleanup and cluster-state verification from tests/e2e/lib.sh.
- Around line 171-191: The test currently only checks .spec.flags for DISKFUL_A
and DISKFUL_B and can miss that a surviving diskful degraded its diskState to
Inconsistent/Unknown; update the loop that iterates over DISKFUL_A and DISKFUL_B
(the block referencing variables DISKFUL_A, DISKFUL_B and RD) to also query and
assert the replica's runtime diskState from the resource status (e.g., the
status field returned by kubectl for the resource) and fail if diskState is not
the expected healthy state (e.g., "UpToDate" or "Consistent"); keep the existing
flag checks but add the diskState check immediately after fetching flags so the
test fails when a surviving diskful has an unacceptable diskState.

In `@tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh`:
- Around line 177-188: The test currently only checks n_tb == 1 (using
linstor_r_l_json and variable wire) but must also verify that exactly two
non-TIE_BREAKER replicas are present and healthy after n rst; update the
assertion to parse wire (output of linstor_r_l_json "$RD") to count entries
where (.rsc_flags // []) does NOT contain "TIE_BREAKER" and whose status/state
indicates a healthy diskful replica (e.g., .state == "UpToDate" or the CLI field
representing healthy), assert that count equals 2, and fail if not; keep the
original TIE_BREAKER check but add this second check using the same wire/RD
context so the test ensures one tie-breaker plus two healthy non-TIE_BREAKER
replicas.
- Around line 64-72: The cleanup() trap currently always restores node "$N3"
which is incorrect when the evacuated witness may be a different node; modify
cleanup() to record the actual evacuated node (e.g., capture it to a variable
like EVACUATED_NODE when you run the evacuate or resource placement) and then
restore "${EVACUATED_NODE}" instead of "$N3", keeping the existing calls to
delete_rd "$RD", assert_no_orphans "$RD", and linstor_cli_teardown; replace the
bare trap cleanup EXIT with a call to register_strict_cleanup cleanup so the
test harness performs strict teardown and cluster verification (reference
functions/variables: cleanup, LCTL, N3, RD, delete_rd, assert_no_orphans,
linstor_cli_teardown, register_strict_cleanup).

In `@tests/e2e/cli-matrix/r-d-inactive-no-tiebreaker.sh`:
- Around line 62-70: Replace the bare trap call with the shared strict cleanup
registration: move the cleanup() function as-is but remove the trailing "trap
cleanup EXIT" and call register_strict_cleanup cleanup instead (ensure
register_strict_cleanup from tests/e2e/lib.sh is imported/available); this
ensures cleanup()—which calls "${LCTL[@]} resource activate", delete_rd,
assert_no_orphans and linstor_cli_teardown—runs via the standardized strict
cleanup path.
- Around line 161-184: Add an assertion that the surviving active replica ${N3}
is actually UpToDate: after computing n3_flags and before exiting success, fetch
the replica status (e.g. with kubectl get
"resources.blockstor.cozystack.io/${RD}.${N3}" -o jsonpath for the status field
you use for DRBD status) into a variable like n3_status and fail the test if
n3_status does not include or equal "UpToDate". Reference the existing symbols
RD, N3 and n3_flags when adding this check so it verifies status convergence in
addition to flag checks.

In `@tests/operator-harness/lib.sh`:
- Around line 268-280: The new tiebreaker_present check violates the replay
await.kind contract by adding an unsupported kind; remove/replace the
tiebreaker_present handler and instead implement its logic under an allowed
await.kind (e.g., reuse or extend the existing no_tiebreaker flow) so replay
YAMLs remain schema-compliant, or map any incoming await.kind value to existing
checks before use; do not change the schema/validation here — if a new kind is
truly required, add the schema/validation update in a separate PR.

In `@tests/operator-harness/replay/n-rst-recreates-tiebreaker.yaml`:
- Around line 33-68: The replay assumes node3 is the witness but auto-place-2
lets the controller pick diskfuls; update the scenario so the witness is
guaranteed before evacuating: either replace the "auto-place-2" step with an
explicit resource create that pins diskful replicas to {{node1}} and {{node2}}
(matching the Bug 385 replay), or insert a pre-evacuate verification step (e.g.,
after "wait-tiebreaker-spawned") that runs "resource list" / a check to assert
the tiebreaker/witness is on {{node3}} and fail if not; then keep the
"evacuate-node3", "restore-node3", and "tiebreaker-recreated" steps unchanged.

In `@tests/operator-harness/replay/r-d-inactive-no-tiebreaker.yaml`:
- Around line 60-75: The replay is missing a post-failure status-convergence
assertion for the surviving active replica: after the existing steps
no-tiebreaker-spawned and node2-stays-absent (which use the resource_absent
await and rd/node variables), add a new step that runs the same resource check
and awaits a status-convergence/UpToDate condition for rd on node "{{node3}}"
(use the same rd and node3 symbols), e.g. an await kind like resource_status or
status_converged with rd: "{{rd}}", node: "{{node3}}", and timeout_s: 30; also
ensure the e2e CLI matrix tests under tests/e2e/cli-matrix/** mirror this
assertion to verify real LINSTOR/DRBD status convergence.
- Around line 33-40: The replay is missing a post-delete convergence assertion:
after the existing resource_absent step that removes {{node2}} under the
no_tiebreaker scenario, add an await.kind: all_uptodate check for the resource
{{rd}} to ensure surviving diskful replica(s) (e.g., {{node3}}) return to
UpToDate; locate the delete step named resource_absent and insert an await block
with kind: all_uptodate targeting {{rd}} immediately after it so the test
verifies convergence.

In `@tests/operator-harness/replay/vd-late-create-second-volume.yaml`:
- Around line 48-53: The test commands r-c-node1 and r-c-node2 use the
two-element flag form for the storage pool argument; update each cmd array to
use the combined equals form so the harness command runner accepts it (replace
["--storage-pool", "{{sp}}"] with ["--storage-pool={{sp}}"] in the command
entries for r-c-node1 and r-c-node2).

---

Outside diff comments:
In `@tests/operator-harness/replay/r-full-lifecycle.yaml`:
- Line 128: The cmd entry for the "resource toggle-disk" step uses the split
argument form "--storage-pool", "{{sp}}" which is inconsistent with the rest of
the PR; update the cmd array element so the storage-pool argument uses the
equals form (i.e., "--storage-pool={{sp}}") in the same "cmd" that invokes
"resource toggle-disk" to match the harness command runner and linstor-client
1.27.1 expectations.

---

Nitpick comments:
In `@internal/controller/ensure_tiebreaker_test.go`:
- Around line 1510-1604: Tests currently cover NodeHasDrainFlag and
EnqueueRDsForNode but omit exercising the actual predicate nodeDrainFlagChanged
(the UpdateEvent gate); add unit tests that construct old/new
*blockstoriov1alpha1.Node objects and call nodeDrainFlagChanged().Update with
events that: 1) change from no drain flags to spec/ status EVICTED/LOST (expect
true), 2) change from drained to no flags (expect true), and 3) make unrelated
metadata changes or unrelated flag changes (expect false); place these cases in
internal/controller/ensure_tiebreaker_test.go alongside the existing tests and
reference nodeDrainFlagChanged and the controllerpkg.NodeHasDrainFlag helper to
build expected outcomes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e23c93c-6d63-4dba-aef6-ce070bc48e50

📥 Commits

Reviewing files that changed from the base of the PR and between 900c709 and 787c279.

📒 Files selected for processing (24)
  • internal/controller/ensure_tiebreaker_evict_bug_385_test.go
  • internal/controller/ensure_tiebreaker_inactive_bug_387_test.go
  • internal/controller/ensure_tiebreaker_test.go
  • internal/controller/export_test.go
  • internal/controller/resourcedefinition_controller.go
  • pkg/satellite/reconciler.go
  • pkg/satellite/reconciler_drbd_test.go
  • tests/e2e/cli-matrix/bug-384-late-vd-thin-converges.sh
  • tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh
  • tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh
  • tests/e2e/cli-matrix/r-d-inactive-no-tiebreaker.sh
  • tests/operator-harness/lib.sh
  • tests/operator-harness/replay-runner.sh
  • tests/operator-harness/replay/auto-place-no-drbd.yaml
  • tests/operator-harness/replay/autoplace-3r.yaml
  • tests/operator-harness/replay/luks-encrypted-rd.yaml
  • tests/operator-harness/replay/n-evict-tiebreaker-no-shuffle.yaml
  • tests/operator-harness/replay/n-rst-recreates-tiebreaker.yaml
  • tests/operator-harness/replay/pvc-lifecycle.yaml
  • tests/operator-harness/replay/r-d-inactive-no-tiebreaker.yaml
  • tests/operator-harness/replay/r-full-lifecycle.yaml
  • tests/operator-harness/replay/r-l-conns-shapes.yaml
  • tests/operator-harness/replay/vd-late-create-second-volume.yaml
  • tests/operator-harness/replay/vd-resize-full-lifecycle.yaml

Comment on lines +59 to +64
cleanup() {
delete_rd "$RD"
assert_no_orphans "$RD"
linstor_cli_teardown
}
trap cleanup EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use register_strict_cleanup instead of bare trap directive.

The script registers cleanup using a bare trap cleanup EXIT, but should use register_strict_cleanup from lib.sh to ensure proper resource cleanup and cluster state verification. Based on learnings, e2e test scenarios should register cleanup via register_strict_cleanup from tests/e2e/lib.sh instead of bare trap directives to ensure proper resource cleanup and cluster state verification.

♻️ Proposed fix
-cleanup() {
-    delete_rd "$RD"
-    assert_no_orphans "$RD"
-    linstor_cli_teardown
-}
-trap cleanup EXIT
+register_strict_cleanup delete_rd "$RD"
+register_strict_cleanup assert_no_orphans "$RD"
+register_strict_cleanup linstor_cli_teardown
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cleanup() {
delete_rd "$RD"
assert_no_orphans "$RD"
linstor_cli_teardown
}
trap cleanup EXIT
register_strict_cleanup delete_rd "$RD"
register_strict_cleanup assert_no_orphans "$RD"
register_strict_cleanup linstor_cli_teardown
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/bug-384-late-vd-thin-converges.sh` around lines 59 - 64,
Replace the bare trap usage with register_strict_cleanup: remove or stop using
"trap cleanup EXIT" and instead call register_strict_cleanup cleanup (imported
from tests/e2e/lib.sh) so the existing cleanup() function (which calls
delete_rd, assert_no_orphans and linstor_cli_teardown) is executed via
register_strict_cleanup to ensure strict teardown and cluster verification.

Comment on lines +70 to +80
cleanup() {
# Restore any node we evicted so the shared stand is left healthy
# for the next cell, BEFORE deleting the RD.
if [[ -n "$EVICTED_NODE" ]]; then
"${LCTL[@]}" node restore "$EVICTED_NODE" >/dev/null 2>&1 || true
fi
delete_rd "$RD"
assert_no_orphans "$RD"
linstor_cli_teardown
}
trap cleanup EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use strict cleanup registration here.

This scenario still installs a bare trap cleanup EXIT. The repo contract for tests/e2e/*.sh requires register_strict_cleanup, which is what enforces the shared cleanup and cluster-state verification path for these stand tests.

Based on learnings, Applies to tests/e2e/*.sh: Register cleanup for e2e test scenarios via register_strict_cleanup from tests/e2e/lib.sh instead of bare trap directives to ensure proper resource cleanup and cluster state verification.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh` around lines 70 - 80,
Replace the bare trap registration with the repository’s strict cleanup helper:
remove or stop using the direct "trap cleanup EXIT" registration and instead
call register_strict_cleanup to register the existing cleanup() function (which
restores "$EVICTED_NODE", calls delete_rd "$RD", assert_no_orphans "$RD", and
linstor_cli_teardown). Locate the cleanup() definition and the trailing "trap
cleanup EXIT" and replace that trap invocation with a call to
register_strict_cleanup so the test uses the shared strict cleanup and
cluster-state verification from tests/e2e/lib.sh.

Comment on lines +171 to +191
# The load-bearing invariant: NEITHER healthy diskful was demoted to
# TIE_BREAKER (the exact wrong behaviour from the repro where worker-2
# went UpToDate → TieBreaker).
echo ">> assert neither healthy diskful ($DISKFUL_A, $DISKFUL_B) was demoted to TIE_BREAKER"
for n in "$DISKFUL_A" "$DISKFUL_B"; do
flags=$(kubectl get "resources.blockstor.cozystack.io/${RD}.${n}" \
-o jsonpath='{.spec.flags}' 2>/dev/null || echo "__absent__")
if [[ "$flags" == "__absent__" ]]; then
echo "FAIL (Bug 385): healthy diskful on $n disappeared after evict; flags=$flags" >&2
exit 1
fi
if [[ "$flags" == *"TIE_BREAKER"* ]]; then
echo "FAIL (Bug 385 regression): healthy diskful on $n was demoted to TIE_BREAKER; flags=$flags" >&2
"${LCTL[@]}" resource list --resources "$RD" 2>&1 | tail -20 >&2
exit 1
fi
if [[ "$flags" == *"DISKLESS"* ]]; then
echo "FAIL (Bug 385): healthy diskful on $n gained DISKLESS after evict; flags=$flags" >&2
exit 1
fi
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert diskState on the surviving diskful replicas after the evict.

These checks only look at flags, so the test still passes if either original diskful stays non-TIE_BREAKER but drops to Inconsistent or Unknown. That weakens the regression signal for the “healthy replicas stay unchanged” guarantee.

Suggested change
 for n in "$DISKFUL_A" "$DISKFUL_B"; do
     flags=$(kubectl get "resources.blockstor.cozystack.io/${RD}.${n}" \
         -o jsonpath='{.spec.flags}' 2>/dev/null || echo "__absent__")
@@
     if [[ "$flags" == *"DISKLESS"* ]]; then
         echo "FAIL (Bug 385): healthy diskful on $n gained DISKLESS after evict; flags=$flags" >&2
         exit 1
     fi
+    d=$(status_disk_state "$RD" "$n" 0)
+    if [[ "$d" != "UpToDate" ]]; then
+        echo "FAIL (Bug 385): healthy diskful on $n is no longer UpToDate after evict; diskState=$d" >&2
+        exit 1
+    fi
 done

Based on learnings, Applies to tests/e2e/cli-matrix/**: L6 operator-CLI e2e tests must test real linstor CLI → REST → satellite → DRBD kernel interactions and assert Status convergence under tests/e2e/cli-matrix/.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/n-evict-tiebreaker-no-shuffle.sh` around lines 171 -
191, The test currently only checks .spec.flags for DISKFUL_A and DISKFUL_B and
can miss that a surviving diskful degraded its diskState to
Inconsistent/Unknown; update the loop that iterates over DISKFUL_A and DISKFUL_B
(the block referencing variables DISKFUL_A, DISKFUL_B and RD) to also query and
assert the replica's runtime diskState from the resource status (e.g., the
status field returned by kubectl for the resource) and fail if diskState is not
the expected healthy state (e.g., "UpToDate" or "Consistent"); keep the existing
flag checks but add the diskState check immediately after fetching flags so the
test fails when a surviving diskful has an unacceptable diskState.

Comment on lines +64 to +72
cleanup() {
# Best-effort restore in case the test bailed mid-evacuate, so the
# shared stand isn't left with an EVICTED node.
"${LCTL[@]}" node restore "$N3" >/dev/null 2>&1 || true
delete_rd "$RD"
assert_no_orphans "$RD"
linstor_cli_teardown
}
trap cleanup EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the actual evacuated node in cleanup.

resource create --auto-place=2 does not pin the witness to $N3, but this trap always restores $N3. If the script aborts before the explicit node restore, the real evacuated witness node can stay drained and poison the shared stand for later cells. Please track the evacuated node dynamically in cleanup, and register that cleanup with register_strict_cleanup instead of a bare trap.

Suggested change
 RD=cli-matrix-386
 N1=$WORKER_1
 N2=$WORKER_2
 N3=$WORKER_3
+EVICTED_NODE=""
 
 cleanup() {
     # Best-effort restore in case the test bailed mid-evacuate, so the
     # shared stand isn't left with an EVICTED node.
-    "${LCTL[@]}" node restore "$N3" >/dev/null 2>&1 || true
+    if [[ -n "$EVICTED_NODE" ]]; then
+        "${LCTL[@]}" node restore "$EVICTED_NODE" >/dev/null 2>&1 || true
+    fi
     delete_rd "$RD"
     assert_no_orphans "$RD"
     linstor_cli_teardown
 }
-trap cleanup EXIT

Based on learnings, Applies to tests/e2e/*.sh: Register cleanup for e2e test scenarios via register_strict_cleanup from tests/e2e/lib.sh instead of bare trap directives to ensure proper resource cleanup and cluster state verification.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh` around lines 64 - 72, The
cleanup() trap currently always restores node "$N3" which is incorrect when the
evacuated witness may be a different node; modify cleanup() to record the actual
evacuated node (e.g., capture it to a variable like EVACUATED_NODE when you run
the evacuate or resource placement) and then restore "${EVACUATED_NODE}" instead
of "$N3", keeping the existing calls to delete_rd "$RD", assert_no_orphans
"$RD", and linstor_cli_teardown; replace the bare trap cleanup EXIT with a call
to register_strict_cleanup cleanup so the test harness performs strict teardown
and cluster verification (reference functions/variables: cleanup, LCTL, N3, RD,
delete_rd, assert_no_orphans, linstor_cli_teardown, register_strict_cleanup).

Comment on lines +177 to +188
# Belt-and-suspenders: the operator-visible `linstor r l` surface the
# bug report cited must show exactly one TIE_BREAKER row plus the two
# diskful replicas.
wire=$(linstor_r_l_json "$RD")
n_tb=$(printf '%s' "$wire" \
| jq -r '.[][] | select((.rsc_flags // []) | index("TIE_BREAKER")) | .name' 2>/dev/null \
| wc -l | tr -d ' ' || echo 0)
if [[ "$n_tb" != "1" ]]; then
echo "FAIL (Bug 386): linstor r l shows $n_tb TIE_BREAKER rows for $RD, want 1" >&2
printf '%s\n' "$wire" | jq '.[][]| {name, node_name, flags: .rsc_flags}' 2>/dev/null >&2 || true
exit 1
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify the full post-restore shape, not just “some TB exists”.

n_tb == 1 still passes if a healthy diskful is demoted to TieBreaker, or if one of the diskful replicas never comes back to UpToDate. This regression should also assert that two non-TieBreaker replicas are healthy after n rst.

Based on learnings, Applies to tests/e2e/cli-matrix/**: L6 operator-CLI e2e tests must test real linstor CLI → REST → satellite → DRBD kernel interactions and assert Status convergence under tests/e2e/cli-matrix/.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/cli-matrix/n-rst-recreates-tiebreaker.sh` around lines 177 - 188,
The test currently only checks n_tb == 1 (using linstor_r_l_json and variable
wire) but must also verify that exactly two non-TIE_BREAKER replicas are present
and healthy after n rst; update the assertion to parse wire (output of
linstor_r_l_json "$RD") to count entries where (.rsc_flags // []) does NOT
contain "TIE_BREAKER" and whose status/state indicates a healthy diskful replica
(e.g., .state == "UpToDate" or the CLI field representing healthy), assert that
count equals 2, and fail if not; keep the original TIE_BREAKER check but add
this second check using the same wire/RD context so the test ensures one
tie-breaker plus two healthy non-TIE_BREAKER replicas.

Comment on lines +268 to +280
tiebreaker_present)
# Bug 386: assert a TieBreaker witness EXISTS for the rd.
# The inverse of no_tiebreaker — used by the node-restore
# catcher to confirm the witness is RE-created after the
# drained node is brought back with `n rst`.
local rd present
rd=$(substitute "$(python3 -c "import json,sys; print(json.loads(sys.argv[1]).get('rd',''))" "$spec")")
# Case-SENSITIVE on purpose — see the no_tiebreaker note above: a
# case-insensitive grep would match the lowercase "tiebreaker"
# substring in the RD name and report a witness present on every
# row, masking a genuinely missing TieBreaker.
present=$(linstor_cli resource list --resources "$rd" 2>/dev/null | grep -c 'TieBreaker' || true)
[[ "$present" -ge 1 ]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

tiebreaker_present extends the replay schema beyond the allowed await.kind contract.

The current replay contract only permits replica_count, disk_state, all_uptodate, replica_diskless, no_tiebreaker, sync_clean, resource_absent, rd_absent, vd_size_kib, pvc_capacity, and pod_md5_invariant. Introducing a new kind here makes the new replay YAMLs non-compliant until the schema/validation contract is updated in the same change.

As per coding guidelines, Replay YAML await.kind must use one of: replica_count, disk_state, all_uptodate, replica_diskless, no_tiebreaker, sync_clean, resource_absent, rd_absent, vd_size_kib, pvc_capacity, pod_md5_invariant.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/lib.sh` around lines 268 - 280, The new
tiebreaker_present check violates the replay await.kind contract by adding an
unsupported kind; remove/replace the tiebreaker_present handler and instead
implement its logic under an allowed await.kind (e.g., reuse or extend the
existing no_tiebreaker flow) so replay YAMLs remain schema-compliant, or map any
incoming await.kind value to existing checks before use; do not change the
schema/validation here — if a new kind is truly required, add the
schema/validation update in a separate PR.

Comment on lines +33 to +68
- name: auto-place-2
# 2 diskful + auto-tiebreaker → the canonical 2r-tb shape.
cmd: ["resource", "create", "--auto-place", "2", "--storage-pool={{sp}}", "{{rd}}"]
expect_exit: 0
await:
kind: all_uptodate
rd: "{{rd}}"
timeout_s: 240
- name: wait-tiebreaker-spawned
cmd: ["resource", "list", "--resources", "{{rd}}"]
expect_exit: 0
await:
kind: tiebreaker_present
rd: "{{rd}}"
timeout_s: 60
- name: evacuate-node3
# Drain the third node. With 2 diskful already placed and no spare
# worker in a 3-node cluster, EVICTED collapses the witness.
cmd: ["node", "evacuate", "{{node3}}"]
expect_exit: 0
await:
kind: no_tiebreaker
rd: "{{rd}}"
timeout_s: 60
- name: restore-node3
# Node recovered. Clearing EVICTED must re-run the tiebreaker
# invariant (Bug 386 fix: Node watch on the RD reconciler).
cmd: ["node", "restore", "{{node3}}"]
expect_exit: 0
- name: tiebreaker-recreated
cmd: ["resource", "list", "--resources", "{{rd}}"]
expect_exit: 0
await:
kind: tiebreaker_present
rd: "{{rd}}"
timeout_s: 60
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This replay assumes node3 is the witness, but the setup does not guarantee that.

auto-place-2 lets the controller choose the diskful pair, yet the later steps unconditionally evacuate and restore {{node3}}. If the witness lands elsewhere, this replay exercises a different topology and can fail or pass for the wrong reason. Create diskful replicas explicitly on {{node1}}/{{node2}} like the Bug 385 replay, or add a step that proves the witness is on {{node3}} before evacuating it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/n-rst-recreates-tiebreaker.yaml` around lines
33 - 68, The replay assumes node3 is the witness but auto-place-2 lets the
controller pick diskfuls; update the scenario so the witness is guaranteed
before evacuating: either replace the "auto-place-2" step with an explicit
resource create that pins diskful replicas to {{node1}} and {{node2}} (matching
the Bug 385 replay), or insert a pre-evacuate verification step (e.g., after
"wait-tiebreaker-spawned") that runs "resource list" / a check to assert the
tiebreaker/witness is on {{node3}} and fail if not; then keep the
"evacuate-node3", "restore-node3", and "tiebreaker-recreated" steps unchanged.

Comment on lines +33 to +40
- name: r-c-node1
cmd: ["resource", "create", "{{node1}}", "{{rd}}", "--storage-pool", "{{sp}}"]
expect_exit: 0
- name: r-c-node2
cmd: ["resource", "create", "{{node2}}", "{{rd}}", "--storage-pool", "{{sp}}"]
expect_exit: 0
- name: r-c-node3
cmd: ["resource", "create", "{{node3}}", "{{rd}}", "--storage-pool", "{{sp}}"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a post-delete UpToDate convergence check for the surviving replica

This replay asserts no_tiebreaker and that {{node2}} is absent, but it never verifies that the remaining active diskful replica(s) (e.g., {{node3}}) return to UpToDate after the resource_absent delete; add an await.kind: all_uptodate assertion for {{rd}} after the deletion step.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/r-d-inactive-no-tiebreaker.yaml` around lines
33 - 40, The replay is missing a post-delete convergence assertion: after the
existing resource_absent step that removes {{node2}} under the no_tiebreaker
scenario, add an await.kind: all_uptodate check for the resource {{rd}} to
ensure surviving diskful replica(s) (e.g., {{node3}}) return to UpToDate; locate
the delete step named resource_absent and insert an await block with kind:
all_uptodate targeting {{rd}} immediately after it so the test verifies
convergence.

Comment on lines +60 to +75
- name: no-tiebreaker-spawned
cmd: ["resource", "list", "--resources", "{{rd}}"]
expect_exit: 0
await:
kind: no_tiebreaker
rd: "{{rd}}"
timeout_s: 30
- name: node2-stays-absent
cmd: ["resource", "list", "--resources", "{{rd}}"]
expect_exit: 0
await:
kind: resource_absent
rd: "{{rd}}"
node: "{{node2}}"
timeout_s: 30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a status-convergence check for the surviving active replica.

After resource_absent and no_tiebreaker, this replay still never proves the remaining active diskful settles back to UpToDate. A degraded {{node3}} would still look green here.

Based on learnings, Applies to tests/e2e/cli-matrix/**: L6 operator-CLI e2e tests must test real linstor CLI → REST → satellite → DRBD kernel interactions and assert Status convergence under tests/e2e/cli-matrix/.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/r-d-inactive-no-tiebreaker.yaml` around lines
60 - 75, The replay is missing a post-failure status-convergence assertion for
the surviving active replica: after the existing steps no-tiebreaker-spawned and
node2-stays-absent (which use the resource_absent await and rd/node variables),
add a new step that runs the same resource check and awaits a
status-convergence/UpToDate condition for rd on node "{{node3}}" (use the same
rd and node3 symbols), e.g. an await kind like resource_status or
status_converged with rd: "{{rd}}", node: "{{node3}}", and timeout_s: 30; also
ensure the e2e CLI matrix tests under tests/e2e/cli-matrix/** mirror this
assertion to verify real LINSTOR/DRBD status convergence.

Comment on lines +48 to +53
- name: r-c-node1
cmd: ["resource", "create", "{{node1}}", "{{rd}}", "--storage-pool", "{{sp}}"]
expect_exit: 0
- name: r-c-node2
cmd: ["resource", "create", "{{node2}}", "{{rd}}", "--storage-pool", "{{sp}}"]
expect_exit: 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use combined equals form for --storage-pool argument.

The storage-pool arguments use the old two-element format ["--storage-pool", "{{sp}}"], but should use the combined equals form ["--storage-pool={{sp}}"] to match the harness command runner's expected format.

♻️ Proposed fix
   - name: r-c-node1
-    cmd: ["resource", "create", "{{node1}}", "{{rd}}", "--storage-pool", "{{sp}}"]
+    cmd: ["resource", "create", "{{node1}}", "{{rd}}", "--storage-pool={{sp}}"]
     expect_exit: 0
   - name: r-c-node2
-    cmd: ["resource", "create", "{{node2}}", "{{rd}}", "--storage-pool", "{{sp}}"]
+    cmd: ["resource", "create", "{{node2}}", "{{rd}}", "--storage-pool={{sp}}"]
     expect_exit: 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/operator-harness/replay/vd-late-create-second-volume.yaml` around lines
48 - 53, The test commands r-c-node1 and r-c-node2 use the two-element flag form
for the storage pool argument; update each cmd array to use the combined equals
form so the harness command runner accepts it (replace ["--storage-pool",
"{{sp}}"] with ["--storage-pool={{sp}}"] in the command entries for r-c-node1
and r-c-node2).

@kvaps Andrei Kvapil (kvaps) merged commit a889f7e into main Jun 3, 2026
14 of 16 checks passed
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 3, 2026
…ion flakes (Bug 392) (#84)

* test(e2e): harden recovery-deleting-convert + state-standalone-partition flakes (Bug 392)

Two e2e scenarios flaked under CI load on PR #83 (both confirmed on a live
stand to be assertion artifacts, not product regressions):

recovery-deleting-convert: assert_uptodate_12 read only the CRD .status
projection, which lags for a few seconds while the satellite runs
`drbdadm adjust` after toggling the stuck-DELETING replica to diskless
(the re-render transiently bounces the N1<->N2 link). N2's local disk
stays UpToDate throughout. Add the same kernel-ground-truth accept-path
(kernel_pair_uptodate) PR #50 already gave the Phase-1 wait, so a stale
projection read no longer fails the assertion.

state-standalone-partition: the historical marker drift was root-caused
to a host-side loopfile storage artifact (already fixed by the ZFS move +
direct-io reassert); the failure log only printed two md5s, which cannot
distinguish a real wrong-resync from that artifact.

Both failure paths now also dump drbdadm role / get-gi + drbdsetup status
--verbose on both nodes, so the next CI hit carries the generation-
identifier evidence needed to tell a real regression from projection lag.
No production code change.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(e2e): retry marker read-back on transient drift in state-standalone-partition (Bug 392)

The marker round-trip flaked under CI load on the QEMU/zvol stand: the
read-back md5 differed on the first read even though the GI dump (added
above) proves DRBD is fully consistent — both peers UpToDate + Established,
near-identical current-UUIDs, no resync. The divergence is a host read-path
artifact (a stale guest page served despite iflag=direct on the emulated
disk under load — the documented loopfile/zvol residue class), not
replicated data loss.

Drop the guest cache and re-read up to 4 times: a transient substrate
artifact clears on a fresh read from the backing store; genuine corruption
or a real wrong-resync PERSISTS across every re-read, so the assertion still
fails loudly on sustained drift and dumps the GI/kernel evidence to prove it.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(e2e): confirm marker drift via non-O_DIRECT ground truth before failing (Bug 392)

The drop_caches+direct retry was the wrong remedy: under CI load the marker
read-back returns a DIFFERENT md5 on every iflag=direct read (4 distinct
hashes) while DRBD is fully consistent, the peer's replicated backing zvol is
byte-identical to the written marker, and ZFS reports zero checksum errors.
A dedicated load investigation pinned it to the CI runner's nested-QEMU
virtio-blk O_DIRECT path returning stale/garbage pages under memory+I/O
pressure (O_DIRECT bypasses the page cache and depends on hypervisor DMA);
the replicated data is intact, only that read path lies, and it does not
reproduce on the lighter dev-stand virtualization (0/22 + 0 under loadavg 17).

Replace the futile re-read loop with a ground-truth cross-check that avoids
O_DIRECT entirely: on a direct-read mismatch, verify via a buffered read on
the Primary (page-cache path, proven deterministic) and the peer's replicated
backing device read directly. PASS iff either matches the written marker;
FAIL only when the non-O_DIRECT path also mismatches (real corruption /
wrong-resync), with GI + kernel-status evidence dumped. This de-flakes the
recurring CI blocker without weakening real data-loss detection.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant