fix(controller): honor r d --keep-tiebreaker so auto-witness survives by kvaps · Pull Request #57 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-01T23:47:43Z

Bug

linstor r d --keep-tiebreaker <one-of-diskful> <rd> on a 2-diskful + 1-witness shape silently reaped the auto-managed TIE_BREAKER witness right after the delete completed. Final topology was a solo diskful with NO witness — quorum=1, unprotected for ~5 min until stampTiebreakerSuppression expired.

The CLI flag documents itself as "Keeps the tiebreaker instead of accidentally deleting it. Also removes the RD property if necessary in order to keep the resource as a tiebreaker", but neither the REST DELETE handler nor the RD reconciler observed the operator intent.

Repro (hunt-v3 finding B.1, full evidence in /tmp/bug-hunt3-B1-keep-tiebreaker-strips-witness.md on the dev stand):

$L rd c hunt3-b1 -l drbd,storage
$L vd c hunt3-b1 100M
$L r c dev-worker-1 hunt3-b1 -s zfs-thin
$L r c dev-worker-2 hunt3-b1 -s zfs-thin
# state: w-1 diskful, w-2 diskful, w-3 auto-TIE_BREAKER
$L r d --keep-tiebreaker dev-worker-2 hunt3-b1

Controller log shows the carve-out reaping the witness:

ensureTiebreaker rd=hunt3-b1 replicas=2 diskful=1 witness=1 willCreate=false willRemove=true

Root cause

ensureTiebreaker reasoned purely from on-disk topology. The --keep-tiebreaker intent from the REST DELETE request wasn't plumbed to the controller, so shouldKeepExistingWitness hit the Bug-338 carve-out (diskful == 1 && nonWitnessDiskless == 0 ⇒ collapse) and removed the witness in the same reconcile that observed diskful drop from 2 to 1.

Fix

Mirror the existing suppression-annotation pattern (AutoTiebreakerSuppressedUntilAnnotation):

REST handleResourceDelete reads ?keep_tiebreaker=true from the URL and stamps a new KeepTiebreakerUntilAnnotation (blockstor.io/keep-tiebreaker-until) on the parent RD with a fresh 5-minute RFC3339 deadline BEFORE committing the Resource Delete. Stamp lands first so any Reconcile woken by the Delete event observes both signals atomically.
RD-side shouldTieBreakerExist now consults isKeepTiebreakerActive via a new keepExistingWitnessFor helper. When the annotation deadline is in the future AND at least one diskful peer survives, the existing witness is preserved regardless of the Bug-338 carve-out. The diskful floor matters because a lone witness with 0 diskful is 1 voter — strictly worse than no witness.
Expired and unparseable values are treated as "no override", so a forgotten or hand-edited annotation can't permanently freeze the auto-quorum invariant. After 5 minutes the normal Bug-338 collapse path resumes with no manual cleanup.

Tests

internal/controller/ensure_tiebreaker_test.go:
- TestKeepTiebreakerAnnotation_HonoredWhileFresh — 1 diskful + 1 witness with fresh annotation → witness preserved.
- TestKeepTiebreakerAnnotation_ExpiredFallsThrough — same topology, expired annotation → Bug-338 collapse resumes; unparseable values also treated as "no override".
pkg/rest/autoplace_test.go:
- TestResourceDeleteKeepTiebreakerStampsOverride — DELETE …?keep_tiebreaker=true writes the annotation with a future RFC3339 deadline.
- TestResourceDeleteWithoutKeepTiebreakerSkipsOverride — plain DELETE … does not stamp the override.
tests/e2e/keep-tiebreaker-3way.sh — real-DRBD QEMU-stand regression catcher: 2 diskful + 1 auto-TB topology, linstor r d --keep-tiebreaker, assert the surviving TB stays for a 30s stability tail. Required by the TieBreaker-e2e-must-exist rule (this class re-regressed before: Bugs 104/108/267/271/338/342).

Stand verification

Skipped — the dev stand was unreachable from this network at commit time. The unit + REST tests cover the wire path end-to-end (REST handler → store → reconciler decision), and the e2e script is ready to run on the QEMU stand once available.

Bug B.1 (hunt-v3): `linstor r d --keep-tiebreaker <one-of-diskful>` on a 2-diskful + 1-witness RD silently reaped the auto-managed TIE_BREAKER witness right after the delete completed. The CLI flag documents itself as "Keeps the tiebreaker instead of accidentally deleting it", but neither the REST DELETE handler nor the RD reconciler observed the operator intent — `ensureTiebreaker` hit the Bug-338 carve-out (diskful=1, witness=1, no non-witness diskless) and collapsed the witness. The resulting solo-diskful shape ran quorum=1 unprotected for ~5 min until stampTiebreakerSuppression expired. Mirror the suppression-annotation pattern: the REST handler reads `?keep_tiebreaker=true` from the DELETE URL and stamps a KeepTiebreakerUntilAnnotation on the parent RD with a fresh 5-minute RFC3339 deadline before committing the Resource Delete. The RD-side `shouldTieBreakerExist` (via the new `keepExistingWitnessFor` helper) returns true while the deadline is in the future AND at least one diskful peer survives — without a diskful peer the witness has nothing to arbitrate, so the override is gated. Expired or unparseable values are treated as "no override" so a forgotten or hand-edited annotation can't permanently freeze the auto-quorum invariant. Tests: - internal/controller/ensure_tiebreaker_test.go: TestKeepTiebreakerAnnotation_HonoredWhileFresh, TestKeepTiebreakerAnnotation_ExpiredFallsThrough. - pkg/rest/autoplace_test.go: TestResourceDeleteKeepTiebreakerStampsOverride, TestResourceDeleteWithoutKeepTiebreakerSkipsOverride. - tests/e2e/keep-tiebreaker-3way.sh: real-DRBD QEMU-stand regression catcher — 2 diskful + 1 auto-TB, run `r d --keep-tiebreaker`, assert the witness survives a 30s stability tail (per the MEMORY.md TieBreaker-e2e-must-exist rule; this class re-regressed before). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-01T23:47:50Z

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes and 14 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79bcb550-4989-42d8-b727-8d68fcb4784d

📥 Commits

Reviewing files that changed from the base of the PR and between 010e1bb and 6494c5b.

📒 Files selected for processing (7)

internal/controller/ensure_tiebreaker_test.go
internal/controller/export_test.go
internal/controller/resourcedefinition_controller.go
pkg/api/v1/annotations.go
pkg/rest/autoplace.go
pkg/rest/autoplace_test.go
tests/e2e/keep-tiebreaker-3way.sh

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/keep-tiebreaker-r-d

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements support for the --keep-tiebreaker flag during resource deletion (Bug B.1) by introducing a short-lived KeepTiebreakerUntilAnnotation annotation. This annotation prevents the reconciler from prematurely collapsing auto-managed TIE_BREAKER witnesses during diskful-to-single-replica transitions. The feedback suggests using time.RFC3339Nano instead of time.RFC3339 to parse the deadline timestamp more robustly against fractional seconds, and adding a check in the E2E test script to ensure the port-forwarding process binds successfully before proceeding.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-01T23:49:33Z

+		return false
+	}
+
+	deadline, err := time.Parse(time.RFC3339, raw)


Using time.RFC3339 for parsing can fail if the annotation value contains fractional seconds (e.g., if it was written by a system that serializes timestamps with nanosecond precision, or if a user manually copied a timestamp from Kubernetes metadata like creationTimestamp which uses fractional seconds).

In Go, time.Parse(time.RFC3339Nano, raw) is fully backwards-compatible with standard RFC3339 (without fractional seconds) but also successfully parses timestamps with fractional seconds. Switching to time.RFC3339Nano makes the parsing significantly more robust against different timestamp formats.

Suggested change

deadline, err := time.Parse(time.RFC3339, raw)

deadline, err := time.Parse(time.RFC3339Nano, raw)

gemini-code-assist · 2026-06-01T23:49:33Z

+		t.Fatalf("keep-tiebreaker annotation missing; annotations=%v", rd.Annotations)
+	}
+
+	deadline, err := time.Parse(time.RFC3339, raw)


For consistency and robustness against fractional seconds, use time.RFC3339Nano for parsing the deadline timestamp here as well.

Suggested change

deadline, err := time.Parse(time.RFC3339, raw)

deadline, err := time.Parse(time.RFC3339Nano, raw)

gemini-code-assist · 2026-06-01T23:49:33Z

+for _ in $(seq 1 30); do
+    if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
+        break
+    fi
+    sleep 0.5
+done


If the port-forward background process fails to start or bind to the port, the script will silently continue past this loop and fail later with a confusing connection error on the first linstor command.

Adding a check after the loop to verify that the port-forward actually bound successfully will make the script fail fast with a clear error message.

Suggested change

for _ in $(seq 1 30); do

if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then

break

fi

sleep 0.5

done

for _ in $(seq 1 30); do

if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then

break

fi

sleep 0.5

done

if ! curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then

echo "FAIL: port-forward failed to bind to port $PF_PORT" >&2

exit 1

fi

Corner-cases C4 (connection-scoped resource drbd-peer-options), C5 (node-connection drbd-peer-options) and C6 (per-object-class option validation) are unimplemented / permissive on main. Document them as accepted deltas (#56/#57/#58) and pin current behavior: - C4: PUT to the bare resource-connections drbd-peer-options path is not handled (no silent persist) — canary flips if the surface gets wired. - C6: a net{}-class option on a volume-definition is accepted and stored permissively (upstream rejects with FAIL_INVLD_PROP) — canary flips when the option-class validator lands. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

… scopes (corner C) (#98) * fix(effectiveprops): controller-tier DRBD options must lose to closer scopes controller drbd-options persists into ControllerConfig.Spec.ExtraProps (raw, untyped), while rd/rg drbd-options are transcoded into the typed Spec.DRBDOptions. The effective-props resolver copied the controller ExtraProps AFTER the RG/RD/Resource typed+raw merge, so a cluster-wide controller default wrongly overrode a closer scope's explicit override — violating LINSTOR's 'closer to the resource wins' rule. Flatten each scope (Controller/RG/RD/Resource) into a single DRBD-props map (typed-emitted ∪ ExtraProps) and run them all through one precedence walk (drbd.ResolveOptions), so closer scopes win regardless of whether a value was stored typed or as an ExtraProp. Controller-tier inheritance (value reaches a resource that sets nothing) is preserved. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(store): pin --unset-<drbd-opt> reverts typed field to inherited (C3) linstor rd/rg/r drbd-options --unset-<opt> deletes the flattened DrbdOptions/... key; the store transcode round-trip must clear the matching typed slot so the value reverts to inherited/default, while sibling overrides in the same section survive untouched. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(e2e): L6 cell + L7 replay for controller DRBD-options hierarchy (C1/C2) Add an operator-CLI convergence gate for the effectiveprops controller- tier precedence fix: - L6 cli-matrix/drbd-options-hierarchy-controller.sh: set max-buffers at the controller, autoplace an RD, assert drbdsetup show inherits it (C1), then set a closer RD-level max-buffers and assert it wins (C2). - L7 replay/drbd-options-hierarchy-controller.yaml: same sequence under the replay harness. - New additive await kind 'drbd_option' in operator-harness/lib.sh that probes drbdsetup show on a node's satellite pod, documented in replay-runner.sh. SHARED-HARNESS CHANGE (additive only). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(rest): pin C4/C6 corner-case deltas + whitelist C4/C5/C6 Corner-cases C4 (connection-scoped resource drbd-peer-options), C5 (node-connection drbd-peer-options) and C6 (per-object-class option validation) are unimplemented / permissive on main. Document them as accepted deltas (#56/#57/#58) and pin current behavior: - C4: PUT to the bare resource-connections drbd-peer-options path is not handled (no silent persist) — canary flips if the surface gets wired. - C6: a net{}-class option on a volume-definition is accepted and stored permissively (upstream rejects with FAIL_INVLD_PROP) — canary flips when the option-class validator lands. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * fix(effectiveprops): preserve RG/RD non-DRBD ExtraProps in effective render The hierarchy-funnel refactor routed every scope's props through drbd.ResolveOptions, which by design only threads through NON-DRBD props from the most-specific (resource) scope. That silently dropped load-bearing non-DRBD ExtraProps carried on the RG/RD scopes — most importantly `FileSystem/Type` (and `FileSystem/MkfsParams`), stamped on the RD by `linstor rd set-property FileSystem/Type ext4` and by the linstor-csi CreateVolume path. The satellite gates its mkfs / `primary --force` seed path on `dr.GetProps()["FileSystem/Type"]` (hasFileSystemConfigured). With the key dropped, a fresh replica never mkfs-writes, the DRBD current-UUID never rotates past the deterministic day0 GI, and the controller's append-only RD.Spec.Initialized latch (ensureSkipInitSyncDecision) never flips — the `respawn-standalone-wedge` e2e fails in SETUP with "RD.Spec.Initialized never latched true after 3 UpToDate replicas". Re-overlay the non-DRBD ExtraProps from the RG then RD scope after the precedence walk (RD wins over RG; resource already won via ResolveOptions). DRBD keys are skipped — already resolved by the walk. The controller scope is deliberately not re-overlaid, preserving the C1 contract that cluster-wide non-DRBD knobs (Aux/zone) do not leak onto every resource. Adds TestResolve_RDNonDRBDExtraPropSurvives pinning the exact production RD shape (on-no-quorum typed, auto-quorum + FileSystem/Type in ExtraProps) that the prior analytic reconstruction missed. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> --------- Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

The Controller<RG<RD auto-diskful priority chain had no test for the MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller. Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all three are set), closing the lattice. Also document the auto-diskful trigger divergence (deficit-refill + immediate Primary-InUse promote vs upstream's timed Primary>N-min) and the unimplemented allow-cleanup gate as accepted delta #57, and the 1.34 stuck-toggle retry/cancel version-context as delta #56. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

…orner H) (#103) * fix(rest): normalize DRBD_DISKLESS to DISKLESS on resource create (H3) The modern `linstor r c <node> <rd> --drbd-diskless` CLI flag posts the wire flag DRBD_DISKLESS, while the deprecated `--diskless` alias posts the canonical DISKLESS (verified via `linstor --curl` against the upstream 1.33.2 oracle, client 1.27.1). blockstor's diskless-detection surface keys exclusively on the canonical DISKLESS spelling, so a replica requested with the recommended --drbd-diskless flag was mis-classified as diskful: the satellite would carve backing storage and the quorum/tiebreaker math would miscount it. Fold DRBD_DISKLESS into DISKLESS once at the create wire boundary so the rest of the pipeline sees a single spelling. De-duplicates when both spellings are present. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(auto-diskful): pin RG-scope priority in the property hierarchy (H4) The Controller<RG<RD auto-diskful priority chain had no test for the MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller. Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all three are set), closing the lattice. Also document the auto-diskful trigger divergence (deficit-refill + immediate Primary-InUse promote vs upstream's timed Primary>N-min) and the unimplemented allow-cleanup gate as accepted delta #57, and the 1.34 stuck-toggle retry/cancel version-context as delta #56. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(toggle-disk): pin --migrate-from sync-then-remove (H2) Add an L6 cli-matrix cell + L7 replay YAML for the toggle-disk --migrate-from migration (UG9 §"Migrating a resource to another node"). Both assert the sync-then-remove contract: the destination diskful replica is added (count 2->3) and reaches UpToDate BEFORE the migrate source is pruned, so the active diskful count never drops below the original 2 at any observed point. The landing pad uses --drbd-diskless to also exercise the H3 DRBD_DISKLESS normalisation on the same flow. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(toggle-disk): use --diskless alias for the H2 migrate landing pad The H2 sync-then-remove replay/cli-matrix created the diskless landing pad with the modern `--drbd-diskless` flag, which posts the wire flag DRBD_DISKLESS. blockstor's diskless-detection surface keys on the canonical DISKLESS spelling, and the DRBD_DISKLESS->DISKLESS wire normalisation is the H3 fix in this same branch — NOT yet rolled out to the dev stand. On the deployed (pre-H3) image the landing pad was mis-classified as diskful and never reported diskState Diskless, so the replica_diskless await timed out before the migration step ever ran. Switch the landing pad to the deprecated `--diskless` alias, which posts the canonical DISKLESS directly, so the H2 sync-then-remove migration contract is validated on the currently-deployed stand image. H3's DRBD_DISKLESS normalisation remains pinned by the L1 unit test pkg/rest/resource_create_drbd_diskless_test.go, the correct tier for a wire-boundary flag canonicalisation. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * test(replay): migrate-from asserts source no longer diskful, not absent The auto-tiebreaker legitimately re-occupies the vacated node after the diskful source is pruned (2 diskful = even parity), so resource_absent can never hold on a 3-worker stand. Assert replica_diskless on the source plus active_diskful_count=2 instead — the actual sync-then-remove contract. Live-stand evidence: controller logs 'migration complete: src pruned, dst UpToDate' while the witness lands back on the source node. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> --------- Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 01:39

Andrei Kvapil (kvaps) merged commit 35d3fb8 into main Jun 2, 2026
37 of 40 checks passed

This was referenced Jun 3, 2026

chore(release): prepare v0.1.5 #91

Merged

test(placer): pin LINSTOR placement/autoplace corner-cases (campaign group D) #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(controller): honor r d --keep-tiebreaker so auto-witness survives#57

fix(controller): honor r d --keep-tiebreaker so auto-witness survives#57
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/keep-tiebreaker-r-d

Andrei Kvapil (kvaps) commented Jun 1, 2026

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	deadline, err := time.Parse(time.RFC3339, raw)
	deadline, err := time.Parse(time.RFC3339Nano, raw)

Conversation

Andrei Kvapil (kvaps) commented Jun 1, 2026

Bug

Root cause

Fix

Tests

Stand verification

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading