Skip to content

fix(controller): honor r d --keep-tiebreaker so auto-witness survives#57

Merged
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/keep-tiebreaker-r-d
Jun 2, 2026
Merged

fix(controller): honor r d --keep-tiebreaker so auto-witness survives#57
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/keep-tiebreaker-r-d

Conversation

@kvaps
Copy link
Copy Markdown
Member

Bug

linstor r d --keep-tiebreaker <one-of-diskful> <rd> on a 2-diskful + 1-witness shape silently reaped the auto-managed TIE_BREAKER witness right after the delete completed. Final topology was a solo diskful with NO witness — quorum=1, unprotected for ~5 min until stampTiebreakerSuppression expired.

The CLI flag documents itself as "Keeps the tiebreaker instead of accidentally deleting it. Also removes the RD property if necessary in order to keep the resource as a tiebreaker", but neither the REST DELETE handler nor the RD reconciler observed the operator intent.

Repro (hunt-v3 finding B.1, full evidence in /tmp/bug-hunt3-B1-keep-tiebreaker-strips-witness.md on the dev stand):

$L rd c hunt3-b1 -l drbd,storage
$L vd c hunt3-b1 100M
$L r c dev-worker-1 hunt3-b1 -s zfs-thin
$L r c dev-worker-2 hunt3-b1 -s zfs-thin
# state: w-1 diskful, w-2 diskful, w-3 auto-TIE_BREAKER
$L r d --keep-tiebreaker dev-worker-2 hunt3-b1

Controller log shows the carve-out reaping the witness:

ensureTiebreaker rd=hunt3-b1 replicas=2 diskful=1 witness=1 willCreate=false willRemove=true

Root cause

ensureTiebreaker reasoned purely from on-disk topology. The --keep-tiebreaker intent from the REST DELETE request wasn't plumbed to the controller, so shouldKeepExistingWitness hit the Bug-338 carve-out (diskful == 1 && nonWitnessDiskless == 0 ⇒ collapse) and removed the witness in the same reconcile that observed diskful drop from 2 to 1.

Fix

Mirror the existing suppression-annotation pattern (AutoTiebreakerSuppressedUntilAnnotation):

  • REST handleResourceDelete reads ?keep_tiebreaker=true from the URL and stamps a new KeepTiebreakerUntilAnnotation (blockstor.io/keep-tiebreaker-until) on the parent RD with a fresh 5-minute RFC3339 deadline BEFORE committing the Resource Delete. Stamp lands first so any Reconcile woken by the Delete event observes both signals atomically.
  • RD-side shouldTieBreakerExist now consults isKeepTiebreakerActive via a new keepExistingWitnessFor helper. When the annotation deadline is in the future AND at least one diskful peer survives, the existing witness is preserved regardless of the Bug-338 carve-out. The diskful floor matters because a lone witness with 0 diskful is 1 voter — strictly worse than no witness.
  • Expired and unparseable values are treated as "no override", so a forgotten or hand-edited annotation can't permanently freeze the auto-quorum invariant. After 5 minutes the normal Bug-338 collapse path resumes with no manual cleanup.

Tests

  • internal/controller/ensure_tiebreaker_test.go:
    • TestKeepTiebreakerAnnotation_HonoredWhileFresh — 1 diskful + 1 witness with fresh annotation → witness preserved.
    • TestKeepTiebreakerAnnotation_ExpiredFallsThrough — same topology, expired annotation → Bug-338 collapse resumes; unparseable values also treated as "no override".
  • pkg/rest/autoplace_test.go:
    • TestResourceDeleteKeepTiebreakerStampsOverrideDELETE …?keep_tiebreaker=true writes the annotation with a future RFC3339 deadline.
    • TestResourceDeleteWithoutKeepTiebreakerSkipsOverride — plain DELETE … does not stamp the override.
  • tests/e2e/keep-tiebreaker-3way.sh — real-DRBD QEMU-stand regression catcher: 2 diskful + 1 auto-TB topology, linstor r d --keep-tiebreaker, assert the surviving TB stays for a 30s stability tail. Required by the TieBreaker-e2e-must-exist rule (this class re-regressed before: Bugs 104/108/267/271/338/342).

Stand verification

Skipped — the dev stand was unreachable from this network at commit time. The unit + REST tests cover the wire path end-to-end (REST handler → store → reconciler decision), and the e2e script is ready to run on the QEMU stand once available.

Bug B.1 (hunt-v3): `linstor r d --keep-tiebreaker <one-of-diskful>` on
a 2-diskful + 1-witness RD silently reaped the auto-managed
TIE_BREAKER witness right after the delete completed. The CLI flag
documents itself as "Keeps the tiebreaker instead of accidentally
deleting it", but neither the REST DELETE handler nor the RD
reconciler observed the operator intent — `ensureTiebreaker` hit the
Bug-338 carve-out (diskful=1, witness=1, no non-witness diskless) and
collapsed the witness. The resulting solo-diskful shape ran quorum=1
unprotected for ~5 min until stampTiebreakerSuppression expired.

Mirror the suppression-annotation pattern: the REST handler reads
`?keep_tiebreaker=true` from the DELETE URL and stamps a
KeepTiebreakerUntilAnnotation on the parent RD with a fresh 5-minute
RFC3339 deadline before committing the Resource Delete. The RD-side
`shouldTieBreakerExist` (via the new `keepExistingWitnessFor` helper)
returns true while the deadline is in the future AND at least one
diskful peer survives — without a diskful peer the witness has
nothing to arbitrate, so the override is gated. Expired or
unparseable values are treated as "no override" so a forgotten or
hand-edited annotation can't permanently freeze the auto-quorum
invariant.

Tests:
- internal/controller/ensure_tiebreaker_test.go:
  TestKeepTiebreakerAnnotation_HonoredWhileFresh,
  TestKeepTiebreakerAnnotation_ExpiredFallsThrough.
- pkg/rest/autoplace_test.go:
  TestResourceDeleteKeepTiebreakerStampsOverride,
  TestResourceDeleteWithoutKeepTiebreakerSkipsOverride.
- tests/e2e/keep-tiebreaker-3way.sh: real-DRBD QEMU-stand regression
  catcher — 2 diskful + 1 auto-TB, run `r d --keep-tiebreaker`, assert
  the witness survives a 30s stability tail (per the MEMORY.md
  TieBreaker-e2e-must-exist rule; this class re-regressed before).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes and 14 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79bcb550-4989-42d8-b727-8d68fcb4784d

📥 Commits

Reviewing files that changed from the base of the PR and between 010e1bb and 6494c5b.

📒 Files selected for processing (7)
  • internal/controller/ensure_tiebreaker_test.go
  • internal/controller/export_test.go
  • internal/controller/resourcedefinition_controller.go
  • pkg/api/v1/annotations.go
  • pkg/rest/autoplace.go
  • pkg/rest/autoplace_test.go
  • tests/e2e/keep-tiebreaker-3way.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/keep-tiebreaker-r-d

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements support for the --keep-tiebreaker flag during resource deletion (Bug B.1) by introducing a short-lived KeepTiebreakerUntilAnnotation annotation. This annotation prevents the reconciler from prematurely collapsing auto-managed TIE_BREAKER witnesses during diskful-to-single-replica transitions. The feedback suggests using time.RFC3339Nano instead of time.RFC3339 to parse the deadline timestamp more robustly against fractional seconds, and adding a check in the E2E test script to ensure the port-forwarding process binds successfully before proceeding.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

return false
}

deadline, err := time.Parse(time.RFC3339, raw)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.RFC3339 for parsing can fail if the annotation value contains fractional seconds (e.g., if it was written by a system that serializes timestamps with nanosecond precision, or if a user manually copied a timestamp from Kubernetes metadata like creationTimestamp which uses fractional seconds).

In Go, time.Parse(time.RFC3339Nano, raw) is fully backwards-compatible with standard RFC3339 (without fractional seconds) but also successfully parses timestamps with fractional seconds. Switching to time.RFC3339Nano makes the parsing significantly more robust against different timestamp formats.

Suggested change
deadline, err := time.Parse(time.RFC3339, raw)
deadline, err := time.Parse(time.RFC3339Nano, raw)

t.Fatalf("keep-tiebreaker annotation missing; annotations=%v", rd.Annotations)
}

deadline, err := time.Parse(time.RFC3339, raw)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency and robustness against fractional seconds, use time.RFC3339Nano for parsing the deadline timestamp here as well.

Suggested change
deadline, err := time.Parse(time.RFC3339, raw)
deadline, err := time.Parse(time.RFC3339Nano, raw)

Comment on lines +93 to +98
for _ in $(seq 1 30); do
if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
break
fi
sleep 0.5
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the port-forward background process fails to start or bind to the port, the script will silently continue past this loop and fail later with a confusing connection error on the first linstor command.

Adding a check after the loop to verify that the port-forward actually bound successfully will make the script fail fast with a clear error message.

Suggested change
for _ in $(seq 1 30); do
if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
break
fi
sleep 0.5
done
for _ in $(seq 1 30); do
if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
break
fi
sleep 0.5
done
if ! curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
echo "FAIL: port-forward failed to bind to port $PF_PORT" >&2
exit 1
fi

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 01:39
@kvaps Andrei Kvapil (kvaps) merged commit 35d3fb8 into main Jun 2, 2026
37 of 40 checks passed
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 4, 2026
Corner-cases C4 (connection-scoped resource drbd-peer-options),
C5 (node-connection drbd-peer-options) and C6 (per-object-class option
validation) are unimplemented / permissive on main. Document them as
accepted deltas (#56/#57/#58) and pin current behavior:

- C4: PUT to the bare resource-connections drbd-peer-options path is not
  handled (no silent persist) — canary flips if the surface gets wired.
- C6: a net{}-class option on a volume-definition is accepted and stored
  permissively (upstream rejects with FAIL_INVLD_PROP) — canary flips
  when the option-class validator lands.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 4, 2026
Corner-cases C4 (connection-scoped resource drbd-peer-options),
C5 (node-connection drbd-peer-options) and C6 (per-object-class option
validation) are unimplemented / permissive on main. Document them as
accepted deltas (#56/#57/#58) and pin current behavior:

- C4: PUT to the bare resource-connections drbd-peer-options path is not
  handled (no silent persist) — canary flips if the surface gets wired.
- C6: a net{}-class option on a volume-definition is accepted and stored
  permissively (upstream rejects with FAIL_INVLD_PROP) — canary flips
  when the option-class validator lands.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 4, 2026
… scopes (corner C) (#98)

* fix(effectiveprops): controller-tier DRBD options must lose to closer scopes

controller drbd-options persists into ControllerConfig.Spec.ExtraProps
(raw, untyped), while rd/rg drbd-options are transcoded into the typed
Spec.DRBDOptions. The effective-props resolver copied the controller
ExtraProps AFTER the RG/RD/Resource typed+raw merge, so a cluster-wide
controller default wrongly overrode a closer scope's explicit override
— violating LINSTOR's 'closer to the resource wins' rule.

Flatten each scope (Controller/RG/RD/Resource) into a single DRBD-props
map (typed-emitted ∪ ExtraProps) and run them all through one
precedence walk (drbd.ResolveOptions), so closer scopes win regardless
of whether a value was stored typed or as an ExtraProp. Controller-tier
inheritance (value reaches a resource that sets nothing) is preserved.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(store): pin --unset-<drbd-opt> reverts typed field to inherited (C3)

linstor rd/rg/r drbd-options --unset-<opt> deletes the flattened
DrbdOptions/... key; the store transcode round-trip must clear the
matching typed slot so the value reverts to inherited/default, while
sibling overrides in the same section survive untouched.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(e2e): L6 cell + L7 replay for controller DRBD-options hierarchy (C1/C2)

Add an operator-CLI convergence gate for the effectiveprops controller-
tier precedence fix:

- L6 cli-matrix/drbd-options-hierarchy-controller.sh: set max-buffers at
  the controller, autoplace an RD, assert drbdsetup show inherits it (C1),
  then set a closer RD-level max-buffers and assert it wins (C2).
- L7 replay/drbd-options-hierarchy-controller.yaml: same sequence under
  the replay harness.
- New additive await kind 'drbd_option' in operator-harness/lib.sh that
  probes drbdsetup show on a node's satellite pod, documented in
  replay-runner.sh. SHARED-HARNESS CHANGE (additive only).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(rest): pin C4/C6 corner-case deltas + whitelist C4/C5/C6

Corner-cases C4 (connection-scoped resource drbd-peer-options),
C5 (node-connection drbd-peer-options) and C6 (per-object-class option
validation) are unimplemented / permissive on main. Document them as
accepted deltas (#56/#57/#58) and pin current behavior:

- C4: PUT to the bare resource-connections drbd-peer-options path is not
  handled (no silent persist) — canary flips if the surface gets wired.
- C6: a net{}-class option on a volume-definition is accepted and stored
  permissively (upstream rejects with FAIL_INVLD_PROP) — canary flips
  when the option-class validator lands.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* fix(effectiveprops): preserve RG/RD non-DRBD ExtraProps in effective render

The hierarchy-funnel refactor routed every scope's props through
drbd.ResolveOptions, which by design only threads through NON-DRBD
props from the most-specific (resource) scope. That silently dropped
load-bearing non-DRBD ExtraProps carried on the RG/RD scopes — most
importantly `FileSystem/Type` (and `FileSystem/MkfsParams`), stamped
on the RD by `linstor rd set-property FileSystem/Type ext4` and by the
linstor-csi CreateVolume path.

The satellite gates its mkfs / `primary --force` seed path on
`dr.GetProps()["FileSystem/Type"]` (hasFileSystemConfigured). With the
key dropped, a fresh replica never mkfs-writes, the DRBD current-UUID
never rotates past the deterministic day0 GI, and the controller's
append-only RD.Spec.Initialized latch (ensureSkipInitSyncDecision)
never flips — the `respawn-standalone-wedge` e2e fails in SETUP with
"RD.Spec.Initialized never latched true after 3 UpToDate replicas".

Re-overlay the non-DRBD ExtraProps from the RG then RD scope after the
precedence walk (RD wins over RG; resource already won via
ResolveOptions). DRBD keys are skipped — already resolved by the walk.
The controller scope is deliberately not re-overlaid, preserving the
C1 contract that cluster-wide non-DRBD knobs (Aux/zone) do not leak
onto every resource.

Adds TestResolve_RDNonDRBDExtraPropSurvives pinning the exact
production RD shape (on-no-quorum typed, auto-quorum + FileSystem/Type
in ExtraProps) that the prior analytic reconstruction missed.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 4, 2026
The Controller<RG<RD auto-diskful priority chain had no test for the
MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller.
Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is
unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all
three are set), closing the lattice.

Also document the auto-diskful trigger divergence (deficit-refill +
immediate Primary-InUse promote vs upstream's timed Primary>N-min) and
the unimplemented allow-cleanup gate as accepted delta #57, and the
1.34 stuck-toggle retry/cancel version-context as delta #56.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 4, 2026
The Controller<RG<RD auto-diskful priority chain had no test for the
MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller.
Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is
unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all
three are set), closing the lattice.

Also document the auto-diskful trigger divergence (deficit-refill +
immediate Primary-InUse promote vs upstream's timed Primary>N-min) and
the unimplemented allow-cleanup gate as accepted delta #57, and the
1.34 stuck-toggle retry/cancel version-context as delta #56.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 5, 2026
The Controller<RG<RD auto-diskful priority chain had no test for the
MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller.
Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is
unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all
three are set), closing the lattice.

Also document the auto-diskful trigger divergence (deficit-refill +
immediate Primary-InUse promote vs upstream's timed Primary>N-min) and
the unimplemented allow-cleanup gate as accepted delta #57, and the
1.34 stuck-toggle retry/cancel version-context as delta #56.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 5, 2026
…orner H) (#103)

* fix(rest): normalize DRBD_DISKLESS to DISKLESS on resource create (H3)

The modern `linstor r c <node> <rd> --drbd-diskless` CLI flag posts the
wire flag DRBD_DISKLESS, while the deprecated `--diskless` alias posts
the canonical DISKLESS (verified via `linstor --curl` against the
upstream 1.33.2 oracle, client 1.27.1). blockstor's diskless-detection
surface keys exclusively on the canonical DISKLESS spelling, so a
replica requested with the recommended --drbd-diskless flag was
mis-classified as diskful: the satellite would carve backing storage
and the quorum/tiebreaker math would miscount it.

Fold DRBD_DISKLESS into DISKLESS once at the create wire boundary so the
rest of the pipeline sees a single spelling. De-duplicates when both
spellings are present.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(auto-diskful): pin RG-scope priority in the property hierarchy (H4)

The Controller<RG<RD auto-diskful priority chain had no test for the
MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller.
Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is
unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all
three are set), closing the lattice.

Also document the auto-diskful trigger divergence (deficit-refill +
immediate Primary-InUse promote vs upstream's timed Primary>N-min) and
the unimplemented allow-cleanup gate as accepted delta #57, and the
1.34 stuck-toggle retry/cancel version-context as delta #56.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(toggle-disk): pin --migrate-from sync-then-remove (H2)

Add an L6 cli-matrix cell + L7 replay YAML for the toggle-disk
--migrate-from migration (UG9 §"Migrating a resource to another node").
Both assert the sync-then-remove contract: the destination diskful
replica is added (count 2->3) and reaches UpToDate BEFORE the migrate
source is pruned, so the active diskful count never drops below the
original 2 at any observed point. The landing pad uses --drbd-diskless
to also exercise the H3 DRBD_DISKLESS normalisation on the same flow.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(toggle-disk): use --diskless alias for the H2 migrate landing pad

The H2 sync-then-remove replay/cli-matrix created the diskless landing
pad with the modern `--drbd-diskless` flag, which posts the wire flag
DRBD_DISKLESS. blockstor's diskless-detection surface keys on the
canonical DISKLESS spelling, and the DRBD_DISKLESS->DISKLESS wire
normalisation is the H3 fix in this same branch — NOT yet rolled out to
the dev stand. On the deployed (pre-H3) image the landing pad was
mis-classified as diskful and never reported diskState Diskless, so the
replica_diskless await timed out before the migration step ever ran.

Switch the landing pad to the deprecated `--diskless` alias, which posts
the canonical DISKLESS directly, so the H2 sync-then-remove migration
contract is validated on the currently-deployed stand image. H3's
DRBD_DISKLESS normalisation remains pinned by the L1 unit test
pkg/rest/resource_create_drbd_diskless_test.go, the correct tier for a
wire-boundary flag canonicalisation.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(replay): migrate-from asserts source no longer diskful, not absent

The auto-tiebreaker legitimately re-occupies the vacated node after the
diskful source is pruned (2 diskful = even parity), so resource_absent
can never hold on a 3-worker stand. Assert replica_diskless on the
source plus active_diskful_count=2 instead — the actual sync-then-remove
contract. Live-stand evidence: controller logs 'migration complete: src
pruned, dst UpToDate' while the witness lands back on the source node.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant