Skip to content

fix(rest): normalize DRBD_DISKLESS + pin toggle-disk state machine (corner H)#103

Merged
Andrei Kvapil (kvaps) merged 5 commits into
mainfrom
corner/h-toggle-disk-statemachine
Jun 5, 2026
Merged

fix(rest): normalize DRBD_DISKLESS + pin toggle-disk state machine (corner H)#103
Andrei Kvapil (kvaps) merged 5 commits into
mainfrom
corner/h-toggle-disk-statemachine

Conversation

@kvaps
Copy link
Copy Markdown
Member

Summary

Closes the group-H corner cases of the LINSTOR parity campaign (toggle-disk / migration state machine). One real wire-flag bug fixed (H3), two state-machine contracts pinned (H2/H4), one version-context divergence documented (H1). Diskless-detection and the --migrate-from sync-then-remove ordering are the load-bearing surfaces.

Verdict table

Item What Verdict Evidence
H1 Stuck-toggle state machine (r td retry / cancel) DELIBERATE DELTA (#67) Upstream ≥1.34.0 recovers a stuck toggle by re-issuing the SAME command (retry) or the OPPOSITE one (cancel). The dev-stand oracle is 1.33.2 (pre-1.34; rejects with a conflict). blockstor already satisfies the 1.34 contract — a repeated toggle is idempotent, the opposite-direction toggle drives the satellite demote path (detach + reclaimVolumesForDiskless), and it additionally accepts an explicit ?cancel=true (Bug 40) the upstream client never sends. Pinned by existing TestToggleDiskCancelStuckAddDiskScenario4W24 + TestBug267ToggleToDisklessReclaimsBackingVolume.
H2 toggle-disk --migrate-from is sync-then-remove MATCHES (pinned) The migrate SOURCE is removed only AFTER the destination reaches UpToDate; redundancy never dips below the original diskful count. Pinned at L6 (tests/e2e/cli-matrix/toggle-disk-migrate-from.sh) + L7 (replay/toggle-disk-migrate-from.yaml), validated on the stand.
H3 Deprecated --diskless alias vs modern --drbd-diskless DIVERGED (fixed) The modern r c --drbd-diskless posts wire flag DRBD_DISKLESS; the deprecated --diskless posts the canonical DISKLESS (verified via linstor --curl against the 1.33.2 oracle, client 1.27.1). blockstor's diskless-detection surface keys exclusively on DISKLESS, so a --drbd-diskless replica was mis-classified as diskful — the satellite would carve backing storage and the quorum/tiebreaker math would miscount it. Folded DRBD_DISKLESSDISKLESS at the create wire boundary. Pinned by pkg/rest/resource_create_drbd_diskless_test.go.
H4 DrbdOptions/auto-diskful trigger + priority hierarchy DELIBERATE DELTA (#68), hierarchy pinned blockstor honours the property and the RD>RG>Controller priority chain; the MIDDLE (RG) layer had no test. Added TestAutoDiskfulPropHierarchyRGWins + TestAutoDiskfulPropHierarchyRDBeatsRG, closing the lattice. The firing CONDITION (deficit-refill + immediate Primary-InUse promote) and the unimplemented auto-diskful-allow-cleanup trim are documented as accepted delta #68.

The fix (H3)

Diskless-detection across blockstor (the placer's splitByDiskless, the satellite's applyStorageIfDiskful, the quorum/tiebreaker arithmetic, the store's flag projection) keys exclusively on the canonical DISKLESS spelling. A resource created with the RECOMMENDED --drbd-diskless flag posts only DRBD_DISKLESS and was therefore treated as a diskful create. normalizeDisklessFlag folds DRBD_DISKLESS into DISKLESS once at the resource-create wire boundary (de-duplicating when both spellings are present), so the rest of the pipeline sees a single spelling.

Tests

  • L1: pkg/rest/resource_create_drbd_diskless_test.go (H3 wire-boundary normalisation), internal/controller/auto_diskful_timer_test.go (H4 RD>RG>Controller hierarchy).
  • L6: tests/e2e/cli-matrix/toggle-disk-migrate-from.sh (H2 sync-then-remove; redundancy floor asserted on every poll).
  • L7: tests/operator-harness/replay/toggle-disk-migrate-from.yaml (H2 convergence assertion).

Known deltas

Rows #67 (H1 stuck-toggle version-context) and #68 (H4 auto-diskful trigger condition) appended to docs/cli-parity-known-deltas.md at max+1 over main's numbering.

Stand validation

The H2 L6/L7 artifacts create the diskless landing pad with the deprecated --diskless alias (canonical DISKLESS) so they validate the sync-then-remove contract against the currently-deployed stand image, which predates the H3 fix. H3's DRBD_DISKLESS normalisation is pinned at the L1 unit tier (the correct level for a wire-boundary canonicalisation) and takes effect once the controller change rolls out.

Root-cause of the original replay failure (now fixed): the first stand run timed out at the landing-pad step —

  -> step: r-c-dst-drbd-diskless :: linstor resource create dev-worker-3 <rd> --drbd-diskless
    ASSERTION TIMEOUT: kind=replica_diskless ... node=dev-worker-3
FAIL: toggle-disk-migrate-from

— because the deployed (pre-H3) image keys diskless-detection on the canonical DISKLESS spelling (pkg/satellite/reconciler.go isDiskless, pkg/store/k8s/resources.go), so a --drbd-diskless replica (DRBD_DISKLESS) was mis-classified as diskful and diskState never reported Diskless. Switching the landing pad to the --diskless alias removes that dependency on an unrolled controller change; the migration step itself was unit-validated.

Stand re-validation status: BLOCKED-ON-INFRA at submit time. The dev stand entered an INFRA-owned reset during re-validation — a worker (dev-worker-1) failed to return Ready and .work/dev/kubeconfig was wiped mid-run (the runner reached the workflow but reported SKIP: workflow needs 3 workers once discovery lost the cluster). The corrected replay should be re-run on the stand once it is healthy; the fix is conclusive at the code + L1 + L6/L7-shape level.

Notes

No shared-harness changes (lib.sh / replay-runner.sh untouched). go build ./... + go test ./... green; golangci-lint run and shellcheck -S error clean for the diff.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Warning

Review limit reached

@kvaps, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 27 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 70a629b0-0267-4d61-99bc-6ad0f94de73a

📥 Commits

Reviewing files that changed from the base of the PR and between cfd5ce9 and a885cd9.

📒 Files selected for processing (6)
  • docs/cli-parity-known-deltas.md
  • internal/controller/auto_diskful_timer_test.go
  • pkg/rest/autoplace.go
  • pkg/rest/resource_create_drbd_diskless_test.go
  • tests/e2e/cli-matrix/toggle-disk-migrate-from.sh
  • tests/operator-harness/replay/toggle-disk-migrate-from.yaml
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch corner/h-toggle-disk-statemachine

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to handle the modern --drbd-diskless flag by normalizing the DRBD_DISKLESS wire flag to the canonical DISKLESS flag at the API boundary, accompanied by comprehensive unit and E2E tests for the sync-then-remove migration behavior. It also adds tests to verify the priority hierarchy of the auto-diskful property (Controller < RG < RD). The review feedback suggests adding a defensive nil check for the resource pointer in the normalization function and capturing command output in a variable within the E2E bash script to prevent exit status masking.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/rest/autoplace.go
Comment on lines +2429 to +2430
func normalizeDisklessFlag(res *apiv1.Resource) {
if !slices.Contains(res.Flags, rscFlagDrbdDiskless) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To adhere to defensive programming practices, it is recommended to add a nil check for the res pointer before accessing its fields. This prevents potential runtime panics if the function is called with a nil argument in the future.

Suggested change
func normalizeDisklessFlag(res *apiv1.Resource) {
if !slices.Contains(res.Flags, rscFlagDrbdDiskless) {
func normalizeDisklessFlag(res *apiv1.Resource) {
if res == nil || !slices.Contains(res.Flags, rscFlagDrbdDiskless) {
return
}

exit 1
fi

if linstor_diskful_nodes "$RD" | grep -qx "$SRC"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Running linstor_diskful_nodes directly inside the if condition pipeline masks any command failures (even with pipefail active, the if statement suppresses set -e exits). If linstor_diskful_nodes fails, the test could silently pass. Consider capturing the output in a variable first to ensure failures are caught by set -e.

Suggested change
if linstor_diskful_nodes "$RD" | grep -qx "$SRC"; then
nodes=$(linstor_diskful_nodes "$RD")
if echo "$nodes" | grep -qx "$SRC"; then

@kvaps
Copy link
Copy Markdown
Member Author

Stand validation root-caused: the earlier resource_absent failure was a wrong assertion, not a product gap — after the diskful source is pruned the auto-tiebreaker legitimately re-occupies the vacated node (2 diskful = even parity), so a Resource CRD for node1 exists again as a diskless TIE_BREAKER. The replay now asserts replica_diskless on the source + active_diskful_count=2 (the real sync-then-remove contract). Live stand on current main: controller logs migration complete: src pruned, dst UpToDate; the launcher will re-run the updated replay before merge.

@kvaps
Copy link
Copy Markdown
Member Author

Stand validation complete: the corrected replay (replica_diskless + active_diskful_count) PASSES on the live dev stand against current main — sync-then-remove contract confirmed at operator-CLI level (source's diskful replica pruned only after dst UpToDate; auto-tiebreaker legitimately re-occupies the vacated node).

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 4, 2026 13:20
@kvaps Andrei Kvapil (kvaps) force-pushed the corner/h-toggle-disk-statemachine branch from 31a3c53 to 02049ba Compare June 4, 2026 13:22
Andrei Kvapil (kvaps) and others added 5 commits June 5, 2026 18:47
The modern `linstor r c <node> <rd> --drbd-diskless` CLI flag posts the
wire flag DRBD_DISKLESS, while the deprecated `--diskless` alias posts
the canonical DISKLESS (verified via `linstor --curl` against the
upstream 1.33.2 oracle, client 1.27.1). blockstor's diskless-detection
surface keys exclusively on the canonical DISKLESS spelling, so a
replica requested with the recommended --drbd-diskless flag was
mis-classified as diskful: the satellite would carve backing storage
and the quorum/tiebreaker math would miscount it.

Fold DRBD_DISKLESS into DISKLESS once at the create wire boundary so the
rest of the pipeline sees a single spelling. De-duplicates when both
spellings are present.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The Controller<RG<RD auto-diskful priority chain had no test for the
MIDDLE (RG) layer: the existing RDWins test only covers RD-over-Controller.
Add TestAutoDiskfulPropHierarchyRGWins (RG beats Controller when RD is
unset) and TestAutoDiskfulPropHierarchyRDBeatsRG (RD beats both when all
three are set), closing the lattice.

Also document the auto-diskful trigger divergence (deficit-refill +
immediate Primary-InUse promote vs upstream's timed Primary>N-min) and
the unimplemented allow-cleanup gate as accepted delta #57, and the
1.34 stuck-toggle retry/cancel version-context as delta #56.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add an L6 cli-matrix cell + L7 replay YAML for the toggle-disk
--migrate-from migration (UG9 §"Migrating a resource to another node").
Both assert the sync-then-remove contract: the destination diskful
replica is added (count 2->3) and reaches UpToDate BEFORE the migrate
source is pruned, so the active diskful count never drops below the
original 2 at any observed point. The landing pad uses --drbd-diskless
to also exercise the H3 DRBD_DISKLESS normalisation on the same flow.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The H2 sync-then-remove replay/cli-matrix created the diskless landing
pad with the modern `--drbd-diskless` flag, which posts the wire flag
DRBD_DISKLESS. blockstor's diskless-detection surface keys on the
canonical DISKLESS spelling, and the DRBD_DISKLESS->DISKLESS wire
normalisation is the H3 fix in this same branch — NOT yet rolled out to
the dev stand. On the deployed (pre-H3) image the landing pad was
mis-classified as diskful and never reported diskState Diskless, so the
replica_diskless await timed out before the migration step ever ran.

Switch the landing pad to the deprecated `--diskless` alias, which posts
the canonical DISKLESS directly, so the H2 sync-then-remove migration
contract is validated on the currently-deployed stand image. H3's
DRBD_DISKLESS normalisation remains pinned by the L1 unit test
pkg/rest/resource_create_drbd_diskless_test.go, the correct tier for a
wire-boundary flag canonicalisation.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The auto-tiebreaker legitimately re-occupies the vacated node after the
diskful source is pruned (2 diskful = even parity), so resource_absent
can never hold on a 3-worker stand. Assert replica_diskless on the
source plus active_diskful_count=2 instead — the actual sync-then-remove
contract. Live-stand evidence: controller logs 'migration complete: src
pruned, dst UpToDate' while the witness lands back on the source node.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps Andrei Kvapil (kvaps) force-pushed the corner/h-toggle-disk-statemachine branch from 02049ba to a885cd9 Compare June 5, 2026 16:47
@kvaps Andrei Kvapil (kvaps) merged commit e26ee80 into main Jun 5, 2026
15 checks passed
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 5, 2026
…nt (auto-TB parity)

After r d leaves a 2-diskful RD, auto-add-quorum-tiebreaker (default-on,
matching upstream LINSTOR) re-occupies the vacated node with a DISKLESS
TieBreaker witness for quorum. The resource_absent asserts on the vacated
node in remove-replica and r-c-with-tiebreaker-peer were therefore false
FAILs of correct, parity-matching behaviour. Flip them to tiebreaker_present
(same lesson as toggle-disk-migrate-from #103, n-autoplace-target-excludes
#109). The load-bearing assertions (re-spawn DISKFUL, no permanent
Connecting) are unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 5, 2026
…to-TB-aware assertions, slow-stand timeouts (#110)

* fix(operator-harness): vd_size_kib VOL env across pipe + {{rg}} substitution + expect_exit list

vd_size_kib passed the target volume number as a `VOL=... linstor_cli |
python3` env prefix. In a pipeline the prefix binds to the LEFT command,
not python3 on the right, so os.environ['VOL'] raised KeyError, the
parser fell through to print(0), and the assertion could never match —
the P0 vd-resize lifecycle (and vd-resize-thick) were permanently red.
Pass the volume number as argv to the parser instead.

Add {{rg}} substitution (resolved from vars.rg, default rg-<name>-<rand>)
so resource-group replays no longer create a group literally named
'{{rg}}'. Add list-form expect_exit so idempotent steps whose exit code
depends on shared-stand state (e.g. encryption create-passphrase: 0 on a
fresh controller, 10 when a passphrase already exists) can accept either.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): correct CLI shapes in evacuate/luks/evict replays

- auto-diskful-evicted-node teardown used `node evacuate --restore`,
  which client 1.27.1 does not accept; the inverse of `node evacuate`
  is `node restore`.
- luks-encrypted-rd passed the passphrase as a positional to
  `encryption create-passphrase`; the CLI takes it via the -p flag.
  The literal value is inlined ({{passphrase}} was never a substituted
  var) and the step accepts exit 0 or 10 (passphrase may already exist
  on the shared stand).
- n-evict-tiebreaker-no-shuffle: document that `node evict` is driven
  through the harness linstor_cli shim (PUT /v1/nodes/<n>/evict), not a
  native client verb.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): auto-diskful-evicted-node needs 4 nodes

The workflow places a diskful on node1/node2/node3, then evacuates
node3 and asserts the auto-diskful controller refills back to 3
diskful. On a 3-node stand the displaced replica has no healthy node
to land on (every peer already hosts one), so replica_count can never
return to 3 — the assertion times out as a false FAIL of correct
controller behaviour. Require min_nodes: 4 so the refill targets the
spare node4; on a 3-node stand the runner now SKIPs cleanly.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): flip vacated-node asserts to tiebreaker_present (auto-TB parity)

After r d leaves a 2-diskful RD, auto-add-quorum-tiebreaker (default-on,
matching upstream LINSTOR) re-occupies the vacated node with a DISKLESS
TieBreaker witness for quorum. The resource_absent asserts on the vacated
node in remove-replica and r-c-with-tiebreaker-peer were therefore false
FAILs of correct, parity-matching behaviour. Flip them to tiebreaker_present
(same lesson as toggle-disk-migrate-from #103, n-autoplace-target-excludes
#109). The load-bearing assertions (re-spawn DISKFUL, no permanent
Connecting) are unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): luks replay sets DrbdOptions/EncryptPassphrase before rd create

A LUKS-layered RD requires the controller property DrbdOptions/EncryptPassphrase
in addition to the cluster passphrase; without it the controller correctly
rejects rd-create -l drbd,luks,storage with exit 10 ("LUKS layer requires
DrbdOptions/EncryptPassphrase to be set first") -- an intentional guard, not
a product gap. Add the set-property step before rd-create, mirroring the green
L6 cell luks-rd-create-encrypted.sh. Idempotent on the shared stand.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): raise slow-stand resync timeouts to 600s

The phase-3 relocate disk_state wait in r-full-lifecycle and the large-volume
all_uptodate waits in vd-resize-full-lifecycle provably time out only on the
stand's ~4 MB/s DRBD resync, not on a real product fault. Bump those ceilings
to 600s with a comment that this is slow-stand headroom (CI stands are faster;
the timeout is a ceiling, not a target). Assertion strictness is unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): {{device}} substitution + fixture-presence SKIP gates

ps-cdp-zfs passed the literal {{device}} placeholder to the device-pool
create because substitute() never resolved it (same class as the {{rg}}
pass-through). Add {{device}} resolution from vars.device.

Add two opt-in prerequisite SKIP gates so a workflow that needs a stand
fixture the current stand lacks SKIPs cleanly (exit 0) instead of FAILing on
a missing fixture -- a missing fixture is "not exercisable here", not a
product bug:
  - prerequisites.storage_pool_min_nodes {name,min}: SKIP unless a LINSTOR
    pool is on >= min nodes (wired into vd-resize-thick, needs lvm-thick x2).
  - prerequisites.device_on_any_node <path>: SKIP unless the block device
    exists on a worker satellite (wired into ps-cdp-zfs, needs /dev/loop9).

Both gates are additive; workflows without the key are unaffected.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): accept exit 10 for vd shrink rejection

The shrink-rejected step asserted exit 1, but the python-linstor client
surfaces an API-level rejection (shrink past DRBD metadata position) as
exit 10. Both 1 and 10 mean "rejected"; the load-bearing contract (size
unchanged at 4G) is pinned by the follow-up size-unchanged-after-reject
assertion. Validated on stand: shrink returns exit 10.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

* test(operator-harness): pin r-full-lifecycle phase-1 placement (deterministic relocate target)

Phase 1 used --auto-place=2, which picks the 2 diskful nodes
non-deterministically; when a diskful landed on node3 the phase-3
`r c node3` relocate hit "resource already exists" (exit 10) on the stand.
Pin the diskful pair to node1+node2 so auto-add-quorum-tiebreaker
deterministically lands the witness on node3, making node3 a valid relocate
target -- the same topology the L6 ground-truth cell r-full-lifecycle.sh
discovers dynamically. The pure autoplace-node-selection path stays covered
by autoplace-3r.yaml and the L6 cell. Also raise the phase-1 512M-resync
all_uptodate ceiling to 600s for the slow stand.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

---------

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant