Skip to content

v0.1.5

Choose a tag to compare

@kvaps kvaps released this 03 Jun 12:53
· 39 commits to main since this release
db2dafe

Large bug-fix release. Spans the REST API wire-validation surface (Bugs 356–383: input validation, typed FAIL_* envelopes, idempotency), the satellite DRBD / LUKS / metadata paths, and a multi-round operator-lifecycle bug-hunt that closed the four operator-reported DRBD lifecycle bugs the REST sweep missed plus their adjacent classes (Bugs 384–397). Every operator-facing fix lands with an L1/L2 unit/contract test, an L6 cli-matrix cell, and an L7 operator-replay workflow, validated on the live Talos+DRBD stand.

Fixed

  • Late vd c leaves the new volume Inconsistent on every replica (Bug 384, data integrity, #83) — adding a volume to an already-initialized multi-replica resource ran the seed path with isWinner=false unconditionally (first-activation election is gated on !rdInitialized), so no replica seeded the new volume UpToDate and it latched Inconsistent forever. The satellite now re-runs the lowest-node-id winner election per freshly-added volume, so exactly one replica becomes the SyncSource. Class regression of the Bug 79/332 family.
  • node evict demotes a healthy diskful replica to TieBreaker (Bug 385, #83)ensureTiebreaker counted a witness stranded on a just-EVICTED node as live, so the witness was never relocated and a healthy diskful drifted into the tiebreaker role. Replicas on EVICTED/LOST nodes are now excluded from the witness/quorum decision and stranded witnesses are reaped.
  • node restore does not recreate the auto-TieBreaker (Bug 386, #83) — the RD reconciler did not watch Node, so clearing the EVICTED flag never re-ran the tiebreaker invariant, leaving two diskful UpToDate with no witness (split-brain risk). Adds a Node watch.
  • r d of a diskful on a 2-diskful + 1-INACTIVE resource grows a useless TieBreaker (Bug 387, #83) — an INACTIVE (drbdadm down) replica is not a voting peer but was counted as a diskful, so the delete spuriously converted to a witness. INACTIVE replicas are excluded from the voting set.
  • node evacuate never prunes the source replica (Bug 389, #81) — evacuate gap-filled a replacement but never deleted the source on the drained node, leaving the resource permanently at place_count+1. Now does strict add-before-drop (prune only after the replacement reaches UpToDate) and derives the redundancy target from the current diskful count, so it works on RDs that inherit place_count=0 from DfltRscGrp.
  • auto-diskful ignores EVICTED/LOST nodes and INACTIVE replicas (Bug 390, #82) — the deficit count and promotion-candidate set treated drained-node and deactivated replicas as healthy diskful, masking deficits and promoting onto draining nodes. Both are now filtered.
  • Autoplace under-places when an INACTIVE replica is present (Bug 393, #85)placer.countDiskfulReplicas counted INACTIVE replicas toward place_count, so a replacement active replica was never placed. INACTIVE is now excluded, mirroring the auto-diskful and tiebreaker invariants.
  • snapshot create fails on any resource with an INACTIVE replica (Bug 394, #86) — snapshot node-selection and the success denominator included the INACTIVE node, whose down DRBD device cannot ack the suspend-io barrier, aborting the whole group. INACTIVE replicas are excluded from snapshot targets.
  • Thick-LVM volume resize silently diverges the replicas (Bug 395, data integrity, #87)drbdadm resize --assume-clean ran unconditionally; on a thick LVM pool the grown extents hold node-distinct stale content, so replicas disagreed on the grown region with no resync (out-of-sync 0) and a failover changed the bytes an application read. Resize is now provider-aware: zero-on-allocate providers (ZFS, thin, file) keep the --assume-clean fast path; thick LVM omits it so DRBD resyncs the grown region. Cozystack's default (ZFS) was unaffected.
  • Snapshot-restore onto a snapshot-less node (Bug 397, #89) — the explicit --node-name restore path did not constrain targets to the nodes that hold the snapshot (unlike the auto-place path), so a replica could be placed on a node lacking the data. The restore handler now rejects a snapshot-less target with a typed error, and the seed path refuses the skip-init-sync fast path for a blank-fallback replica so it SyncTargets the real copy; a legitimate all-clone restore keeps the fast path.
  • Tiebreaker / toggle-disk / LUKS / metadata satellite fixesr toggle-disk --diskful no longer leaves a stale TIE_BREAKER flag on the promoted replica (#54); r d --keep-tiebreaker keeps the auto-witness instead of collapsing it (#57); r c retries through the tiebreaker-collapse race instead of failing (Bug 359, #61); a TB-relocate that wedged StandAlone on a both-disks-bitmap state now recovers (#53); r td --diskless closes the LUKS mapper so the backing zvol can be reclaimed (#55); and per-volume drbdadm create-md stops vd c on an existing RD from EBUSY-looping against vol-0's attached minor (Bug 332, #58).

Fixed — REST API wire validation & idempotency

Closes Bugs 356–383: the REST surface now validates operator input at the wire boundary (before any partial state lands) and returns upstream-matching FAIL_* ApiCallRc envelopes instead of bare 200s or generic 500s.

  • Name & volume-number validation — RD/RG/Node names are capped at the 48-char k8s-label limit (Bug 360, #59) and invalid names are rejected on s r rst / rg spawn before partial state lands (#56); volume_number is validated in [0, 65535] at create (#60) and on vd d / vd l / vd m (Bug 365, #62); a non-numeric volume-number in the URL path returns an operator-grade envelope (Bug 380, #73).
  • Size, type & placement validation — non-positive volume_sizes in spawn (Bug 381, #74) and non-positive size_kib on a vd PUT regardless of --force (Bug 383, #75) are rejected; select_filter.place_count is validated at RG create + modify (Bug 367 / 361, #64); node Type is validated, defaulting empty to SATELLITE, at POST /v1/nodes (Bug 370, #65); a PUT resource-definition validates its resource_group (Bug 372, #66); net-interface PUT validates address + port at the wire (Bug 371 / 368 / 369, #63); the fresh-create pool resolver walks the RG StoragePoolList (Bug 364, #67).
  • Immutability & idempotencyStorDriver/* mutation is rejected on PUT storage-pools (Bug 373, #68) and storage-pool-definitions (Bug 375, #70); five bare-200 write endpoints now emit an ApiCallRc envelope (Bug 374, #69); drop-property (Bug 378, #71) and net-interface DELETE (Bug 379, #72) are idempotent on a missing parent.

Test infrastructure

  • L7 replay convergence assertions were silent no-ops (Bug 388, #83 / #84)all_uptodate / wait_settle filtered replicas on spec.resourceName, but the CRD field is spec.resourceDefinitionName, so the most-used "did the cluster converge" check passed vacuously across every replay. Fixed the field, tolerate Diskless/TieBreaker rows, and gave no_orphans a settle window. (This immediately caught a real drop-without-add defect in the first Bug-389 fix.)
  • e2e flake hardening (Bugs 392 / 396 / 398 — #84 / #88 / #90)state-standalone-partition and siblings flaked under CI on three substrate-level read/scan races, none of them blockstor data bugs (DRBD partition recovery is forensically correct — the writer stays SyncSource, ZFS checksums clean). The connection-state waits now read kernel ground truth instead of the lagging CRD projection; the marker round-trip distinguishes a real (stable) on-disk corruption from a non-deterministic nested-QEMU read-path glitch; and the stand's Talos config narrows the LVM global_filter so the node-side pvscan no longer races the satellite for DRBD/dm/zvol/loop backing devices (open(/dev/loopN): Device or resource busy).