Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/cli-parity-known-deltas.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Row IDs match the command-catalogue indexes used by `cli-parity-refresh.sh` (see
| 70 | `rd lp <rd>` (inherited parent-scope props shown inline) | WIRE_SHAPE | permanent | Corner-case I2. Upstream LINSTOR treats inheritance as a *reference, not a copy*: a property set on a resource-group / controller affects existing children, but the child's `list-properties` does NOT show the inherited key inline (UG9 ~4587: "As the QoS properties are inherited and not copied, you will not see the property listed in any child objects"). BS deliberately INLINES inherited Controller/RG-scope keys into the RD's bare `props` map (stampRDEffectiveProps) so `linstor rd lp` renders them — the python CLI's `rd lp` table reads `props`, and surfacing the effective value there is more operator-friendly than an empty table. The inheritance-is-a-reference half MATCHES (props resolve at read time via effectiveprops.Resolve, so a post-spawn RG change is reflected); only the inline-visibility in `rd lp` diverges, by design. Sensitive controller-scope keys are still redacted (Bug 115). Pinned by `TestCornerI2_InheritanceIsReferenceAndShownInline`. |
| 71 | `controller TcpPortAutoRange` / DRBD port + minor base | BEHAVIOR | permanent | Corner-case K3. Upstream LINSTOR's `TcpPortAutoRange` defaults to 7000-7999 and DRBD minors to 1000+ (UG9 ~2225-2231). blockstor deliberately allocates DRBD TCP ports from 20000-20999 and minors from 20000-28999 (`drbd.DefaultPortMin/Max`, `dispatcher.derivePort/deriveMinor`) so a blockstor cluster can run on the same nodes as a live upstream LINSTOR without colliding on the shared kernel `/dev/drbd<N>` minor / TCP-port namespace. The override knob is LINSTOR-compatible (`linstor n set-property <node> DrbdOptions/TcpPortRange "<min>-<max>"` via `ParseRange`); only the DEFAULT base differs. Reuse-after-`rd d` MATCHES upstream: a deleted resource's port/minor re-enters the cluster taken-set and the deterministic lowest-free allocator hands it back. Pinned by `pkg/drbd/portpool_test.go::TestK3_{DefaultPortRangeIsBlockstorWindow,PortReusedAfterDelete}`. |
| 74 | `r td <node> <rd> --cancel` (stuck-toggle recovery) | BEHAVIOR | permanent | Corner-case H1. VERSION_CONTEXT: upstream LINSTOR ≥1.34.0 recovers a stuck toggle by re-issuing the SAME command (retry) or the OPPOSITE command (cancel). The dev-stand oracle is 1.33.2 (pre-1.34; rejects with a conflict). blockstor satisfies the 1.34 contract: a repeated toggle is idempotent (retry), and the opposite-direction toggle drives the satellite demote path (`applyStorageIfDiskful` → detach + `reclaimVolumesForDiskless`), unwinding the half-carved storage (cancel). blockstor ALSO accepts an explicit `?cancel=true` query (Bug 40) the upstream client never sends — a superset, not a divergence. Pinned by `pkg/rest/resource_toggle_disk_test.go::TestToggleDiskCancelStuckAddDiskScenario4W24` + `pkg/satellite/reconciler_toggle_diskless_test.go::TestBug267ToggleToDisklessReclaimsBackingVolume`. |
| 76 | FILE_THIN `.res` disk block — `rs-discard-granularity` omitted (loop-safety), `block-size` omitted, scope resource vs volume | WIRE_SHAPE | permanent | Corner-case Q3. For a FILE_THIN volume BS renders only `discard-zeroes-if-aligned no;` and DELIBERATELY OMITS `rs-discard-granularity`, even though the loop backing reports a non-zero DISC-GRAN and upstream 1.33.2 emits `rs-discard-granularity 4096;`. Reason: rendering the granularity on a loop-backed FILE_THIN volume regresses blockstor's day0 GI-skip on the mkfs/force-primary path — the elected winner's `mkfs` full-device discard, with rs-discard-granularity active, dirties the bitmap relative to the day0-seeded peers and forces a FULL initial SyncTarget that wedges (proven: instant convergence on LVM_THIN with the same disk block, full-resync wedge on FILE_THIN; the e2e respawn-standalone-wedge regression). The thin-aware-resync win (written-bytes-only resync) is retained on the discard-zero-safe BLOCK-device pools (LVM_THIN / ZFS / ZFS_THIN), where the option is both safe and effective. Additional accepted render-shape divergences: (a) upstream additionally emits `block-size 512;` (backing logical-block-size hint, unrelated to discard/resync); BS omits it — DRBD defaults to the device block size. (b) Upstream renders the disk block at VOLUME scope; BS at RESOURCE scope — functionally identical for single-volume resources. Pinned by `pkg/satellite/discardgran_test.go::TestAutoDiskOptions_FileThin4KLoop_NoGranularity` + `TestBuildResFile_FileThinDay0SkipRender` + `TestBuildResFile_LvmThinDay0SkipRender` (L1), `tests/e2e/cli-matrix/file-thin-rs-discard-granularity.sh` (L6), `tests/operator-harness/replay/file-thin-rs-discard-granularity.yaml` (L7). |
| 75 | `DrbdOptions/auto-diskful` trigger condition | BEHAVIOR | 2026-12-31 | Corner-case H4. PARTIAL: blockstor honours the property and the RD>RG>Controller priority hierarchy (`internal/controller/auto_diskful_controller.go`), but the firing CONDITION diverges from upstream's documented "Diskless node held DRBD Primary > N minutes". blockstor arms the N-minute timer on a diskful-replica DEFICIT (count < place_count) and additionally promotes a Primary-InUse diskless replica immediately (`ResourceReconciler.maybeAutoDiskful`, no timer, property-independent). The `DrbdOptions/auto-diskful-allow-cleanup` excess-secondary trim is unimplemented (reserved key only). Property plumbing + hierarchy pinned by `internal/controller/auto_diskful_timer_test.go`. Full timed-Primary state machine + cleanup gate tracked as follow-up. |

## Open (block merge until addressed)
Expand Down
183 changes: 122 additions & 61 deletions pkg/satellite/discardgran.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,9 @@ const (
diskOptRsDiscardGranularity = "rs-discard-granularity"
diskOptDiscardZeroesAligned = "discard-zeroes-if-aligned"

// drbdYes is DRBD's affirmative value for a boolean disk option.
// drbdYes / drbdNo are DRBD's boolean disk-option values.
drbdYes = "yes"
drbdNo = "no"
)

// DRBD's accepted bounds for rs-discard-granularity. Mirrors upstream
Expand All @@ -72,17 +73,23 @@ const (
// resource-scope `disk { }` block (which DRBD applies to every
// volume).
//
// It probes each local diskful volume's backing device and is
// deliberately CONSERVATIVE for the (today hypothetical) multi-volume
// case: the options are emitted only when EVERY local diskful volume's
// provider is discard-zero-safe; if any volume is thick/file/unknown
// the whole block is omitted (safe full resync for all). The
// rs-discard-granularity, when present, is the SMALLEST across the
// volumes so the single resource-scope value stays a valid multiple of
// every device's discard granularity — never producing an UNDER-aligned
// UNMAP. A volume whose device can't be probed simply contributes no
// granularity (discard-zeroes-if-aligned still applies; DRBD falls back
// to a full copy where it can't discard).
// It probes each local diskful volume's backing device. Each volume
// contributes a discard-zeroes-if-aligned flag (yes/no per provider
// kind) and, for the discard-zero-safe BLOCK-device provider set
// (LVM_THIN / ZFS / ZFS_THIN) whose device reports a non-zero discard
// granularity, an rs-discard-granularity value. Loop-backed FILE_THIN is
// excluded from the granularity (see autoDiskOptions for the loop+mkfs
// wedge rationale).
//
// The resource-scope collapse is CONSERVATIVE for the (today
// hypothetical) multi-volume case (mergeResourceDiscardOptions):
// discard-zeroes-if-aligned is "yes" only when EVERY volume is
// discard-zero-safe (one "no" pins the resource to "no"); the
// rs-discard-granularity is the SMALLEST across the volumes so the
// single resource-scope value stays a valid multiple of every device's
// discard granularity — never an UNDER-aligned UNMAP. A volume whose
// device can't be probed simply contributes no granularity (DRBD falls
// back to a full copy where it can't discard).
//
// A diskless local replica (no diskful volumes) returns nil — there is
// no backing device to discard against.
Expand Down Expand Up @@ -114,11 +121,6 @@ func (r *Reconciler) autoDiskOptionsForResource(
device := devices[vol.GetVolumeNumber()]

volOpts := autoDiskOptions(ctx, exec, provider.Kind(), device)
if len(volOpts) == 0 {
// This volume is not discard-zero-safe — omit the whole
// block so no volume gets an unsafe option.
return nil
}

mergeResourceDiscardOptions(out, volOpts)

Expand All @@ -133,21 +135,34 @@ func (r *Reconciler) autoDiskOptionsForResource(
}

// mergeResourceDiscardOptions folds one volume's auto disk options into
// the resource-scope accumulator. discard-zeroes-if-aligned is a flat
// flag (all discard-zero-safe volumes set "yes").
// rs-discard-granularity collapses to the SMALLEST seen value so the
// single resource-scope number is a valid multiple of every volume's
// backing-device granularity.
// the resource-scope accumulator (a single `disk { }` block DRBD applies
// to every volume of the resource):
//
// - discard-zeroes-if-aligned collapses CONSERVATIVELY to "yes" only
// when EVERY volume is discard-zero-safe; a single "no" volume pins
// the whole resource to "no". Otherwise a thick/file volume sharing
// the resource could be told an aligned discard yields zero when it
// does not — a data-safety violation. (Today resources are
// single-volume, but the merge must stay safe for the multi-volume
// future.)
// - rs-discard-granularity collapses to the SMALLEST seen value so the
// single resource-scope number stays a valid multiple of every
// volume's backing-device granularity (never an UNDER-aligned UNMAP).
func mergeResourceDiscardOptions(acc, volOpts map[string]string) {
for key, val := range volOpts {
if key != diskOptRsDiscardGranularity {
acc[key] = val

continue
}

prev, have := acc[key]
if !have || smallerNumeric(val, prev) {
switch key {
case diskOptRsDiscardGranularity:
prev, have := acc[key]
if !have || smallerNumeric(val, prev) {
acc[key] = val
}
case diskOptDiscardZeroesAligned:
// "no" wins: once any volume is unsafe the resource flag
// must stay "no". Only set "yes" if not yet pinned to "no".
if acc[key] != drbdNo {
acc[key] = val
}
default:
acc[key] = val
}
}
Expand All @@ -172,54 +187,100 @@ func smallerNumeric(left, right string) bool {
// (sans the `DrbdOptions/Disk/` prefix) ready to merge into the
// rendered `disk { }` block.
//
// Both options are gated on the SAME discard-zero-safe provider set
// (LVM_THIN / ZFS / ZFS_THIN). discard-zeroes-if-aligned mirrors
// upstream's CtrlRscCrtApiHelper switch; rs-discard-granularity is
// additionally gated on that set rather than on the device's reported
// DISC-GRAN alone — see autoDiskOptions's inline rationale: a loop-
// backed FILE_THIN device reports DISC-GRAN=4096 but rendering
// rs-discard-granularity there REGRESSES fresh-create convergence on
// the mkfs/force-primary path (the wedge that broke the e2e
// respawn-standalone-wedge scenario). Keeping the option on real
// block-device thin/ZFS pools retains the genuine thin-aware-resync win
// (partially-written volume resyncs ~only the written bytes) where it is
// both safe and effective.
//
// Data-safety contract — the optimisation may ONLY skip provably-zero
// ranges (thin discard), NEVER any written data. The gates here are
// the same ones upstream LINSTOR uses:
// ranges (thin discard), NEVER any written data:
//
// - discard-zeroes-if-aligned is enabled ONLY for provider kinds
// whose backing device deterministically reads/discards as zero in
// an ALIGNED region: LVM_THIN, ZFS, ZFS_THIN. NOT for FILE_THIN
// (loop-backed sparse files do not reliably honour aligned discard
// → zero), NOT for thick LVM / plain FILE / DISKLESS. This matches
// upstream's CtrlRscCrtApiHelper provider switch exactly. Note we
// deliberately do NOT reuse IsThinOrZFS here: that helper includes
// FILE_THIN (correct for the seed-GI skip-sync gate) but FILE_THIN
// must NOT get discard-zeroes-if-aligned.
// - discard-zeroes-if-aligned is `yes` ONLY for provider kinds whose
// backing device deterministically reads/discards as zero in an
// ALIGNED region: LVM_THIN, ZFS, ZFS_THIN. For every other kind
// (FILE_THIN loop-backed sparse file, thick LVM, plain FILE,
// unknown) it is `no` — matching upstream's CtrlRscCrtApiHelper
// switch, which renders an explicit `no`. Note we deliberately do
// NOT reuse IsThinOrZFS here: that helper includes FILE_THIN
// (correct for the seed-GI skip-sync gate) but FILE_THIN must NOT
// get discard-zeroes-if-aligned=yes.
//
// - rs-discard-granularity is set ONLY when the backing block device
// actually reports a non-zero discard granularity (lsblk DISC-GRAN
// > 0). A device that reports 0 does not support discard — setting
// a non-zero granularity there would make drbdsetup reject the
// option (or, worse, attempt unsupported UNMAPs). When the
// granularity can't be determined or is 0, we OMIT the key
// entirely and DRBD falls back to a full byte-copy resync — always
// safe.
// - rs-discard-granularity is set ONLY for the discard-zero-safe
// provider set AND when the backing block device reports a non-zero
// discard granularity (lsblk DISC-GRAN > 0). A device reporting 0
// does not support discard — emitting a non-zero granularity there
// would make drbdsetup reject the option, so we OMIT the key and
// DRBD falls back to a full byte-copy resync. FILE_THIN is excluded
// even though its loop backing reports a non-zero DISC-GRAN (see the
// inline mkfs/loop wedge rationale below).
//
// When in doubt the map omits a key (full transfer), never the
// reverse. An empty/nil return means "render no auto disk options".
// When in doubt the map omits the granularity key (full transfer),
// never the reverse. The discard-zeroes flag is ALWAYS present (yes or
// no), mirroring upstream which renders it explicitly. An empty return
// would mean "render no disk block"; this function returns at least the
// discard-zeroes flag for any diskful volume, so the block renders
// whenever a backing device is present.
func autoDiskOptions(
ctx context.Context,
exec storage.Exec,
providerKind string,
devicePath string,
) map[string]string {
if !discardZeroesIfAligned(providerKind) {
// Thick LVM / plain FILE / FILE_THIN / DISKLESS / unknown:
// the backing device cannot be trusted to discard-to-zero on
// an aligned range. Emit nothing — DRBD keeps its safe
// full-copy resync.
return nil
zeroesVal := drbdNo
if discardZeroesIfAligned(providerKind) {
zeroesVal = drbdYes
}

out := map[string]string{
diskOptDiscardZeroesAligned: drbdYes,
diskOptDiscardZeroesAligned: zeroesVal,
}

// rs-discard-granularity is emitted ONLY for a discard-zero-safe
// provider kind — i.e. a real block device whose aligned discard
// deterministically reads back zero (LVM_THIN / ZFS / ZFS_THIN).
//
// WHY this is gated on the SAME provider set as discard-zeroes (and
// NOT, as an earlier revision did, on the device's reported
// DISC-GRAN alone): a loop-backed FILE_THIN volume reports a non-zero
// lsblk DISC-GRAN (4096 on the stand) yet rendering
// rs-discard-granularity into its `disk { }` block REGRESSES the
// fresh-create convergence. When the elected day0 winner force-
// primaries to run `mkfs` (the FileSystem/Type path), mkfs issues a
// full-device discard; on the loop backing — which has a documented
// DRBD kernel write-path interaction (see tests/e2e/lib.sh) — that
// discard storm, with rs-discard-granularity active, dirties the
// bitmap relative to the day0-seeded peers and forces a FULL initial
// SyncTarget that then wedges (PausedSyncT dependency between the two
// targets). Proven on the dev stand + CI: an identical 3-replica
// create with FileSystem/Type=ext4 CONVERGES INSTANTLY on LVM_THIN
// (real block device, same full disk block) but FULL-resyncs and
// wedges on FILE_THIN (loop). The e2e `respawn-standalone-wedge`
// scenario (FILE_THIN + mkfs) failed for exactly this reason; the
// `replica-add-no-resync` scenario (FILE_THIN, NO mkfs) passed, so
// the trigger is the mkfs-discard × loop × rs-discard-granularity
// interaction, not the option in isolation.
//
// For FILE_THIN we therefore render only `discard-zeroes-if-aligned
// no` (DRBD's inert default) and OMIT the granularity — DRBD then
// does a full byte-copy resync on the rare loop relocate, which is
// always correct, while the day0 skip + mkfs convergence is
// preserved. The genuine thin-aware-resync win is retained where it
// is both safe AND effective: the real block-device thin/ZFS pools.
if !discardZeroesIfAligned(providerKind) {
return out
}

// Only attempt the granularity when we have a real device path to
// probe. A diskless local replica (devicePath == "") has no
// backing device — discard-zeroes-if-aligned alone is harmless
// (it governs how DRBD treats discards it receives) and we skip
// the granularity probe.
// backing device — skip the granularity probe.
if devicePath == "" {
return out
}
Expand Down
Loading