fix(snapshot): close corner-case snapshot edges G1-G5 (rollback/restore/autosnapshot/thick-pool)#100
Conversation
|
Warning Review limit reached
More reviews will be available in 17 minutes and 32 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces several fixes and enhancements to snapshot and restore behaviors, aligning them with upstream LINSTOR semantics. Specifically, it ensures that an auto-snapshot Keep value of <= 0 falls back to the default of 10, blocks snapshot restores when target volume definitions conflict, and rejects snapshot creation on thick storage pools that do not support copy-on-write. The review feedback suggests adding a defensive nil check for the resolved storage pool in nonSnapshotPoolLocations to prevent a potential nil pointer dereference.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| continue | ||
| } | ||
|
|
||
| if !pool.SupportsSnapshot { |
There was a problem hiding this comment.
Defensive programming check: If pool is a pointer type, it is possible for it to be nil (for example, in mock environments or custom store implementations where Get might return nil, nil). To prevent a potential nil pointer dereference panic, please add a check to ensure pool is not nil before accessing pool.SupportsSnapshot.
if pool == nil {
continue
}
if !pool.SupportsSnapshot {3a73df5 to
4adbf2a
Compare
…ack deltas (corner G5/G1/G2) Corner-case campaign items G5, G1, G2 (snapshot edges). G5: handleSnapshotCreate now refuses a snapshot when any targeted diskful replica lives in a storage pool whose provider cannot take copy-on-write snapshots (thick LVM, plain FILE, DISKLESS), mirroring upstream LINSTOR's controller-side FAIL_SNAPSHOTS_NOT_SUPPORTED (994) refusal. Pre-fix the REST surface was ungated and blockstor's thick-LVM provider would silently stage a 25%ORIGIN COW overlay (Bug 245 footgun). The gate is best-effort: an unresolvable pool passes through. G1/G2: pin the existing deliberate 501 rollback contract for the snapshot-less-node edge (upstream >=1.31.2 resurrects deleted replicas on rollback; blockstor refuses in-place rollback entirely and points at snapshot-restore-resource). InUse/Primary still 409s first. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… restore-after-rd (corner G3) Corner-case campaign item G3 (two-phase restore edge cases). G3a: pin that snapshot restore works after the source RD's resources were all deleted (r d) — the surviving RD shell + snapshot restore into a new RD with replicas on the snapshot's recorded nodes. Matches the upstream guide: restore works 'even when the original resource has been removed from the nodes where the snapshots were taken'. G3b: snapshot-restore-volume-definition now refuses up front with a typed FAIL_EXISTS_VLM_DFN (502) envelope when the target RD already carries a volume-definition whose number collides with the snapshot's, naming the offending number — rather than letting the per-VD hydrate Create surface a bare 409 only after a partial mutation. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…ner G4) Corner-case campaign item G4 (AutoSnapshot semantics). The upstream administration guide is explicit: 'If AutoSnapshot/Keep is omitted (or <= 0), LINSTOR will keep the last 10 snapshots by default' (linstor-administration.adoc ~2467). Pre-fix blockstor treated a Keep value <= 0 as 'disable cleanup, keep everything', letting auto-snapshots accumulate unbounded — a silent divergence from upstream. pruneOldAutoSnapshots now falls back to DefaultAutoSnapshotKeep (10) on any non-positive Keep instead of returning early. RunEvery cadence, manual-snapshot exclusion, and the absent-Keep default were already correct and remain pinned. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… G1/G2) Add row 56 to the parity accept-list: BS deliberately does not expose in-place snapshot rollback (501 + actionable text pointing at the safe snapshot-restore-resource path). zfs rollback / lvconvert --merge are destructive and upstream >=1.31.2 additionally resurrects deleted replicas on rollback — both footguns we refuse to make reachable. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…e conflict L6/L7 for corner-case G3b. The cli-matrix cell drives the operator CLI end-to-end: source RD + snapshot, target RD with a clashing volume 0, asserts `snapshot volume-definition restore` is rejected (non-zero) and the target's own VD is left untouched, then a control restore into a fresh empty RD succeeds. The L7 replay codifies the same operator sequence with the reject (exit 10) / accept contract. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…jection L6 for corner-case G5. The cell self-provisions a throwaway loop-backed thick LVM pool on worker-1 (the dev stand ships only thin pools), registers it as a blockstor `LVM` pool, spawns a tiny RD on it, and asserts `snapshot create` is rejected with a snapshots-not-supported error and leaves no orphan Snapshot CRD. Skips (exit 0) when a thick pool cannot be provisioned. No L7 replay: provisioning a thick pool requires node-level loop+VG setup the CLI-only replay-runner cannot drive — the L1 gate tests + this L6 cell cover the surface. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Scope refuseSnapshotOnNonSnapshotPool to the resolved snapshot target set (snap.Nodes) — the same set the offline pre-check uses — so a caller that snapshots only a snapshot-capable subset of an otherwise mixed-provider RD is not falsely refused. Empty target set means 'all diskful replicas' (the default-fan-out case). Adds a subset-scoping regression test. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Both snapshot L6 cells provision a single-replica diskful source, but called wait_uptodate (a 2-peer convergence helper that requires a third positional $peer and trips `set -u` with only two args). Switch to the single-node wait_disk_state poll. Found running on the dev stand. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
3d0f231 to
b187e2e
Compare
Corner-case campaign — PKG-G (snapshot edges)
Closes plan items G1–G5 of the LINSTOR corner-case campaign (snapshot rollback / restore / autosnapshot / thick-pool semantics). Doc references into
linstor-administration.adocs-linstor-snapshots.Rebased onto current
main(#94–#98). Delta row renumbered to #67 (docs/cli-parity-known-deltas.md).Verdict table (with oracle + stand evidence)
Oracle = upstream LINSTOR 1.33.2 (piraeus-server, FILE_THIN
pool) on the dev stand, captured on a healthy kernel. BS stand = deployedmainimage (does NOT yet carry this PR's fixes).snapshot rollbackwhile InUse/Primarysafety-snap-*and merges in place ("Resource already deployed as requested" x3 nodes) — destructive footgun confirmed.TestSnapshotRollback*.snapshot rollbackonto snapshot-less node(dev-worker-3) Volume number 0 ... [FILE THIN] created,Added disk on dev-worker-3,Resource adjusted.TestSnapshotRollbackSnapshotlessNodeReaches501_G2.r dof source resourcesr dof every source replica (RD shell + snapshot persist).ret_code -4611686018409823753=FAIL_EXISTS_VLM_DFN,A volume definition with the number 0 already exists in resource definition orc-g-r3. Control (empty RD) →ret_code 17563650success.FAIL_EXISTS_VLM_DFN(502) envelope naming the offending volume number, pre-checked before any Store mutation. L6 cell PASS on deployed main (reject + target VD untouched), L7 replay PASS.Keep <= 0Keepomitted or<= 0→ keep last 10 (was: keep everything). RunEvery + manual-snapshot exclusion already correct. Pinned L1 (stub-clock).snapshot createon a thick poolret_code -4611686018409823262=FAIL_SNAPSHOTS_NOT_SUPPORTED,Storage driver LVM does not support snapshots.(captured on a self-provisioned loop-backed thick LVM VG, torn down clean).handleSnapshotCreatenow refuses when any diskful replica sits in a non-snapshot-capable pool (thick LVM / plain FILE / DISKLESS) withFAIL_SNAPSHOTS_NOT_SUPPORTED(994). Pinned L1; oracle-validated.Validation classification
mainsnap-vd-restore-volume-conflict-rejected.shsnap-vd-restore-volume-conflict-rejected.yamlsnap-create-thick-pool-rejected.shKeep<=0fallbackNote on G3b classification: the operator-observable contract (reject the clashing restore, leave the target untouched, allow restore into an empty RD) already holds on the deployed
mainimage — so the L6/L7 artifacts PASS rather than reproduce a RED. This PR's net change for G3b is the typedFAIL_EXISTS_VLM_DFNenvelope (vs a bare 409) and the pre-Store conflict pre-check, both pinned at L1.Tests
Keep=0andKeep<0fall back to default 10.snap-vd-restore-volume-conflict-rejected.sh(G3b, stand PASS),snap-create-thick-pool-rejected.sh(G5, self-provisions a loop-backed thick LVM pool; SKIPs when the satellite won't register it).snap-vd-restore-volume-conflict-rejected.yaml(G3b) — runs against thin pools, stand PASS.go test ./...green;golangci-lint run0 new issues (both re-verified post-rebase).Stand validation outputs (deployed
mainimage)Notes / leftovers
blockstor-lvmVG, sostorage-pool create lvm <loop-vg>is refused at the discovery check. The G5 gate is therefore validated by the L1 create-gate tests and the upstream-oracle capture (which produced the exactFAIL_SNAPSHOTS_NOT_SUPPORTEDstring BS mirrors), not by a stand L6 PASS. The cell self-skips rather than false-FAIL.orc-g-) and BS test resources (prefixcli-matrix-g*) were all torn down; no leftover RDs, storage pools, VGs, or loop devices on either controller (verified post-run).Do NOT merge — draft for launcher sequencing.