fix(store): surface DELETING during two-phase RD/resource deletion (corner-cases E1/E3/E4)#94
Conversation
Two-phase RD/Resource deletion (corner-case E4): when a CRD carries a DeletionTimestamp but its satellite-resource finalizer is still held (e.g. a downed satellite cannot drain DRBD), the object lingers in the store. Upstream LINSTOR surfaces this interim as the DELETE flag, which the CLI renders as DELETING in the State column of `rd l` / `r l`. The k8s store projections (crdToWireRD / crdToWireResource) previously dropped DeletionTimestamp on the floor, so a finalizer-blocked object showed its normal flags and the operator had no CLI-visible signal that a delete was pending and stuck. Add withDeletingFlag to stamp the upstream-canonical DELETE token onto the wire Flags slice when DeletionTimestamp is set. Also pin the second half of corner-case E1: a single-replica `r d` must NOT trip the rd-d snapshot guard — it drops one per-node Resource and leaves the parent RD, surviving replicas, and every Snapshot intact. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
L6 cli-matrix cells:
- rd-d-blocked-by-snapshot.sh (E1): rd d refused with a snapshot,
r d of a single replica allowed and snapshots survive, recovery
flow after the snapshot is dropped.
- rg-delete-with-rds-rejected.sh (E3): rg d refused with live RDs,
rd modify --resource-group escape, retroactive inheritance of the
new group's props (rd lp flips), then the empty group deletes.
- rd-d-deleting-surface.sh (E4): a finalizer-blocked replica surfaces
DELETING in r l during the two-phase delete; the held finalizer
blocks final removal. Simulates the stuck on-node teardown with a
test finalizer on a cce- Resource only (never stops a satellite).
L7 replay YAMLs (exit-code + convergence contracts):
- rd-d-blocked-by-snapshot.yaml (E1)
- rg-delete-with-rds-rejected.yaml (E3)
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
golinstor's `resource list -o json` is a doubly-nested array with the per-replica resource flags under `rsc_flags` (not a flat `flags` key). Align the DELETING-surface probe with the shape the other cli-matrix cells already use. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR implements a two-phase resource deletion mechanism for Blockstor by introducing a DELETE wire-level flag. The flag is projected from Kubernetes deletion timestamps into both ResourceDefinitions and Resources via a new ChangesTwo-phase delete implementation and verification
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements and tests several deletion semantics and corner cases in blockstor, specifically focusing on two-phase deletion visibility (surfacing the DELETE flag on resources and resource definitions with a DeletionTimestamp), snapshot-blocked resource definition deletion, and resource group deletion constraints. The changes include new helper functions, unit tests, and comprehensive E2E CLI-matrix and replay tests. The review feedback suggests optimizing the shell scripts by replacing subshell-spawning $(date +%s) calls with the Bash built-in $SECONDS variable inside wait loops to improve test execution efficiency in CI environments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| deadline=$(( $(date +%s) + 60 )) | ||
| snap_seen=false | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 60 )) | |
| snap_seen=false | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 60 )) | |
| snap_seen=false | |
| while (( SECONDS < deadline )); do |
| deadline=$(( $(date +%s) + 60 )) | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 60 )) | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 60 )) | |
| while (( SECONDS < deadline )); do |
| deadline=$(( $(date +%s) + 120 )) | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 120 )) | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 120 )) | |
| while (( SECONDS < deadline )); do |
| deadline=$(( $(date +%s) + 60 )) | ||
| deleting_seen=false | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 60 )) | |
| deleting_seen=false | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 60 )) | |
| deleting_seen=false | |
| while (( SECONDS < deadline )); do |
| deadline=$(( $(date +%s) + 120 )) | ||
| cleared=false | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 120 )) | |
| cleared=false | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 120 )) | |
| cleared=false | |
| while (( SECONDS < deadline )); do |
| deadline=$(( $(date +%s) + 30 )) | ||
| flipped=false | ||
| while (( $(date +%s) < deadline )); do |
There was a problem hiding this comment.
Using $(date +%s) in a loop spawns a subshell and forks an external date process on every single iteration. This is highly inefficient and can slow down test execution under resource-constrained CI environments.
Using the Bash built-in $SECONDS variable is much more efficient as it tracks the elapsed time since shell startup entirely in-memory without spawning any external processes.
| deadline=$(( $(date +%s) + 30 )) | |
| flipped=false | |
| while (( $(date +%s) < deadline )); do | |
| deadline=$(( SECONDS + 30 )) | |
| flipped=false | |
| while (( SECONDS < deadline )); do |
Summary
Closes corner-case plan items E1, E3, E4 (deletion semantics) from the LINSTOR corner-case campaign. Verifies the documented LINSTOR deletion rules against blockstor, pins them with L1 + L6 + L7 tests, and fixes the one divergence (E4).
rd dblocked by snapshots;r dof a single replica is NOT (snapshots survive)rd dguard already present inhandleRDDeletewith upstream code514|MASK_ERROR(HTTP 409).r donly drops one Resource and never touches Snapshots. Stand: rejection textCannot delete resource definition '<rd>' because it has snapshots.; single-replicar dsucceeds, snapshot survives.rg deletewith live RDs rejected;rd modify --resource-groupescape; new group's props inherited retroactivelyCannot delete resource group '<rg>' because it has existing resource definitions.(code501|MASK_ERROR). Reassign works;rd lpflips from the old group's marker to the new group's after the reassign — inheritance is a live reference, not a create-time copy (answers the E3 ⚖️ question on the BS side).rd lshows DELETING in the interim; a downed satellite blocks final removalmetadata.DeletionTimestamp, so a finalizer-blocked RD/Resource showed its normal flags with no CLI-visible signal that a delete was pending. NowwithDeletingFlagsurfaces the upstream-canonicalDELETEflag whenDeletionTimestampis set, solinstor rd l/r lrender the State column asDELETINGwhile a held finalizer (e.g. a downed satellite) blocks finalisation.Changes
pkg/api/v1: addResourceFlagDelete = "DELETE"(upstream-canonical token; distinct from the internalDELETINGpeer-forget constant).pkg/store/k8s:crdToWireRD/crdToWireResourcenow stampDELETEonto the wireFlagswhen the CRD carries aDeletionTimestamp(E4).withDeletingFlag+ projection unit tests; a new REST test pinning that a single-replicar dleaves snapshots intact (E1 second half).rd-d-blocked-by-snapshot.sh(E1),rg-delete-with-rds-rejected.sh(E3),rd-d-deleting-surface.sh(E4).rd-d-blocked-by-snapshot.yaml(E1),rg-delete-with-rds-rejected.yaml(E3).Test evidence
go test ./...green;golangci-lint run0 issues on touched packages.Stand validation (dev, blockstor v0.1.9 + this branch's YAMLs/cells):
The E4 cli-matrix cell (
rd-d-deleting-surface.sh) asserts the newDELETE-flag wire surface and therefore requires this branch's controller build deployed; it is pinned at L1 against the store projection. It safely simulates the "stuck on-node teardown" with a test finalizer on acce-Resource only — it never stops a satellite.Notes / leftovers
cli-parity-known-deltas.mdrow is added; E4 moves toward parity (a fix), not a deliberate delta.rd lpflip).Oracle (upstream LINSTOR 1.33.2) cross-check
Ran the same E1/E3 sequences against the upstream Java controller on the shared stand (FILE_THIN
pool):rd dwith a snapshot present: exit 10,Cannot delete resource definition 'orc-e-snap-rd' because it has snapshots.Snapshots ARE supported on FILE_THIN upstream; a single-replicar dsucceeded and the snapshot survived (Successfulinsnapshot list) — matching blockstor exactly.rg dwith a live RD: exit 10,Cannot delete resource group 'orc-e-src' because it has existing resource definitions.Reassign viard modify --resource-groupsucceeded and re-stamped the new group'sStorPoolNameonto the RD (retroactive inheritance confirmed at the effective level).rd lpshowsNo property map found for this entry.— it does NOT inline inherited RGAux/*markers at the RD scope. blockstor DOES inline them (Bug 105, deliberate python-CLI parity), which is what lets the L6 cell observe thesrc→dstmarker flip directly. The inheritance itself is live in both; only therd lprendering differs, and that divergence is independent of these deletion-semantics items.Summary by CodeRabbit
New Features
Tests