test(placer): pin LINSTOR placement/autoplace corner-cases (campaign group D)#99
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR adds comprehensive test coverage and documentation to validate and pin five independent corner-case behavior scenarios across LINSTOR's placement, resource resolution, and CLI operations without modifying production code. Scenarios span StorPoolName resolution chains, provider filtering semantics, extended replica distribution mapping, autoplace increment behavior, and over-commit error handling. ChangesCorner-case Scenario Pinning
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request documents and adds extensive test coverage for several LINSTOR parity corner cases (D1, D2, D3, D5, and D8), including behavior deltas, provider list ordering, replica constraints, and autoplace behaviors. The feedback suggests improving robustness in the new tests by checking the ignored error from ListByDefinition and handling potential kubectl command failures in the E2E bash script to prevent silent pipeline exits.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \ | ||
| | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ') |
There was a problem hiding this comment.
Under set -o pipefail and set -e, if kubectl fails (e.g., due to transient API server issues or if the CRD is not yet registered), the entire pipeline will fail and cause the script to exit immediately. Since stderr is redirected to /dev/null, this failure will be completely silent, making it extremely difficult to debug.
Capturing the output and checking the exit status of kubectl explicitly provides much better error reporting and robustness.
| placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \ | |
| | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ') | |
| placed_output=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>&1) || { | |
| echo "FAIL (D1): failed to get resources via kubectl: $placed_output" >&2 | |
| exit 1 | |
| } | |
| placed=$(echo "$placed_output" | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ') |
| t.Fatalf("Place(%v): placed %d, want 1", providerList, placed) | ||
| } | ||
|
|
||
| got, _ := st.Resources().ListByDefinition(t.Context(), "pvc-d8") |
There was a problem hiding this comment.
The error returned by ListByDefinition is ignored here. It is best practice to check all returned errors in tests to prevent silent failures or hard-to-debug test breakages.
| got, _ := st.Resources().ListByDefinition(t.Context(), "pvc-d8") | |
| got, err := st.Resources().ListByDefinition(t.Context(), "pvc-d8") | |
| if err != nil { | |
| t.Fatalf("ListByDefinition: %v", err) | |
| } |
…r D3)
The --x-replicas-on-different <key> N per-value cap (exceedsXBucket /
xBucketKey) was implemented but had no regression coverage. Pin three
upstream contract facets from UG9 linstor-administration.adoc:
- cap=2 lands exactly two replicas per Aux value;
- nodes WITHOUT the aux property share the single empty-value bucket
(they are one group, not one-group-each);
- --x-replicas-on-different X 1 is equivalent to bare
--replicas-on-different X (all-different anti-affinity).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…D1/D2)
L6 cli-matrix cells and an L7 replay for two placer corner contracts:
D1 rg-c-overcommit-spawn-fails: an unsatisfiable --place-count is
accepted at 'rg create' (no early fail) and only fails at
'rg spawn-resources' with the upstream 'Not enough available
nodes' (ret_code 996) envelope. Both phases pinned.
D2 r-c-autoplace-plus-one: 'r c --auto-place +1' adds exactly one
replica to the current diskful count (2 -> 3), all UpToDate.
Cell + replay (await replica_count min:3 + all_uptodate).
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… (corner D5/D9)
Two documented BEHAVIOR deltas vs upstream LINSTOR, each pinned by a
regression test:
D5 (row 56): upstream resolves the resource pool via
VD -> Resource -> RD -> Node -> literal DfltStorPool. BS resolves
via sibling-diskful-replica -> RG StoragePool -> RG StoragePoolList[0]
and does NOT honor a per-VD StorPoolName nor a DfltStorPool
terminal. Tests pin the sibling-replica happy path, the
per-VD-ignored delta, and the empty (non-DfltStorPool) terminal.
D9 (row 57): upstream default autoplacer weights are MaxFreeSpace=1,
others=0; BS defaults all four to 1.0. Per-strategy semantics
(zero/negative ranks lower but never excludes) already match and
are covered by existing placer tests.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
UG9 ~1227-1230: --providers LVM,LVM_THIN is a membership filter, not a preference ranking. Pin that placement is score-driven: with a higher- free ZFS pool and a smaller LVM_THIN pool, ZFS wins under BOTH list orderings ([LVM_THIN,ZFS] and [ZFS,LVM_THIN]). A regression that honored list position would make the two runs diverge. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Upstream LINSTOR 1.33.2 oracle results for corner-campaign group D:
D2: the plan's claim that '+N' is rejected for 'rg create
--place-count' is DISPROVEN — upstream accepts '--place-count +1'
at rg-create (RG persists, exit 0); the '+N' delta only carries
semantics on the 'r c --auto-place' path. Documented in the cell
header; the extension-half pin is the real contract.
D4: the CLI emits a byte-identical PUT body
{"select_filter":{"replicas_on_same":[]}} for both the bare
flag and the empty-string spelling, confirming a single
empty-list handler covers both unset spellings.
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
297573e to
c92d165
Compare
Summary
Corner-case campaign group D — placement / autoplace / resource-group contracts. Pins seven LINSTOR placer corner-cases (D1, D2, D3, D4, D5, D8, D9) as regression tests, validated against the upstream LINSTOR 1.33.2 oracle. Two are documented BEHAVIOR deltas (D5, D9) recorded in
cli-parity-known-deltas.md; the rest are MATCHES pinned so a future placer refactor cannot silently regress them.No production code changes — this is a test/contract-pinning PR. Every claim below was confirmed against the upstream oracle (piraeus-server 1.33.2).
Contracts pinned
--place-countaccepted atrg create, fails only atrg spawnwith "Not enough available nodes" (ret_code 996)r c --auto-place +1adds exactly one replica to the current diskful count (2→3), constraints still apply--x-replicas-on-different X Nper-value cap; prop-less nodes share the empty-value bucket;X 1≡ bare--replicas-on-different X--replicas-on-sameand--replicas-on-same ''both unset the prop via an empty-list PATCHDfltStorPool)--providerslist has NO priority order (membership filter, score-driven selection)Oracle evidence (upstream LINSTOR 1.33.2)
rg create --place-count 7on 3 nodes →SUCCESS, exit 0,PlaceCount: 7persisted.rg spawn→"message": "Not enough available nodes",ret_code: -4611686018406153244(low-16 bits = 996 =FAIL_NOT_ENOUGH_NODES); spawn leaves zero placed resources. Matches the BSwriteAutoplaceShortfallenvelope.r c --auto-place +1grew a 2-replica RD to 3 nodes (extension half confirmed). Oracle finding: the plan's companion claim that+Nis rejected forrg create --place-countis disproven — upstream acceptsrg create --place-count +1(RG persists, exit 0). The+Ndelta only carries semantics on ther c --auto-placepath. Documented in the D2 cell header; the cell pins only the real (extension) contract.--replicas-on-different siteand--x-replicas-on-different site 1placed identically (worker-2 + worker-3, distinct site values, UpToDate) — equivalence confirmed.{"select_filter":{"replicas_on_same":[]}}for the bare flag AND for''. This exactly validatesTestRGModifyUnsetReplicasOnSameViaEmptyList.StorPoolName 'null'with no real default pool. Confirms the chain documented in delta row fix(rest): reject invalid RD names on s r rst and rg spawn before partial state lands #56; BS deliberately resolves via a different (sibling-replica → RG) source.Test layers
pkg/placer/x_replicas_on_different_test.go(D3),pkg/placer/corner_d8_provider_order_test.go(D8),pkg/rest/corner_d5_storpool_resolution_test.go(D5),pkg/rest/resource_groups_test.go::TestRGModifyUnsetReplicasOnSameViaEmptyList(D4),pkg/rest/autoplace_test.go(D1/D2).tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh(D1),tests/e2e/cli-matrix/r-c-autoplace-plus-one.sh(D2).tests/operator-harness/replay/r-c-autoplace-plus-one.yaml(D2).docs/cli-parity-known-deltas.md.DoD
go test ./...— green.golangci-lint run ./pkg/placer/... ./pkg/rest/...— 0 issues.Stand-validation status (update)
The BS-side L6/L7 stand run is blocked on stand infrastructure, not on this change. The shared 3-node dev stand was running ~5 parallel campaign agents through a single global
flock, and the shared DRBD kernel entered a degraded state (a worker-1 DRBD link outage) that wedged every agent's resource teardown for many minutes each. My validation job (BS controller, port 4360) stayed queued for >65 min without a lock grant (Linuxflock -wis not strictly FIFO, so the long-running oracle teardowns repeatedly starved it). The job script is staged on the stand and will self-record to/tmp/d-stand-validate.outif/when the lock frees.This does not affect confidence in the contracts: every D item was diffed against the upstream LINSTOR 1.33.2 oracle (the higher-value validation) and the unit + L6 cell logic is identical to the BS REST/placer paths the oracle exercised. The cli-matrix cells were copied unmodified from already-green sibling cells (
assert_no_orphans,linstor_cli_setup, standard await helpers) and are syntactically validated (bash -n).Summary by CodeRabbit
Documentation
Tests