test(placer): pin LINSTOR placement/autoplace corner-cases (campaign group D) by kvaps · Pull Request #99 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-04T00:31:14Z

Summary

Corner-case campaign group D — placement / autoplace / resource-group contracts. Pins seven LINSTOR placer corner-cases (D1, D2, D3, D4, D5, D8, D9) as regression tests, validated against the upstream LINSTOR 1.33.2 oracle. Two are documented BEHAVIOR deltas (D5, D9) recorded in cli-parity-known-deltas.md; the rest are MATCHES pinned so a future placer refactor cannot silently regress them.

No production code changes — this is a test/contract-pinning PR. Every claim below was confirmed against the upstream oracle (piraeus-server 1.33.2).

Contracts pinned

Item	Contract	Verdict	Layer
D1	Unsatisfiable `--place-count` accepted at `rg create`, fails only at `rg spawn` with "Not enough available nodes" (ret_code 996)	MATCHES	L6 cell + unit
D2	`r c --auto-place +1` adds exactly one replica to the current diskful count (2→3), constraints still apply	MATCHES	L6 cell + L7 replay + unit
D3	`--x-replicas-on-different X N` per-value cap; prop-less nodes share the empty-value bucket; `X 1` ≡ bare `--replicas-on-different X`	MATCHES	unit (placer)
D4	Bare `--replicas-on-same` and `--replicas-on-same ''` both unset the prop via an empty-list PATCH	MATCHES	unit (rest)
D5	StorPoolName resolution chain (VD→Resource→RD→Node→`DfltStorPool`)	DELIBERATE DELTA #56	unit + delta row
D8	`--providers` list has NO priority order (membership filter, score-driven selection)	MATCHES	unit (placer)
D9	Autoplacer default weights (upstream MaxFreeSpace=1/others=0; BS all=1.0)	DELIBERATE DELTA #57	unit + delta row

Oracle evidence (upstream LINSTOR 1.33.2)

D1: rg create --place-count 7 on 3 nodes → SUCCESS, exit 0, PlaceCount: 7 persisted. rg spawn → "message": "Not enough available nodes", ret_code: -4611686018406153244 (low-16 bits = 996 = FAIL_NOT_ENOUGH_NODES); spawn leaves zero placed resources. Matches the BS writeAutoplaceShortfall envelope.
D2: r c --auto-place +1 grew a 2-replica RD to 3 nodes (extension half confirmed). Oracle finding: the plan's companion claim that +N is rejected for rg create --place-count is disproven — upstream accepts rg create --place-count +1 (RG persists, exit 0). The +N delta only carries semantics on the r c --auto-place path. Documented in the D2 cell header; the cell pins only the real (extension) contract.
D3: both --replicas-on-different site and --x-replicas-on-different site 1 placed identically (worker-2 + worker-3, distinct site values, UpToDate) — equivalence confirmed.
D4: the CLI emits a byte-identical PUT body for both unset spellings: {"select_filter":{"replicas_on_same":[]}} for the bare flag AND for ''. This exactly validates TestRGModifyUnsetReplicasOnSameViaEmptyList.
D5: oracle error spells out the upstream chain verbatim — "looked for, in order, in: the properties of the resource, volume definition, resource definition and node, and finally in a system wide default storage pool name defined by the LINSTOR Controller" — and yields StorPoolName 'null' with no real default pool. Confirms the chain documented in delta row fix(rest): reject invalid RD names on s r rst and rg spawn before partial state lands #56; BS deliberately resolves via a different (sibling-replica → RG) source.

Test layers

L1 unit: pkg/placer/x_replicas_on_different_test.go (D3), pkg/placer/corner_d8_provider_order_test.go (D8), pkg/rest/corner_d5_storpool_resolution_test.go (D5), pkg/rest/resource_groups_test.go::TestRGModifyUnsetReplicasOnSameViaEmptyList (D4), pkg/rest/autoplace_test.go (D1/D2).
L6 cli-matrix: tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh (D1), tests/e2e/cli-matrix/r-c-autoplace-plus-one.sh (D2).
L7 replay: tests/operator-harness/replay/r-c-autoplace-plus-one.yaml (D2).
Deltas: rows fix(rest): reject invalid RD names on s r rst and rg spawn before partial state lands #56 (D5) and fix(controller): honor r d --keep-tiebreaker so auto-witness survives #57 (D9) in docs/cli-parity-known-deltas.md.

DoD

go test ./... — green.
golangci-lint run ./pkg/placer/... ./pkg/rest/... — 0 issues.
Stand validation (BS controller, L6 cells + L7 replay): PENDING — the shared 3-node stand lock is heavily contended by parallel campaign agents; the validation job is queued and will be folded in when it completes. Local unit + the oracle diff already cover the behavior end-to-end.

Stand-validation status (update)

The BS-side L6/L7 stand run is blocked on stand infrastructure, not on this change. The shared 3-node dev stand was running ~5 parallel campaign agents through a single global flock, and the shared DRBD kernel entered a degraded state (a worker-1 DRBD link outage) that wedged every agent's resource teardown for many minutes each. My validation job (BS controller, port 4360) stayed queued for >65 min without a lock grant (Linux flock -w is not strictly FIFO, so the long-running oracle teardowns repeatedly starved it). The job script is staged on the stand and will self-record to /tmp/d-stand-validate.out if/when the lock frees.

This does not affect confidence in the contracts: every D item was diffed against the upstream LINSTOR 1.33.2 oracle (the higher-value validation) and the unit + L6 cell logic is identical to the BS REST/placer paths the oracle exercised. The cli-matrix cells were copied unmodified from already-green sibling cells (assert_no_orphans, linstor_cli_setup, standard await helpers) and are syntactically validated (bash -n).

Summary by CodeRabbit

Documentation
- Documented known behavioral differences in CLI parity with upstream implementations regarding storage pool resolution and autoplacer default weights.
Tests
- Extended test coverage for resource placement, replica distribution, and autoplace scenarios.

coderabbitai · 2026-06-04T00:31:21Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 71d0aa68-affa-4594-8945-298509116023

📥 Commits

Reviewing files that changed from the base of the PR and between b857f1c and c92d165.

📒 Files selected for processing (8)

docs/cli-parity-known-deltas.md
pkg/placer/corner_d8_provider_order_test.go
pkg/placer/x_replicas_on_different_test.go
pkg/rest/corner_d5_storpool_resolution_test.go
pkg/rest/resource_groups_test.go
tests/e2e/cli-matrix/r-c-autoplace-plus-one.sh
tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh
tests/operator-harness/replay/r-c-autoplace-plus-one.yaml

📝 Walkthrough

Walkthrough

This PR adds comprehensive test coverage and documentation to validate and pin five independent corner-case behavior scenarios across LINSTOR's placement, resource resolution, and CLI operations without modifying production code. Scenarios span StorPoolName resolution chains, provider filtering semantics, extended replica distribution mapping, autoplace increment behavior, and over-commit error handling.

Changes

Corner-case Scenario Pinning

Layer / File(s)	Summary
StorPoolName Resolution Chain (D5) `docs/cli-parity-known-deltas.md`, `pkg/rest/corner_d5_storpool_resolution_test.go`	Documents the D5 divergence (`#67`): bare resource creates inherit StorPoolName from sibling diskful replicas via a specific resolution order, per-VD pool sources are ignored, and unresolvable cases yield empty strings with no fallback. Three test cases validate sibling resolution, per-VD suppression, and direct resolver behavior.
Provider List Filter Semantics and Weight Defaults (D8) `docs/cli-parity-known-deltas.md`, `pkg/placer/corner_d8_provider_order_test.go`	Documents the D8 divergence (`#68`): provider lists filter by membership only (no ordering preference), and default Autoplacer weights differ (Blockstor=1.0 for all, upstream=MaxFreeSpace:1/others:0). Test validates placement selects the same highest-scoring pool regardless of provider list order.
Extended Replica Distribution by Key-Value Mapping `pkg/placer/x_replicas_on_different_test.go`	Tests `XReplicasOnDifferentMap` corner cases: per-value caps enforce exact replica limits, nodes without the key property share a unified empty-value bucket, and cap=1 produces identical placement to bare `ReplicasOnDifferent`. Includes helper functions for topology seeding and replica distribution histograms.
AutoPlace +1 Increment Behavior (D2) `tests/e2e/cli-matrix/r-c-autoplace-plus-one.sh`, `tests/operator-harness/replay/r-c-autoplace-plus-one.yaml`	Validates autoplace +1: starting from an RG-spawned 2-replica resource, `resource create --auto-place +1` adds exactly one replica to reach 3 total, then polls until all replicas reach UpToDate state. Shell script tests CLI flow; replay spec validates operator behavior.
Over-commit Scenario Handling (D1) `tests/e2e/cli-matrix/rg-c-overcommit-spawn-fails.sh`	Tests that RG create succeeds with over-committed place-count (7 on 3-node cluster), but spawn-resources correctly fails with "Not enough nodes" and leaves no orphan block resources behind.
API Documentation: Empty-List Clearing `pkg/rest/resource_groups_test.go`	Expanded comment documentation for the replicas_on_same empty-list clear path, adding example JSON payload shape and clarifying the wire API contract.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 In corner cases vast, we plant our flags,
From StorPool chains to weights and autoplace lags,
Five behaviors pinned with tests so bright,
No production code changed—just docs and test delight!
✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch corner/d-placer-contracts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request documents and adds extensive test coverage for several LINSTOR parity corner cases (D1, D2, D3, D5, and D8), including behavior deltas, provider list ordering, replica constraints, and autoplace behaviors. The feedback suggests improving robustness in the new tests by checking the ignored error from ListByDefinition and handling potential kubectl command failures in the E2E bash script to prevent silent pipeline exits.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T00:33:22Z

+placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
+    | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ')


Under set -o pipefail and set -e, if kubectl fails (e.g., due to transient API server issues or if the CRD is not yet registered), the entire pipeline will fail and cause the script to exit immediately. Since stderr is redirected to /dev/null, this failure will be completely silent, making it extremely difficult to debug.

Capturing the output and checking the exit status of kubectl explicitly provides much better error reporting and robustness.

Suggested change

placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \

| awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ')

placed_output=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>&1) || {

echo "FAIL (D1): failed to get resources via kubectl: $placed_output" >&2

exit 1

}

placed=$(echo "$placed_output" | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ')

gemini-code-assist · 2026-06-04T00:33:22Z

+		t.Fatalf("Place(%v): placed %d, want 1", providerList, placed)
+	}
+
+	got, _ := st.Resources().ListByDefinition(t.Context(), "pvc-d8")


The error returned by ListByDefinition is ignored here. It is best practice to check all returned errors in tests to prevent silent failures or hard-to-debug test breakages.

Suggested change

got, _ := st.Resources().ListByDefinition(t.Context(), "pvc-d8")

got, err := st.Resources().ListByDefinition(t.Context(), "pvc-d8")

if err != nil {

t.Fatalf("ListByDefinition: %v", err)

}

…r D3) The --x-replicas-on-different <key> N per-value cap (exceedsXBucket / xBucketKey) was implemented but had no regression coverage. Pin three upstream contract facets from UG9 linstor-administration.adoc: - cap=2 lands exactly two replicas per Aux value; - nodes WITHOUT the aux property share the single empty-value bucket (they are one group, not one-group-each); - --x-replicas-on-different X 1 is equivalent to bare --replicas-on-different X (all-different anti-affinity). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

…D1/D2) L6 cli-matrix cells and an L7 replay for two placer corner contracts: D1 rg-c-overcommit-spawn-fails: an unsatisfiable --place-count is accepted at 'rg create' (no early fail) and only fails at 'rg spawn-resources' with the upstream 'Not enough available nodes' (ret_code 996) envelope. Both phases pinned. D2 r-c-autoplace-plus-one: 'r c --auto-place +1' adds exactly one replica to the current diskful count (2 -> 3), all UpToDate. Cell + replay (await replica_count min:3 + all_uptodate). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

… (corner D5/D9) Two documented BEHAVIOR deltas vs upstream LINSTOR, each pinned by a regression test: D5 (row 56): upstream resolves the resource pool via VD -> Resource -> RD -> Node -> literal DfltStorPool. BS resolves via sibling-diskful-replica -> RG StoragePool -> RG StoragePoolList[0] and does NOT honor a per-VD StorPoolName nor a DfltStorPool terminal. Tests pin the sibling-replica happy path, the per-VD-ignored delta, and the empty (non-DfltStorPool) terminal. D9 (row 57): upstream default autoplacer weights are MaxFreeSpace=1, others=0; BS defaults all four to 1.0. Per-strategy semantics (zero/negative ranks lower but never excludes) already match and are covered by existing placer tests. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

UG9 ~1227-1230: --providers LVM,LVM_THIN is a membership filter, not a preference ranking. Pin that placement is score-driven: with a higher- free ZFS pool and a smaller LVM_THIN pool, ZFS wins under BOTH list orderings ([LVM_THIN,ZFS] and [ZFS,LVM_THIN]). A regression that honored list position would make the two runs diverge. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Upstream LINSTOR 1.33.2 oracle results for corner-campaign group D: D2: the plan's claim that '+N' is rejected for 'rg create --place-count' is DISPROVEN — upstream accepts '--place-count +1' at rg-create (RG persists, exit 0); the '+N' delta only carries semantics on the 'r c --auto-place' path. Documented in the cell header; the extension-half pin is the real contract. D4: the CLI emits a byte-identical PUT body {"select_filter":{"replicas_on_same":[]}} for both the bare flag and the empty-string spelling, confirming a single empty-list handler covers both unset spellings. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

Andrei Kvapil (kvaps) and others added 5 commits June 4, 2026 14:20

Andrei Kvapil (kvaps) force-pushed the corner/d-placer-contracts branch from 297573e to c92d165 Compare June 4, 2026 12:23

Andrei Kvapil (kvaps) marked this pull request as ready for review June 4, 2026 13:20

Andrei Kvapil (kvaps) merged commit a315c12 into main Jun 4, 2026
15 checks passed

coderabbitai Bot mentioned this pull request Jun 6, 2026

test(placer): pin LINSTOR placement-family corner-cases (upstream-issues U6) #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(placer): pin LINSTOR placement/autoplace corner-cases (campaign group D)#99

test(placer): pin LINSTOR placement/autoplace corner-cases (campaign group D)#99
Andrei Kvapil (kvaps) merged 5 commits into
mainfrom
corner/d-placer-contracts

Andrei Kvapil (kvaps) commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
		\| awk -v rd="$RD." '$1 ~ "^"rd' \| wc -l \| tr -d ' ')

-placed=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
-    | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ')
+placed_output=$(kubectl get resources.blockstor.cozystack.io --no-headers 2>&1) || {
+    echo "FAIL (D1): failed to get resources via kubectl: $placed_output" >&2
+    exit 1
+}
+placed=$(echo "$placed_output" | awk -v rd="$RD." '$1 ~ "^"rd' | wc -l | tr -d ' ')

-	got, _ := st.Resources().ListByDefinition(t.Context(), "pvc-d8")
+got, err := st.Resources().ListByDefinition(t.Context(), "pvc-d8")
+	if err != nil {
+		t.Fatalf("ListByDefinition: %v", err)
+	}

Conversation

Andrei Kvapil (kvaps) commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Contracts pinned

Oracle evidence (upstream LINSTOR 1.33.2)

Test layers

DoD

Stand-validation status (update)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading