Skip to content

test(placer): pin LINSTOR placement-family corner-cases (upstream-issues U6)#117

Merged
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
issues/u6-placement-family
Jun 6, 2026
Merged

test(placer): pin LINSTOR placement-family corner-cases (upstream-issues U6)#117
Andrei Kvapil (kvaps) merged 4 commits into
mainfrom
issues/u6-placement-family

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) commented Jun 6, 2026

Summary

Mines and pins the LINSTOR placement family of corner-cases from real upstream LINBIT/linstor-server user reports (campaign-2, agent U6), clean-room. Re-examines one previously-recorded deliberate delta and adds regression coverage at L1 + L6 + L7 for the placement/autoplace surface.

All seven scoped items resolve to MATCHES (no production-code change needed) — the placer already behaves correctly; this PR is the missing test coverage plus a documented re-examination verdict.

Verdict table

Item Topic Verdict Evidence
U89 / corner-D9 free-space under-weighting / hot-spotting MATCHES (delta re-examined, retained) composite() scores MaxFreeSpace = Free/Total — the only discriminator between identical pools, so the emptier pool wins the first placement; MinRscCount then spreads. No hot-spotting. known-deltas row 72 updated.
U88 / U113 constraint honored on extension MATCHES anti-affinity seeded from existing replicas (newStatetopologySeen); --auto-place +1 --replicas-on-different lands on a different site. Validated on stand (L7 PASS).
U139 / U94 "autoplaced on 0 nodes" must error MATCHES contradictory / unsatisfiable constraints under-place (placed < want) → REST 409 FAIL_NOT_ENOUGH_NODES (996), never success-on-zero. Validated on stand (L6 + L7 PASS, 996 envelope captured).
U253 rg modify --place-count UP converges MATCHES (already pinned) rgNeedsRebalance stamps rebalance-pending on increase; RGRebalanceReconciler refills. Existing TestRGRebalanceReconcilerSpawnsAdditionalReplicas + TestRGModifyStampsRebalanceAnnotationOnPlaceCountIncrease.
U201 place-count DOWN below current MATCHES (already pinned) scale-down is a clean no-op: rgNeedsRebalance returns false, placer is strictly additive (never trims, never throws). Existing TestRGRebalanceReconcilerIsAdditiveOnly + TestRGModifyNoAnnotationOnPlaceCountDecrease.
U467 rebalance × eviction MATCHES (covered) additive rebalance + eviction exclusion (disabledNodes) converge without flapping; covered by existing rebalance/eviction tests. A 3-worker replay variant was scoped but the existing auto-diskful-evicted-node coverage + additive-only invariant already pin the convergence; no new gap found.
U83 / U21 pool-pinned / diskless-pool autoplace MATCHES -s <pool> is a hard filter (matchesPoolFilter) — replicas land only on pool-hosting nodes even when a wrong-pool node has more free space. Validated on stand (L6 PASS). Diskless-only pool pin yields zero diskful candidates (oracle note below).

Tests added

L1

  • pkg/placer/issues_u6_placement_family_test.go — U89 ceteris-paribus emptier-wins + spread-not-hotspot, U88 constraint-on-extension, U139 contradictory + unsatisfiable under-place, U83 pool-pin.
  • pkg/rest/autoplace_issues_u6_test.go — U139 REST 409 FAIL_NOT_ENOUGH_NODES envelope (contradictory + unsatisfiable node-list).

L6 tests/e2e/cli-matrix/r-c-autoplace-pool-pinned.sh — U83 pool-pin honored + U139 contradictory-constraint fail-short.

L7

  • tests/operator-harness/replay/autoplace-contradictory-rejected.yaml
  • tests/operator-harness/replay/autoplace-constraint-on-extension.yaml

Docs — known-deltas row 72 (D9) updated with the U89 re-examination verdict.

Stand validation (dev stand, BS v0.1.10 baseline apiserver)

POOL-PINNED-CELL (U83 + U139):                   PASS
REPLAY autoplace-contradictory-rejected:         PASS
REPLAY autoplace-constraint-on-extension:        PASS

r-full-lifecycle (placer-surface gate) reaches Phases 1–5 green on lvm-thin (autoplace, physical-delete + re-create, relocate, diskless create — the entire placer-selection surface). It does NOT reach full-green because of a baseline stand DRBD-resync condition independent of this diff:

Both are toggle-disk / DRBD-resync convergence on the stand kernel, not placement, and reproduce against baseline production code (this PR changes zero production code — it is test-only). A fresh lvm-thin 2R autoplace reached UpToDate,UpToDate in ~8s during diagnosis, confirming the placer/selection path is healthy.

Notes

  • Test-only PR — no pkg//internal/ production code touched.
  • CLI arg-order: --replicas-on-same/--replicas-on-different take argparse nargs='*' and greedily eat the trailing positional RD name; the cell/replays keep a self-contained --storage-pool=<pool> as the last flag before the RD name (same class as the documented --storage-pool space-form quirk).

Oracle / open questions

  • Diskless-only-pool autoplace (--auto-place 1 -s DfltDisklessStorPool): BS's matchesPoolFilter drops the DISKLESS kind, so a pure diskless-pool pin yields zero diskful candidates. Upstream semantics for a diskless-pool-pinned autoplace (does it create a diskless replica, or fail?) were not oracle-diffed in this pass — flagged for a follow-up oracle comparison.

Summary by CodeRabbit

  • Documentation

    • Clarified autoplacer weight default behavior documentation, confirming equal-weight default is intentional.
  • Tests

    • Added comprehensive test coverage for autoplace functionality including pool-pinned placements, contradictory constraint handling, zone-based replica constraints, and edge case validation across unit, REST, CLI, and operator scenarios.

Andrei Kvapil (kvaps) and others added 4 commits June 6, 2026 05:52
…t-on-extension, no success-on-zero

Re-examines and pins the LINSTOR placement corner-cases mined from real
upstream user reports (campaign-2):

- U89 / corner-D9: re-exam of BS's equal-weight autoplacer default vs
  upstream MaxFreeSpace-dominant. Verdict: the equal-weight scorer does
  NOT hot-spot — MaxFreeSpace is the only discriminator between
  identical pools so the emptier pool wins ceteris paribus, and
  MinRscCount spreads subsequent replicas. Delta retained as deliberate;
  known-deltas row 72 records the conclusion.
- U88 / U113: replicas-on-different honored on an EXTENSION of an
  existing resource (anti-affinity seeded from existing replicas).
- U139 / U94: contradictory / unsatisfiable constraints must under-place
  (placed < want) and surface a 409 FAIL_NOT_ENOUGH_NODES, never a
  success-on-zero-nodes. Pinned at both the placer and REST layers.
- U83 / U21: pool-pinned autoplace (-s <pool>) lands only on nodes
  hosting that pool, even when a wrong-pool node has more free space.

L1 only in this commit; L6/L7 artifacts follow.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…-constraint reject, constraint-on-extension

L6 cli-matrix cell r-c-autoplace-pool-pinned.sh:
  - U83/U21: --auto-place -s <pool> lands replicas only on nodes hosting
    that pool (hard filter, not soft preference).
  - U139/U94: contradictory --replicas-on-same X --replicas-on-different X
    fails short with FAIL_NOT_ENOUGH_NODES (996), never success-on-zero.

L7 replay YAMLs:
  - autoplace-contradictory-constraints-rejected.yaml: operator-CLI repro
    of the upstream "successfully autoplaced on 0 nodes" complaint; the
    autoplace exits 10 and no diskful replicas leak.
  - autoplace-constraint-on-extension.yaml: --auto-place +1
    --replicas-on-different site honors the constraint against the
    EXISTING replica, converging to 2 UpToDate replicas across sites.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…unting

- pool-pinned cell: assert on DISKFUL count (auto-quorum adds a DISKLESS
  TIE_BREAKER witness so total resources = 3 for an auto-place=2 RD).
- contradictory + constraint-on-extension replays + cell: reorder so the
  self-contained --storage-pool=<pool> flag is the LAST token before the
  RD positional. python-linstor's --replicas-on-same/--replicas-on-different
  take nargs='*' and otherwise greedily consume the trailing RD name
  (the same class as the documented --storage-pool space-form quirk).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…fier limit

The runner derives the resource name as replay-<workflow-name>-<rand>;
the old name pushed it to 56 chars, past the LINSTOR 48-char RD-name
limit (rd create -> exit 10 FAIL_INVLD_RSC_DFN_NAME). Rename the
workflow autoplace-contradictory-constraints-rejected ->
autoplace-contradictory-rejected so the generated name fits.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Need an answer fast? Review this PR in Change Stack to ask focused questions about the PR or a changed range.

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3bddd68-0191-4199-9111-0b2892d59ea5

📥 Commits

Reviewing files that changed from the base of the PR and between e070bd4 and e1e0f04.

📒 Files selected for processing (6)
  • docs/cli-parity-known-deltas.md
  • pkg/placer/issues_u6_placement_family_test.go
  • pkg/rest/autoplace_issues_u6_test.go
  • tests/e2e/cli-matrix/r-c-autoplace-pool-pinned.sh
  • tests/operator-harness/replay/autoplace-constraint-on-extension.yaml
  • tests/operator-harness/replay/autoplace-contradictory-rejected.yaml

📝 Walkthrough

Walkthrough

This PR adds comprehensive test coverage for autoplace constraint validation across the test pyramid: documentation justification, unit tests, REST API tests, E2E CLI tests, and operator harness replay scenarios. Tests validate pool-pinned placement, empty-pool preference, zone constraints, and contradictory constraint failure semantics.

Changes

Autoplace constraint validation tests

Layer / File(s) Summary
Autoplacer weight behavior justification
docs/cli-parity-known-deltas.md
Extended CLI parity delta justification for Autoplacer/Weights/* to include post-U89 re-examination confirming equal-weight default behavior is intentionally retained rather than changed to free-space-dominant.
Placement family unit tests
pkg/placer/issues_u6_placement_family_test.go
Introduced seedPoolWithFree helper and six test cases: U89/D9 tests verify empty-pool preference and no hot-spotting across multiple placements; U88 test verifies ReplicasOnDifferent zone constraint honored on extension; U139 tests verify contradictory constraints under-place and unsatisfiable node lists do not silently succeed; U83 test verifies pool-pinned autoplace lands only on pool-hosting nodes.
REST API constraint validation tests
pkg/rest/autoplace_issues_u6_test.go
Added HTTP-layer tests: TestU139ContradictoryConstraintsReturns409 validates 409 Conflict response with FAIL_NOT_ENOUGH_NODES (996) sub-code; TestU139UnsatisfiableNodeListReturns409 validates 409 Conflict for unsatisfiable node lists, both confirming no silent partial-success behavior.
E2E CLI pool-pinned and constraint tests
tests/e2e/cli-matrix/r-c-autoplace-pool-pinned.sh
End-to-end validation: U83 test discovers pool-hosting nodes and verifies replicas land only on nodes hosting the pinned storage pool; U139 test stamps distinct site labels and verifies contradictory constraints fail with "Not enough nodes" message without partial success.
Operator harness replay tests
tests/operator-harness/replay/autoplace-constraint-on-extension.yaml, tests/operator-harness/replay/autoplace-contradictory-rejected.yaml
Replay scenarios: first validates U88 ReplicasOnDifferent site constraint honored when extending resource with additional replica; second validates U139 contradictory same/different site constraints rejected with exit code 10 (FAIL_NOT_ENOUGH_NODES) without orphan resources.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cozystack/blockstor#99: Originally added the controller set-property Autoplacer/Weights/* CLI parity delta; this PR extends that delta's justification after re-examining upstream issue U89.

Poem

🐇 With tests now stacked from base to peak,
We validate constraints both strong and weak,
Empty pools preferred, zones held tight,
Contradictions fail with all due might—
Blockstor's placement stands the test,
From unit to E2E, simply blessed! 🌟

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issues/u6-placement-family

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive unit, REST, and E2E/replay tests to pin and verify placement-family behaviors and address several upstream issues (such as U83, U88, U89, and U139). These tests ensure that pool-pinned autoplacements land on correct nodes, contradictory constraints fail with a 409 conflict instead of a silent success-on-zero, and constraints are honored during resource extensions. Additionally, documentation in docs/cli-parity-known-deltas.md was updated to reflect findings on default weights. A review comment identified a potential panic in TestU83PoolPinnedAutoplaceLandsOnlyOnPoolNodes where t.Errorf is used instead of t.Fatalf when checking the length of a slice before indexing it.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +417 to +419
if len(got) != 1 || got[0].NodeName != "n-has" {
t.Errorf("pool-pinned autoplace must land only on the pool's node: got %+v, want n-has/fast", got)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using t.Errorf here allows the test execution to continue even if len(got) is 0. This will cause a panic on line 421 when attempting to access got[0].Props because got is empty. Changing this to t.Fatalf will safely halt the test execution if the assertions fail.

Suggested change
if len(got) != 1 || got[0].NodeName != "n-has" {
t.Errorf("pool-pinned autoplace must land only on the pool's node: got %+v, want n-has/fast", got)
}
if len(got) != 1 || got[0].NodeName != "n-has" {
t.Fatalf("pool-pinned autoplace must land only on the pool's node: got %+v, want n-has/fast", got)
}

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 6, 2026 04:53
@kvaps Andrei Kvapil (kvaps) merged commit 832f00e into main Jun 6, 2026
19 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant