fix(rest): gate capacity on the real linstor-csi single-node create path (issue #45) by kvaps · Pull Request #51 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-05-31T12:29:21Z

Summary

Closes #45 by gating capacity on the actual endpoint linstor-csi hits when a StorageClass sets nodeList + placementCount=1. PR #47 gated /autoplace and the single-node alias's bulk variant, but the real CSI request lands on POST /v1/resource-definitions/{rd}/resources/{node} via golinstor's Resources.Create, which routes to handleResourceCreateOnNode with no capacity check.

Phase 1 — empirical endpoint capture on stand

Apiserver access lines from a CreateVolume against linstor.csi.linbit.com/storagePool: lvm-thin, placementCount=1, nodeList=dev-worker-1:

POST /v1/resource-groups                                              201
POST /v1/resource-groups/sc-…/volume-groups                           201
PUT  /v1/resource-groups/sc-…                                         200
POST /v1/resource-definitions                                         201
POST /v1/resource-definitions/pvc-…/volume-definitions                200
POST /v1/resource-definitions/pvc-…/resources/dev-worker-1            201   <-- gate target
PUT  /v1/resource-definitions/pvc-…                                   200

No /spawn call. linstor-csi's manual scheduler (selected because nodeList flips PlacementPolicy = Manual in volume.Parameters) calls Resources.Create per node, which golinstor maps to the single-node alias above.

Phase 2 — fix

Inline per-pool gate added in createOneResource (shared by both the bulk endpoint and the single-node alias):

Resolves the target pool via a 4-tier fallback (Props["StorPoolName"] → RD.Props["StorPoolName"] → RG.SelectFilter.StoragePool → RG.SelectFilter.StoragePoolList[0]). Tier 4 is new — linstor-csi's ToResourceGroupModify writes the SC pool into StoragePoolList, which the existing fallback chain doesn't read.
Reads pool.FreeCapacity directly (not computeSizeInfo). The cluster-wide MaxVlmSizeInKib aggregation would mask a full target pool behind sibling pools on other nodes — a 13 GiB lvm-thin on worker-1 at 100% used while worker-2's is empty must refuse r c worker-1 <rd> even though the cluster cap remains 13 GiB.
Skips for DISKLESS / TIE_BREAKER (no backing storage), unresolved pool (diskless fallback), and definitions-only creates (no VDs).
Returns 409 with the same RetCode bits + envelope shape PR fix(rest): gate autoplace on cluster MaxVolumeSize #47 uses on /autoplace, so operators can classify both paths with one rule.

pkg/placer is NOT touched. A previous attempt exported placer.MatchesPoolFilter and that intercepted the migrate flow, regressing node-replace-hardware (the bundle was reverted in PR #47 final force-push). The migrate flow goes through Resource.Spec.Nodes reconciliation in the controller, not through this REST handler.

6 new unit tests in resource_create_issue_45_test.go: reject-on-full / allow-on-fit / skip-diskless / bulk-endpoint / RG-StoragePoolList-fallback / skip-no-VDs.

Phase 3 — stand validation

observability-capacity-correlation on dev stand (lvm-thin, worker-1, 13 GiB total filled to 0 KiB free):

Level 1 (PVC pending + event)     : OK (capacity-keyword)
Level 2a (sp list free <100 MiB)  : free=0 KiB
Level 2b (autoplace rejection)    : OK
Level 3 (lvm-thin view)           : free=0 KiB
>> OBSERVABILITY-CAPACITY-CORRELATION OK

observability-capacity-correlation re-enabled in the e2e-piraeus job. node-replace-hardware stays excluded — its failure (SP stand.dev-worker-3 already exists after linstor n d --lost) is in the satellite/controller path (PR #48 follow-up), unrelated to this REST gate.

Test plan

go test ./pkg/rest/... — full suite green (58s)
golangci-lint run ./pkg/rest/... — 0 issues
pkg/placer/... tests untouched and green
Stand validation: observability-capacity-correlation PASSES on dev stand
Stand validation confirmed: node-replace-hardware failure is the pre-existing PR fix(rest): refuse duplicate SP POST; strip internal annotations (bughunt v0.1.3 P1) #48 regression on storage-pool re-registration, NOT caused by this gate

Refs #45

Summary by CodeRabbit

New Features
- Storage pool capacity is now validated during resource creation. When a pool is full, requests are rejected with a structured error response, preventing resource persistence.
Tests
- Added comprehensive test suite verifying pool capacity validation behavior across various scenarios.
Chores
- Updated CI/E2E test workflows to enable additional observability test scenarios.

…ath (#45) linstor-csi's `manual` scheduler — selected when a StorageClass sets `nodeList` + `placementCount=1` — fires `POST /v1/resource-definitions/{rd}/resources/{node}` via golinstor's `Resources.Create`, NOT `/autoplace`. The PR #47 capacity gate on `/autoplace` therefore never saw this traffic, and a CreateVolume against a now-full pool placed the replica anyway: the PVC reached Bound immediately and only failed later at satellite-side LV allocation. Per Phase 1 capture on the dev stand the CSI flow is: POST /v1/resource-groups POST /v1/resource-groups/{rg}/volume-groups PUT /v1/resource-groups/{rg} POST /v1/resource-definitions POST /v1/resource-definitions/{rd}/volume-definitions POST /v1/resource-definitions/{rd}/resources/<worker> <-- here PUT /v1/resource-definitions/{rd} The fix adds an inline per-pool capacity check in `createOneResource`, shared by both the bulk `POST /v1/resource-definitions/{rd}/resources` and the single-node alias `POST /v1/resource-definitions/{rd}/ resources/{node}`. The gate: - Resolves the target pool from a four-tier fallback chain so the CSI wire shape (empty body, RG.SelectFilter.StoragePoolList=[<p>]) is honoured: explicit Props["StorPoolName"] → RD.Props → RG single StoragePool → RG StoragePoolList[0]. - Reads pool.FreeCapacity directly (NOT computeSizeInfo): this code path knows the EXACT (node, pool) target, and the cluster-wide MaxVlmSizeInKib aggregation would mask a full target pool behind sibling pools on other nodes. A 13 GiB lvm-thin on worker-1 at 100% while worker-2's lvm-thin is empty MUST refuse `r c worker-1 <rd>` even though the cluster cap remains 13 GiB. - Skips for DISKLESS / TIE_BREAKER (no backing storage), unresolved pool (diskless fallback), and definitions-only creates (no VDs to size against). - Returns a 409 with the same RetCode bits + envelope shape PR #47 uses on /autoplace, so operators classify both paths with one rule. Why this does NOT touch `pkg/placer`: a previous attempt exported `placer.MatchesPoolFilter` for `pkg/rest` to reuse, but that intercepted the migrate flow and regressed `node-replace-hardware` on lane 5 (the bundle was reverted in PR #47 final force-push). The inline check here lives entirely in `pkg/rest/autoplace.go`. The migrate flow goes through `Resource.Spec.Nodes` reconciliation in the controller, not through the REST `POST resources/{node}` handler, so it's untouched. Why a new wire-shape function `resolveGatePoolName` is needed: the existing `resolveStorPoolForFreshCreate` fallback chain reads `rg.SelectFilter.StoragePool` (singular) but linstor-csi's ToResourceGroupModify maps SC `storagePool: <p>` to `SelectFilter.StoragePoolList`. Pre-fix the gate skipped because neither `res.Props["StorPoolName"]` nor `rg.SelectFilter.StoragePool` matched a CSI request. The gate-local resolver adds tier 4 (`StoragePoolList[0]`) without changing the existing fallback that other call sites depend on. Tests: 6 new unit tests in `resource_create_issue_45_test.go` pin the reject / allow / skip-diskless / bulk-endpoint / RG-list-fallback / no-VDs cases. Existing pkg/rest + pkg/placer suites stay green. Stand validation on dev (lvm-thin, worker-1, 13 GiB total → filled to 0 KiB free): Level 1 (PVC pending + event) : OK (capacity-keyword) Level 2a (sp list free <100 MiB) : free=0 KiB Level 2b (autoplace rejection) : OK Level 3 (lvm-thin view) : free=0 KiB >> OBSERVABILITY-CAPACITY-CORRELATION OK `observability-capacity-correlation` re-enabled in the e2e-piraeus job. `node-replace-hardware` stays excluded — its failure (`SP stand.dev-worker-3 already exists` after `linstor n d --lost`) is in the satellite/controller path, unrelated to this REST gate. Refs #45 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-05-31T12:29:28Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8566b99-bcc2-45d3-a2f3-8171bf9e1449

📥 Commits

Reviewing files that changed from the base of the PR and between b14cc30 and d88e046.

📒 Files selected for processing (3)

.github/workflows/pull-request.yml
pkg/rest/autoplace.go
pkg/rest/resource_create_issue_45_test.go

📝 Walkthrough

Walkthrough

This PR implements Issue #45 capacity gating for resource creation by adding per-pool capacity validation to the per-node create endpoint, validating requests and rejecting creation when target pools are full, while skipping diskless replicas, and testing the behavior comprehensively before enabling the corresponding E2E test scenario.

Changes

Issue #45 Capacity Gating for Per-Node Resource Creation

Layer / File(s)	Summary
Capacity gate validation and helper functions `pkg/rest/autoplace.go`	`validateResourceCreateShape` validates request node name and naming boundaries; `rejectResourceCreateIfPoolFull` gates resource creation when pool capacity is exhausted; `resolveGatePoolName` resolves the target pool name via Resource/ResourceDefinition/ResourceGroup property fallback chain; `sumRDVolumeDefinitionsKib` computes required capacity as the maximum volume definition size.
Resource creation endpoint integration `pkg/rest/autoplace.go`	`createOneResource` calls the new validation helper and inserts the capacity gate before persisting per-node resources, returning structured 409 shortfall responses when the pool is full.
Comprehensive capacity-gating test suite `pkg/rest/resource_create_issue_45_test.go`	Six integration tests verify the gate rejects full-pool requests with 409 and no persistence, allows sufficient-capacity requests with 201 and persistence, skips the gate for diskless resources, rejects bulk operations without persisting any resources, resolves pool names from resource group storage pool lists, and treats the gate as a no-op when resource definitions have no volume definitions.
E2E scenario re-enablement in CI `.github/workflows/pull-request.yml`	The `observability-capacity-correlation` test scenario is re-enabled in the `e2e-piraeus` job (where piraeus LinstorCluster CRD is available) while remaining excluded from the standard 6-lane E2E matrix, with comments noting that the capacity gating on the single-node create path allows the "PVC-stays-Pending" assertion to pass.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

cozystack/blockstor#47: Both PRs implement Issue #45 capacity shortfall logic using the same structured 409 "autoplace shortfall" envelope—fix(rest): gate autoplace on cluster MaxVolumeSize #47 gates handleAutoplace, while the main PR adds the corresponding per-node resource-create gate (with shared capacity/volume-definition reasoning) in pkg/rest/autoplace.go.
cozystack/blockstor#15: Both PRs modify the GitHub Actions pull-request.yml e2e workflow to change how scenario subsets are selected/passed into make ci-e2e (main PR re-enables observability-capacity-correlation only for e2e-piraeus, while the other PR removes hardcoded SCENARIOS and relies on dynamic discovery).

Poem

🐰 A gate guards the pool, no wasteful creation,
When capacity's full, we send information.
Tests verify each pathway and each edge case,
And piraeus now passes the long-pending race! 🏁✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/issue-45-real-csi-path-final

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses Issue #45 by introducing a per-pool capacity gate (rejectResourceCreateIfPoolFull) on the direct resource creation path, ensuring that volume creation fails fast if the target storage pool lacks sufficient capacity. It also refactors existing validation logic into validateResourceCreateShape and adds comprehensive unit tests. The reviewer recommended renaming sumRDVolumeDefinitionsKib to maxRDVolumeDefinitionsKib to accurately reflect that it calculates the maximum volume size rather than the sum, preventing potential confusion.

gemini-code-assist · 2026-05-31T12:30:20Z

+		return false
+	}
+
+	requiredKib, err := s.sumRDVolumeDefinitionsKib(r.Context(), rdName)


The function sumRDVolumeDefinitionsKib actually returns the maximum (largest) volume size rather than the sum of all volume sizes. To prevent confusion and potential future bugs where a true sum might be expected, rename this function to maxRDVolumeDefinitionsKib.

Suggested change

requiredKib, err := s.sumRDVolumeDefinitionsKib(r.Context(), rdName)

requiredKib, err := s.maxRDVolumeDefinitionsKib(r.Context(), rdName)

gemini-code-assist · 2026-05-31T12:30:20Z

+// sumRDVolumeDefinitionsKib returns the largest VolumeDefinition's
+// SizeKib on the named RD. Every volume of an RD provisions against
+// the same pool (upstream LINSTOR contract), so the per-pool
+// capacity gate must clear the biggest of them. Returns 0 — no
+// filter — when the RD has no VDs yet. Mirrors
+// `Placer.requiredKib` exactly so the gate semantics agree with the
+// placer's own per-pool check on the autoplace path.
+func (s *Server) sumRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) {


Rename the function definition and update its documentation to match the suggested name maxRDVolumeDefinitionsKib to accurately reflect that it calculates the maximum volume size rather than the sum.

Suggested change

// sumRDVolumeDefinitionsKib returns the largest VolumeDefinition's

// SizeKib on the named RD. Every volume of an RD provisions against

// the same pool (upstream LINSTOR contract), so the per-pool

// capacity gate must clear the biggest of them. Returns 0 — no

// filter — when the RD has no VDs yet. Mirrors

// `Placer.requiredKib` exactly so the gate semantics agree with the

// placer's own per-pool check on the autoplace path.

func (s *Server) sumRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) {

// maxRDVolumeDefinitionsKib returns the largest VolumeDefinition's

// SizeKib on the named RD. Every volume of an RD provisions against

// the same pool (upstream LINSTOR contract), so the per-pool

// capacity gate must clear the biggest of them. Returns 0 — no

// filter — when the RD has no VDs yet. Mirrors

// Placer.requiredKib exactly so the gate semantics agree with the

// placer's own per-pool check on the autoplace path.

func (s *Server) maxRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) {

The previous commit re-enabled this scenario by dropping it from E2E_EXCLUDE, but it requires piraeus's LinstorCluster CRD and that CRD is installed only in the e2e-piraeus job — lane 5 hit "FAIL: LinstorCluster CRD (piraeus.io) absent". The scenario already runs (and now passes with the fix on this branch) in the piraeus interop job; this commit just stops it from being attempted on the bare-blockstor matrix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

Andrei Kvapil (kvaps) mentioned this pull request May 31, 2026

bug(rest): autoplace path does not gate on FreeCapacity, accepts placement on full pool #45

Closed

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

Andrei Kvapil (kvaps) marked this pull request as ready for review May 31, 2026 21:41

Andrei Kvapil (kvaps) merged commit 1ac478d into main May 31, 2026
27 of 28 checks passed

coderabbitai Bot mentioned this pull request Jun 2, 2026

fix(rest): walk RG StoragePoolList in fresh-create pool resolution (Bug 364) #67

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rest): gate capacity on the real linstor-csi single-node create path (issue #45)#51

fix(rest): gate capacity on the real linstor-csi single-node create path (issue #45)#51
Andrei Kvapil (kvaps) merged 2 commits into
mainfrom
fix/issue-45-real-csi-path-final

Andrei Kvapil (kvaps) commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Uh oh!

gemini-code-assist Bot May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	requiredKib, err := s.sumRDVolumeDefinitionsKib(r.Context(), rdName)
	requiredKib, err := s.maxRDVolumeDefinitionsKib(r.Context(), rdName)

Conversation

Andrei Kvapil (kvaps) commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — empirical endpoint capture on stand

Phase 2 — fix

Phase 3 — stand validation

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented May 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading