fix(rest): gate capacity on the real linstor-csi single-node create path (issue #45)#51
Conversation
…ath (#45) linstor-csi's `manual` scheduler — selected when a StorageClass sets `nodeList` + `placementCount=1` — fires `POST /v1/resource-definitions/{rd}/resources/{node}` via golinstor's `Resources.Create`, NOT `/autoplace`. The PR #47 capacity gate on `/autoplace` therefore never saw this traffic, and a CreateVolume against a now-full pool placed the replica anyway: the PVC reached Bound immediately and only failed later at satellite-side LV allocation. Per Phase 1 capture on the dev stand the CSI flow is: POST /v1/resource-groups POST /v1/resource-groups/{rg}/volume-groups PUT /v1/resource-groups/{rg} POST /v1/resource-definitions POST /v1/resource-definitions/{rd}/volume-definitions POST /v1/resource-definitions/{rd}/resources/<worker> <-- here PUT /v1/resource-definitions/{rd} The fix adds an inline per-pool capacity check in `createOneResource`, shared by both the bulk `POST /v1/resource-definitions/{rd}/resources` and the single-node alias `POST /v1/resource-definitions/{rd}/ resources/{node}`. The gate: - Resolves the target pool from a four-tier fallback chain so the CSI wire shape (empty body, RG.SelectFilter.StoragePoolList=[<p>]) is honoured: explicit Props["StorPoolName"] → RD.Props → RG single StoragePool → RG StoragePoolList[0]. - Reads pool.FreeCapacity directly (NOT computeSizeInfo): this code path knows the EXACT (node, pool) target, and the cluster-wide MaxVlmSizeInKib aggregation would mask a full target pool behind sibling pools on other nodes. A 13 GiB lvm-thin on worker-1 at 100% while worker-2's lvm-thin is empty MUST refuse `r c worker-1 <rd>` even though the cluster cap remains 13 GiB. - Skips for DISKLESS / TIE_BREAKER (no backing storage), unresolved pool (diskless fallback), and definitions-only creates (no VDs to size against). - Returns a 409 with the same RetCode bits + envelope shape PR #47 uses on /autoplace, so operators classify both paths with one rule. Why this does NOT touch `pkg/placer`: a previous attempt exported `placer.MatchesPoolFilter` for `pkg/rest` to reuse, but that intercepted the migrate flow and regressed `node-replace-hardware` on lane 5 (the bundle was reverted in PR #47 final force-push). The inline check here lives entirely in `pkg/rest/autoplace.go`. The migrate flow goes through `Resource.Spec.Nodes` reconciliation in the controller, not through the REST `POST resources/{node}` handler, so it's untouched. Why a new wire-shape function `resolveGatePoolName` is needed: the existing `resolveStorPoolForFreshCreate` fallback chain reads `rg.SelectFilter.StoragePool` (singular) but linstor-csi's ToResourceGroupModify maps SC `storagePool: <p>` to `SelectFilter.StoragePoolList`. Pre-fix the gate skipped because neither `res.Props["StorPoolName"]` nor `rg.SelectFilter.StoragePool` matched a CSI request. The gate-local resolver adds tier 4 (`StoragePoolList[0]`) without changing the existing fallback that other call sites depend on. Tests: 6 new unit tests in `resource_create_issue_45_test.go` pin the reject / allow / skip-diskless / bulk-endpoint / RG-list-fallback / no-VDs cases. Existing pkg/rest + pkg/placer suites stay green. Stand validation on dev (lvm-thin, worker-1, 13 GiB total → filled to 0 KiB free): Level 1 (PVC pending + event) : OK (capacity-keyword) Level 2a (sp list free <100 MiB) : free=0 KiB Level 2b (autoplace rejection) : OK Level 3 (lvm-thin view) : free=0 KiB >> OBSERVABILITY-CAPACITY-CORRELATION OK `observability-capacity-correlation` re-enabled in the e2e-piraeus job. `node-replace-hardware` stays excluded — its failure (`SP stand.dev-worker-3 already exists` after `linstor n d --lost`) is in the satellite/controller path, unrelated to this REST gate. Refs #45 Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR implements Issue ChangesIssue
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses Issue #45 by introducing a per-pool capacity gate (rejectResourceCreateIfPoolFull) on the direct resource creation path, ensuring that volume creation fails fast if the target storage pool lacks sufficient capacity. It also refactors existing validation logic into validateResourceCreateShape and adds comprehensive unit tests. The reviewer recommended renaming sumRDVolumeDefinitionsKib to maxRDVolumeDefinitionsKib to accurately reflect that it calculates the maximum volume size rather than the sum, preventing potential confusion.
| return false | ||
| } | ||
|
|
||
| requiredKib, err := s.sumRDVolumeDefinitionsKib(r.Context(), rdName) |
There was a problem hiding this comment.
The function sumRDVolumeDefinitionsKib actually returns the maximum (largest) volume size rather than the sum of all volume sizes. To prevent confusion and potential future bugs where a true sum might be expected, rename this function to maxRDVolumeDefinitionsKib.
| requiredKib, err := s.sumRDVolumeDefinitionsKib(r.Context(), rdName) | |
| requiredKib, err := s.maxRDVolumeDefinitionsKib(r.Context(), rdName) |
| // sumRDVolumeDefinitionsKib returns the largest VolumeDefinition's | ||
| // SizeKib on the named RD. Every volume of an RD provisions against | ||
| // the same pool (upstream LINSTOR contract), so the per-pool | ||
| // capacity gate must clear the biggest of them. Returns 0 — no | ||
| // filter — when the RD has no VDs yet. Mirrors | ||
| // `Placer.requiredKib` exactly so the gate semantics agree with the | ||
| // placer's own per-pool check on the autoplace path. | ||
| func (s *Server) sumRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) { |
There was a problem hiding this comment.
Rename the function definition and update its documentation to match the suggested name maxRDVolumeDefinitionsKib to accurately reflect that it calculates the maximum volume size rather than the sum.
| // sumRDVolumeDefinitionsKib returns the largest VolumeDefinition's | |
| // SizeKib on the named RD. Every volume of an RD provisions against | |
| // the same pool (upstream LINSTOR contract), so the per-pool | |
| // capacity gate must clear the biggest of them. Returns 0 — no | |
| // filter — when the RD has no VDs yet. Mirrors | |
| // `Placer.requiredKib` exactly so the gate semantics agree with the | |
| // placer's own per-pool check on the autoplace path. | |
| func (s *Server) sumRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) { | |
| // maxRDVolumeDefinitionsKib returns the largest VolumeDefinition's | |
| // SizeKib on the named RD. Every volume of an RD provisions against | |
| // the same pool (upstream LINSTOR contract), so the per-pool | |
| // capacity gate must clear the biggest of them. Returns 0 — no | |
| // filter — when the RD has no VDs yet. Mirrors | |
| // Placer.requiredKib exactly so the gate semantics agree with the | |
| // placer's own per-pool check on the autoplace path. | |
| func (s *Server) maxRDVolumeDefinitionsKib(ctx context.Context, rdName string) (int64, error) { |
The previous commit re-enabled this scenario by dropping it from E2E_EXCLUDE, but it requires piraeus's LinstorCluster CRD and that CRD is installed only in the e2e-piraeus job — lane 5 hit "FAIL: LinstorCluster CRD (piraeus.io) absent". The scenario already runs (and now passes with the fix on this branch) in the piraeus interop job; this commit just stops it from being attempted on the bare-blockstor matrix. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Summary
Closes #45 by gating capacity on the actual endpoint linstor-csi hits when a StorageClass sets
nodeList+placementCount=1. PR #47 gated/autoplaceand the single-node alias's bulk variant, but the real CSI request lands onPOST /v1/resource-definitions/{rd}/resources/{node}via golinstor'sResources.Create, which routes tohandleResourceCreateOnNodewith no capacity check.Phase 1 — empirical endpoint capture on stand
Apiserver access lines from a CreateVolume against
linstor.csi.linbit.com/storagePool: lvm-thin,placementCount=1,nodeList=dev-worker-1:No
/spawncall. linstor-csi'smanualscheduler (selected becausenodeListflipsPlacementPolicy = Manualinvolume.Parameters) callsResources.Createper node, which golinstor maps to the single-node alias above.Phase 2 — fix
Inline per-pool gate added in
createOneResource(shared by both the bulk endpoint and the single-node alias):Props["StorPoolName"]→RD.Props["StorPoolName"]→RG.SelectFilter.StoragePool→RG.SelectFilter.StoragePoolList[0]). Tier 4 is new — linstor-csi'sToResourceGroupModifywrites the SC pool intoStoragePoolList, which the existing fallback chain doesn't read.pool.FreeCapacitydirectly (notcomputeSizeInfo). The cluster-wideMaxVlmSizeInKibaggregation would mask a full target pool behind sibling pools on other nodes — a 13 GiB lvm-thin on worker-1 at 100% used while worker-2's is empty must refuser c worker-1 <rd>even though the cluster cap remains 13 GiB./autoplace, so operators can classify both paths with one rule.pkg/placeris NOT touched. A previous attempt exportedplacer.MatchesPoolFilterand that intercepted the migrate flow, regressingnode-replace-hardware(the bundle was reverted in PR #47 final force-push). The migrate flow goes throughResource.Spec.Nodesreconciliation in the controller, not through this REST handler.6 new unit tests in
resource_create_issue_45_test.go: reject-on-full / allow-on-fit / skip-diskless / bulk-endpoint / RG-StoragePoolList-fallback / skip-no-VDs.Phase 3 — stand validation
observability-capacity-correlationon dev stand (lvm-thin, worker-1, 13 GiB total filled to 0 KiB free):observability-capacity-correlationre-enabled in the e2e-piraeus job.node-replace-hardwarestays excluded — its failure (SP stand.dev-worker-3 already existsafterlinstor n d --lost) is in the satellite/controller path (PR #48 follow-up), unrelated to this REST gate.Test plan
go test ./pkg/rest/...— full suite green (58s)golangci-lint run ./pkg/rest/...— 0 issuespkg/placer/...tests untouched and greenobservability-capacity-correlationPASSES on dev standnode-replace-hardwarefailure is the pre-existing PR fix(rest): refuse duplicate SP POST; strip internal annotations (bughunt v0.1.3 P1) #48 regression on storage-pool re-registration, NOT caused by this gateRefs #45
Summary by CodeRabbit
New Features
Tests
Chores