fix(rest): walk RG StoragePoolList in fresh-create pool resolution (Bug 364) by kvaps · Pull Request #67 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-02T10:55:42Z

Summary

linstor r c <node> <rd> without --storage-pool against an RG that pins its default via select_filter.storage_pool_list (not select_filter.storage_pool) created a Resource with empty Props["StorPoolName"]. The satellite reconciler then had no pool to bind to and the replica wedged at "Provisioning" forever.

linstor-csi is the canonical caller for this path: it posts no body to the per-node resource-create endpoint and relies on RG-side propagation for the pool name. When the StorageClass sets linstor.csi.linbit.com/storagePool: <p>, linstor-csi's RGCreate path lands the value under SelectFilter.StoragePoolList[0] (not .StoragePool), so every Cozystack volume hits this code path.

Pre-fix resolveTakeoverStorPool only checked rg.SelectFilter.StoragePool. The matching resolveGatePoolName helper already tolerated the list tier; this fix brings the takeover resolver in line. ~5 lines of intent, mirroring upstream LINSTOR's CtrlRscCrtApiHelper.resolveStorPoolName tier ordering.

Reproducer

Live dev stand, Round 7 bug-hunt 2026-06-02:

$ curl -sS -X PUT http://.../v1/resource-groups/testrg \
    -d '{"select_filter":{"storage_pool":"","storage_pool_list":["lvm-thin"]}}'
200

$ linstor r c dev-worker-1 testlist
SUCCESS: resource(s) created on resource-definition: testlist

$ curl -sS http://.../v1/resource-definitions/testlist/resources | jq '.[].props'
null

Test plan

Unit tests cover the REST round-trip reproducer, the helper directly, single-StoragePool precedence over a list, and the empty-fallthrough
go test ./pkg/rest/ -timeout 240s passes
New e2e catcher tests/e2e/resource-create-pool-resolve.sh pins the fix on a live cluster (Props stamped + replica converges to UpToDate)
CI green

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed resource provisioning failures that occurred when storage pools were specified using a list instead of a direct selection, which prevented resources from being properly assigned to the appropriate storage pool from the list.
Tests
- Added comprehensive testing including unit tests for different storage pool configuration scenarios and priority rules, plus end-to-end tests validating the fix works correctly on live clusters.

…ug 364) `linstor r c <node> <rd>` without `--storage-pool` against an RG that pins its default via `select_filter.storage_pool_list` (not `select_filter.storage_pool`) created a Resource with empty `Props["StorPoolName"]`. The satellite reconciler then had no pool to bind to and the replica wedged at "Provisioning" — visible to the operator only as a phantom replica that never reached UpToDate. linstor-csi is the canonical caller for this path: it posts no body to the per-node resource-create endpoint and relies on RG-side propagation for the pool name. When the StorageClass sets `linstor.csi.linbit.com/storagePool: <p>`, linstor-csi's RGCreate path lands the value under SelectFilter.StoragePoolList[0] (not .StoragePool), so every Cozystack volume hits this code path. Pre-fix `resolveTakeoverStorPool` (the fallback chain `resolveStorPoolForFreshCreate` walks before the satellite-side provisioning starts) only checked `rg.SelectFilter.StoragePool`, ignoring `rg.SelectFilter.StoragePoolList`. The matching `resolveGatePoolName` helper that gates per-pool capacity already tolerated the list tier; this fix brings the takeover resolver in line with the gate's existing semantics. Extends the helper with the `StoragePoolList[0]` fallback after the single-StoragePool check, mirroring upstream LINSTOR's CtrlRscCrtApiHelper.resolveStorPoolName tier ordering and the existing `resolveGatePoolName` walk. ~5 lines of intent. Unit tests cover the canonical reproducer (REST round-trip through the live handler), direct probes against `resolveTakeoverStorPool` for the StoragePoolList branch, the single-StoragePool precedence over a list, and the "no pool anywhere" empty-string fallthrough. The new e2e catcher (tests/e2e/resource-create-pool-resolve.sh) pins the fix on a live cluster: an RG with only storage_pool_list must drive Props["StorPoolName"] to the list's first entry and the satellite must converge to UpToDate. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-02T10:55:50Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab07396b-2c4e-4c2b-867e-069b8c500bcc

📥 Commits

Reviewing files that changed from the base of the PR and between 6abc0a0 and 45ac0e2.

📒 Files selected for processing (3)

pkg/rest/autoplace.go
pkg/rest/bug_364_resource_create_pool_resolve_test.go
tests/e2e/resource-create-pool-resolve.sh

📝 Walkthrough

Walkthrough

This PR fixes Bug 364 by updating the storage pool fallback chain in resolveTakeoverStorPool. When a ResourceGroup has no SelectFilter.StoragePool but does have SelectFilter.StoragePoolList, the function now returns the first list entry instead of leaving the pool unresolved. Unit tests and an end-to-end shell script validate the behavior across multiple scenarios.

Changes

Bug 364: Storage Pool Resolution from SelectFilter List

Layer / File(s)	Summary
Storage pool resolution fallback logic `pkg/rest/autoplace.go`	`resolveTakeoverStorPool` now consults `StoragePoolList[0]` when `SelectFilter.StoragePool` is empty, preventing provisioning stalls when pools are configured only via the list.
Unit tests for pool resolution behavior `pkg/rest/bug_364_resource_create_pool_resolve_test.go`	Four test cases validate the fix: HTTP resource-create path stamps `StorPoolName` from `StoragePoolList[0]`, direct helper call returns correct pool, precedence is correct when both fields are set, and empty-string behavior is preserved when neither is configured.
End-to-end cluster test for Bug 364 `tests/e2e/resource-create-pool-resolve.sh`	Shell test reproduces Bug 364 on a live cluster by seeding a resource group with `StoragePoolList` only, issuing resource-create with empty body, verifying `StorPoolName` is stamped to `StoragePoolList[0]`, and polling until replica state reaches `UpToDate`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cozystack/blockstor#51: Applies similar fallback-to-StoragePoolList[0] logic in the capacity-gate resolveGatePoolName path for pool-name resolution.

Poem

🐰 A pool by any other list shall flow,
When StoragePool's empty, we now know—
Pluck the first from the list below,
And watch those resources go! 🌊

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bug-364-resource-create-pool-resolve

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses Bug 364 by updating resolveTakeoverStorPool to fall back to StoragePoolList[0] when StoragePool is empty, ensuring that resources created without an explicit storage pool can correctly resolve their pool name from the Resource Group's defaults. Unit and end-to-end tests have been added to verify this behavior. The review feedback highlights two issues in the new e2e test script: first, the cleanup function terminates the port-forwarding process before sending the deletion requests, preventing proper cleanup; second, the script should fail early with a clear error message if port-forwarding fails to start.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T10:56:51Z

+cleanup() {
+    kill "$PF_PID" 2>/dev/null || true
+    wait "$PF_PID" 2>/dev/null || true
+    curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-definitions/bug364-rd" >/dev/null 2>&1 || true
+    curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-groups/bug364-rg" >/dev/null 2>&1 || true
+}


In the cleanup function, kill "$PF_PID" is called before the curl -s -X DELETE commands. This terminates the port-forwarding process, causing the subsequent curl deletion requests to fail with "Connection refused". As a result, the test resources (bug364-rd and bug364-rg) are not cleaned up from the cluster.

To fix this, perform the curl deletion requests before killing the port-forward process.

Suggested change

cleanup() {

kill "$PF_PID" 2>/dev/null || true

wait "$PF_PID" 2>/dev/null || true

curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-definitions/bug364-rd" >/dev/null 2>&1 || true

curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-groups/bug364-rg" >/dev/null 2>&1 || true

}

cleanup() {

if [[ -n "${PF_PORT:-}" ]]; then

curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-definitions/bug364-rd" >/dev/null 2>&1 || true

curl -s -X DELETE "http://localhost:$PF_PORT/v1/resource-groups/bug364-rg" >/dev/null 2>&1 || true

fi

if [[ -n "${PF_PID:-}" ]]; then

kill "$PF_PID" 2>/dev/null || true

wait "$PF_PID" 2>/dev/null || true

fi

}

gemini-code-assist · 2026-06-02T10:56:52Z

+for _ in $(seq 1 30); do
+    if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then
+        break
+    fi
+    sleep 0.5
+done


If the port-forwarding fails to start or bind, the wait loop will silently exhaust all 30 attempts and the script will continue, only to fail later with a less clear connection error. It is better to explicitly check if the port-forwarding succeeded and fail with a clear message and the port-forward logs if it didn't.

Suggested change

for _ in $(seq 1 30); do

if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then

break

fi

sleep 0.5

done

for i in {1..30}; do

if curl -sf -m1 "http://localhost:$PF_PORT/v1/nodes" >/dev/null 2>&1; then

break

fi

if [[ $i -eq 30 ]]; then

echo "FAIL: port-forward failed to start"

cat /tmp/bug364-pf.log

exit 1

fi

sleep 0.5

done

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 11:57

Andrei Kvapil (kvaps) merged commit dc8799c into main Jun 2, 2026
15 checks passed

This was referenced Jun 4, 2026

fix(rest): normalize DRBD_DISKLESS + pin toggle-disk state machine (corner H) #103

Merged

fix(snapshot): close corner-case snapshot edges G1-G5 (rollback/restore/autosnapshot/thick-pool) #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rest): walk RG StoragePoolList in fresh-create pool resolution (Bug 364)#67

fix(rest): walk RG StoragePoolList in fresh-create pool resolution (Bug 364)#67
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/bug-364-resource-create-pool-resolve

Andrei Kvapil (kvaps) commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andrei Kvapil (kvaps) commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproducer

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading