fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit#59
Conversation
Long RD names (>63 chars) bypassed the wire-side validation gate (which capped at 253 — the k8s metadata.name regime) and then exploded inside the store layer because the k8s store writes the RD-name into `metadata.labels` (LabelResourceDefinition in pkg/store/k8s/resources.go). k8s label VALUES are bounded to 63 chars by apimachinery — anything longer leaks the raw apimachinery error through the next `r c` and leaves a zombie RD that accepts `vd c` but never accepts a replica. Reproduction on dev stand (HEAD 6f69c56): $ linstor rd c $(printf 'a%.0s' {1..150}) SUCCESS: resource definition created $ linstor vd c $LONG_NAME 64M SUCCESS: volume definition created $ linstor r c dev-worker-1 $LONG_NAME --storage-pool stand ERROR: store error: ...metadata.labels: Invalid value: "aaa...aaa": must be no more than 63 characters Fix: lower maxLinstorName from 253 → 48. 48 also matches upstream LINSTOR's DRBD_RES_NAME_MAX so blockstor stays wire-compatible with any caller that relies on the upstream identifier limit. The wire gate now fires at rd-create / rg-spawn / s r rst time before any partial state lands. Tests: - unit: pkg/rest/input_validation_bug_360_test.go covers refusal matrix [49, 64, 150, 250], the 48-char boundary, and the rg-spawn entry point sharing the same gate - e2e: tests/e2e/rd-name-length-48.sh drives the live REST surface against the dev stand, asserts no zombie RD CRD survives a refusal Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR fixes Bug 360 by reducing the REST input validation maximum identifier length from 253 to 48 characters, aligning the REST rejection threshold with upstream LINSTOR constraints and preventing downstream Kubernetes label-length validation failures. Unit and e2e regression tests verify enforcement at both RD creation and RG spawn endpoints. ChangesBug 360 wire-gate enforcement
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request addresses Bug 360 by capping the maximum Linstor name length at 48 characters to prevent validation failures when writing to Kubernetes labels. It includes a new unit test suite and an end-to-end test script. The review feedback suggests improving the E2E test script's cleanup logic by handling resource group deletion in the global cleanup function rather than overwriting the EXIT trap, which would otherwise discard the original cleanup steps.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| cleanup() { | ||
| set +e | ||
| kill "$PF_PID" 2>/dev/null || true | ||
| # Best-effort cleanup if a borderline RD slipped through. | ||
| for n in 48 49; do | ||
| local name | ||
| # shellcheck disable=SC2155 | ||
| name=$(printf 'a%.0s' $(seq 1 "$n")) | ||
| kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \ | ||
| --ignore-not-found --timeout=10s 2>/dev/null || true | ||
| done | ||
| set -e | ||
| } | ||
| trap cleanup EXIT |
There was a problem hiding this comment.
The cleanup function currently only handles the port-forward process and the borderline resource definitions. Since the resource group RG is created later in the script, we should update this global cleanup function to also delete the resource group if it has been defined. This avoids the need to overwrite the EXIT trap later, which would otherwise discard this cleanup logic.
| cleanup() { | |
| set +e | |
| kill "$PF_PID" 2>/dev/null || true | |
| # Best-effort cleanup if a borderline RD slipped through. | |
| for n in 48 49; do | |
| local name | |
| # shellcheck disable=SC2155 | |
| name=$(printf 'a%.0s' $(seq 1 "$n")) | |
| kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \ | |
| --ignore-not-found --timeout=10s 2>/dev/null || true | |
| done | |
| set -e | |
| } | |
| trap cleanup EXIT | |
| cleanup() { | |
| set +e | |
| kill "$PF_PID" 2>/dev/null || true | |
| # Best-effort cleanup if a borderline RD slipped through. | |
| for n in 48 49; do | |
| local name | |
| # shellcheck disable=SC2155 | |
| name=$(printf 'a%.0s' $(seq 1 "$n")) | |
| kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \\ | |
| --ignore-not-found --timeout=10s 2>/dev/null || true | |
| done | |
| if [[ -n "${RG:-}" ]]; then | |
| kubectl delete resourcegroup "$RG" --ignore-not-found --timeout=10s >/dev/null 2>&1 || true | |
| fi | |
| set -e | |
| } | |
| trap cleanup EXIT |
| selectFilter: | ||
| placeCount: 1 | ||
| EOF | ||
| trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT |
There was a problem hiding this comment.
Overwriting the EXIT trap here discards the previously registered cleanup trap (defined on line 70). This means the original cleanup logic (killing the port-forward process and deleting the borderline resource definitions) will not execute if the script exits after this point. Since we updated the global cleanup function to handle the resource group deletion, we can safely remove this trap overwrite.
| trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT | |
| # EXIT trap is handled globally by cleanup function |
Bug 359: linstor r d <last-extra-diskful> immediately followed by linstor r c <ex-tiebreaker-node> races the RD reconcilers Bug-338 carve-out. When the r d drops the diskful count to one, removeWitnesses Deletes the TIE_BREAKER CRD on the ex-witness-node. The kubectl Delete finishes synchronously from the reconcilers POV but the apiserver still serves the CRD as exists DeletionTimestamp set finalizer pending for tens of ms until the satellite strips its finalizer. During that window REST createOrPromoteResource sees AlreadyExists on Create and NotFound on Get (or NotFound from promotes PatchResourceSpec) pre-fix it surfaced that as 404 not found - the same envelope shape as a real missing-RD or missing-pool class which confused operators because they never asked for a promote. Fix wraps createOrPromoteResource in a 5-attempt retry loop with a 200ms cadence. Both race surfaces (AlreadyExists+NotFound on the flags probe and NotFound from promote) flag the attempt as retryable; the next attempt converges as a fresh Create once GC finishes. Exhausted retries surface a 503 envelope so CSI or operator tooling can distinguish a transient race from a true 404. The companion e2e (tiebreaker-r-d-r-c-other-node) stabilises against the parallel ensureTiebreaker thrashing window: it waits up to 20s for the post-r-d controller topology to settle (witness count stable for at least 4s) before issuing the relocate and retries the r c CLI call up to 5 times on a 503 envelope. CI lane 4 (stand-caught PR #56 PR #59) is green across 6 consecutive fresh-stand runs on dev with this fix. Signed-off-by: Andrei Kvapil <andrei.kvapil@aenix.io> Co-authored-by: Claude <noreply@anthropic.com>
Bug 360 (hunt-v4)
Long RD names (>63 chars) bypassed the wire-side validation gate (which capped at 253 — the k8s
metadata.nameregime) and then exploded inside the store layer because the k8s store writes the RD-name intometadata.labels(LabelResourceDefinitioninpkg/store/k8s/resources.go). k8s label VALUES are bounded to 63 chars by apimachinery — anything longer leaks the raw apimachinery error through the nextr cand leaves a zombie RD that acceptsvd cbut never accepts a replica.Reproduction on dev stand (HEAD
6f69c5678)Fix
Lower
maxLinstorNamefrom 253 → 48. 48 also matches upstream LINSTOR'sDRBD_RES_NAME_MAXso blockstor stays wire-compatible with any caller that relies on the upstream identifier limit. The wire gate now fires atrd c/rg spawn/s r rsttime before any partial state lands.Tests
pkg/rest/input_validation_bug_360_test.go— refusal matrix [49, 64, 150, 250], 48-char boundary acceptance,rg spawnentry point sharing the same gate.tests/e2e/rd-name-length-48.sh— drives the live REST surface against the dev stand, asserts no zombie RD CRD survives a refusal. Verified locally on the dev stand.Summary by CodeRabbit
Bug Fixes
Tests