fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit by kvaps · Pull Request #59 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-02T03:23:50Z

Bug 360 (hunt-v4)

Long RD names (>63 chars) bypassed the wire-side validation gate (which capped at 253 — the k8s metadata.name regime) and then exploded inside the store layer because the k8s store writes the RD-name into metadata.labels (LabelResourceDefinition in pkg/store/k8s/resources.go). k8s label VALUES are bounded to 63 chars by apimachinery — anything longer leaks the raw apimachinery error through the next r c and leaves a zombie RD that accepts vd c but never accepts a replica.

Reproduction on dev stand (HEAD `6f69c5678`)

$ linstor rd c $(printf 'a%.0s' {1..150})
SUCCESS: resource definition created
$ linstor vd c $LONG_NAME 64M
SUCCESS: volume definition created
$ linstor r c dev-worker-1 $LONG_NAME --storage-pool stand
ERROR: store error: ...metadata.labels: Invalid value:
  "aaa...aaa": must be no more than 63 characters

Fix

Lower maxLinstorName from 253 → 48. 48 also matches upstream LINSTOR's DRBD_RES_NAME_MAX so blockstor stays wire-compatible with any caller that relies on the upstream identifier limit. The wire gate now fires at rd c / rg spawn / s r rst time before any partial state lands.

Tests

Unit: pkg/rest/input_validation_bug_360_test.go — refusal matrix [49, 64, 150, 250], 48-char boundary acceptance, rg spawn entry point sharing the same gate.
E2E: tests/e2e/rd-name-length-48.sh — drives the live REST surface against the dev stand, asserts no zombie RD CRD survives a refusal. Verified locally on the dev stand.

Summary by CodeRabbit

Bug Fixes
- Updated REST API validation to reject resource definition names exceeding 48 characters at the wire level, preventing downstream Kubernetes validation failures.
Tests
- Added regression tests to verify proper rejection of long resource definition names and boundary condition behavior.

Long RD names (>63 chars) bypassed the wire-side validation gate (which capped at 253 — the k8s metadata.name regime) and then exploded inside the store layer because the k8s store writes the RD-name into `metadata.labels` (LabelResourceDefinition in pkg/store/k8s/resources.go). k8s label VALUES are bounded to 63 chars by apimachinery — anything longer leaks the raw apimachinery error through the next `r c` and leaves a zombie RD that accepts `vd c` but never accepts a replica. Reproduction on dev stand (HEAD 6f69c56): $ linstor rd c $(printf 'a%.0s' {1..150}) SUCCESS: resource definition created $ linstor vd c $LONG_NAME 64M SUCCESS: volume definition created $ linstor r c dev-worker-1 $LONG_NAME --storage-pool stand ERROR: store error: ...metadata.labels: Invalid value: "aaa...aaa": must be no more than 63 characters Fix: lower maxLinstorName from 253 → 48. 48 also matches upstream LINSTOR's DRBD_RES_NAME_MAX so blockstor stays wire-compatible with any caller that relies on the upstream identifier limit. The wire gate now fires at rd-create / rg-spawn / s r rst time before any partial state lands. Tests: - unit: pkg/rest/input_validation_bug_360_test.go covers refusal matrix [49, 64, 150, 250], the 48-char boundary, and the rg-spawn entry point sharing the same gate - e2e: tests/e2e/rd-name-length-48.sh drives the live REST surface against the dev stand, asserts no zombie RD CRD survives a refusal Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-02T03:23:57Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 619faedc-0096-4a60-ab03-18b27a09cdda

📥 Commits

Reviewing files that changed from the base of the PR and between 6f69c56 and 6652113.

📒 Files selected for processing (3)

pkg/rest/input_validation.go
pkg/rest/input_validation_bug_360_test.go
tests/e2e/rd-name-length-48.sh

📝 Walkthrough

Walkthrough

The PR fixes Bug 360 by reducing the REST input validation maximum identifier length from 253 to 48 characters, aligning the REST rejection threshold with upstream LINSTOR constraints and preventing downstream Kubernetes label-length validation failures. Unit and e2e regression tests verify enforcement at both RD creation and RG spawn endpoints.

Changes

Bug 360 wire-gate enforcement

Layer / File(s)	Summary
Input validation threshold and documentation `pkg/rest/input_validation.go`	Maximum allowed LINSTOR identifier length changed from 253 to 48 characters; updated comments clarify RFC 1123 subdomain interpretation and consistent failure mode before Kubernetes storage validation.
Unit tests for validation enforcement `pkg/rest/input_validation_bug_360_test.go`	Three test functions verify rejection of RD and RG spawn requests with names exceeding 48 characters, returning 4xx with "exceeds" rule message (no metadata.labels leakage), and confirm acceptance and persistence at the 48-character boundary.
E2E regression test for REST API enforcement `tests/e2e/rd-name-length-48.sh`	Bash regression script validates complete REST flow via port-forward: asserts 4xx rejection for lengths 49–250 with rule-text matching, verifies no zombie RD CRDs on rejection, confirms acceptance and Kubernetes visibility at 48-char boundary, and validates spawn endpoint behavior identically.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through validation gates,
where names of length forty-eight await.
No longer chaos from Kubernetes woes,
the wire now rejects what quickly grows. 🐰✨
Bug 360 fixed—short names now prevail!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/rd-name-length-48

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses Bug 360 by capping the maximum Linstor name length at 48 characters to prevent validation failures when writing to Kubernetes labels. It includes a new unit test suite and an end-to-end test script. The review feedback suggests improving the E2E test script's cleanup logic by handling resource group deletion in the global cleanup function rather than overwriting the EXIT trap, which would otherwise discard the original cleanup steps.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T03:24:47Z

+cleanup() {
+    set +e
+    kill "$PF_PID" 2>/dev/null || true
+    # Best-effort cleanup if a borderline RD slipped through.
+    for n in 48 49; do
+        local name
+        # shellcheck disable=SC2155
+        name=$(printf 'a%.0s' $(seq 1 "$n"))
+        kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \
+            --ignore-not-found --timeout=10s 2>/dev/null || true
+    done
+    set -e
+}
+trap cleanup EXIT


The cleanup function currently only handles the port-forward process and the borderline resource definitions. Since the resource group RG is created later in the script, we should update this global cleanup function to also delete the resource group if it has been defined. This avoids the need to overwrite the EXIT trap later, which would otherwise discard this cleanup logic.

Suggested change

cleanup() {

set +e

kill "$PF_PID" 2>/dev/null || true

# Best-effort cleanup if a borderline RD slipped through.

for n in 48 49; do

local name

# shellcheck disable=SC2155

name=$(printf 'a%.0s' $(seq 1 "$n"))

kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \

--ignore-not-found --timeout=10s 2>/dev/null || true

done

set -e

}

trap cleanup EXIT

cleanup() {

set +e

kill "$PF_PID" 2>/dev/null || true

# Best-effort cleanup if a borderline RD slipped through.

for n in 48 49; do

local name

# shellcheck disable=SC2155

name=$(printf 'a%.0s' $(seq 1 "$n"))

kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \\

--ignore-not-found --timeout=10s 2>/dev/null || true

done

if [[ -n "${RG:-}" ]]; then

kubectl delete resourcegroup "$RG" --ignore-not-found --timeout=10s >/dev/null 2>&1 || true

fi

set -e

}

trap cleanup EXIT

gemini-code-assist · 2026-06-02T03:24:47Z

+  selectFilter:
+    placeCount: 1
+EOF
+trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT


Overwriting the EXIT trap here discards the previously registered cleanup trap (defined on line 70). This means the original cleanup logic (killing the port-forward process and deleting the borderline resource definitions) will not execute if the script exits after this point. Since we updated the global cleanup function to handle the resource group deletion, we can safely remove this trap overwrite.

Suggested change

trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT

# EXIT trap is handled globally by cleanup function

Bug 359: linstor r d <last-extra-diskful> immediately followed by linstor r c <ex-tiebreaker-node> races the RD reconcilers Bug-338 carve-out. When the r d drops the diskful count to one, removeWitnesses Deletes the TIE_BREAKER CRD on the ex-witness-node. The kubectl Delete finishes synchronously from the reconcilers POV but the apiserver still serves the CRD as exists DeletionTimestamp set finalizer pending for tens of ms until the satellite strips its finalizer. During that window REST createOrPromoteResource sees AlreadyExists on Create and NotFound on Get (or NotFound from promotes PatchResourceSpec) pre-fix it surfaced that as 404 not found - the same envelope shape as a real missing-RD or missing-pool class which confused operators because they never asked for a promote. Fix wraps createOrPromoteResource in a 5-attempt retry loop with a 200ms cadence. Both race surfaces (AlreadyExists+NotFound on the flags probe and NotFound from promote) flag the attempt as retryable; the next attempt converges as a fresh Create once GC finishes. Exhausted retries surface a 503 envelope so CSI or operator tooling can distinguish a transient race from a true 404. The companion e2e (tiebreaker-r-d-r-c-other-node) stabilises against the parallel ensureTiebreaker thrashing window: it waits up to 20s for the post-r-d controller topology to settle (witness count stable for at least 4s) before issuing the relocate and retries the r c CLI call up to 5 times on a 503 envelope. CI lane 4 (stand-caught PR #56 PR #59) is green across 6 consecutive fresh-stand runs on dev with this fix. Signed-off-by: Andrei Kvapil <andrei.kvapil@aenix.io> Co-authored-by: Claude <noreply@anthropic.com>

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 05:11

Andrei Kvapil (kvaps) merged commit f0be80f into main Jun 2, 2026
26 of 28 checks passed

Andrei Kvapil (kvaps) deleted the fix/rd-name-length-48 branch June 2, 2026 05:11

Andrei Kvapil (kvaps) mentioned this pull request Jun 2, 2026

fix(rest): retry r c on Bug 359 tiebreaker-collapse race #61

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit#59

fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit#59
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/rd-name-length-48

Andrei Kvapil (kvaps) commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 \|\| true; kill "$PF_PID" 2>/dev/null \|\| true' EXIT
	# EXIT trap is handled globally by cleanup function

Conversation

Andrei Kvapil (kvaps) commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug 360 (hunt-v4)

Reproduction on dev stand (HEAD 6f69c5678)

Fix

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Reproduction on dev stand (HEAD `6f69c5678`)

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading