Skip to content

fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit#59

Merged
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/rd-name-length-48
Jun 2, 2026
Merged

fix(rest): cap RD/RG/Node names at 48 chars to match k8s label limit#59
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/rd-name-length-48

Conversation

@kvaps
Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) commented Jun 2, 2026

Bug 360 (hunt-v4)

Long RD names (>63 chars) bypassed the wire-side validation gate (which capped at 253 — the k8s metadata.name regime) and then exploded inside the store layer because the k8s store writes the RD-name into metadata.labels (LabelResourceDefinition in pkg/store/k8s/resources.go). k8s label VALUES are bounded to 63 chars by apimachinery — anything longer leaks the raw apimachinery error through the next r c and leaves a zombie RD that accepts vd c but never accepts a replica.

Reproduction on dev stand (HEAD 6f69c5678)

$ linstor rd c $(printf 'a%.0s' {1..150})
SUCCESS: resource definition created
$ linstor vd c $LONG_NAME 64M
SUCCESS: volume definition created
$ linstor r c dev-worker-1 $LONG_NAME --storage-pool stand
ERROR: store error: ...metadata.labels: Invalid value:
  "aaa...aaa": must be no more than 63 characters

Fix

Lower maxLinstorName from 253 → 48. 48 also matches upstream LINSTOR's DRBD_RES_NAME_MAX so blockstor stays wire-compatible with any caller that relies on the upstream identifier limit. The wire gate now fires at rd c / rg spawn / s r rst time before any partial state lands.

Tests

  • Unit: pkg/rest/input_validation_bug_360_test.go — refusal matrix [49, 64, 150, 250], 48-char boundary acceptance, rg spawn entry point sharing the same gate.
  • E2E: tests/e2e/rd-name-length-48.sh — drives the live REST surface against the dev stand, asserts no zombie RD CRD survives a refusal. Verified locally on the dev stand.

Summary by CodeRabbit

  • Bug Fixes

    • Updated REST API validation to reject resource definition names exceeding 48 characters at the wire level, preventing downstream Kubernetes validation failures.
  • Tests

    • Added regression tests to verify proper rejection of long resource definition names and boundary condition behavior.

Long RD names (>63 chars) bypassed the wire-side validation gate
(which capped at 253 — the k8s metadata.name regime) and then
exploded inside the store layer because the k8s store writes the
RD-name into `metadata.labels` (LabelResourceDefinition in
pkg/store/k8s/resources.go). k8s label VALUES are bounded to 63
chars by apimachinery — anything longer leaks the raw apimachinery
error through the next `r c` and leaves a zombie RD that accepts
`vd c` but never accepts a replica.

Reproduction on dev stand (HEAD 6f69c56):

  $ linstor rd c $(printf 'a%.0s' {1..150})
  SUCCESS: resource definition created
  $ linstor vd c $LONG_NAME 64M
  SUCCESS: volume definition created
  $ linstor r c dev-worker-1 $LONG_NAME --storage-pool stand
  ERROR: store error: ...metadata.labels: Invalid value:
    "aaa...aaa": must be no more than 63 characters

Fix: lower maxLinstorName from 253 → 48. 48 also matches upstream
LINSTOR's DRBD_RES_NAME_MAX so blockstor stays wire-compatible with
any caller that relies on the upstream identifier limit. The wire
gate now fires at rd-create / rg-spawn / s r rst time before any
partial state lands.

Tests:
  - unit: pkg/rest/input_validation_bug_360_test.go covers refusal
    matrix [49, 64, 150, 250], the 48-char boundary, and the
    rg-spawn entry point sharing the same gate
  - e2e:  tests/e2e/rd-name-length-48.sh drives the live REST
    surface against the dev stand, asserts no zombie RD CRD
    survives a refusal

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 619faedc-0096-4a60-ab03-18b27a09cdda

📥 Commits

Reviewing files that changed from the base of the PR and between 6f69c56 and 6652113.

📒 Files selected for processing (3)
  • pkg/rest/input_validation.go
  • pkg/rest/input_validation_bug_360_test.go
  • tests/e2e/rd-name-length-48.sh

📝 Walkthrough

Walkthrough

The PR fixes Bug 360 by reducing the REST input validation maximum identifier length from 253 to 48 characters, aligning the REST rejection threshold with upstream LINSTOR constraints and preventing downstream Kubernetes label-length validation failures. Unit and e2e regression tests verify enforcement at both RD creation and RG spawn endpoints.

Changes

Bug 360 wire-gate enforcement

Layer / File(s) Summary
Input validation threshold and documentation
pkg/rest/input_validation.go
Maximum allowed LINSTOR identifier length changed from 253 to 48 characters; updated comments clarify RFC 1123 subdomain interpretation and consistent failure mode before Kubernetes storage validation.
Unit tests for validation enforcement
pkg/rest/input_validation_bug_360_test.go
Three test functions verify rejection of RD and RG spawn requests with names exceeding 48 characters, returning 4xx with "exceeds" rule message (no metadata.labels leakage), and confirm acceptance and persistence at the 48-character boundary.
E2E regression test for REST API enforcement
tests/e2e/rd-name-length-48.sh
Bash regression script validates complete REST flow via port-forward: asserts 4xx rejection for lengths 49–250 with rule-text matching, verifies no zombie RD CRDs on rejection, confirms acceptance and Kubernetes visibility at 48-char boundary, and validates spawn endpoint behavior identically.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through validation gates,
where names of length forty-eight await.
No longer chaos from Kubernetes woes,
the wire now rejects what quickly grows. 🐰✨
Bug 360 fixed—short names now prevail!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rd-name-length-48

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses Bug 360 by capping the maximum Linstor name length at 48 characters to prevent validation failures when writing to Kubernetes labels. It includes a new unit test suite and an end-to-end test script. The review feedback suggests improving the E2E test script's cleanup logic by handling resource group deletion in the global cleanup function rather than overwriting the EXIT trap, which would otherwise discard the original cleanup steps.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +57 to +70
cleanup() {
set +e
kill "$PF_PID" 2>/dev/null || true
# Best-effort cleanup if a borderline RD slipped through.
for n in 48 49; do
local name
# shellcheck disable=SC2155
name=$(printf 'a%.0s' $(seq 1 "$n"))
kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \
--ignore-not-found --timeout=10s 2>/dev/null || true
done
set -e
}
trap cleanup EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cleanup function currently only handles the port-forward process and the borderline resource definitions. Since the resource group RG is created later in the script, we should update this global cleanup function to also delete the resource group if it has been defined. This avoids the need to overwrite the EXIT trap later, which would otherwise discard this cleanup logic.

Suggested change
cleanup() {
set +e
kill "$PF_PID" 2>/dev/null || true
# Best-effort cleanup if a borderline RD slipped through.
for n in 48 49; do
local name
# shellcheck disable=SC2155
name=$(printf 'a%.0s' $(seq 1 "$n"))
kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \
--ignore-not-found --timeout=10s 2>/dev/null || true
done
set -e
}
trap cleanup EXIT
cleanup() {
set +e
kill "$PF_PID" 2>/dev/null || true
# Best-effort cleanup if a borderline RD slipped through.
for n in 48 49; do
local name
# shellcheck disable=SC2155
name=$(printf 'a%.0s' $(seq 1 "$n"))
kubectl delete resourcedefinition.blockstor.cozystack.io "$name" \\
--ignore-not-found --timeout=10s 2>/dev/null || true
done
if [[ -n "${RG:-}" ]]; then
kubectl delete resourcegroup "$RG" --ignore-not-found --timeout=10s >/dev/null 2>&1 || true
fi
set -e
}
trap cleanup EXIT

selectFilter:
placeCount: 1
EOF
trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Overwriting the EXIT trap here discards the previously registered cleanup trap (defined on line 70). This means the original cleanup logic (killing the port-forward process and deleting the borderline resource definitions) will not execute if the script exits after this point. Since we updated the global cleanup function to handle the resource group deletion, we can safely remove this trap overwrite.

Suggested change
trap 'kubectl delete resourcegroup '"$RG"' --ignore-not-found --timeout=10s >/dev/null 2>&1 || true; kill "$PF_PID" 2>/dev/null || true' EXIT
# EXIT trap is handled globally by cleanup function

@kvaps Andrei Kvapil (kvaps) marked this pull request as ready for review June 2, 2026 05:11
@kvaps Andrei Kvapil (kvaps) merged commit f0be80f into main Jun 2, 2026
26 of 28 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the fix/rd-name-length-48 branch June 2, 2026 05:11
Andrei Kvapil (kvaps) added a commit that referenced this pull request Jun 2, 2026
Bug 359: linstor r d <last-extra-diskful> immediately followed by
linstor r c <ex-tiebreaker-node> races the RD reconcilers
Bug-338 carve-out. When the r d drops the diskful count to one,
removeWitnesses Deletes the TIE_BREAKER CRD on the
ex-witness-node. The kubectl Delete finishes synchronously from
the reconcilers POV but the apiserver still serves the CRD as
exists DeletionTimestamp set finalizer pending for tens of
ms until the satellite strips its finalizer. During that window
REST createOrPromoteResource sees AlreadyExists on Create and
NotFound on Get (or NotFound from promotes PatchResourceSpec)
pre-fix it surfaced that as 404 not found - the same envelope
shape as a real missing-RD or missing-pool class which confused
operators because they never asked for a promote.

Fix wraps createOrPromoteResource in a 5-attempt retry loop with
a 200ms cadence. Both race surfaces (AlreadyExists+NotFound on
the flags probe and NotFound from promote) flag the attempt as
retryable; the next attempt converges as a fresh Create once GC
finishes. Exhausted retries surface a 503 envelope so CSI
or operator tooling can distinguish a transient race from a true
404.

The companion e2e (tiebreaker-r-d-r-c-other-node) stabilises
against the parallel ensureTiebreaker thrashing window: it waits
up to 20s for the post-r-d controller topology to settle (witness
count stable for at least 4s) before issuing the relocate and retries
the r c CLI call up to 5 times on a 503 envelope. CI lane 4
(stand-caught PR #56 PR #59) is green across 6 consecutive
fresh-stand runs on dev with this fix.

Signed-off-by: Andrei Kvapil <andrei.kvapil@aenix.io>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant