fix(manifests): prevent RWO PVC multi-attach errors during backend ro…#853
fix(manifests): prevent RWO PVC multi-attach errors during backend ro…#853Gkrumbach07 merged 2 commits intomainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
|
Claude Code Review Summary This PR adds a RollingUpdate strategy (maxSurge=0, maxUnavailable=1) to the backend-api Deployment to prevent RWO PVC multi-attach errors on multi-node clusters. The change is minimal, correct, and directly solves a real operational problem. No security, architectural, or error-handling concerns are present. Issues by Severity Blocker Issues: None Critical Issues: None Major Issues: None Minor Issues:
File: components/manifests/base/backend-deployment.yaml lines 9-13 The combination of RollingUpdate + maxSurge=0 + maxUnavailable=1 is functionally equivalent to Recreate for a replicas=1 deployment. The native Recreate strategy is the canonical Kubernetes pattern for single-replica RWO PVC workloads - it documents the intent without requiring readers to reason through the interaction of two numeric parameters, and uses 4 fewer lines. The current approach is correct; this is a readability observation only.
File: components/manifests/base/backend-deployment.yaml line 13 The comment 'Allow the old pod to terminate first' describes the desired outcome rather than the Kubernetes semantic. maxUnavailable=1 means at most 1 pod can be unavailable during the rollout. Given replicas=1 and maxSurge=0, this forces sequential terminate-then-replace behaviour, but that connection is non-obvious. A comment like 'With replicas=1 and maxSurge=0, forces terminate-before-replace' would be more precise. Minor nit - existing comments together do convey the intent. Positive Highlights
Recommendations
Reviewed by Claude Code against ambient-code/platform standards |
…llouts Set maxSurge=0 and maxUnavailable=1 on the backend-api deployment so the old pod is fully terminated (and its RWO volume detached) before the new pod is created. This prevents multi-attach errors on multi-node clusters where the new pod could be scheduled on a different node while the old pod still holds the volume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
29a4f1c to
4831e7f
Compare
Ambient Code PlatformKubernetes-native AI automation platform that orchestrates agentic sessions through containerized microservices. Built with Go (backend, operator), NextJS + Shadcn (frontend), Python (runner), and Kubernetes CRDs.
Structure
Key Files
Session FlowCommandsmake build-all # Build all container images
make deploy # Deploy to cluster
make test # Run tests
make lint # Lint code
make kind-up # Start local Kind cluster
make test-e2e-local # Run E2E tests against KindPer-Component# Backend / Operator (Go)
cd components/backend && gofmt -l . && go vet ./... && golangci-lint run
cd components/operator && gofmt -l . && go vet ./... && golangci-lint run
# Frontend
cd components/frontend && npm run build # Must pass with 0 errors, 0 warnings
# Runner (Python)
cd components/runners/ambient-runner && uv venv && uv pip install -e .
# Docs
cd docs && npm run dev # http://localhost:4321Critical Context
Pre-commit HooksThe project uses the pre-commit framework to run linters locally before every commit. Configuration lives in Installmake setup-hooksWhat RunsOn every
On every
Run Manuallymake lint # All hooks, all files
pre-commit run gofmt-check --all-files # Single hook
pre-commit run --files path/to/file.go # Single fileSkip Hooksgit commit --no-verify # Skip pre-commit hooks
git push --no-verify # Skip pre-push hooksNotes
Testing
More InfoSee BOOKMARKS.md for architecture decisions, development context, code patterns, and component-specific guides. |
|
Claude Code Review test |
|
NOTE: Full review posted in follow-up comment below. |
Ambient Code PlatformKubernetes-native AI automation platform that orchestrates agentic sessions through containerized microservices. Built with Go (backend, operator), NextJS + Shadcn (frontend), Python (runner), and Kubernetes CRDs.
Structure
Key Files
Session FlowCommandsmake build-all # Build all container images
make deploy # Deploy to cluster
make test # Run tests
make lint # Lint code
make kind-up # Start local Kind cluster
make test-e2e-local # Run E2E tests against KindPer-Component# Backend / Operator (Go)
cd components/backend && gofmt -l . && go vet ./... && golangci-lint run
cd components/operator && gofmt -l . && go vet ./... && golangci-lint run
# Frontend
cd components/frontend && npm run build # Must pass with 0 errors, 0 warnings
# Runner (Python)
cd components/runners/ambient-runner && uv venv && uv pip install -e .
# Docs
cd docs && npm run dev # http://localhost:4321Critical Context
Pre-commit HooksThe project uses the pre-commit framework to run linters locally before every commit. Configuration lives in Installmake setup-hooksWhat RunsOn every
On every
Run Manuallymake lint # All hooks, all files
pre-commit run gofmt-check --all-files # Single hook
pre-commit run --files path/to/file.go # Single fileSkip Hooksgit commit --no-verify # Skip pre-commit hooks
git push --no-verify # Skip pre-push hooksNotes
Testing
More InfoSee BOOKMARKS.md for architecture decisions, development context, code patterns, and component-specific guides. |
1 similar comment
Ambient Code PlatformKubernetes-native AI automation platform that orchestrates agentic sessions through containerized microservices. Built with Go (backend, operator), NextJS + Shadcn (frontend), Python (runner), and Kubernetes CRDs.
Structure
Key Files
Session FlowCommandsmake build-all # Build all container images
make deploy # Deploy to cluster
make test # Run tests
make lint # Lint code
make kind-up # Start local Kind cluster
make test-e2e-local # Run E2E tests against KindPer-Component# Backend / Operator (Go)
cd components/backend && gofmt -l . && go vet ./... && golangci-lint run
cd components/operator && gofmt -l . && go vet ./... && golangci-lint run
# Frontend
cd components/frontend && npm run build # Must pass with 0 errors, 0 warnings
# Runner (Python)
cd components/runners/ambient-runner && uv venv && uv pip install -e .
# Docs
cd docs && npm run dev # http://localhost:4321Critical Context
Pre-commit HooksThe project uses the pre-commit framework to run linters locally before every commit. Configuration lives in Installmake setup-hooksWhat RunsOn every
On every
Run Manuallymake lint # All hooks, all files
pre-commit run gofmt-check --all-files # Single hook
pre-commit run --files path/to/file.go # Single fileSkip Hooksgit commit --no-verify # Skip pre-commit hooks
git push --no-verify # Skip pre-push hooksNotes
Testing
More InfoSee BOOKMARKS.md for architecture decisions, development context, code patterns, and component-specific guides. |
Claude Code ReviewSummaryThis PR adds a Issues by SeverityBlocker IssuesNone Critical IssuesNone Major IssuesNone Minor Issues1. PR description does not match the actual implementation
2. Accepted downtime tradeoff is implicit
Positive Highlights
Recommendations
Reviewed by Claude Code using repository standards from 🔍 View AI decision process (logs available for 90 days) 📋 View memory system files loaded (click to expand)What Amber Loaded for Code ReviewAmber automatically loaded these repository standards from the memory system:
Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines. |
Summary
Set
strategy.type: Recreateon thebackend-apideployment to prevent multi-attach errors on thebackend-state-pvc(ReadWriteOnce) during rollouts on multi-node clusters.Problem
The backend deployment uses a RWO PVC (
backend-state-pvc) for session state persistence. With the defaultRollingUpdatestrategy, the new pod can be scheduled on a different node before the old pod fully terminates and releases the volume, causing a multi-attach error that blocks the deployment.Fix
Recreatestrategy ensures the old pod is fully terminated (and its volume detached) before the new pod is created. This is the Kubernetes-recommended approach for deployments with RWO PVCs and is consistent with the existingreplicas: 1constraint.Test plan
Fixes: RHOAIENG-52353
🤖 Generated with Claude Code