Remove pod-affinity rules; rely on RWO PVC for co-location#380
Remove pod-affinity rules; rely on RWO PVC for co-location#380t0mdavid-m merged 1 commit intomainfrom
Conversation
The shared volume-group: workspaces label and required pod-affinity attracted every fork's workspace pods onto a single node per memory tier and deadlocked the first replica of any fork landing on an otherwise-empty tier (no peer pod for the required affinity to match). Per-fork RWO PVCs (<slug>-workspaces-pvc) already constrain all of a fork's workspace-using pods to the node the volume is attached to via the scheduler's VolumeBinding plugin, so the explicit affinity adds nothing on top. Removing it scopes co-location naturally to one fork and lets a fresh tier bootstrap without manual affinity-strip. NodeSelector continues to pick the memory tier; the RWO mount picks the specific node within that tier.
📝 WalkthroughWalkthroughThis PR updates Kubernetes deployment manifests and documentation to replace pod-affinity scheduling with storage-based pod co-location. The Changes
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/kubernetes-deployment.md (1)
93-93:⚠️ Potential issue | 🟠 MajorRemove stale “Pod affinity exists …” statement; it now conflicts with this PR.
Line 93 says pod affinity exists for warm WebSocket/cache behavior, but Line 83 states there is no pod-affinity rule. Please reconcile this to a single source of truth (RWO-PVC-based co-location only).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/kubernetes-deployment.md` at line 93, Remove or update the stale sentence "Pod affinity exists to keep the WebSocket warm and reuse Streamlit's in-process script cache" so the doc consistently states that there is no pod-affinity rule and co-location is achieved only via the RWO-PVC (shared workspace PVC) and Redis; ensure the paragraph about sticky `stroute` cookies remains and replace the pod-affinity claim with a brief note that pod affinity is not used and RWO-PVC-based co-location is the source of shared state.
🧹 Nitpick comments (1)
.claude/skills/configure-k8s-deployment.md (1)
32-33: Broaden the recon check to all PVC consumers, not just Streamlit.This check currently validates
claimNameonly ink8s/base/streamlit-deployment.yaml; please also require matchingclaimName: workspaces-pvcink8s/base/rq-worker-deployment.yaml(and cleanup job if present) so drift cannot silently break co-location assumptions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/configure-k8s-deployment.md around lines 32 - 33, The recon check currently only validates claimName in k8s/base/streamlit-deployment.yaml; extend it to validate the same claimName: workspaces-pvc in all PVC-consuming manifests (specifically k8s/base/rq-worker-deployment.yaml and any cleanup Job manifest), i.e., update the check logic that looks for claimName to scan both Deployment and Job resources and assert claimName === "workspaces-pvc" for each volume/volumeMount reference; also update tests/assertions for the recon check to include examples of rq-worker-deployment.yaml and the cleanup job so drift cannot silently break co-location assumptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@docs/kubernetes-deployment.md`:
- Line 93: Remove or update the stale sentence "Pod affinity exists to keep the
WebSocket warm and reuse Streamlit's in-process script cache" so the doc
consistently states that there is no pod-affinity rule and co-location is
achieved only via the RWO-PVC (shared workspace PVC) and Redis; ensure the
paragraph about sticky `stroute` cookies remains and replace the pod-affinity
claim with a brief note that pod affinity is not used and RWO-PVC-based
co-location is the source of shared state.
---
Nitpick comments:
In @.claude/skills/configure-k8s-deployment.md:
- Around line 32-33: The recon check currently only validates claimName in
k8s/base/streamlit-deployment.yaml; extend it to validate the same claimName:
workspaces-pvc in all PVC-consuming manifests (specifically
k8s/base/rq-worker-deployment.yaml and any cleanup Job manifest), i.e., update
the check logic that looks for claimName to scan both Deployment and Job
resources and assert claimName === "workspaces-pvc" for each volume/volumeMount
reference; also update tests/assertions for the recon check to include examples
of rq-worker-deployment.yaml and the cleanup job so drift cannot silently break
co-location assumptions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 857f910e-46dd-49cc-a173-ca22b453b46e
📒 Files selected for processing (5)
.claude/skills/configure-k8s-deployment.mddocs/kubernetes-deployment.mdk8s/base/cleanup-cronjob.yamlk8s/base/rq-worker-deployment.yamlk8s/base/streamlit-deployment.yaml
💤 Files with no reviewable changes (3)
- k8s/base/streamlit-deployment.yaml
- k8s/base/rq-worker-deployment.yaml
- k8s/base/cleanup-cronjob.yaml
Summary
Simplify pod co-location by removing explicit pod-affinity rules and relying instead on Kubernetes' native
VolumeBindingscheduler plugin, which automatically pins pods that mount the sameReadWriteOncePVC to the same node.Changes
streamlit-deployment.yaml,rq-worker-deployment.yaml, andcleanup-cronjob.yamlvolume-group: workspaceslabels from all three deployments (no longer needed as affinity selectors)NodeSelectorpicks the eligible node set, and the RWO mount picks the specific node within that setvolume-grouplabel and pod-affinity ruleImplementation Details
The Kubernetes scheduler's
VolumeBindingplugin automatically ensures that once aReadWriteOncePVC is attached to a node, all subsequent pods mounting that PVC are scheduled to the same node. This eliminates the need for manual pod-affinity configuration while achieving the same co-location guarantee. The change reduces manifest complexity and makes the scheduling constraint more explicit and maintainable.https://claude.ai/code/session_01HLxsvLLznn5WV42iGHBxiP
Summary by CodeRabbit
Documentation
Chores