Skip to content

Remove pod-affinity rules; rely on RWO PVC for co-location#380

Merged
t0mdavid-m merged 1 commit intomainfrom
claude/fix-pod-affinity-labels-YvnlN
Apr 27, 2026
Merged

Remove pod-affinity rules; rely on RWO PVC for co-location#380
t0mdavid-m merged 1 commit intomainfrom
claude/fix-pod-affinity-labels-YvnlN

Conversation

@t0mdavid-m
Copy link
Copy Markdown
Member

@t0mdavid-m t0mdavid-m commented Apr 27, 2026

Summary

Simplify pod co-location by removing explicit pod-affinity rules and relying instead on Kubernetes' native VolumeBinding scheduler plugin, which automatically pins pods that mount the same ReadWriteOnce PVC to the same node.

Changes

  • Remove pod-affinity rules from streamlit-deployment.yaml, rq-worker-deployment.yaml, and cleanup-cronjob.yaml
  • Remove volume-group: workspaces labels from all three deployments (no longer needed as affinity selectors)
  • Update documentation to explain the new co-location mechanism:
    • Pod co-location is now enforced by the shared RWO PVC mount, not by explicit affinity rules
    • Clarify that NodeSelector picks the eligible node set, and the RWO mount picks the specific node within that set
    • Emphasize that co-location applies within a fork (shared PVC), not across forks (each fork has its own PVC)
    • Update deployment descriptions to reference the RWO mount instead of the affinity label
  • Update skill guide to remove references to the volume-group label and pod-affinity rule

Implementation Details

The Kubernetes scheduler's VolumeBinding plugin automatically ensures that once a ReadWriteOnce PVC is attached to a node, all subsequent pods mounting that PVC are scheduled to the same node. This eliminates the need for manual pod-affinity configuration while achieving the same co-location guarantee. The change reduces manifest complexity and makes the scheduling constraint more explicit and maintainable.

https://claude.ai/code/session_01HLxsvLLznn5WV42iGHBxiP

Summary by CodeRabbit

  • Documentation

    • Updated Kubernetes deployment documentation to clarify current pod co-location mechanisms.
  • Chores

    • Simplified Kubernetes configuration by removing deprecated pod affinity constraints from multiple deployment manifests.

The shared volume-group: workspaces label and required pod-affinity
attracted every fork's workspace pods onto a single node per memory
tier and deadlocked the first replica of any fork landing on an
otherwise-empty tier (no peer pod for the required affinity to match).

Per-fork RWO PVCs (<slug>-workspaces-pvc) already constrain all of
a fork's workspace-using pods to the node the volume is attached to
via the scheduler's VolumeBinding plugin, so the explicit affinity
adds nothing on top. Removing it scopes co-location naturally to one
fork and lets a fresh tier bootstrap without manual affinity-strip.

NodeSelector continues to pick the memory tier; the RWO mount picks
the specific node within that tier.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

📝 Walkthrough

Walkthrough

This PR updates Kubernetes deployment manifests and documentation to replace pod-affinity scheduling with storage-based pod co-location. The volume-group labels and associated podAffinity rules are removed from four K8s manifests, with documentation clarifying that ReadWriteOnce PVC attachment now enforces node pinning instead.

Changes

Cohort / File(s) Summary
Documentation
.claude/skills/configure-k8s-deployment.md, docs/kubernetes-deployment.md
Updated co-location mechanism explanation from hostname-based pod affinity to RWO PVC-based storage enforcement. Removed references to volume-group labeling and podAffinity rules; clarified that shared per-fork PVC mounts ensure workspace pod co-location within Kubernetes' VolumeBinding.
Kubernetes Manifests
k8s/base/cleanup-cronjob.yaml, k8s/base/rq-worker-deployment.yaml, k8s/base/streamlit-deployment.yaml
Removed volume-group: workspaces pod labels and podAffinity scheduling rules that enforced hostname-level co-scheduling. Container specs, volumes, environment variables, and job scheduling behavior remain unchanged.

Possibly related PRs

  • Add Kubernetes deployment docs and refactor Claude skills #362: Both PRs directly modify the same Kubernetes deployment documentation and base manifests around workspace co-location; the earlier PR documented pod-affinity/volume-group rationale, while this PR removes those mechanisms in favor of PVC-based co-location.
  • Refactor K8s deployment to use memory-tier components #375: Both PRs modify the same streamlit-deployment.yaml and rq-worker-deployment.yaml manifests; while this PR removes volume-group labels and podAffinity rules, the earlier PR modified resource specifications and introduced nodeSelector patches.

Poem

🐰 Pod affinity labels fade away,
PVC storage saves the day!
Kubernetes binds each fork in place,
Through shared volumes, not label race.
Simpler schedules, cleaner state,
Co-location feels just great!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: removing pod-affinity rules and relying on RWO PVC for co-location, which directly aligns with all modified files and documentation updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/fix-pod-affinity-labels-YvnlN

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/kubernetes-deployment.md (1)

93-93: ⚠️ Potential issue | 🟠 Major

Remove stale “Pod affinity exists …” statement; it now conflicts with this PR.

Line 93 says pod affinity exists for warm WebSocket/cache behavior, but Line 83 states there is no pod-affinity rule. Please reconcile this to a single source of truth (RWO-PVC-based co-location only).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/kubernetes-deployment.md` at line 93, Remove or update the stale
sentence "Pod affinity exists to keep the WebSocket warm and reuse Streamlit's
in-process script cache" so the doc consistently states that there is no
pod-affinity rule and co-location is achieved only via the RWO-PVC (shared
workspace PVC) and Redis; ensure the paragraph about sticky `stroute` cookies
remains and replace the pod-affinity claim with a brief note that pod affinity
is not used and RWO-PVC-based co-location is the source of shared state.
🧹 Nitpick comments (1)
.claude/skills/configure-k8s-deployment.md (1)

32-33: Broaden the recon check to all PVC consumers, not just Streamlit.

This check currently validates claimName only in k8s/base/streamlit-deployment.yaml; please also require matching claimName: workspaces-pvc in k8s/base/rq-worker-deployment.yaml (and cleanup job if present) so drift cannot silently break co-location assumptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/configure-k8s-deployment.md around lines 32 - 33, The recon
check currently only validates claimName in k8s/base/streamlit-deployment.yaml;
extend it to validate the same claimName: workspaces-pvc in all PVC-consuming
manifests (specifically k8s/base/rq-worker-deployment.yaml and any cleanup Job
manifest), i.e., update the check logic that looks for claimName to scan both
Deployment and Job resources and assert claimName === "workspaces-pvc" for each
volume/volumeMount reference; also update tests/assertions for the recon check
to include examples of rq-worker-deployment.yaml and the cleanup job so drift
cannot silently break co-location assumptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/kubernetes-deployment.md`:
- Line 93: Remove or update the stale sentence "Pod affinity exists to keep the
WebSocket warm and reuse Streamlit's in-process script cache" so the doc
consistently states that there is no pod-affinity rule and co-location is
achieved only via the RWO-PVC (shared workspace PVC) and Redis; ensure the
paragraph about sticky `stroute` cookies remains and replace the pod-affinity
claim with a brief note that pod affinity is not used and RWO-PVC-based
co-location is the source of shared state.

---

Nitpick comments:
In @.claude/skills/configure-k8s-deployment.md:
- Around line 32-33: The recon check currently only validates claimName in
k8s/base/streamlit-deployment.yaml; extend it to validate the same claimName:
workspaces-pvc in all PVC-consuming manifests (specifically
k8s/base/rq-worker-deployment.yaml and any cleanup Job manifest), i.e., update
the check logic that looks for claimName to scan both Deployment and Job
resources and assert claimName === "workspaces-pvc" for each volume/volumeMount
reference; also update tests/assertions for the recon check to include examples
of rq-worker-deployment.yaml and the cleanup job so drift cannot silently break
co-location assumptions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 857f910e-46dd-49cc-a173-ca22b453b46e

📥 Commits

Reviewing files that changed from the base of the PR and between 8e096ac and 8d76292.

📒 Files selected for processing (5)
  • .claude/skills/configure-k8s-deployment.md
  • docs/kubernetes-deployment.md
  • k8s/base/cleanup-cronjob.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/streamlit-deployment.yaml
💤 Files with no reviewable changes (3)
  • k8s/base/streamlit-deployment.yaml
  • k8s/base/rq-worker-deployment.yaml
  • k8s/base/cleanup-cronjob.yaml

@t0mdavid-m t0mdavid-m merged commit 9f819a8 into main Apr 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants