-
Notifications
You must be signed in to change notification settings - Fork 31
Fix: Add node affinity to prevent Multi-Attach PVC errors #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Add node affinity to prevent Multi-Attach PVC errors #344
Conversation
Provides cold-start implementation instructions for fixing AgenticSession stuck in Creating phase due to PVC Multi-Attach errors when init containers and main containers schedule on different nodes with RWO PVCs.
Claude Code ReviewSummaryThis PR adds implementation guidance for fixing Multi-Attach PVC errors when AgenticSessions get stuck in "Creating" phase. The PR contains only a documentation file with implementation instructions - no actual code changes yet. The proposed solution uses PodAffinity to co-locate all containers on the same node as the PVC attachment. Issues by Severity🚫 Blocker Issues1. INCORRECT AFFINITY TYPE - WILL NOT SOLVE THE PROBLEM
2. DOCUMENTATION DESCRIBES NON-EXISTENT CODE
🔴 Critical Issues3. MISSING ROOT CAUSE INVESTIGATION
4. AFFINITY IMPLEMENTATION VIOLATES KUBERNETES BEST PRACTICES
5. REAL ISSUE: VOLUME TOPOLOGY NOT CONSIDERED
🟡 Major Issues6. INCORRECT IMPORT REFERENCE
7. DOCUMENTATION STRUCTURE DOESN'T MATCH REPO STANDARDS
8. TESTING SECTION LACKS VERIFICATION OF FIX
🔵 Minor Issues9. ALTERNATIVE APPROACHES MISSING THE ACTUAL BEST PRACTICE
10. ROLLBACK INSTRUCTIONS INCOMPLETE
Positive Highlights✅ Good Documentation Structure: The implementation prompt follows a clear problem/solution/testing format that's easy to follow. ✅ Rollback Considered: Including rollback instructions shows operational awareness. ✅ Multiple Approaches Evaluated: The document considers alternative approaches (RWX, single initContainer, node selector) which shows thoughtful analysis. ✅ Links to Issue: PR description properly references issue #298. RecommendationsCRITICAL - DO NOT MERGE WITHOUT THESE CHANGES:
Overall Assessment: ❌ DO NOT MERGE - The proposed solution uses an incorrect Kubernetes API (PodAffinity instead of volume topology awareness) and is based on incorrect understanding of the codebase (references non-existent temp-content initContainer). The PR needs fundamental rework of both the problem analysis and proposed solution. Next Steps:
|
Fixes #298
Problem
AgenticSessions stuck in "Creating" phase due to temp-content and Job pods scheduled on different nodes with RWO PVCs.
Solution
Add node affinity to ensure all containers schedule on the same node as the PVC attachment.
Implementation
See
docs/implementation-prompts/fix-multi-attach-node-affinity.mdfor complete cold-startable implementation instructions.Testing Needed
Note: This is a draft PR with implementation guidance. Code changes still need to be implemented.