Update file system config for crc #216

jwm4 · 2025-10-29T00:44:45Z

Yahav and I were both having trouble running vTeam locally with crc. We were able to make projects and workspace and start an ideate session but then when we get to the message tab, we see no messages and get no response when we send one.

I looked into the OpenShift console and saw errors like this:

[GIN] GET | 200 | 10.217.0.68 | /api/projects/lendyr-red-local/agentic-sessions/agentic-session-1761252727
[GIN] POST | 202 | 10.217.0.68 | /api/projects/lendyr-red-local/sessions/agentic-session-1761252727/messages
2025/10/28 18:08:25 persistMessageToS3: path: /workspace/sessions/agentic-session-1761252727/messages.jsonl
2025/10/28 18:08:25 persistMessage: open failed: open /workspace/sessions/agentic-session-1761252727/messages.jsonl: no such file or directory

This PR is addresses the problem. It also adds a new CONTRIBUTING.md file with a section called "Local Development Setup" with a lot of content about getting and using CRC. The CONTRIBUTING.md file also has a lot of generic CONTRIBUTING.md stuff too, but I included it in this PR because the "Local Development Setup" is important for replicating my results showing that this works.

I've addressed most of the CodeRabbit concerns. The remaining ones seem not really applicable here because these are CRC-specific local development manifests. The storage class is required as CRC has no default storage class (verified in testing). Production manifests are separate in components/manifests/.

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

github-actions · 2025-10-29T00:46:57Z

PR Review: Update file system config for CRC

Summary

This PR adds persistent volume support to the local development backend deployment to fix a file system issue where message persistence was failing with "no such file or directory" errors.

✅ Positive Aspects

Aligns with Production Configuration: The changes mirror the production deployment pattern in components/manifests/, which already uses a PVC mounted at /workspace. This is good consistency.
Correct PVC Configuration: The PVC spec matches the production workspace-pvc.yaml:
- ReadWriteOnce access mode (appropriate for single backend replica)
- 5Gi storage request
- Proper labels for identification
Proper Deployment Order: The script now creates the PVC before the backend deployment, which is the correct order of operations.
Environment Variable Set: The STATE_BASE_DIR=/workspace env var is explicitly set, matching production configuration.

🔍 Issues & Recommendations

Critical: Missing Directory Initialization

The root cause of your intermittent success is likely a race condition with directory creation. Looking at the backend code (components/backend/websocket/hub.go:165):

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

This creates the session-specific directory, but there's a potential issue:

Empty PVC: When CRC creates a new PVC, it starts completely empty
Timing Issue: If a message arrives before the first MkdirAll call succeeds, or if the pod restarts, the directory might not exist
Error Handling: The code silently ignores the MkdirAll error (_ = ), so failures aren't logged

Recommendation: Add an init container or startup command to ensure the base directory structure exists:

# In backend-deployment.yaml, add:
initContainers:
- name: init-workspace
  image: quay.io/ambient_code/vteam_backend:latest
  command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 755 /workspace/sessions']
  volumeMounts:
  - name: backend-state
    mountPath: /workspace

Low: Inconsistent Label Values

The PVC uses app: vteam-backend but the deployment uses app: backend-api. While this doesn't break functionality, it's inconsistent with the production manifests.

Recommendation: Change backend-pvc.yaml line 6 to:

labels:
  app: backend-api  # Match deployment label
  component: state-storage

Info: Missing storageClassName

CRC will use its default storage class, but for clarity and to match production expectations, consider explicitly setting:

spec:
  storageClassName: crc-csi-hostpath-provisioner  # CRC's default
  accessModes:
    - ReadWriteOnce

Backend Code: Silent Error Handling

In components/backend/websocket/hub.go:165, the MkdirAll error is ignored:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

Recommendation: Log errors for debugging (separate PR):

if err := os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755); err != nil {
    log.Printf("Failed to create session directory: %v", err)
    return
}

🧪 Testing Recommendations

Clean State Test: Delete the PVC and redeploy to simulate a fresh environment:

oc delete pvc backend-state-pvc -n vteam-dev
make dev-stop && make dev-start

Rapid Message Test: Send multiple messages quickly after session creation to test race conditions
Pod Restart Test: Kill the backend pod after creating a session, verify messages persist
Directory Permissions: Exec into the backend pod and verify:
```
oc exec -it deployment/backend-api -- ls -la /workspace
```

📋 Pre-Merge Checklist

Add init container to create base directory structure
Fix label consistency (vteam-backend → backend-api)
Test with clean PVC state
Verify messages persist across pod restarts
Consider adding backend logging improvements (can be separate PR)

Summary

The PR is on the right track and correctly identifies the need for persistent storage. The main issue is likely empty PVC initialization causing intermittent failures. Adding an init container should resolve the "no such file or directory" errors consistently.

Recommended Action: Add the init container, test with a clean PVC, and this should be ready to merge.

Review based on CLAUDE.md guidelines and production deployment patterns

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

github-actions · 2025-10-29T13:04:07Z

Code Review: Update file system config for crc

I've reviewed the changes in this PR. Overall, this is a solid fix for the filesystem persistence issue in the local CRC development environment. Here's my detailed feedback:

✅ What's Working Well

Root Cause Identified: The PR correctly identifies and addresses the issue where /workspace/sessions directory doesn't exist, causing message persistence to fail.
Init Container Pattern: Using an initContainer to create the directory structure is the right approach - it ensures the directory exists before the main backend container starts.
PVC Creation Order: The PR correctly creates the PVC before deploying the backend, which prevents race conditions.
Consistency with Production: The new backend-pvc.yaml matches the production manifest at components/manifests/workspace-pvc.yaml, ensuring dev/prod parity.

🔍 Code Quality Issues

1. Missing Storage Class (Medium Priority)

The PVC doesn't specify a storageClassName. While this will use the cluster default, it's better to be explicit for CRC:

spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""  # Use default storage class explicitly
  resources:
    requests:
      storage: 5Gi

Or for CRC-specific storage:

  storageClassName: crc-csi-hostpath-provisioner  # CRC's default

2. Init Container Permissions (Low Priority)

The init container command uses chmod 755. Consider:

Security: Should this be 750 or 700 to restrict access?
Ownership: The command doesn't set ownership, relying on the container's default UID. This is fine, but document it.

Current:

command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 755 /workspace/sessions']

Consider:

command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 750 /workspace/sessions']

3. Missing SecurityContext (Medium Priority)

According to the CLAUDE.md Backend Development Standards, Job pods should have SecurityContext. The backend deployment is missing this:

containers:
- name: backend
  image: ...
  securityContext:
    allowPrivilegeEscalation: false
    runAsNonRoot: true
    capabilities:
      drop: ["ALL"]

This should be added for production-readiness, though it may not be the cause of the current issue.

🐛 Potential Issues

1. Init Container Image Pull (High Priority)

The init container uses:

image: image-registry.openshift-image-registry.svc:5000/vteam-dev/vteam-backend:latest

This assumes the backend image is already built and available. If someone runs make dev-start on a fresh cluster, this will fail because:

The init container runs before the main container
The image is built via oc start-build which happens before the deployment
But there's still a potential race condition if the build isn't complete

Recommendation: Add error handling or use a busybox-based init container:

initContainers:
- name: init-workspace
  image: registry.access.redhat.com/ubi9/ubi-minimal:latest
  command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 750 /workspace/sessions']
  volumeMounts:
  - name: backend-state
    mountPath: /workspace

This is more reliable and doesn't depend on the backend image.

2. Directory Creation in Application Code (Low Priority)

Looking at components/backend/websocket/hub.go:165:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

The code already tries to create the directory. The issue is that /workspace/sessions doesn't exist, so creating /workspace/sessions/SESSION_ID fails.

Why the init container works: It creates the parent /workspace/sessions directory.

Alternative approach: Fix the application code to create the full path:

sessionDir := fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID)
if err := os.MkdirAll(sessionDir, 0o755); err != nil {
    log.Printf("Failed to create session directory: %v", err)
    return
}

This should already work, so the issue might be permissions on /workspace itself. Verify PVC mount permissions.

📋 Testing Recommendations

Fresh Cluster Test: Test on a completely fresh CRC instance to verify the build order works correctly.

Permission Verification: After deployment, exec into the backend pod and verify:

oc exec -it deployment/vteam-backend -- ls -la /workspace
oc exec -it deployment/vteam-backend -- ls -la /workspace/sessions

Message Persistence: Create a session, send messages, restart the backend pod, and verify messages persist.
PVC Cleanup: Test that PVC cleanup works when running make dev-clean.

🔒 Security Considerations

No sensitive data concerns: The PVC stores session messages, which is appropriate.
RBAC: The backend service account needs permissions to create PVCs if not already granted. Verify in components/scripts/local-dev/manifests/ that RBAC allows this.
Resource limits: The backend deployment has appropriate resource limits (512Mi memory, 500m CPU).

🎯 Performance Considerations

Storage Size: 5Gi is reasonable for local dev. Monitor usage over time.
ReadWriteOnce: Correctly limits to single replica, which matches the deployment spec (replicas: 1).
No performance concerns: PVC I/O should be sufficient for local dev message persistence.

✨ Suggestions for Improvement

Add comments explaining why the init container is needed:

initContainers:
# Create /workspace/sessions directory before backend starts
# Required for message persistence in websocket/hub.go
- name: init-workspace

Document the storage requirement in the local dev docs or Makefile.
Consider adding health check for the PVC mount in the readiness probe.

📝 Summary

Recommendation: Approve with minor changes

The PR solves the immediate problem and follows the right patterns. The main improvements needed are:

Use a minimal/busybox image for the init container (higher reliability)
Add SecurityContext to the backend container
Add inline comments explaining the init container purpose
Test on a fresh cluster to verify build ordering

Priority fixes before merge:

High: Switch to busybox/UBI-minimal for init container
Medium: Add SecurityContext

Can be addressed in follow-up PRs:

Storage class specification
Improved error handling in application code
Documentation updates

Great work identifying and fixing this issue! 🎉

github-actions · 2025-10-29T19:25:46Z

Code Review: Update file system config for crc

Thanks for working on this issue! I've reviewed the PR and have several observations and suggestions.

✅ Positive Aspects

Root Cause Identified: You correctly identified that the backend needs persistent storage for the /workspace/sessions directory where messages are stored (via persistMessageToS3 in websocket/hub.go:158-175).
Follows Production Pattern: Your approach mirrors the production deployment in components/manifests/backend-deployment.yaml which already uses a PVC mounted at /workspace.
STATE_BASE_DIR Environment Variable: Correctly added STATE_BASE_DIR=/workspace which the backend reads in server/k8s.go:67-70.

🔍 Issues and Recommendations

1. Init Container Security Context Missing

Issue: The init container lacks a securityContext, which violates Kubernetes security best practices outlined in CLAUDE.md.

From CLAUDE.md:

Always set SecurityContext for Job pods

Recommendation: Add security context to the init container:

initContainers:
- name: init-workspace
  image: image-registry.openshift-image-registry.svc:5000/vteam-dev/vteam-backend:latest
  command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 755 /workspace/sessions']
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    runAsNonRoot: true
  volumeMounts:
  - name: backend-state
    mountPath: /workspace

2. Init Container May Be Unnecessary

Issue: The persistMessageToS3 function already creates directories:

// websocket/hub.go:165
_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

Question: Is the init container actually needed, or does the issue stem from the PVC not being mounted at all in the original deployment?

Recommendation: Test whether removing the init container still works. If the backend creates directories correctly, the init container adds unnecessary complexity.

3. Missing Resource Limits

Issue: Neither the PVC creation step nor the init container have resource constraints.

Recommendation:

The PVC size (5Gi) seems reasonable for message storage, but should be validated against expected usage patterns.
Add resource requests/limits to the init container if kept:

initContainers:
- name: init-workspace
  # ...
  resources:
    requests:
      memory: "32Mi"
      cpu: "10m"
    limits:
      memory: "64Mi"
      cpu: "50m"

4. PVC Creation Timing

Issue: The PVC is created immediately before the backend deployment (crc-start.sh:307-311), which is correct. However, consider adding a wait condition to ensure the PVC is bound before deploying the backend.

Recommendation:

log "Creating backend PVC..."
oc apply -f "${MANIFESTS_DIR}/backend-pvc.yaml" -n "$PROJECT_NAME"

# Wait for PVC to be bound
log "Waiting for PVC to be bound..."
oc wait --for=condition=bound pvc/backend-state-pvc -n "$PROJECT_NAME" --timeout=60s || {
  log "WARNING: PVC did not bind within 60 seconds"
}

5. Storage Class Not Specified

Issue: The PVC does not specify a storageClassName, relying on the cluster default. This might work but could cause issues if CRC's default storage class has limitations.

Recommendation: Explicitly set the storage class for CRC:

spec:
  storageClassName: crc-csi-hostpath-provisioner  # CRC's default
  accessModes:
    - ReadWriteOnce

🧪 Testing Recommendations

Verify PVC Creation: After running make dev-start, check:

oc get pvc -n vteam-dev
oc describe pvc backend-state-pvc -n vteam-dev

Check Volume Mount: Verify the volume is mounted in the backend pod:

oc exec -n vteam-dev deployment/vteam-backend -- ls -la /workspace
oc exec -n vteam-dev deployment/vteam-backend -- df -h | grep workspace

Test Message Persistence: Create a session, send messages, then restart the backend pod and verify messages are still retrievable.

Check Permissions: Ensure the backend process can write to /workspace/sessions:

oc exec -n vteam-dev deployment/vteam-backend -- touch /workspace/sessions/test.txt

🐛 Potential Root Cause Analysis

Based on the error message you shared:

open /workspace/sessions/agentic-session-1761252727/messages.jsonl: no such file or directory

The issue is likely NOT that the directory does not exist (since os.MkdirAll is called), but rather:

PVC was never mounted in the original local-dev deployment (confirmed by comparing to origin/main)
The /workspace directory might be ephemeral (container filesystem), so restarts lose data
Permission issues - the backend might not have write permissions to the container's /workspace directory

Your fix (adding the PVC) should resolve issue #1, which is the most likely culprit.

📋 Suggested Changes Summary

✅ Add PVC: Already done
✅ Mount PVC: Already done
⚠️ Add security context to init container
❓ Consider removing init container (test without it first)
🔧 Add PVC wait condition to deployment script
🔧 Explicitly set storage class in PVC spec
🧪 Add the testing steps above to validate the fix

🎯 Next Steps

Address the security context issue (required by project standards)
Test whether the init container is actually necessary
Add the PVC wait condition for more robust deployments
Consider adding a comment in backend-pvc.yaml explaining why this PVC is needed for local dev (mirrors production pattern)

Overall, this is a solid fix that aligns with the production deployment pattern. With the minor adjustments above, it should reliably solve the message persistence issue in CRC. Great debugging work! 🚀

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

github-actions · 2025-10-29T20:35:34Z

Code Review: Update file system config for crc

Thank you for addressing the local development filesystem issue! This PR correctly identifies and fixes the root cause of the message persistence problem.

Positive Aspects

Correct Root Cause Identification: The error was correctly traced to missing PVC mount
Consistency with Production: Aligns with production deployment patterns
Security Best Practices: InitContainer properly configured with minimal permissions
Proper Resource Naming: Clear and descriptive naming

Critical Issues

1. StorageClassName Hardcoded to CRC

File: components/scripts/local-dev/manifests/backend-pvc.yaml:11

Problem: storageClassName: crc-csi-hostpath-provisioner is CRC-specific and will fail on other clusters.

Impact: Production PVC omits storageClassName to use cluster default. This divergence breaks portability.

Recommendation: Remove the storageClassName field entirely to match production.

2. No PVC Binding Verification

File: components/scripts/local-dev/crc-start.sh:307-308

Problem: No verification that PVC is bound before deploying backend.

Recommendation: Add wait condition to ensure PVC is bound before proceeding.

3. InitContainer Image Not Pinned

File: components/scripts/local-dev/manifests/backend-deployment.yaml:20

Recommendation: Pin to specific version (e.g., ubi9/ubi-minimal:9.5) instead of :latest

Testing Recommendations

Test PVC recreation workflow
Test message persistence across backend restarts
Test on fresh CRC instance
Test multi-session scenarios

Summary

Overall Assessment: ✅ Approve with Minor Changes

This PR correctly fixes the root cause. Implementation is sound and follows security best practices.

Primary blocker: Remove hardcoded storageClassName to match production

Secondary improvements: Add PVC binding verification, update documentation

Great work tracking this down! Once storageClassName is addressed, this should resolve the issues.

Related Files

Production PVC: components/manifests/workspace-pvc.yaml
Production Backend: components/manifests/backend-deployment.yaml
Backend State Logic: components/backend/websocket/hub.go:158-175

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

github-actions · 2025-10-29T20:51:29Z

Code Review: Update file system config for CRC

Thanks for working on this important fix! I've reviewed the PR and have several observations about the implementation. The core approach of adding a PVC for persistent storage is sound, but there are some areas that need attention.

🎯 Summary

This PR addresses the no such file or directory error when persisting messages in CRC environments by:

Adding a PVC (backend-state-pvc) for persistent storage
Mounting the PVC at /workspace in the backend pod
Using an init container to pre-create the sessions directory
Adding the comprehensive CONTRIBUTING.md documentation

✅ What's Working Well

1. Root Cause Analysis

Correctly identified that the backend needs persistent storage for the /workspace/sessions/ directory
The error open /workspace/sessions/agentic-session-1761252727/messages.jsonl: no such file or directory stems from missing directory structure

2. CONTRIBUTING.md

Excellent addition! This is comprehensive and well-structured
Clear setup instructions for CRC 2.54.0 (addresses known certificate issues)
Detailed troubleshooting section
Good coverage of development workflows and hot-reloading

3. Init Container Pattern

Using an init container to pre-create directories is a valid Kubernetes pattern
Security context follows best practices (drop all capabilities)

⚠️ Issues & Concerns

1. Critical: Directory Creation Logic Issue

Problem: The init container creates /workspace/sessions, but the backend code in websocket/hub.go:165 already does this:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

Why it might still fail:

The init container runs as ubi-minimal with user 1000700000 (OpenShift random UID)
The backend pod runs as a different UID (also OpenShift random UID)
Directory ownership mismatch could prevent the backend from writing files

Recommended Fix:

initContainers:
- name: init-workspace
  image: registry.access.redhat.com/ubi9/ubi-minimal:latest
  command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 777 /workspace/sessions']  # More permissive for multi-user
  securityContext:
    runAsNonRoot: true  # Add this for OpenShift compliance
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL

Or better yet, remove the init container entirely and let the backend's MkdirAll handle it (it should work if PVC permissions are correct).

2. Medium: Storage Class Hardcoded

storageClassName: crc-csi-hostpath-provisioner

Issues:

This storage class is CRC-specific and won't work on:
- Production OpenShift clusters (uses ocs-storagecluster-ceph-rbd or similar)
- Other Kubernetes distributions
- Cloud providers (AWS EBS, Azure Disk, GCP PD)

Recommended Fix:

# Remove storageClassName entirely to use cluster default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  # storageClassName: omit for default, or parameterize via env var

Or add logic in crc-start.sh to detect the environment and apply the correct storage class.

3. Medium: Security Context Missing in Backend Container

The backend container lacks a securityContext, which violates the patterns documented in CLAUDE.md:

Current:

containers:
- name: backend
  image: ...
  # No securityContext

Should be (per CLAUDE.md standards):

containers:
- name: backend
  image: ...
  securityContext:
    allowPrivilegeEscalation: false
    readOnlyRootFilesystem: false  # Can't be true due to /workspace writes
    capabilities:
      drop:
      - ALL
    runAsNonRoot: true  # OpenShift compliance

4. Low: STATE_BASE_DIR Environment Variable

The new STATE_BASE_DIR=/workspace env var is being set, but this is already the default in server/k8s.go:69:

StateBaseDir = os.Getenv("STATE_BASE_DIR")
if StateBaseDir == "" {
    StateBaseDir = "/workspace"  // Already defaults to /workspace
}

Impact: Harmless redundancy, but unnecessary.

Recommendation: Remove the env var unless you need a different path.

5. Low: PVC Size May Be Insufficient

storage: 5Gi

Considerations:

Messages for active sessions with large outputs could accumulate quickly
Multiple concurrent sessions multiply storage needs
No cleanup mechanism visible in the code

Recommendations:

Consider 10Gi or 20Gi for production workloads
Implement message retention/cleanup policies
Monitor PVC usage

6. Code Quality: Missing Error Handling

In websocket/hub.go:165, the directory creation error is silently ignored:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

Per CLAUDE.md standards:

"Never": Silent failures (always log errors)

Recommended Fix:

sessionsDir := fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID)
if err := os.MkdirAll(sessionsDir, 0o755); err != nil {
    log.Printf("Failed to create sessions directory %s: %v", sessionsDir, err)
    return  // Don't attempt to write if directory creation failed
}

🔒 Security Review

✅ Good:

Init container drops all capabilities
Using allowPrivilegeEscalation: false
PVC uses ReadWriteOnce (appropriate for single replica)

⚠️ Needs Attention:

Backend container missing security context (see issue Epic: Data Source Integration #3)
Directory permissions (755 vs 777) need testing with OpenShift random UIDs
No runAsNonRoot constraint (OpenShift will enforce, but should be explicit)

🚀 Performance Considerations

Positive:

PVC on CRC uses local hostpath storage (fast)
5Gi should handle moderate message volumes

Potential Issues:

No message cleanup strategy: Messages accumulate indefinitely
JSONL append pattern: Sequential writes are fine, but no indexing for retrieval
Shared PVC: If you scale backend to multiple replicas, ReadWriteOnce will block scheduling

🧪 Testing Recommendations

Before merging, please verify:

Basic Message Flow:

make dev-start
# Create session, send message, verify it appears in UI

PVC Persistence:

# Send messages
oc delete pod -l app=vteam-backend -n vteam-dev
# Wait for pod restart, verify messages still visible

Directory Permissions:

oc rsh deployment/vteam-backend -n vteam-dev
ls -la /workspace/sessions
# Verify backend can write files

Error Handling:

# Check logs for any permission errors
make dev-logs-backend

📋 Action Items

Must Fix Before Merge:

Test with actual message flow (author mentioned "worked once, but can't replicate")
Fix directory ownership/permissions if backend still can't write
Add security context to backend container
Remove hardcoded storage class or make it conditional

Should Fix:

Add error logging for MkdirAll in websocket/hub.go:165
Consider removing init container (redundant with MkdirAll)
Test on production OpenShift to validate PVC provisioning

Nice to Have:

Implement message cleanup policy (retention period)
Add PVC usage monitoring/alerts
Document message persistence behavior in user docs

🎓 Learning Notes

This PR highlights a common Kubernetes pattern challenge: stateful data in containerized environments. Key lessons:

Ephemeral vs Persistent Storage: Containers have ephemeral filesystems; use PVCs for data that must survive pod restarts
OpenShift Random UIDs: Init containers and app containers run as different UIDs, requiring careful permission management
Environment-Specific Configuration: Storage classes vary across environments (CRC vs production)

📚 References

💡 Next Steps

Address the "can't replicate success" issue - this suggests intermittent problems
Check OpenShift console logs for permission denied errors
Consider posting debug output from a failed attempt
Test the permission fix (chmod 777 or removing init container)

Happy to help debug further if you share logs from a failed attempt! 🔍

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

github-actions · 2025-10-29T22:58:39Z

PR Review: Update file system config for crc

I've reviewed PR #216 and the changes look good overall. The fix properly addresses the filesystem permission issue in CRC by adding a PVC and init container. Here's my detailed feedback:

✅ Strengths

Root Cause Fix: The PR correctly identifies and fixes the "no such file or directory" error by ensuring the /workspace/sessions directory exists before the backend tries to write messages.
Security Best Practices: The init container follows proper security patterns:
- allowPrivilegeEscalation: false
- Drops all capabilities
- Uses Red Hat UBI minimal image
Proper Ordering: The crc-start.sh script now creates the PVC before deploying the backend, ensuring the volume is available.
Consistency with Production: The PVC configuration aligns with the production manifest (components/manifests/workspace-pvc.yaml).
Excellent Documentation: The new CONTRIBUTING.md is comprehensive and provides clear CRC setup instructions that will help other developers reproduce the fix.

🔍 Issues & Recommendations

1. Security Context Missing on Backend Container (Medium Priority)

The production manifest (components/manifests/backend-deployment.yaml) doesn't have a securityContext on the backend container, but according to CLAUDE.md standards, it should:

CLAUDE.md requirement (line 1050-1057):

// Always set SecurityContext for Job pods
SecurityContext: &corev1.SecurityContext{
    AllowPrivilegeEscalation: boolPtr(false),
    ReadOnlyRootFilesystem:   boolPtr(false),  // Only if temp files needed
    Capabilities: &corev1.Capabilities{
        Drop: []corev1.Capability{"ALL"},
    },
},

Recommendation: Add a securityContext to the backend container in backend-deployment.yaml:

containers:
- name: backend
  image: image-registry.openshift-image-registry.svc:5000/vteam-dev/vteam-backend:latest
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
  # ... rest of container spec

This should be applied to both the local-dev and production manifests for consistency.

2. Init Container Permissions (Low Priority)

The init container runs as root (by default) to create directories. While this works, consider if it's necessary:

Current approach:

initContainers:
- name: init-workspace
  command: ['sh', '-c', 'mkdir -p /workspace/sessions && chmod 755 /workspace/sessions']

Alternative consideration: If the PVC mounts with proper permissions, the init container might not be needed at all since os.MkdirAll in line 165 of websocket/hub.go already creates directories:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

Question: Does CRC require the parent /workspace/sessions directory to exist before the backend can create subdirectories? If testing shows the init container is necessary, that's fine - just worth documenting why.

3. Storage Class Hardcoding (Medium Priority)

The PVC manifest hardcodes the CRC storage class:

storageClassName: crc-csi-hostpath-provisioner

Issue: This will only work in CRC environments. If someone tries to use these manifests elsewhere, it will fail.

Recommendation: Make this configurable or document it clearly. Options:

Add a comment explaining this is CRC-specific
Use a script variable or kustomize overlay
Document in CONTRIBUTING.md that this needs to be changed for other environments

The production manifest (components/manifests/workspace-pvc.yaml) doesn't specify storageClassName, which allows the cluster's default storage class to be used.

4. Error Handling in persistMessageToS3 (Low Priority)

Line 165 in websocket/hub.go silently ignores MkdirAll errors:

_ = os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755)

While the subsequent OpenFile will fail if the directory can't be created, it would be better to log the error for debugging:

if err := os.MkdirAll(fmt.Sprintf("%s/sessions/%s", StateBaseDir, message.SessionID), 0o755); err != nil {
    log.Printf("persistMessage: MkdirAll failed: %v", err)
    return
}

This isn't a blocker for this PR since it's existing code, but worth noting for future improvements.

📝 Documentation Quality

The CONTRIBUTING.md file is excellent:

Clear CRC version recommendation (2.54.0) with rationale
Step-by-step setup instructions
Troubleshooting section
Hot-reload development workflow
Aligns with existing CLAUDE.md standards

🧪 Testing Validation

Question for the author: Have you tested that:

Messages now persist correctly in CRC?
Messages survive backend pod restarts (PVC persistence)?
The init container successfully creates the directory structure?

⚡ Performance Considerations

No performance concerns - the changes are minimal and appropriate for local development.

🎯 Recommendations Summary

Before merging:

✅ Add securityContext to backend container (both local-dev and production manifests)
✅ Add comment explaining CRC-specific storage class or make it configurable

Optional improvements (can be separate PRs):
3. Consider if init container is truly necessary after adding PVC
4. Improve error logging in persistMessageToS3 (existing code improvement)

Conclusion

This is a solid fix that addresses a real issue preventing local development with CRC. The changes are well-thought-out and the documentation additions are valuable. With the minor security enhancements mentioned above, this will be ready to merge.

Great work identifying and fixing this issue! 🎉

Gkrumbach07

looks great

Update file system config for crc

b4add62

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

Address CodeRabbit comments

3afa35c

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

Fix concerns from CodeRabbit

9472c5b

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

Add CONTRIBUTING.md

22ad2f3

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

Fix more CodeRabbit concerns

ba99956

Signed-off-by: Bill Murdock <bmurdock@redhat.com>

jwm4 marked this pull request as ready for review October 29, 2025 23:02

Gkrumbach07 approved these changes Oct 31, 2025

View reviewed changes

Gkrumbach07 merged commit 8cad2f2 into ambient-code:main Oct 31, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update file system config for crc #216

Update file system config for crc #216

jwm4 commented Oct 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

github-actions bot commented Oct 29, 2025

Uh oh!

Gkrumbach07 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update file system config for crc #216

Update file system config for crc #216

Conversation

jwm4 commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 29, 2025

PR Review: Update file system config for CRC

Summary

✅ Positive Aspects

🔍 Issues & Recommendations

Critical: Missing Directory Initialization

Low: Inconsistent Label Values

Info: Missing storageClassName

Backend Code: Silent Error Handling

🧪 Testing Recommendations

📋 Pre-Merge Checklist

Summary

Uh oh!

github-actions bot commented Oct 29, 2025

Code Review: Update file system config for crc

✅ What's Working Well

🔍 Code Quality Issues

1. Missing Storage Class (Medium Priority)

2. Init Container Permissions (Low Priority)

3. Missing SecurityContext (Medium Priority)

🐛 Potential Issues

1. Init Container Image Pull (High Priority)

2. Directory Creation in Application Code (Low Priority)

📋 Testing Recommendations

🔒 Security Considerations

🎯 Performance Considerations

✨ Suggestions for Improvement

📝 Summary

Uh oh!

github-actions bot commented Oct 29, 2025

Code Review: Update file system config for crc

✅ Positive Aspects

🔍 Issues and Recommendations

1. Init Container Security Context Missing

2. Init Container May Be Unnecessary

3. Missing Resource Limits

4. PVC Creation Timing

5. Storage Class Not Specified

🧪 Testing Recommendations

🐛 Potential Root Cause Analysis

📋 Suggested Changes Summary

🎯 Next Steps

Uh oh!

github-actions bot commented Oct 29, 2025

Code Review: Update file system config for crc

Positive Aspects

Critical Issues

1. StorageClassName Hardcoded to CRC

2. No PVC Binding Verification

3. InitContainer Image Not Pinned

Testing Recommendations

Summary

Related Files

Uh oh!

github-actions bot commented Oct 29, 2025

Code Review: Update file system config for CRC

🎯 Summary

✅ What's Working Well

1. Root Cause Analysis

2. CONTRIBUTING.md

3. Init Container Pattern

⚠️ Issues & Concerns

1. Critical: Directory Creation Logic Issue

2. Medium: Storage Class Hardcoded

3. Medium: Security Context Missing in Backend Container

4. Low: STATE_BASE_DIR Environment Variable

5. Low: PVC Size May Be Insufficient

6. Code Quality: Missing Error Handling

🔒 Security Review

✅ Good:

⚠️ Needs Attention:

🚀 Performance Considerations

Positive:

Potential Issues:

jwm4 commented Oct 29, 2025 •

edited

Loading