Use native fsGroup for Hermes PVC ownership#514
Conversation
PR #481 only repaired hermes-<id> volumes after hermes.Sync (master agent). Child agents live under agent-<name> and are provisioned by the controller or agent-factory without that path, so hermes-data stayed 1000:1000 while Hermes runs as 10000:10000 and crash-looped on Permission denied under /data/.hermes. Extend EnsureHermesDataPVCOwnership to agent-<name>/hermes-data, call it from obol agent new and obol sell demo quant, and add obol agent repair-perms for factory-only creates that cannot docker-exec the k3d node from in-cluster. Co-authored-by: Cursor <cursoragent@cursor.com>
Linux coverage validationValidated PR head Environment:
Commands run:
Full-suite note:
PR-specific Linux coverage checked:
Verdict: ✅ PR-specific Linux validation passes. The only observed full-suite failure is already present on |
macOS/k3d validation reportValidated this PR on a local macOS arm64 host against the running k3d stack. Go test and coverageCommand: /usr/local/bin/go test -coverprofile=/tmp/obol-pr514/coverage.out ./internal/hermes ./internal/serviceoffercontroller ./internal/agentcrd ./cmd/obol -count=1
/usr/local/bin/go tool cover -func=/tmp/obol-pr514/coverage.outResult: pass. Package coverage from the scoped run:
Live master Hermes validationBuilt the PR binary locally and synced the existing stack-managed Hermes agent: go build -o /tmp/obol-pr514/obol ./cmd/obol
/tmp/obol-pr514/obol agent sync --runtime hermes obol-agentResult: pass. Live Deployment assertions: {"fsGroup":10000,"fsGroupChangePolicy":"OnRootMismatch","runAsGroup":10000,"runAsUser":10000}
id
# uid=10000(hermes) gid=10000(hermes) groups=10000(hermes)
touch /data/.hermes/.pr514-fsgroup-smoke && rm /data/.hermes/.pr514-fsgroup-smokeLive CRD child-agent validationThe running cluster initially had the older pinned OBOL_DEVELOPMENT=true OBOL_FORCE_REBUILD_LOCAL_DEV_IMAGES=serviceoffer-controller /tmp/obol-pr514/obol stack upResult: pass. Post-sync assertions:
Then created a temporary child Agent CR: /tmp/obol-pr514/obol agent new pr514-smoke --model qwen3.5:9b --skills addresses --objective "PR 514 fsGroup smoke"Live child Deployment assertions: {"fsGroup":10000,"fsGroupChangePolicy":"OnRootMismatch","runAsGroup":10000,"runAsUser":10000}
id
# uid=10000(hermes) gid=10000(hermes) groups=10000(hermes)
touch /data/.hermes/.pr514-child-fsgroup-smoke && rm /data/.hermes/.pr514-child-fsgroup-smokeCleanup completed:
Notes
|
Summary
Why
The ownership fix belongs in the Kubernetes pod security context so kubelet applies it consistently to master and child Hermes PVC mounts. The previous host-side chown path added CLI surface area, host path plumbing, and child-agent-specific call sites.
Validation