Skip to content

Align shared PVC writers via fsGroup 2000 and umask 002#400

Merged
rockfordlhotka merged 1 commit into
mainfrom
fix/shared-pvc-fsgroup
May 12, 2026
Merged

Align shared PVC writers via fsGroup 2000 and umask 002#400
rockfordlhotka merged 1 commit into
mainfrom
fix/shared-pvc-fsgroup

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

Summary

  • Aligns the three writers of /rockbot/shared — agent (UID 999), shared-cleanup cronjob (UID 0), ephemeral script pods (UID 1000) — under a common fsGroup (2000), so each can read/write/delete the others' files
  • Prefixes the script-pod sh command with umask 002 so new dirs/files are group-writable (combined with setgid from fsGroup, the agent can later clean up script-pod artifacts)
  • Bumps agent to 0.10.61 and chart to 0.10.16

Why

Morning brief on 2026-05-12 failed when a subagent's execute_python_script tried os.makedirs('/rockbot/shared/attachments/teams-bridge-triage-2026-05-12') and got PermissionError: [Errno 13]. Root cause: /rockbot/shared/attachments was drwxr-xr-x 999:999 and the script pod runs as UID 1000 — three writers, three UIDs, no shared group bit. Already verified live in the cluster; this PR commits what was deployed.

Test plan

  • helm template renders fsGroup: 2000 on agent deployment, cleanup cronjob, and configmap (Scripts__Container__FsGroup)
  • dotnet test tests/RockBot.Scripts.Tests — 46 pass, 0 fail (existing Contains assertions unaffected by the umask 002 && prefix)
  • Deployed to live cluster (rockbot/default); agent pod's spec.securityContext.fsGroup = 2000, supplementary groups include 2000
  • Smoke pod (runAsUser:1000, fsGroup:2000) successfully created /rockbot/shared/attachments/teams-bridge-triage-smoke-test/hello.txt — the exact failing path from this morning
  • After umask change: smoke pod creates dirs drwxrwsr-x 1000:2000 (mode 2775), and agent pod (UID 999, group 2000) can rm -rf them

🤖 Generated with Claude Code

The agent (UID 999), shared-cleanup cronjob (UID 0), and ephemeral script
pods (UID 1000) all mount /rockbot/shared but had no shared group, so each
writer's dirs (mode 755) blocked the others. Script pods couldn't mkdir
inside attachments/, which broke the morning brief's Teams-bridge triage
this morning with PermissionError on /rockbot/shared/attachments/...

- Add shared.fsGroup (default 2000) to values.yaml
- Apply pod-level securityContext.fsGroup on agent deployment and cleanup
  cronjob templates
- Plumb FsGroup through ContainerScriptOptions and the configmap so the
  agent stamps the same fsGroup on script pods it creates
- Prefix the script pod sh command with 'umask 002 &&' so new files and
  dirs are group-writable; combined with setgid (from fsGroup), the three
  writers can now read, write, and delete each other's artifacts

Bumps agent to 0.10.61 and chart to 0.10.16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 0f49015 into main May 12, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the fix/shared-pvc-fsgroup branch May 12, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant