Skip to content

feat(security): Restricted Pod Security Standard across embedded workloads#521

Closed
bussyjd wants to merge 1 commit into
mainfrom
feat/restricted-pss-sweep
Closed

feat(security): Restricted Pod Security Standard across embedded workloads#521
bussyjd wants to merge 1 commit into
mainfrom
feat/restricted-pss-sweep

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 23, 2026

Why

Architecture review surfaced this as P0 security: grep across every embedded Deployment manifest returned zero hits for securityContext, runAsNonRoot, readOnlyRootFilesystem, or seccompProfile. The serviceoffer-controller Dockerfile uses gcr.io/distroless/static-debian12 (not :nonroot) — it runs as UID 0. Container escape via a Go runtime CVE on a UID-0, no-seccomp, no-cap-drop, RW-rootfs container is the easiest path to host pivot on k3s single-node.

Before

   Every embedded pod:
     runAsUser:        not set        -> UID 0 (root) inside container
     runAsNonRoot:     false (default)
     readOnlyRootFs:   false           -> can scribble /usr, /etc, anywhere
     capabilities:     [default]       -> keeps DAC_OVERRIDE, NET_BIND_SERVICE, etc.
     seccompProfile:   Unconfined      -> all syscalls allowed
     namespace PSS:    none            -> no admission gate

   Result: a Go runtime CVE in the controller -> root in pod -> trivially
           escape to host on single-node k3s.

After

   Every embedded pod:
     runAsUser:        65532
     runAsNonRoot:     true
     readOnlyRootFs:   true (named emptyDir where writes are needed)
     capabilities:     drop: [ALL]
     seccompProfile:   RuntimeDefault
     namespace PSS:    enforce: restricted (apiserver-side admission gate)

   serviceoffer-controller Dockerfile:
     gcr.io/distroless/static-debian12 -> :nonroot variant

Files

  • Dockerfile.serviceoffer-controller:nonroot base (UID 65532)
  • internal/embed/infrastructure/base/templates/x402.yaml — verifier + controller pod/container securityContext, x402 ns PSS label
  • internal/embed/infrastructure/base/templates/llm.yaml — litellm + x402-buyer securityContext, litellm-tmp + litellm-home emptyDir mounts (HOME / XDG_CACHE_HOME / HF_HOME redirected onto them), llm ns PSS label

Survey

Deployment File Container UID (before) Writes to disk Now under Restricted
x402-verifier x402.yaml 65532 (Dockerfile already :nonroot) none (mounts RO ConfigMaps) yes
serviceoffer-controller x402.yaml 0 (root) none yes — Dockerfile flipped to :nonroot
litellm (Python) llm.yaml per upstream image /tmp, $HOME, HF cache yes — emptyDir at /tmp + /home/litellm
x402-buyer sidecar llm.yaml 65532 (Dockerfile already :nonroot) /state (already emptyDir) yes
local-path-provisioner local-path.yaml (kube-system) out of scope (k3d-managed)
hermes default agent internal/hermes/hermes.go (generated) configurable UID, init-hermes-perms runs as UID 0 for chown /data (PVC) out of scope (dynamic, init container legitimately root)
cloudflared internal/embed/infrastructure/cloudflared/... out of scope (separate Helm chart)

No third-party image had to stay root.

What may break

  • LiteLLM (Python) writes outside /tmp / $HOME would now fail with EROFS. Mitigated by adding the two emptyDir mounts + HOME=/home/litellm, XDG_CACHE_HOME, HF_HOME. Watch release-smoke for Read-only file system errors on first paid call.
  • PSS ns label is enforce: restricted — a future Deployment edit that omits per-pod securityContext gets rejected at admission. Intentional.

Test plan

  • go build ./... clean
  • go test ./internal/embed/... ./internal/x402/... ./internal/serviceoffercontroller/... — green
  • Full go test ./... — only pre-existing failure (TestWarnIfNoChatModel_EmitsWarnWhenNoModels in internal/stack) reproduces on origin/main, unrelated
  • YAML parses with PyYAML loader
  • Manual on next stack up: kubectl get pods -A confirms all run; kubectl logs clean for litellm cold start
  • Release-smoke: flow-11 (USDC live Base Sepolia) + flow-14 (OBOL Permit2) green

Reference

…loads

Brings every embedded Deployment shipped by obol-stack up to PSS Restricted:
  - runAsNonRoot: true with fixed non-zero UID/GID (65532)
  - allowPrivilegeEscalation: false
  - capabilities.drop: [ALL]
  - seccompProfile: RuntimeDefault
  - readOnlyRootFilesystem: true (with named emptyDir mounts where Python
    needs writeable /tmp and HOME/.cache)

PSS labels (enforce=restricted, audit/warn=restricted) added to the x402
and llm namespaces so future Deployment edits that omit per-pod
securityContext are rejected at admission.

Also switches the serviceoffer-controller Dockerfile from
gcr.io/distroless/static-debian12 (UID 0) to ...:nonroot (UID 65532).
Container escape via a Go runtime CVE on a UID-0 / no-seccomp /
no-cap-drop / RW-rootfs container was the easiest path to host pivot
on k3s single-node; this closes it.

Files touched:
  - Dockerfile.serviceoffer-controller (:nonroot base)
  - internal/embed/infrastructure/base/templates/x402.yaml
    (verifier + controller securityContext blocks, x402 ns PSS label)
  - internal/embed/infrastructure/base/templates/llm.yaml
    (litellm + x402-buyer securityContext, litellm-tmp + litellm-home
     emptyDir mounts with HOME/XDG_CACHE_HOME/HF_HOME redirection,
     llm ns PSS label)

Scope notes:
  - local-path-provisioner lives in kube-system (k3d-managed); not
    relabeled per PSS guidance to skip system namespaces.
  - hermes-obol-agent runtime is generated dynamically by
    serviceoffer-controller (internal/serviceoffercontroller/agent_render.go
    and internal/hermes/hermes.go), not from the embedded templates;
    its init-hermes-perms initContainer legitimately runs as UID 0
    for /data chown and is intentionally left out of this PR's scope.
  - cloudflared chart (internal/embed/infrastructure/cloudflared/...)
    is a separate Helm chart and not in this PR's file list.

What may break:
  - LiteLLM with readOnlyRootFilesystem may fail if it writes outside
    /tmp or $HOME — watch the next release-smoke for permission-denied
    errors and add named emptyDir mounts for any new write paths.
@bussyjd
Copy link
Copy Markdown
Collaborator Author

bussyjd commented May 24, 2026

Superseded by bundle PR #536 — closing in favor of the consolidated merge target. Original branch and history preserved.

@bussyjd bussyjd closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant