Skip to content

Docker image secret-leak audit: findings and follow-ups #199

@vredchenko

Description

@vredchenko

Summary

Read-only audit of every Dockerfile, docker-compose file, locally-built image, and CI workflow across smartem-decisions, smartem-devtools, and smartem-frontend, looking for credentials baked into image layers. No rebuilds, no pushes, no rotations. Current Dockerfiles produce clean images; one leak exists in an already-published GHCR image from June 2025, and a handful of hardening warnings apply across the three repos.

This issue is filed in smartem-devtools because that's where workspace-wide concerns live. The actual leak is in ghcr.io/diamondlightsource/smartem-decisions:latest, so remediation work will land in smartem-decisions.

Full report (local, in workspace scratch, not in any repo): tmp/docker-audit/docs/security/docker-audit-2026-05-14.md. Audit scratch (extracted rootfs, raw .env from the leaked image) is at tmp/docker-audit/rootfs-*/ and tmp/docker-audit/leaked.env* — should be deleted once this issue is triaged.

Scope

  • Dockerfiles reviewed: smartem-decisions/Dockerfile, smartem-decisions/Dockerfile.dev, smartem-frontend/Dockerfile
  • docker-compose reviewed: smartem-devtools/keycloak-mock/docker-compose.yml
  • Images scanned: ghcr.io/diamondlightsource/smartem-decisions:latest, smartem-decisions:latest (local, 6 mo), smartem-decisions:keycloak-local (local, current)
  • CI workflows reviewed: 9 across the three repos. Only smartem-decisions/.github/workflows/release-smartem-decisions.yml builds and publishes a container image.
  • smartem-devtools has no Dockerfile; smartem-frontend has a Dockerfile but no CI builds it.

Methodology

  1. Static review of each Dockerfile against the checklist in the original audit brief (COPY hygiene, ARG-as-secret, USER directive, multi-stage hygiene, .dockerignore coverage).
  2. docker history --no-trunc and docker inspect for each local image; checked Env, Labels, Cmd, Entrypoint, User, and every layer's CreatedBy.
  3. Filesystem layer scan via docker create + docker export (per-image rootfs) + gitleaks detect --no-git. trufflehog 3.94.3 was unusable for this — its docker --image subcommand insists on registry-pulling rather than using the local daemon, and the GHCR repo is private.
  4. CI grep for build-arg, BUILD_ARG, NPM_TOKEN, GH_TOKEN, GITHUB_TOKEN, AWS_*, DOCKER_PASSWORD, REGISTRY_PASSWORD.
  5. For the one leak found: registry-visibility check (curl -sI ghcr.io/v2/.../manifests/latest), provenance check via git show <rev>:.env.example, hash-preimage check.

Findings

Critical (contingent on one DLS judgement call)

.env and .env.example baked into ghcr.io/diamondlightsource/smartem-decisions:latest

  • Image labels: revision=26a8be4274c505ac01a2d0c7943af398ba44d0bd, version=test-v0.0.2, created=2025-06-02T11:09:29Z.
  • Layer 5 of the image (COPY /app/, ~6.26 MB) contains /app/.env and /app/.env.example — byte-identical, 1466 bytes each.
  • Config.Cmd = ["cd /app && source .env && python -m smartem_decisions.model.database && uvicorn ..."] — the container actively sources .env at boot.
  • Audience for the leak: GHCR repo is private. Anonymous manifest fetch returns HTTP 401; the ghcr.io/token endpoint refuses to issue an anonymous bearer token for the pull scope. Only DLS-authorised GHCR pullers can extract the file. This significantly narrows urgency.
  • What is in the leaked .env (fingerprints, not raw values):
    • POSTGRES_USER = literal username — placeholder, not sensitive
    • POSTGRES_PASSWORD = literal password — placeholder, not sensitive
    • RABBITMQ_USER = literal username — placeholder
    • RABBITMQ_PASSWORD = literal password — placeholder
    • GRAYLOG_ROOT_PASSWORD_SHA2 = 64-char SHA-256, fp "e3c…951". Verified placeholder — equal to echo -n "yourpassword" | sha256sum byte-for-byte. No rotation needed.
    • GRAYLOG_PASSWORD_SECRET = 96-char pwgen-shaped string, fp "bQai…Prx". Unresolved. Looks real, but I can't tell from the file alone whether it was ever deployed as a Graylog cluster secret. This is the single judgement call below.
  • Provenance: the same values exist in the smartem-decisions git history at revision 26a8be4 (git show 26a8be4:.env.example reproduces them verbatim), so anyone with repo read access already sees them via git log. The image is not the exclusive leak channel.
  • Current state: today's Dockerfile (production) installs from PyPI and only copies alembic.ini + entrypoint.sh. Today's Dockerfile.dev does COPY . . but .dockerignore excludes .env*. The fresh build smartem-decisions:keycloak-local has 0 leaks per gitleaks. Today's entrypoint.sh sources .env only when KUBERNETES_SERVICE_HOST is unset (defensive).

Warnings

  1. smartem-decisions:latest local (6 months old) carries /app/.secrets.baseline — detect-secrets metadata listing where secret-shaped strings live in the codebase with file/line/hash. Not raw credentials but a recipe of where to look. gitleaks flagged 2 generic-api-key hits in this file. Newer builds don't ship it.

  2. smartem-frontend/.dockerignore is dangerously thin — only 4 entries (.react-router, build, node_modules, README.md). The Dockerfile does COPY . /app in two stages. The repo currently has apps/smartem/.env.local. The final image only ships /app/build, but: (a) intermediate stages carry raw .env.local and would leak if registry-backed build cache is enabled; (b) Vite inlines VITE_* into the bundle. Today's VITE_* values are public-by-design OIDC public-client fields (VITE_KEYCLOAK_URL, VITE_KEYCLOAK_REALM, VITE_KEYCLOAK_CLIENT_ID, VITE_AUTH_ENABLED), so no current leak — but no guardrail either. The Dockerfile is not built by any CI workflow, so there is nothing currently published to remediate.

  3. smartem-frontend Dockerfile has no CMD/ENTRYPOINT and no USER — final stage copies artefacts but never declares how to serve them; runs as root by default. Not a secrets issue, flagged for hygiene.

  4. smartem-decisions/Dockerfile.dev final stage copies /app/ wholesale — line 81, COPY --from=build /app/ /app/. Current .dockerignore is comprehensive, so this is image bloat rather than a secrets leak today, but every new file at the repo root is one .dockerignore-rule-away from leak.

  5. Neither smartem-decisions/Dockerfile nor Dockerfile.dev sets USER in the final stage. The conditional useradd only fires when groupid != 0, and USER is never declared. Documented as intentional ("suitable for CI/CD pipelines and local development"); flagged for awareness if the K8s runAsUser override is ever removed.

  6. smartem-devtools/.env.k8s.development sits unencrypted at the repo root with real DOCKER_USERNAME, POSTGRES_PASSWORD, RABBITMQ_PASSWORD. Properly gitignored, and the repo has no Dockerfile so nothing can bake it in. Flagged for awareness — if a Dockerfile is ever added here, this file plus a thin .dockerignore becomes a future leak.

CI / build-arg review

  • The only build-arg in any workflow is SMARTEM_VERSION=... in smartem-decisions/.github/workflows/release-smartem-decisions.yml:471 — a non-secret version string.
  • All GITHUB_TOKEN / GH_TOKEN references are runtime-only (passed via env: to docker/login-action or softprops/action-gh-release); none are passed as --build-arg.
  • No NPM_TOKEN, AWS_*, DOCKER_PASSWORD, REGISTRY_PASSWORD patterns.

Tooling notes

  • trufflehog v3.94.3 unusable here — docker --image insists on registry-pulling, ignoring the local daemon. Worked around with docker create + docker export + gitleaks 8.30.0.
  • trivy and dockle not installed; skipped per audit brief.

Recommended next actions

Ordered. Nothing applied — left as a checklist.

  • One judgement call on Graylog: confirm whether GRAYLOG_PASSWORD_SECRET fp "bQai…Prx" was ever used as a real Graylog cluster password_secret. If yes, rotate (accepting that all encrypted Graylog rows and active sessions invalidate, per the in-file comment); if no, the leak is fully placeholder.
  • Clean up the published GHCR image regardless of the above. Untag/delete ghcr.io/diamondlightsource/smartem-decisions:latest and the test-v0.0.2 digest, then republish latest from the current Dockerfile.
  • Decide on git-history cleanup for smartem-decisions — only needed if the Graylog secret turns out to have been real. Either accept the historical leak (after rotation) or scrub with git filter-repo / BFG.
  • Harden smartem-frontend/.dockerignore before that image is ever built and published. Deny-list the standard sensitive-file set or flip to allow-list.
  • Trim smartem-decisions/Dockerfile.dev final stage to copy only /venv/, entrypoint.sh, and alembic.ini rather than the whole /app/. Defence in depth.
  • Delete the audit scratch files once this issue has been picked up: rm -rf /home/vredchenko/dev/ERIC/tmp/docker-audit/rootfs-* /home/vredchenko/dev/ERIC/tmp/docker-audit/leaked.env /home/vredchenko/dev/ERIC/tmp/docker-audit/leaked.env.example. The report itself (docker-audit-2026-05-14.md) can stay or be moved into a real docs/security/ location.

Conversation-log hygiene heads-up

During provenance verification I ran git show 26a8be4:.env.example, which printed the raw GRAYLOG_PASSWORD_SECRET value (96 chars between quotes) into the audit session's tool output. The report and this issue redact to fingerprints, but the raw value lives in the Claude Code transcript for that session. If any part of that transcript gets pasted into another ticket, Slack message, PR description, or another LLM, scrub that one tool-output block first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchInvestigation, spikes, or proof-of-concept worksecuritySecurity fixes, audits, or vulnerability remediationsmartem-backendCore backend services, messaging, and persistence layer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions