Docker image secret-leak audit: findings and follow-ups

## Summary

Read-only audit of every Dockerfile, docker-compose file, locally-built image, and CI workflow across smartem-decisions, smartem-devtools, and smartem-frontend, looking for credentials baked into image layers. No rebuilds, no pushes, no rotations. Current Dockerfiles produce clean images; one leak exists in an already-published GHCR image from June 2025, and a handful of hardening warnings apply across the three repos.

This issue is filed in smartem-devtools because that's where workspace-wide concerns live. The actual leak is in `ghcr.io/diamondlightsource/smartem-decisions:latest`, so remediation work will land in **smartem-decisions**.

Full report (local, in workspace scratch, not in any repo): `tmp/docker-audit/docs/security/docker-audit-2026-05-14.md`. Audit scratch (extracted rootfs, raw `.env` from the leaked image) is at `tmp/docker-audit/rootfs-*/` and `tmp/docker-audit/leaked.env*` — should be deleted once this issue is triaged.

## Scope

- Dockerfiles reviewed: `smartem-decisions/Dockerfile`, `smartem-decisions/Dockerfile.dev`, `smartem-frontend/Dockerfile`
- docker-compose reviewed: `smartem-devtools/keycloak-mock/docker-compose.yml`
- Images scanned: `ghcr.io/diamondlightsource/smartem-decisions:latest`, `smartem-decisions:latest` (local, 6 mo), `smartem-decisions:keycloak-local` (local, current)
- CI workflows reviewed: 9 across the three repos. Only `smartem-decisions/.github/workflows/release-smartem-decisions.yml` builds and publishes a container image.
- smartem-devtools has no Dockerfile; smartem-frontend has a Dockerfile but no CI builds it.

## Methodology

1. Static review of each Dockerfile against the checklist in the original audit brief (COPY hygiene, ARG-as-secret, USER directive, multi-stage hygiene, .dockerignore coverage).
2. `docker history --no-trunc` and `docker inspect` for each local image; checked Env, Labels, Cmd, Entrypoint, User, and every layer's CreatedBy.
3. Filesystem layer scan via `docker create` + `docker export` (per-image rootfs) + `gitleaks detect --no-git`. trufflehog 3.94.3 was unusable for this — its `docker --image` subcommand insists on registry-pulling rather than using the local daemon, and the GHCR repo is private.
4. CI grep for `build-arg`, `BUILD_ARG`, `NPM_TOKEN`, `GH_TOKEN`, `GITHUB_TOKEN`, `AWS_*`, `DOCKER_PASSWORD`, `REGISTRY_PASSWORD`.
5. For the one leak found: registry-visibility check (`curl -sI ghcr.io/v2/.../manifests/latest`), provenance check via `git show <rev>:.env.example`, hash-preimage check.

## Findings

### Critical (contingent on one DLS judgement call)

**`.env` and `.env.example` baked into `ghcr.io/diamondlightsource/smartem-decisions:latest`**

- Image labels: `revision=26a8be4274c505ac01a2d0c7943af398ba44d0bd`, `version=test-v0.0.2`, `created=2025-06-02T11:09:29Z`.
- Layer 5 of the image (`COPY /app/`, ~6.26 MB) contains `/app/.env` and `/app/.env.example` — byte-identical, 1466 bytes each.
- `Config.Cmd` = `["cd /app && source .env && python -m smartem_decisions.model.database && uvicorn ..."]` — the container actively sources `.env` at boot.
- **Audience for the leak**: GHCR repo is private. Anonymous manifest fetch returns `HTTP 401`; the `ghcr.io/token` endpoint refuses to issue an anonymous bearer token for the pull scope. Only DLS-authorised GHCR pullers can extract the file. This significantly narrows urgency.
- **What is in the leaked `.env`** (fingerprints, not raw values):
  - `POSTGRES_USER` = literal `username` — placeholder, not sensitive
  - `POSTGRES_PASSWORD` = literal `password` — placeholder, not sensitive
  - `RABBITMQ_USER` = literal `username` — placeholder
  - `RABBITMQ_PASSWORD` = literal `password` — placeholder
  - `GRAYLOG_ROOT_PASSWORD_SHA2` = 64-char SHA-256, fp `"e3c…951"`. **Verified placeholder** — equal to `echo -n "yourpassword" | sha256sum` byte-for-byte. No rotation needed.
  - `GRAYLOG_PASSWORD_SECRET` = 96-char `pwgen`-shaped string, fp `"bQai…Prx"`. **Unresolved.** Looks real, but I can't tell from the file alone whether it was ever deployed as a Graylog cluster secret. This is the single judgement call below.
- **Provenance**: the same values exist in the smartem-decisions git history at revision `26a8be4` (`git show 26a8be4:.env.example` reproduces them verbatim), so anyone with repo read access already sees them via `git log`. The image is not the exclusive leak channel.
- **Current state**: today's Dockerfile (production) installs from PyPI and only copies `alembic.ini` + `entrypoint.sh`. Today's `Dockerfile.dev` does `COPY . .` but `.dockerignore` excludes `.env*`. The fresh build `smartem-decisions:keycloak-local` has **0 leaks** per gitleaks. Today's `entrypoint.sh` sources `.env` only when `KUBERNETES_SERVICE_HOST` is unset (defensive).

### Warnings

1. **`smartem-decisions:latest` local (6 months old) carries `/app/.secrets.baseline`** — detect-secrets metadata listing where secret-shaped strings live in the codebase with file/line/hash. Not raw credentials but a recipe of where to look. gitleaks flagged 2 generic-api-key hits in this file. Newer builds don't ship it.

2. **`smartem-frontend/.dockerignore` is dangerously thin** — only 4 entries (`.react-router`, `build`, `node_modules`, `README.md`). The Dockerfile does `COPY . /app` in two stages. The repo currently has `apps/smartem/.env.local`. The final image only ships `/app/build`, but: (a) intermediate stages carry raw `.env.local` and would leak if registry-backed build cache is enabled; (b) Vite inlines `VITE_*` into the bundle. Today's `VITE_*` values are public-by-design OIDC public-client fields (`VITE_KEYCLOAK_URL`, `VITE_KEYCLOAK_REALM`, `VITE_KEYCLOAK_CLIENT_ID`, `VITE_AUTH_ENABLED`), so no current leak — but no guardrail either. The Dockerfile is **not built by any CI workflow**, so there is nothing currently published to remediate.

3. **`smartem-frontend` Dockerfile has no CMD/ENTRYPOINT and no USER** — final stage copies artefacts but never declares how to serve them; runs as root by default. Not a secrets issue, flagged for hygiene.

4. **`smartem-decisions/Dockerfile.dev` final stage copies `/app/` wholesale** — line 81, `COPY --from=build /app/ /app/`. Current `.dockerignore` is comprehensive, so this is image bloat rather than a secrets leak today, but every new file at the repo root is one `.dockerignore`-rule-away from leak.

5. **Neither `smartem-decisions/Dockerfile` nor `Dockerfile.dev` sets `USER`** in the final stage. The conditional `useradd` only fires when `groupid != 0`, and `USER` is never declared. Documented as intentional ("suitable for CI/CD pipelines and local development"); flagged for awareness if the K8s `runAsUser` override is ever removed.

6. **`smartem-devtools/.env.k8s.development`** sits unencrypted at the repo root with real `DOCKER_USERNAME`, `POSTGRES_PASSWORD`, `RABBITMQ_PASSWORD`. Properly gitignored, and the repo has no Dockerfile so nothing can bake it in. Flagged for awareness — if a Dockerfile is ever added here, this file plus a thin `.dockerignore` becomes a future leak.

## CI / build-arg review

- The only `build-arg` in any workflow is `SMARTEM_VERSION=...` in `smartem-decisions/.github/workflows/release-smartem-decisions.yml:471` — a non-secret version string.
- All `GITHUB_TOKEN` / `GH_TOKEN` references are runtime-only (passed via `env:` to `docker/login-action` or `softprops/action-gh-release`); none are passed as `--build-arg`.
- No `NPM_TOKEN`, `AWS_*`, `DOCKER_PASSWORD`, `REGISTRY_PASSWORD` patterns.

## Tooling notes

- trufflehog v3.94.3 unusable here — `docker --image` insists on registry-pulling, ignoring the local daemon. Worked around with `docker create` + `docker export` + gitleaks 8.30.0.
- trivy and dockle not installed; skipped per audit brief.

## Recommended next actions

Ordered. Nothing applied — left as a checklist.

- [ ] **One judgement call on Graylog**: confirm whether `GRAYLOG_PASSWORD_SECRET` fp `"bQai…Prx"` was ever used as a real Graylog cluster `password_secret`. If yes, rotate (accepting that all encrypted Graylog rows and active sessions invalidate, per the in-file comment); if no, the leak is fully placeholder.
- [ ] **Clean up the published GHCR image** regardless of the above. Untag/delete `ghcr.io/diamondlightsource/smartem-decisions:latest` and the `test-v0.0.2` digest, then republish `latest` from the current Dockerfile.
- [ ] **Decide on git-history cleanup** for `smartem-decisions` — only needed if the Graylog secret turns out to have been real. Either accept the historical leak (after rotation) or scrub with `git filter-repo` / BFG.
- [ ] **Harden `smartem-frontend/.dockerignore`** before that image is ever built and published. Deny-list the standard sensitive-file set or flip to allow-list.
- [ ] **Trim `smartem-decisions/Dockerfile.dev` final stage** to copy only `/venv/`, `entrypoint.sh`, and `alembic.ini` rather than the whole `/app/`. Defence in depth.
- [ ] **Delete the audit scratch files** once this issue has been picked up: `rm -rf /home/vredchenko/dev/ERIC/tmp/docker-audit/rootfs-* /home/vredchenko/dev/ERIC/tmp/docker-audit/leaked.env /home/vredchenko/dev/ERIC/tmp/docker-audit/leaked.env.example`. The report itself (`docker-audit-2026-05-14.md`) can stay or be moved into a real `docs/security/` location.

## Conversation-log hygiene heads-up

During provenance verification I ran `git show 26a8be4:.env.example`, which printed the raw `GRAYLOG_PASSWORD_SECRET` value (96 chars between quotes) into the audit session's tool output. The report and this issue redact to fingerprints, but the raw value lives in the Claude Code transcript for that session. If any part of that transcript gets pasted into another ticket, Slack message, PR description, or another LLM, scrub that one tool-output block first.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker image secret-leak audit: findings and follow-ups #199

Summary

Scope

Methodology

Findings

Critical (contingent on one DLS judgement call)

Warnings

CI / build-arg review

Tooling notes

Recommended next actions

Conversation-log hygiene heads-up

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docker image secret-leak audit: findings and follow-ups #199

Description

Summary

Scope

Methodology

Findings

Critical (contingent on one DLS judgement call)

Warnings

CI / build-arg review

Tooling notes

Recommended next actions

Conversation-log hygiene heads-up

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions