fix(ci): defense-in-depth filename guard for .env* files (bug_env_file_corrupted_during_session)#94
Conversation
Adds scripts/ci/check-no-env-files.sh + a self-test driver + a new secrets-files-guard job in pr.yml. Fails any PR that adds or modifies a .env-family file other than .env.example, regardless of file content. Defense-in-depth for bug_env_file_corrupted_during_session: catches the near-miss where a rotated .env.old slipped past .gitignore (or where .gitignore is regressed / git add -f is used). Gitleaks in pre-commit is content-based and would pass an empty .env.old; this guard is filename- based and catches the structural signal. Pure bash, no python/pytest dependency. 15 canned regression cases run as a workflow step before the real-diff check, so a broken regex fails immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…angential idea files
bug_fix.md documents the locked design for the secrets-files-guard job
shipped in the previous commit: detection mechanism, regex shape,
failure mode, position in the workflow, and the open question on what
local tool originally renamed .env -> .env.old (deferred — cannot
reproduce 4 days post-incident).
Tangential follow-ups captured per CLAUDE.md "Tangential discoveries":
- chore_ci_gitignore_paths_ignore_gap — pr.yml paths-ignore skips the
whole workflow on .gitignore-only PRs, undermining the guard.
- chore_ci_gitleaks_workflow_step — gitleaks runs only in pre-commit
locally; CI should also content-scan to complement the filename guard.
Includes MVP1 dashboard regen for the new planned-feature folders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unquoted ${FORBIDDEN} expansion in check-no-env-files.sh is deliberate
— newline-splitting drives printf format-reuse so each forbidden file
prints on its own bullet line. Quoting would collapse the output to a
single line. Annotate the intent so future shellcheck runs don't re-flag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a CI guard to prevent sensitive .env files from being committed, featuring a detection script, regression tests, and updated project documentation. Reviewers noted that the required updates to the GitHub Actions workflow file are missing. Additionally, suggestions were provided to enhance the guard script's robustness by including renamed and copied files in the diff check and improving the handling of file paths that may contain spaces.
| 4. **Workflow position — standalone `secrets-files-guard` job, ~20s, parallel | ||
| with `backend`/`frontend`/`docker`.** Independent PR check; no service | ||
| containers needed. Cites: existing job-per-concern pattern in pr.yml. |
There was a problem hiding this comment.
| pull_request) | ||
| local base="${GITHUB_BASE_REF:-main}" | ||
| git fetch --no-tags --depth=50 origin "${base}" >/dev/null 2>&1 || true | ||
| git diff --name-only --diff-filter=AM "origin/${base}...HEAD" |
There was a problem hiding this comment.
The --diff-filter=AM only includes Added and Modified files. It is recommended to also include Renamed (R) and Copied (C) files to ensure that forbidden files introduced through these operations (e.g., renaming a tracked file to .env) are also caught by the guard.
| git diff --name-only --diff-filter=AM "origin/${base}...HEAD" | |
| git diff --name-only --diff-filter=ACMR "origin/${base}...HEAD" |
| git diff --name-only --diff-filter=AM "origin/${base}...HEAD" | ||
| ;; | ||
| push) | ||
| git diff --name-only --diff-filter=AM HEAD~1 HEAD |
| # shellcheck disable=SC2086 # Word-split is intentional: one bullet per file via printf format-reuse. | ||
| printf ' - %s\n' ${FORBIDDEN} |
There was a problem hiding this comment.
Relying on word splitting for ${FORBIDDEN} will fail if any file path contains spaces. While .env files typically don't have spaces, it's better to handle the newline-separated list robustly using sed or a while read loop.
| # shellcheck disable=SC2086 # Word-split is intentional: one bullet per file via printf format-reuse. | |
| printf ' - %s\n' ${FORBIDDEN} | |
| printf '%s' "${FORBIDDEN}" | sed 's/^/ - /' |
…emini #2/3/4) Accepted findings from Gemini Code Assist on PR #94: - --diff-filter=AM -> ACMR. High-similarity `git mv foo.txt .env.old` is reported as Renamed by git, not Added; AM alone would miss it. ACMR (Added, Copied, Modified, Renamed) closes the rename/copy bypass. - Replace unquoted ${FORBIDDEN} word-split with `sed 's/^/ - /'`. The regex's `[^/]+` permits spaces in `.env.<...>` filenames; the previous word-splitting approach would mis-split on space. The shellcheck SC2086 disable is no longer needed and is removed. - New regression case `.env.foo bar.bak` proves the space-handling fix. All 16 cases pass. Rejected finding #1 (Gemini High): claimed pr.yml workflow update was missing. Counter-evidence: `gh pr view 94 --json files` shows .github/workflows/pr.yml with +22/-0 in this PR (the secrets-files-guard job). Gemini reviewed hunks without seeing the workflow change in the same PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review adjudication (Gemini Code Assist)Commit landing fixes: Gemini Code Assist (4 findings)
Outcomes
CI re-run on |
Accepted GPT-5.5 final-review finding #1: the implicit `git fetch origin <base>` does not deterministically create refs/remotes/origin/<base>; under some actions/checkout modes (especially forked PRs) the diff would silently fail to enumerate changes and the guard would no-op. Replace with an explicit refspec and add an `origin/<base>` rev-parse check that exits 2 with a clear error if resolution fails — refusing to silently skip the guard. CI runs prior to this fix passed because actions/checkout's fetch-depth=50 already populated origin/main in the typical case; the fix removes the brittleness rather than fixing a current-observable bug. Deferred GPT-5.5 #2 (workflow paths-ignore inheritance): already captured as chore_ci_gitignore_paths_ignore_gap. Counter-evidence: bug_fix.md Tangential Observations §1. Deferred GPT-5.5 #3 (regex misses .env-old / .env_bak / .env~): captured as a new chore_env_guard_extend_deny_pattern idea file. Not a regression of this PR — the gitignore and the guard are currently consistent (both dotted-only); broadening one requires broadening the other, and that paired change deserves its own focused PR. Includes MVP1 dashboard regen for the new chore_env_guard_extend_deny_pattern folder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review adjudication (GPT-5.5 final review)Commit landing fixes: GPT-5.5 final review (3 findings)
Outcomes
CI re-run on Ready for human review + merge. |
Total run drops from ~7m20s to an expected ~5m45s by addressing the three highest-impact-per-risk targets identified from per-step timing on PRs #94 / #95: 1. **Drop `needs: [backend, frontend]` from the docker job** (~65s). buildx tests whether the Dockerfile builds; it does not consume test artifacts. The dependency was defensive, not necessary. With it removed, docker runs in parallel with backend/frontend instead of being serialized after them — the critical path was previously backend (6m06s) → docker (1m05s); now it's just backend. 2. **Cache .mypy_cache + .ruff_cache across runs** (~20s). Both tools invalidate per-file internally — the cache directory is safe to persist. Cold mypy is ~27s today; warm typically ~5s. Keyed on uv.lock + pyproject.toml (dep/tool-config changes); a restore-keys fallback to the runner-OS prefix means partial hits when only one input changes. 3. **Lower ES + OpenSearch heap from 512m → 256m** (~10-15s). The CI integration-test workload uses small synthetic indexes without concurrency; 256m is amply sufficient. Halving the heap makes container startup + health-check completion faster. Risk profile: all three changes are low-risk additive YAML edits. - #1 (docker parallel): zero risk — buildx never depended on test outcomes. - #2 (cache step): cache-miss path is identical to today (cold run). - #3 (heap halving): only risk is OOM during a test; if observed, trivially revert this single line. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eedup wave entry (#98) Closes out the bug by moving the folder from planned_features/ to implemented_features/2026_05_13_bug_env_file_corrupted_during_session/. The defense-in-depth scope shipped in PR #94 (filename-pattern CI guard catching `.env*` files in PR diffs). The original local-tooling rename event from 2026-05-09 remains undetermined — captured as an open question in bug_fix.md for user-side investigation (cannot reproduce 4 days post-incident from agent side). State.md updates: - New "Most recent meaningful changes" entry covering PRs #94 / #95 / #96 / #97 (CI-speedup wave: defense-in-depth env guard, --ship flag, 3 critical-path optimizations cutting wall time 7m20s → 5m55s, unit-test fast lane). - Bug removed from "Tangential bugs captured during the bootstrap" list; marked resolved with pointer to the implemented_features folder. - Last-updated header refreshed. Includes MVP1 dashboard regen for the folder relocation. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oader regex (#99) * infra(ci): extract secrets-files-guard to own workflow + add gitleaks (chore_ci_gitignore_paths_ignore_gap + chore_ci_gitleaks_workflow_step) Two CI defense-in-depth chores resolved together because they share a workflow file: **chore_ci_gitignore_paths_ignore_gap:** the `secrets-files-guard` job was inheriting pr.yml's `paths-ignore` filter, so a PR that ONLY touched `.gitignore` or `docs/**` would skip the entire workflow and the guard would never run. Failure mode: a malicious or accidental `.gitignore` regression removing `.env.old` slips past CI; subsequent PR adding `.env.old` under `docs/` also slips past. Fix: split secrets-files-guard out to `.github/workflows/secrets-defense.yml` with no `paths-ignore` so it triggers on every PR + push to main regardless of which paths changed. pr.yml retains a pointer comment so future maintainers find the surface. **chore_ci_gitleaks_workflow_step:** gitleaks was configured in `.pre-commit-config.yaml` v8.21.2 locally but never invoked from CI. Pre-commit is opt-in — a contributor without `pre-commit install` had no content scan at all. Filename guard catches `.env*` patterns; gitleaks catches `OPENAI_API_KEY=sk-...` content in any path. They're complementary, not redundant. Fix: new `gitleaks` job in the same secrets-defense workflow. Uses the official binary release at v8.21.2 (pinned to match pre-commit), NOT gitleaks-action@v2 — the action requires a paid GITLEAKS_LICENSE for private/org repos (SoundMindsAI/relyloop is private per `gh repo view --json visibility`). Binary install is license-free. Both jobs run in parallel, ≤30s each. Off the critical path entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): broaden env-guard deny regex to match non-dotted backups (chore_env_guard_extend_deny_pattern) Extends the .env filename guard from matching only dotted `.env.<x>` variants to also match `.env-<x>`, `.env_<x>`, `.env~` (vim/emacs backup suffix). Parallel `.gitignore` update so the local-git defense and the CI defense agree on the same pattern. Source: deferred GPT-5.5 Low-severity finding #3 on PR #94. Closed as a follow-up because the gitignore and the guard were both dotted- only originally — broadening required updating both in lockstep, which deserved its own focused PR. **Bug found while broadening: silent regex range trap.** The original draft used `[._\-~]` as the separator class, expecting it to mean "literal `.`, `_`, `-`, `~`". POSIX regex character classes treat `-` as a range operator between two characters; `[._\-~]` parses as "literal `.`, `_`, then RANGE from `\` (92) to `~` (126)". That range matches every letter but does NOT match a literal `-`. So `.env-old` SILENTLY did not match the supposedly-broader pattern, while `.environment.yml` did (because `i` is in the range). Caught by the new regression cases for `.env-old` and `.environment.yml` running together — without both, neither failure mode is visible. Fix: put `-` at the end of the bracket expression (`[._~-]`) where it's unambiguously a literal. Added an explicit code comment explaining the trap so future maintainers don't re-introduce it. 6 new regression cases (4 deny, 2 allow): `.env-old`, `.env_bak`, `.env~`, `backend/.env-prod`, `.envoy.conf`, `.env3.yml`. All 22 cases green; cold local run completes in <0.5s. `.gitignore` extended with broad globs `.env.*`, `.env-*`, `.env_*`, `.env~` to mirror the guard. Canonical incident spellings kept as explicit entries with a comment pointing maintainers at the documentation reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gitignore): .env~ → .env~* glob for parity with deny regex (Gemini #1) Accepted Gemini Medium finding on PR #99: the `.gitignore` entry `.env~` only ignored the literal filename, but the CI deny regex `\.env([._~-][^/]*)?` catches any tilde-suffixed variant (`.env~`, `.env~bak`, `.env~1`, etc.). Without the trailing wildcard, a vim or emacs backup variant like `.env~1` would pass `git status` cleanly locally but be rejected by CI — confusing operator experience. Fix: `.env~` → `.env~*` (matches `.env~` itself plus any suffix). New regression case `.env~bak` locks the parity in (23 cases now; all pass). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
secrets-files-guardCI job inpr.ymlfails any PR that adds or modifies a.env-family file other than.env.example. Catches the failure mode where.gitignoreis regressed orgit add -fis used — even when gitleaks would pass (e.g. an empty.env.oldcarries no detectable secret pattern but still signals a rotated key).scripts/ci/check-no-env-files.sh(pure bash, regex denylist + allowlist) +scripts/ci/test_check_no_env_files.sh(15 canned regression cases run as a workflow step before the real-diff check, so a broken regex fails immediately).chore_ci_gitignore_paths_ignore_gap(paths-ignore skips workflow on.gitignore-only PRs) andchore_ci_gitleaks_workflow_step(gitleaks runs in pre-commit but not in CI).Closes the near-miss documented in
bug_env_file_corrupted_during_session/bug_fix.md—.envwas rotated to.env.oldon a developer's machine duringinfra_foundationPR #4 work;.env.oldwas not in.gitignoreat the time and a naivegit add -Awould have staged a real OpenAI key. The original local-tooling rename cause is undetermined and deferred (cannot reproduce 4 days post-incident).Test plan
bash scripts/ci/test_check_no_env_files.sh— 15/15 pass locally (.env.exampleallowed; bare.env,.env.old,.env.bak,.env.local,.envrc,.env.production, nested variants, mixed legit+bad all rejected with exit 1).venv/bin/ruff format --check backend/— CI parity.venv/bin/ruff check .— all checks passeduv run mypy --strict --config-file=pyproject.toml backend/— no issuesshellcheck scripts/ci/*.sh— clean (one intentionalSC2086word-split annotated)python3 -c "import yaml; yaml.safe_load(open('.github/workflows/pr.yml'))"— workflow parsessecrets-files-guardjob and it self-tests against the canned inputs before scanning the real diff🤖 Generated with Claude Code