Skip to content

fix(state): stop persisting project_root — prevents leaking absolute paths into committable state#55

Merged
gbrbks merged 1 commit into
mainfrom
fix/drop-project-root-from-state
Apr 24, 2026
Merged

fix(state): stop persisting project_root — prevents leaking absolute paths into committable state#55
gbrbks merged 1 commit into
mainfrom
fix/drop-project-root-from-state

Conversation

@gbrbks
Copy link
Copy Markdown
Collaborator

@gbrbks gbrbks commented Apr 24, 2026

Summary

intent_layer.py::save-run-context was persisting the absolute project path into .archie/deep_scan_state.json under run_context.project_root. Real example from the openmeter project:

"run_context": {
  "project_root": "/Users/hamutarto/DEV/gbr/openmeter",
  ...
}

Users who commit .archie/ (several do, and we encourage it for reproducible blueprints across a team) were silently publishing machine-specific info — user home directories, internal repo layouts, etc.

Audit confirmed this was the ONLY such leak in Archie's output files — blueprint.json, findings.json, scan.json, enrichments/, CLAUDE.md, AGENTS.md, and .claude/rules/ are all clean.

The fix

  1. save-run-context no longer writes project_root. The flag is silently accepted for backward-compat with older installs, but the value is discarded.

  2. Defensive cleanup on next write. Any pre-existing project_root in the state file gets scrubbed the next time save-run-context runs. Users with legacy leaked values recover automatically — no manual git checkout or file editing required.

  3. Resume Prelude sets PROJECT_ROOT="$PWD" directly. The field doesn't need to be persisted because the Resume Prelude is always called from inside /archie-deep-scan, and the state file itself lives under <root>/.archie/deep_scan_state.json — which means to even find the state, the caller has to already be at (or have access to) the root. If $PWD differs from where .archie/ lives (user invoked /archie-deep-scan --continue from a subdirectory), the inspect call below returns null, LAST=0, and the Resume Prelude falls through to a fresh Phase 0 — safe default, no corruption.

  4. Doc text made honest about the CWD assumption. It's a convention (how Claude Code is typically invoked), not a hard contract. Symlinked .archie/ directories resolve correctly because $PWD/.archie/* follows symlinks on read.

  5. Removed the --project-root "$PWD" line from the save-run-context call in archie-deep-scan.md Step 0.75.

Tests — 5 new

tests/test_intent_layer_run_context.py:

  • test_save_run_context_does_not_persist_project_root — pass --project-root /Users/someone/..., assert the value does NOT land in the file. Core privacy guarantee.
  • test_save_run_context_persists_expected_fields — scope, intent_layer, scan_mode, monorepo_type, start_step, workspaces still round-trip correctly.
  • test_save_run_context_scrubs_legacy_project_root_from_existing_state — pre-existing file with a leaked project_root gets cleaned on next write.
  • test_save_run_context_accepts_project_root_without_failing — backward-compat: the flag is still accepted without error.
  • test_save_run_context_does_not_leak_any_machine_path — general guard: scan the output file for any substring matching /Users/, /home/, /root/, /var/folders/, C:\\Users\\.

All 5 pass. Full regression suite (70 tests) clean.

Validated on openmeter

Before (the file user actually had on disk):

"run_context": {
  "project_root": "/Users/hamutarto/DEV/gbr/openmeter",
  "scope": "whole",
  ...
}

After syncing the fixed intent_layer.py and running save-run-context:

"run_context": {
  "scope": "whole",
  "intent_layer": "no",
  "scan_mode": "full",
  "monorepo_type": "none",
  "start_step": 1,
  "workspaces": []
}

Broad scan confirmed zero occurrences of gbr, /Users/, /home/, /root/, or /var/folders/ anywhere in openmeter's .archie/, CLAUDE.md, AGENTS.md, or .claude/rules/ after the scrub. The committable state is now fully machine-agnostic.

Test plan

  • python3 -m pytest tests/test_intent_layer_run_context.py -v → 5 passed
  • python3 -m pytest tests/ test_intent_layer_* test_upload.py test_share_setup.py test_renderer_merge.py -q → 70 passed (no regressions)
  • python3 scripts/verify_sync.py → 20 scripts, 5 commands in sync
  • End-to-end on openmeter: fix synced, state scrubbed, grep /Users\|/home\|gbr returns zero hits

Release

Ship as 2.4.4 — patch bump. Privacy / correctness fix with zero functional regression. Backward compatible for anyone passing --project-root (silently accepted + discarded).

🤖 Generated with Claude Code

…paths into committable state

The intent_layer `save-run-context` subcommand was persisting the absolute
project path (e.g. `/Users/hamutarto/DEV/gbr/openmeter`) into
`.archie/deep_scan_state.json` under `run_context.project_root`. Users who
commit `.archie/` (several already do, and we encourage the pattern for
reproducible blueprints across a team) were silently publishing machine-
specific paths — user home dirs, internal repo layouts, etc.

Fix:

- save-run-context no longer writes `project_root`. The flag is silently
  accepted for backward-compat with older slash-command prose, but the
  value is discarded.
- Any pre-existing `project_root` in the file gets scrubbed on the next
  write (defensive: users with legacy leaked values recover automatically
  after their next deep-scan step completion, without any manual git
  checkout / file editing).
- archie-deep-scan.md Resume Prelude now sets PROJECT_ROOT="$PWD" directly
  instead of reading from state. In the common case (slash command invoked
  from repo root, which is how Claude Code is typically used) this is
  correct. If $PWD differs from where .archie/ lives, the state-read below
  fails, LAST=0, and the Resume Prelude falls through to a fresh Phase 0
  with no corruption — safe default.
- Removed the `--project-root "$PWD"` line from the save-run-context call
  in archie-deep-scan.md Step 0.75.

Doc text in archie-deep-scan.md was also made honest about the CWD
assumption — it's a convention (how claude is typically invoked), not a
hard contract. Symlinked .archie/ directories resolve correctly because
$PWD/.archie/* follows symlinks on read.

Tests: 5 new in test_intent_layer_run_context.py covering: (1) the
privacy guarantee (passed --project-root does not land in the file),
(2) non-project_root fields still round-trip correctly, (3) legacy
leaked values in pre-existing state get scrubbed on next write, (4)
backward-compat (the flag is accepted without error), (5) general leak
guard — no string starting with /Users/, /home/, /root/, /var/folders/,
or C:\Users\ appears anywhere in the output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
archie Ready Ready Preview, Comment Apr 24, 2026 6:59pm

gbrbks added a commit to gbrbks/openmeter that referenced this pull request Apr 24, 2026
…e.json

Older Archie (pre-2.4.4) was persisting the absolute project path into
.archie/deep_scan_state.json under run_context.project_root, which leaked
"/Users/hamutarto/DEV/gbr/openmeter" into the committed state file.

Archie 2.4.4 stopped writing this field (BitRaptors/Archie#55). The
existing leaked value is removed here so the committable state stays
machine-agnostic and portable across dev laptops / CI runners.

No other paths leaked — verified with grep for /Users/, /home/, /root/,
/var/folders/, and "gbr" across .archie/, CLAUDE.md, AGENTS.md, .claude/.
Every remaining "/" string comes from actual gitignore content or doc
examples, not from environment-specific paths.
@gbrbks gbrbks merged commit d77bcc1 into main Apr 24, 2026
4 checks passed
@gbrbks gbrbks deleted the fix/drop-project-root-from-state branch April 24, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant