Skip to content

feat: harden prompt guard with symlink rejection and new-file detection#40

Merged
fazxes merged 2 commits intomainfrom
feat/prompt-guard-hardening
Apr 5, 2026
Merged

feat: harden prompt guard with symlink rejection and new-file detection#40
fazxes merged 2 commits intomainfrom
feat/prompt-guard-hardening

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 5, 2026

Summary

  • Symlink rejection: read_repo_instructions() now checks is_symlink() before reading instruction files. Prevents path-traversal attacks where a malicious repo symlinks CLAUDE.md to sensitive files like /etc/passwd.
  • New-file detection: Prompt self-modification guard now snapshots docs/prompt/ directory listings and detects new files created during a cycle. Catches agents creating prompt files that bypass the fixed PROMPT_GUARD_FILES list.
  • Task docs: always use make check + never push directly to main #37

Test plan

  • 5 Python tests: symlink rejected with warning, no budget consumption, regular files still read, broken symlink rejected, nested symlink rejected
  • 4 bash subprocess tests: new file in existing directory, new file when directory empty, no false positives, new directory created during cycle
  • make check: 616 passing, all checks green

fazxes added 2 commits April 4, 2026 22:56
Task #37: Two security improvements for running against untrusted repos:

1. read_repo_instructions() now rejects symlinks before reading. Checks
   is_symlink() before is_file() to catch both valid and dangling symlinks.
   Prevents path-traversal attacks where CLAUDE.md -> /etc/passwd.

2. Prompt self-modification guard now detects new files in docs/prompt/
   created during a cycle. Snapshots directory listings before the cycle
   and compares after. Catches agents creating prompt files that bypass
   the fixed PROMPT_GUARD_FILES list.

+9 tests (5 Python symlink, 4 bash guard), 616 total passing.
Code review fixes:
1. Add LC_ALL=C to sort calls feeding comm, ensuring consistent
   collation across macOS/Linux locales.
2. Replace echo with printf in comm input to avoid spurious
   blank-line entry when directory listing is empty.
3. Add test for empty directory with no changes (false-positive
   prevention).

617 tests passing.
@fazxes fazxes merged commit 3f3ba4e into main Apr 5, 2026
2 checks passed
@fazxes fazxes deleted the feat/prompt-guard-hardening branch April 5, 2026 03:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8dd2768f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/lib-agent.sh Outdated
Comment on lines +119 to +121
current_listing=$(ls -1 "$current_dir" 2>/dev/null | sort)
local new_files
new_files=$(comm -13 "$listing" <(echo "$current_listing"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect nested prompt files when checking for new files

The new-file guard compares only top-level entries of each watched directory (ls -1), so adding a file inside an existing subdirectory is not detected. In this repo, docs/prompt/feedback/ already exists; creating docs/prompt/feedback/evil.md after snapshot leaves the top-level listing unchanged, comm reports no additions, and check_prompt_integrity returns success. This creates a bypass for the self-modification guard in exactly the scenario this change is meant to block.

Useful? React with 👍 / 👎.

fazxes added a commit that referenced this pull request Apr 5, 2026
fazxes added a commit that referenced this pull request Apr 5, 2026
Task #40: assert AGENT_DEFAULT_MODELS matches DEFAULT_CONFIG
Task #41: assert cache_read pricing for gpt-5.4-mini/nano
fazxes added a commit that referenced this pull request Apr 5, 2026
… to #42

Task #40 was accidentally overwritten by PR #44. Restored original
CONTRIBUTING.md task content and moved the AGENT_DEFAULT_MODELS sync
test task to #42.
fazxes added a commit that referenced this pull request Apr 5, 2026
fix: restore task #40 and renumber review follow-up to #42
fazxes added a commit that referenced this pull request Apr 5, 2026
5-agent audit identified task selection as mesa-optimization: agent
optimizes session success over project progress. 5 more agents audited
the fix and found 7 issues, all resolved.

Phase 1 — Prompt fixes:
- Remove "smaller in scope" incentive from evolve-auto.md
- Queue order authoritative, handoff advisory
- "Tasks I Did NOT Pick and Why" in every handoff
- Tracker delta required in session reports
- Staleness multiplier (5+ sessions = 2x priority)
- All-integration-tasks edge case handled

Phase 2 — Task queue:
- environment: internal | integration tags
- blocked_reason subtypes: environment, dependency, design
- .next-id for atomic task ID allocation
- archive/ for done tasks (daemon auto-archives)
- Tagged #12, #28, #29 blocked-environment
- Fixed #43 broken frontmatter (duplicate of #40)

Phase 3 — Overseer avoidance detection:
- 6 new checks: stale tasks, cherry-picking, stuck integration,
  weak blocks, max attempts, skip accountability

Learnings index:
- INDEX.md with categorized one-line summaries (31/31 matched)
- Agent reads index, opens files only when relevant
- Updated CLAUDE.md, evolve.md Step 1 and Step 6l

Meta layer vision (8 new tasks #46-#53):
- #46 Healer — between-session trend observer
- #47 Multi-agent PR review panel
- #48 Human escalation (gh issue create + webhook)
- #49 Self-evaluation loop against real repos
- #50 Prompt self-refinement via strategist
- #51 Cross-session cost intelligence
- #52 Codebase world model (MODULE_MAP.md)
- #53 Agent generates its own tasks across all dimensions

Also: OPERATIONS.md stale task queue + --squash example fixed
fazxes added a commit that referenced this pull request Apr 5, 2026
)

Agent-facing contributor guide covering quality gates, branch/PR conventions,
registration checklist, and resident daemon handling of external PRs.
Also archives completed tasks #31, #33, #61.
fazxes added a commit that referenced this pull request Apr 5, 2026
)

Agent-facing contributor guide covering quality gates, branch/PR conventions,
registration checklist, and resident daemon handling of external PRs.
Also archives completed tasks #31, #33, #61.
fazxes added a commit that referenced this pull request Apr 5, 2026
Created tasks #89 (OPERATIONS.md module table sync) and #90
(detect_file_conflicts failed-task scanning) from PR #72 code review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant