security: close prompt-alert pre-poisoning path (pentest W1)#173
Merged
Conversation
daemon.sh now runs rm -f "$PROMPT_ALERT" immediately before check_prompt_integrity in the pentest preflight block. docs/sessions/ is gitignored so reset_repo_state (git clean -fd) does NOT remove prompt-alert.md. A pentest agent that wrote to the file during its 120-turn window would have had that content injected into the builder prompt on the next cycle as a legitimate alert. check_prompt_integrity is now the only writer: any pentest-agent-written content is cleared first. 2 regression tests. W2 (webhook SSRF via PR non-merge filter) tracked as low-priority task #186 -- no fix urgency per pentest report.
fazxes
added a commit
that referenced
this pull request
Apr 9, 2026
…one) Queue before: 72 pending + 9 wontfix-in-active-dir Queue after: 65 pending + 0 wontfix (all converted to done for archiving) Merged into primary tasks (5 closures): - #175 -> #174: both add tests to TestAuthFailureDetection, same PR - #163 -> #162: both are scoring module tests from PR #158 review, same PR - #124 -> #122: both validate doc snapshot consistency, same PR scope - #196 -> #173: both add entries to PROMPT_GUARD_FILES in lib-agent.sh - #180 -> #179: both touch _is_valid_eval_file() in pick-role.py, same PR Closed as obsolete (1): - #78: references non-existent "evolve.md Step 8" and the multi-agent review panel replaced by unified review in PR #107 Closed as low-value (1): - #230: _DELEGATION_ROLE_MAP covers all 8 current agent types; new agent types require major framework work making the map update obvious Converted wontfix -> done for archiving (9): - #77, #80, #107, #111, #115, #119, #127, #129, #134 All had wontfix status with rationale already documented; changed to done so daemon's archive_done_tasks() housekeeping removes them
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
rm -f "$PROMPT_ALERT"added immediately beforecheck_prompt_integrityin the pentest preflight block indaemon.shdocs/sessions/prompt-alert.md(gitignored, survivesgit clean -fd) to inject arbitrary content into the builder prompt on the next cycleTestPentestAgentPromptAlertPoisoningGuardTest plan
make checkpasses (1104 tests)test_rm_prompt_alert_present_before_check_prompt_integrity: verifiesrm -fprecedescheck_prompt_integritytest_rm_prompt_alert_is_adjacent_to_check_prompt_integrity: verifies they stay within 10 lines