Skip to content

Harden review prompt against XML tag injection#90

Merged
dcellison merged 2 commits intomainfrom
fix/review-prompt-hardening
Mar 19, 2026
Merged

Harden review prompt against XML tag injection#90
dcellison merged 2 commits intomainfrom
fix/review-prompt-hardening

Conversation

@dcellison
Copy link
Copy Markdown
Owner

Summary

  • Replace predictable XML tags (<pr-description>, <diff>, <spec>, etc.) with randomly generated boundary delimiters in build_review_prompt()
  • Each block gets a unique token via secrets.token_hex(4), generated fresh per invocation
  • An attacker who controls PR content (title, description, diff) cannot predict the closing delimiter to break out of their data block and inject prompt instructions
  • Updated preamble to reference boundary format instead of XML blocks
  • All six data blocks hardened: PR metadata, description, spec, conventions, prior review thread, and diff

Fixes #89

Test plan

  • Basic prompt structure uses boundary delimiters, not XML tags
  • Spec content wrapped in SPEC boundary
  • Conventions content wrapped in CONVENTIONS boundary
  • Prior comments wrapped in PRIOR_REVIEW_THREAD boundary
  • Diff truncation still works correctly
  • Optional blocks omitted when content is None
  • Block ordering preserved (conventions before prior comments before diff)
  • Each block in a single prompt gets a unique token (6 blocks, 6 different tokens)
  • Tokens change between invocations (not hardcoded)
  • All 904 tests pass

Replace predictable XML tags (<pr-description>, <diff>, <spec>, etc.)
with randomly generated boundary delimiters (MIME-style). Each block
gets a unique token via secrets.token_hex(4), so an attacker who
controls PR content cannot predict or forge a closing delimiter to
break out of their data block and inject instructions.

Fixes #89
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

The core fix is sound. secrets.token_hex(4) gives 32 bits of entropy per block — more than sufficient to make boundary forgery infeasible. Token generation is correctly scoped inside _boundary() so each call produces a fresh value, and optional blocks only generate tokens when they're actually used.

All seven acceptance criteria from the spec are satisfied.


Suggestion — review.py:420

The three unconditional boundary tokens (meta_begin/meta_end, desc_begin/desc_end, diff_begin/diff_end) are generated before the truncation check. This is harmless but slightly out of logical order — the truncation block operates on diff, which isn't wrapped until much later. Not a bug, just worth noting.


Suggestion — tests/test_review.py:268, 283

Both new tests do import re inside the method body:

def test_each_block_has_unique_token(self):
    ...
    import re
    tokens = re.findall(...)

re should be imported at the module level with the rest of the test file's imports. Inline imports inside test methods are a style violation in most Python conventions (PEP 8 says imports go at the top). No functional impact since Python caches module imports, but it's inconsistent with the rest of the file.


Note — test_tokens_change_between_invocations

The comment "statistically near-certain" is accurate. With 6 blocks × 32-bit tokens per prompt, the probability all 6 tokens happen to collide between two invocations is roughly 1 in 2^32. Acceptable for a test. If this ever becomes a CI flake concern (it won't), the fix would be to mock secrets.token_hex in one call.


Clean otherwise. The preamble update correctly describes the boundary format to the model, the ordering test (conv_end < prior_start < diff_start) correctly uses END CONVENTIONS and BEGIN PRIOR_REVIEW_THREAD as position markers (not BEGIN CONVENTIONS/END DIFF which would be further out), and no XML tags remain for any of the six hardened blocks.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

The prior review's only actionable suggestion (moving import re to module level) has been addressed — it now appears at line 4 of the test file alongside the other imports.

All seven acceptance criteria from the spec are satisfied. Checking each:

  • No XML tags remain for any of the six blocks ✓
  • Each block gets a unique token via a fresh _boundary() call ✓
  • Tokens are scoped inside the function, regenerated per invocation ✓
  • Preamble correctly describes the boundary format ✓
  • Existing tests updated for the new delimiter pattern ✓
  • test_each_block_has_unique_token and test_tokens_change_between_invocations added ✓
  • conventions block hardened despite lower risk ✓

One minor thing worth noting (not a blocking issue):

Suggestion — tests/test_review.py regex pattern

tokens = re.findall(r"--- BEGIN \w+ ([0-9a-f]{8}) ---", prompt)

\w+ matches word characters including underscores, so it correctly handles PR_METADATA, PR_DESCRIPTION, and PRIOR_REVIEW_THREAD. No bug — just confirming it works as intended for hyphenated-looking names, which these aren't (they use underscores).

Clean otherwise. The _boundary() helper is well-scoped, optional blocks generate tokens only when actually used (no wasted entropy), and the test for invocation-level freshness correctly uses list inequality on the token arrays.

@dcellison dcellison merged commit 52816b9 into main Mar 19, 2026
1 check passed
@dcellison dcellison deleted the fix/review-prompt-hardening branch March 19, 2026 13:27
dcellison pushed a commit that referenced this pull request Mar 19, 2026
…ersal fix

Reverts the issue-body fetching from PR #88 and restores local
filesystem spec loading. The key change from the original code:
load_spec() now contains paths within the repo root using
Path.resolve().relative_to() to prevent path traversal attacks.

Why revert: piping untrusted external content (GitHub issue bodies)
directly into a Claude session is a prompt injection surface. The
boundary tokens from PR #90 prevent structural injection (delimiter
escape) but not semantic injection (content inside the boundary
influencing Claude's behavior). The security principle: don't build
pipelines from external content to LLM sessions. A human reads the
issue, copies relevant content to a local spec file, and references
it in the PR. The human is the firewall.

This also avoids establishing a pattern that future agents with more
capabilities could inherit. The review agent today is toolless and
non-interactive, but the pattern of 'fetch external content, pipe to
LLM' is dangerous to normalize.

What stays from PRs #88/#90:
- Boundary tokens (PR #90) for PR descriptions and diffs (these MUST
  be fed to the agent since they are what is being reviewed)
- _resolve_local_repo() improvement (better than old home_repo_name)
- Prior comment awareness, issue triage agent, etc. (unrelated)

Fixes #91
dcellison added a commit that referenced this pull request Mar 19, 2026
…ersal fix (#92)

* Remove issue-body fetching, restore local spec loading with path traversal fix

Reverts the issue-body fetching from PR #88 and restores local
filesystem spec loading. The key change from the original code:
load_spec() now contains paths within the repo root using
Path.resolve().relative_to() to prevent path traversal attacks.

Why revert: piping untrusted external content (GitHub issue bodies)
directly into a Claude session is a prompt injection surface. The
boundary tokens from PR #90 prevent structural injection (delimiter
escape) but not semantic injection (content inside the boundary
influencing Claude's behavior). The security principle: don't build
pipelines from external content to LLM sessions. A human reads the
issue, copies relevant content to a local spec file, and references
it in the PR. The human is the firewall.

This also avoids establishing a pattern that future agents with more
capabilities could inherit. The review agent today is toolless and
non-interactive, but the pattern of 'fetch external content, pipe to
LLM' is dangerous to normalize.

What stays from PRs #88/#90:
- Boundary tokens (PR #90) for PR descriptions and diffs (these MUST
  be fed to the agent since they are what is being reviewed)
- _resolve_local_repo() improvement (better than old home_repo_name)
- Prior comment awareness, issue triage agent, etc. (unrelated)

Fixes #91

* Fix None description crash and branch spec containment

- Guard resolve_spec_from_body against None description (GitHub sends
  null for PRs with no body, causing AttributeError on splitlines).
- Add Path.resolve().relative_to() containment check to strategy 2
  (branch-based spec path) matching strategy 1. A misconfigured
  spec_dir pointing outside the repo would otherwise leak files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden review prompt against XML tag injection

2 participants