Harden review prompt against XML tag injection by dcellison · Pull Request #90 · dcellison/kai

dcellison · 2026-03-19T13:22:25Z

Summary

Replace predictable XML tags (<pr-description>, <diff>, <spec>, etc.) with randomly generated boundary delimiters in build_review_prompt()
Each block gets a unique token via secrets.token_hex(4), generated fresh per invocation
An attacker who controls PR content (title, description, diff) cannot predict the closing delimiter to break out of their data block and inject prompt instructions
Updated preamble to reference boundary format instead of XML blocks
All six data blocks hardened: PR metadata, description, spec, conventions, prior review thread, and diff

Fixes #89

Test plan

Replace predictable XML tags (<pr-description>, <diff>, <spec>, etc.) with randomly generated boundary delimiters (MIME-style). Each block gets a unique token via secrets.token_hex(4), so an attacker who controls PR content cannot predict or forge a closing delimiter to break out of their data block and inject instructions. Fixes #89

dcellison · 2026-03-19T13:23:11Z

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

The core fix is sound. secrets.token_hex(4) gives 32 bits of entropy per block — more than sufficient to make boundary forgery infeasible. Token generation is correctly scoped inside _boundary() so each call produces a fresh value, and optional blocks only generate tokens when they're actually used.

All seven acceptance criteria from the spec are satisfied.

Suggestion — review.py:420

The three unconditional boundary tokens (meta_begin/meta_end, desc_begin/desc_end, diff_begin/diff_end) are generated before the truncation check. This is harmless but slightly out of logical order — the truncation block operates on diff, which isn't wrapped until much later. Not a bug, just worth noting.

Suggestion — tests/test_review.py:268, 283

Both new tests do import re inside the method body:

def test_each_block_has_unique_token(self):
    ...
    import re
    tokens = re.findall(...)

re should be imported at the module level with the rest of the test file's imports. Inline imports inside test methods are a style violation in most Python conventions (PEP 8 says imports go at the top). No functional impact since Python caches module imports, but it's inconsistent with the rest of the file.

Note — test_tokens_change_between_invocations

The comment "statistically near-certain" is accurate. With 6 blocks × 32-bit tokens per prompt, the probability all 6 tokens happen to collide between two invocations is roughly 1 in 2^32. Acceptable for a test. If this ever becomes a CI flake concern (it won't), the fix would be to mock secrets.token_hex in one call.

Clean otherwise. The preamble update correctly describes the boundary format to the model, the ordering test (conv_end < prior_start < diff_start) correctly uses END CONVENTIONS and BEGIN PRIOR_REVIEW_THREAD as position markers (not BEGIN CONVENTIONS/END DIFF which would be further out), and no XML tags remain for any of the six hardened blocks.

dcellison · 2026-03-19T13:26:22Z

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

The prior review's only actionable suggestion (moving import re to module level) has been addressed — it now appears at line 4 of the test file alongside the other imports.

All seven acceptance criteria from the spec are satisfied. Checking each:

No XML tags remain for any of the six blocks ✓
Each block gets a unique token via a fresh _boundary() call ✓
Tokens are scoped inside the function, regenerated per invocation ✓
Preamble correctly describes the boundary format ✓
Existing tests updated for the new delimiter pattern ✓
test_each_block_has_unique_token and test_tokens_change_between_invocations added ✓
conventions block hardened despite lower risk ✓

One minor thing worth noting (not a blocking issue):

Suggestion — tests/test_review.py regex pattern

tokens = re.findall(r"--- BEGIN \w+ ([0-9a-f]{8}) ---", prompt)

\w+ matches word characters including underscores, so it correctly handles PR_METADATA, PR_DESCRIPTION, and PRIOR_REVIEW_THREAD. No bug — just confirming it works as intended for hyphenated-looking names, which these aren't (they use underscores).

Clean otherwise. The _boundary() helper is well-scoped, optional blocks generate tokens only when actually used (no wasted entropy), and the test for invocation-level freshness correctly uses list inequality on the token arrays.

…ersal fix Reverts the issue-body fetching from PR #88 and restores local filesystem spec loading. The key change from the original code: load_spec() now contains paths within the repo root using Path.resolve().relative_to() to prevent path traversal attacks. Why revert: piping untrusted external content (GitHub issue bodies) directly into a Claude session is a prompt injection surface. The boundary tokens from PR #90 prevent structural injection (delimiter escape) but not semantic injection (content inside the boundary influencing Claude's behavior). The security principle: don't build pipelines from external content to LLM sessions. A human reads the issue, copies relevant content to a local spec file, and references it in the PR. The human is the firewall. This also avoids establishing a pattern that future agents with more capabilities could inherit. The review agent today is toolless and non-interactive, but the pattern of 'fetch external content, pipe to LLM' is dangerous to normalize. What stays from PRs #88/#90: - Boundary tokens (PR #90) for PR descriptions and diffs (these MUST be fed to the agent since they are what is being reviewed) - _resolve_local_repo() improvement (better than old home_repo_name) - Prior comment awareness, issue triage agent, etc. (unrelated) Fixes #91

…ersal fix (#92) * Remove issue-body fetching, restore local spec loading with path traversal fix Reverts the issue-body fetching from PR #88 and restores local filesystem spec loading. The key change from the original code: load_spec() now contains paths within the repo root using Path.resolve().relative_to() to prevent path traversal attacks. Why revert: piping untrusted external content (GitHub issue bodies) directly into a Claude session is a prompt injection surface. The boundary tokens from PR #90 prevent structural injection (delimiter escape) but not semantic injection (content inside the boundary influencing Claude's behavior). The security principle: don't build pipelines from external content to LLM sessions. A human reads the issue, copies relevant content to a local spec file, and references it in the PR. The human is the firewall. This also avoids establishing a pattern that future agents with more capabilities could inherit. The review agent today is toolless and non-interactive, but the pattern of 'fetch external content, pipe to LLM' is dangerous to normalize. What stays from PRs #88/#90: - Boundary tokens (PR #90) for PR descriptions and diffs (these MUST be fed to the agent since they are what is being reviewed) - _resolve_local_repo() improvement (better than old home_repo_name) - Prior comment awareness, issue triage agent, etc. (unrelated) Fixes #91 * Fix None description crash and branch spec containment - Guard resolve_spec_from_body against None description (GitHub sends null for PRs with no body, causing AttributeError on splitlines). - Add Path.resolve().relative_to() containment check to strategy 2 (branch-based spec path) matching strategy 1. A misconfigured spec_dir pointing outside the repo would otherwise leak files.

Move inline re import to module level

a5daa48

dcellison merged commit 52816b9 into main Mar 19, 2026
1 check passed

dcellison deleted the fix/review-prompt-hardening branch March 19, 2026 13:27

dcellison mentioned this pull request Mar 19, 2026

Remove issue-body fetching from review agent #91

Closed

dcellison mentioned this pull request Mar 19, 2026

Remove issue-body fetching, restore local spec loading with path traversal fix #92

Merged

7 tasks

This was referenced Mar 19, 2026

Clean up spec terminology in public docs and comments #93

Merged

Harden triage prompt against injection via randomized boundaries #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden review prompt against XML tag injection#90

Harden review prompt against XML tag injection#90
dcellison merged 2 commits intomainfrom
fix/review-prompt-hardening

dcellison commented Mar 19, 2026

Uh oh!

dcellison commented Mar 19, 2026

Uh oh!

dcellison commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dcellison commented Mar 19, 2026

Summary

Test plan

Uh oh!

dcellison commented Mar 19, 2026

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

Uh oh!

dcellison commented Mar 19, 2026

Review by Kai

Review: PR #90 — Harden review prompt against XML tag injection

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants