Harden triage prompt against injection via randomized boundaries by dcellison · Pull Request #115 · dcellison/kai

dcellison · 2026-03-23T12:48:55Z

Summary

Replaces static, predictable XML delimiters in build_triage_prompt() with randomly generated boundary tokens, closing a prompt injection vulnerability where an attacker could include </issue-body> in their GitHub issue to break out of the data context and inject arbitrary instructions.

The vulnerability

build_triage_prompt() wrapped untrusted user content in static XML tags:

<issue-body>
  attacker-controlled content here
</issue-body>

An attacker could craft an issue body containing </issue-body> followed by prompt instructions. Claude would interpret the injected text as system-level instructions rather than user data.

The fix

Same pattern as review.py (PR #90): each data block gets a unique random boundary token via secrets.token_hex(4):

--- BEGIN ISSUE_BODY a7f3b2c9 ---
  attacker-controlled content here
--- END ISSUE_BODY a7f3b2c9 ---

The token is generated fresh for each prompt invocation, and each of the four blocks gets its own unique token. An attacker cannot predict or forge a closing delimiter.

Changes

prompt_utils.py (new): Shared make_boundary() helper extracted from review.py's nested function. Both review.py and triage.py now import from this shared module, eliminating duplication of security-relevant code.
triage.py: Replaced all four static XML tag pairs with make_boundary() calls, updated preamble instruction text, updated docstrings.
review.py: Removed nested _boundary() function, now imports make_boundary from prompt_utils.
test_triage.py: Updated prompt structure assertions from XML tags to boundary patterns, added TestBoundaryHelper class (token uniqueness, format, per-block uniqueness).

What stayed the same

Prompt output format (JSON schema, label guidelines) - unchanged
_parse_triage_json - unchanged (downstream of prompt format)
No change to boundary token entropy (32 bits, sufficient for the threat model)

Fixes #110

Test plan

Boundary delimiters present for all four blocks
No static XML tags remain in prompt output
Preamble references "boundary" not "XML"
Issue content still present in output
make_boundary() produces unique tokens across calls
make_boundary() output matches expected format
Each block in a single prompt gets a different token
grep '<issue-' src/kai/triage.py returns zero matches
All 1012 tests pass (3 new)

Replace static XML delimiters (<issue-body>, <issue-metadata>, etc.) with randomly generated boundary tokens using the same secrets.token_hex(4) pattern from review.py. Each block gets a unique 8-char hex token, making it computationally infeasible for injected content to forge a closing delimiter. Fixes #110

dcellison · 2026-03-23T12:49:52Z

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries

The fix is correct and the approach is sound. A few findings:

Suggestion — `import re` inside test method

tests/test_triage.py, test_each_block_unique_in_prompt

def test_each_block_unique_in_prompt(self):
    import re   # ← local import

re is a stdlib module and should be at the top of the file with the other imports. Minor style violation but worth fixing for consistency.

Suggestion — 32-bit token entropy is minimal

triage.py, _boundary(): secrets.token_hex(4) gives 4 bytes = 32 bits of entropy per token. The threat model (attacker submits a static issue body and must guess a token they cannot observe) makes this safe in practice, but token_hex(8) (64 bits, same as review.py-style tokens are often sized) costs nothing and removes any future concern about brute force if the system ever logs prompts or allows retries. Worth a note, though not a blocker.

Suggestion — probabilistic test is acknowledged but undocumented

tests/test_triage.py, test_unique_tokens: The comment says "statistically near-certain" but token_hex(4) has a 1-in-2³² chance of collision between two consecutive calls, making this test technically flaky. At 32 bits this is negligible for CI, but if token entropy is ever reduced (e.g., a misconfigured mock), the test would silently pass. Replacing with a deterministic structure check (e.g., asserting the token substring differs) or patching secrets.token_hex in this test would make the assertion airtight. Low priority.

Clean

No bugs or logic errors.
_boundary() is correctly scoped and independently testable.
All four blocks get distinct tokens per invocation — verified by the new test_each_block_unique_in_prompt test.
_parse_triage_json is unaffected (correct — it's downstream of prompt format).
Docstring accuracy: "8-character hex token" matches token_hex(4) output.
No XML tags remain in the prompt path.

- Move _boundary() to kai.prompt_utils.make_boundary() so both review.py and triage.py use the same function. Eliminates the duplication of security-relevant code between the two agents. - review.py: remove nested _boundary(), remove secrets import, import make_boundary from prompt_utils. - triage.py: remove local _boundary(), remove secrets import, import make_boundary from prompt_utils. - Move 'import re' to module level in test_triage.py.

dcellison · 2026-03-23T12:58:35Z

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (revised)

The refactoring is correct. Prior suggestions (import re at top level) are addressed. A couple of small new findings:

Suggestion — PR description contradicts the implementation

PR description, "What stayed the same" section:

No shared module extraction (_boundary is 2 lines; a shared module adds import complexity for no gain)

The diff creates prompt_utils.py and migrates both review.py and triage.py to import from it — which is the right call. This line is stale and should be removed or replaced with a note explaining the change of approach (the prior review presumably prompted it).

Suggestion — stale function name in test docstring

tests/test_triage.py, TestBoundaryHelper.test_unique_tokens:

def test_unique_tokens(self):
    """Each call to _boundary produces a different token."""

The docstring references _boundary (the old private helper) rather than make_boundary. Should be updated to match.

Clean

make_boundary in prompt_utils.py is identical in behavior to the removed _boundary closures in both modules — no regression.
All four triage blocks correctly receive distinct tokens per invocation.
review.py refactor is mechanical and correct; no behavioral change.
re import is now at module level in test_triage.py as previously requested.
test_each_block_unique_in_prompt regex ([0-9a-f]{8}) correctly matches token_hex(4) output.
Docstring in prompt_utils.py ("8-character hex token") accurately describes token_hex(4) output.

dcellison · 2026-03-23T13:01:46Z

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 3)

Both suggestions from the prior round are addressed: test_unique_tokens docstring now correctly names make_boundary, and re was already at module level in the previous round.

Suggestion — `TestBoundaryHelper` lives in the wrong test file

tests/test_triage.py, TestBoundaryHelper

test_unique_tokens and test_format test make_boundary from prompt_utils.py directly, with no triage-specific setup. review.py also calls make_boundary, so if someone later adds boundary tests from the review side they'll either duplicate these or hunt for them in test_triage.py. A tests/test_prompt_utils.py would be the natural home; the test_each_block_unique_in_prompt test (which exercises triage integration specifically) can stay here. Minor—no behavioral impact.

Clean

make_boundary extraction is correct and the callers in both review.py and triage.py are mechanically identical to the removed closures — no behavioral regression.
All four triage blocks receive distinct tokens; regex \w+ in test_each_block_unique_in_prompt correctly matches labels using [A-Z_].
secrets import removed from review.py without leaving a stale reference.
Docstrings in prompt_utils.py, triage.py, and the test class are accurate and consistent.
No XML tags remain in the prompt path (negative assertions in the test confirm this).

Tests for the shared make_boundary helper belong with the module they test, not in triage-specific tests. The triage integration test (test_each_block_unique_in_prompt) stays in test_triage.py.

dcellison · 2026-03-23T13:09:24Z

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 4)

The round 3 suggestion about TestBoundaryHelper is addressed: the pure make_boundary unit tests moved to test_prompt_utils.py as TestMakeBoundary, and the triage-integration test (test_each_block_unique_in_prompt) stays in test_triage.py. Correct split.

Two new small findings:

Suggestion — `TestBoundaryHelper` class name is now a misnomer

tests/test_triage.py, TestBoundaryHelper

The class was named for the old _boundary helper, which no longer exists. It now contains one test that exercises the triage prompt integration. The name no longer describes what it's testing. TestBuildTriagePromptBoundaries or folding test_each_block_unique_in_prompt into the existing TestBuildTriagePrompt class would be cleaner. Low priority.

Suggestion — `test_format` doesn't verify begin/end token pairing

tests/test_prompt_utils.py, TestMakeBoundary.test_format

def test_format(self):
    begin, end = make_boundary("ISSUE_BODY")
    assert begin.startswith("--- BEGIN ISSUE_BODY ")
    assert begin.endswith(" ---")
    assert end.startswith("--- END ISSUE_BODY ")
    assert end.endswith(" ---")

This verifies the prefix/suffix but not that the token embedded in begin matches the token in end. If make_boundary were refactored to call token_hex twice, the delimiters would become unpaired (begin and end with different tokens), and this test would still pass. Extracting the token from begin and asserting it appears in end would close that gap:

token = begin.split()[-2]  # "--- BEGIN LABEL token ---"
assert end == f"--- END ISSUE_BODY {token} ---"

Clean

prompt_utils.py implementation is correct; single token variable guarantees begin/end share the same token.
review.py and triage.py callers are mechanically equivalent to the removed closures — no behavioral regression.
PR description "What stayed the same" stale line from round 2 is gone.
Negative assertions in test_contains_boundary_delimiters_and_schema confirm no XML tags remain.
test_each_block_unique_in_prompt regex [0-9a-f]{8} correctly matches token_hex(4) output; 4 unique tokens asserted.

- Rename TestBoundaryHelper to TestBuildTriagePromptBoundaries (class now contains only the triage integration test). - Add token pairing assertion to test_format: extract the token from the begin string and verify it matches the end string.

dcellison · 2026-03-23T13:12:19Z

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 5)

Both round 4 suggestions are addressed. The PR is clean.

Clean

TestBoundaryHelper is renamed TestBuildTriagePromptBoundaries and now contains only test_each_block_unique_in_prompt — the name accurately describes what it tests.
test_format_and_token_pairing extracts the token from begin and asserts end == f"--- END ISSUE_BODY {token} ---", closing the pairing gap from round 4.
token = begin.split()[-2] correctly indexes ["---", "BEGIN", "ISSUE_BODY", "<token>", "---"] — no off-by-one.
make_boundary implementation is correct: single token variable shared between begin and end strings.
review.py refactor is mechanical — no behavioral change.
Negative XML assertions in test_contains_boundary_delimiters_and_schema confirm no static tags remain in the prompt path.
secrets import correctly removed from review.py with no stale reference.

Ready to merge.

Fix stale _boundary reference in test docstring

4b4de77

Move make_boundary unit tests to test_prompt_utils.py

b0672fb

Tests for the shared make_boundary helper belong with the module they test, not in triage-specific tests. The triage integration test (test_each_block_unique_in_prompt) stays in test_triage.py.

dcellison merged commit 102f8f4 into main Mar 23, 2026
1 check passed

dcellison deleted the fix/harden-triage-boundaries branch March 23, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden triage prompt against injection via randomized boundaries#115

Harden triage prompt against injection via randomized boundaries#115
dcellison merged 5 commits intomainfrom
fix/harden-triage-boundaries

dcellison commented Mar 23, 2026 •

edited

Loading

Uh oh!

dcellison commented Mar 23, 2026

Uh oh!

dcellison commented Mar 23, 2026

Uh oh!

dcellison commented Mar 23, 2026

Uh oh!

dcellison commented Mar 23, 2026

Uh oh!

dcellison commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dcellison commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The vulnerability

The fix

Changes

What stayed the same

Test plan

Uh oh!

dcellison commented Mar 23, 2026

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries

Suggestion — import re inside test method

Suggestion — 32-bit token entropy is minimal

Suggestion — probabilistic test is acknowledged but undocumented

Clean

Uh oh!

dcellison commented Mar 23, 2026

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (revised)

Suggestion — PR description contradicts the implementation

Suggestion — stale function name in test docstring

Clean

Uh oh!

dcellison commented Mar 23, 2026

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 3)

Suggestion — TestBoundaryHelper lives in the wrong test file

Clean

Uh oh!

dcellison commented Mar 23, 2026

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 4)

Suggestion — TestBoundaryHelper class name is now a misnomer

Suggestion — test_format doesn't verify begin/end token pairing

Clean

Uh oh!

dcellison commented Mar 23, 2026

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 5)

Clean

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dcellison commented Mar 23, 2026 •

edited

Loading

Suggestion — `import re` inside test method

Suggestion — `TestBoundaryHelper` lives in the wrong test file

Suggestion — `TestBoundaryHelper` class name is now a misnomer

Suggestion — `test_format` doesn't verify begin/end token pairing