Skip to content

Harden triage prompt against injection via randomized boundaries#115

Merged
dcellison merged 5 commits intomainfrom
fix/harden-triage-boundaries
Mar 23, 2026
Merged

Harden triage prompt against injection via randomized boundaries#115
dcellison merged 5 commits intomainfrom
fix/harden-triage-boundaries

Conversation

@dcellison
Copy link
Copy Markdown
Owner

@dcellison dcellison commented Mar 23, 2026

Summary

Replaces static, predictable XML delimiters in build_triage_prompt() with randomly generated boundary tokens, closing a prompt injection vulnerability where an attacker could include </issue-body> in their GitHub issue to break out of the data context and inject arbitrary instructions.

The vulnerability

build_triage_prompt() wrapped untrusted user content in static XML tags:

<issue-body>
  attacker-controlled content here
</issue-body>

An attacker could craft an issue body containing </issue-body> followed by prompt instructions. Claude would interpret the injected text as system-level instructions rather than user data.

The fix

Same pattern as review.py (PR #90): each data block gets a unique random boundary token via secrets.token_hex(4):

--- BEGIN ISSUE_BODY a7f3b2c9 ---
  attacker-controlled content here
--- END ISSUE_BODY a7f3b2c9 ---

The token is generated fresh for each prompt invocation, and each of the four blocks gets its own unique token. An attacker cannot predict or forge a closing delimiter.

Changes

  • prompt_utils.py (new): Shared make_boundary() helper extracted from review.py's nested function. Both review.py and triage.py now import from this shared module, eliminating duplication of security-relevant code.
  • triage.py: Replaced all four static XML tag pairs with make_boundary() calls, updated preamble instruction text, updated docstrings.
  • review.py: Removed nested _boundary() function, now imports make_boundary from prompt_utils.
  • test_triage.py: Updated prompt structure assertions from XML tags to boundary patterns, added TestBoundaryHelper class (token uniqueness, format, per-block uniqueness).

What stayed the same

  • Prompt output format (JSON schema, label guidelines) - unchanged
  • _parse_triage_json - unchanged (downstream of prompt format)
  • No change to boundary token entropy (32 bits, sufficient for the threat model)

Fixes #110

Test plan

  • Boundary delimiters present for all four blocks
  • No static XML tags remain in prompt output
  • Preamble references "boundary" not "XML"
  • Issue content still present in output
  • make_boundary() produces unique tokens across calls
  • make_boundary() output matches expected format
  • Each block in a single prompt gets a different token
  • grep '<issue-' src/kai/triage.py returns zero matches
  • All 1012 tests pass (3 new)

Replace static XML delimiters (<issue-body>, <issue-metadata>, etc.)
with randomly generated boundary tokens using the same
secrets.token_hex(4) pattern from review.py. Each block gets a
unique 8-char hex token, making it computationally infeasible for
injected content to forge a closing delimiter.

Fixes #110
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries

The fix is correct and the approach is sound. A few findings:


Suggestion — import re inside test method

tests/test_triage.py, test_each_block_unique_in_prompt

def test_each_block_unique_in_prompt(self):
    import re   # ← local import

re is a stdlib module and should be at the top of the file with the other imports. Minor style violation but worth fixing for consistency.


Suggestion — 32-bit token entropy is minimal

triage.py, _boundary(): secrets.token_hex(4) gives 4 bytes = 32 bits of entropy per token. The threat model (attacker submits a static issue body and must guess a token they cannot observe) makes this safe in practice, but token_hex(8) (64 bits, same as review.py-style tokens are often sized) costs nothing and removes any future concern about brute force if the system ever logs prompts or allows retries. Worth a note, though not a blocker.


Suggestion — probabilistic test is acknowledged but undocumented

tests/test_triage.py, test_unique_tokens: The comment says "statistically near-certain" but token_hex(4) has a 1-in-2³² chance of collision between two consecutive calls, making this test technically flaky. At 32 bits this is negligible for CI, but if token entropy is ever reduced (e.g., a misconfigured mock), the test would silently pass. Replacing with a deterministic structure check (e.g., asserting the token substring differs) or patching secrets.token_hex in this test would make the assertion airtight. Low priority.


Clean

  • No bugs or logic errors.
  • _boundary() is correctly scoped and independently testable.
  • All four blocks get distinct tokens per invocation — verified by the new test_each_block_unique_in_prompt test.
  • _parse_triage_json is unaffected (correct — it's downstream of prompt format).
  • Docstring accuracy: "8-character hex token" matches token_hex(4) output.
  • No XML tags remain in the prompt path.

- Move _boundary() to kai.prompt_utils.make_boundary() so both
  review.py and triage.py use the same function. Eliminates the
  duplication of security-relevant code between the two agents.
- review.py: remove nested _boundary(), remove secrets import,
  import make_boundary from prompt_utils.
- triage.py: remove local _boundary(), remove secrets import,
  import make_boundary from prompt_utils.
- Move 'import re' to module level in test_triage.py.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (revised)

The refactoring is correct. Prior suggestions (import re at top level) are addressed. A couple of small new findings:


Suggestion — PR description contradicts the implementation

PR description, "What stayed the same" section:

No shared module extraction (_boundary is 2 lines; a shared module adds import complexity for no gain)

The diff creates prompt_utils.py and migrates both review.py and triage.py to import from it — which is the right call. This line is stale and should be removed or replaced with a note explaining the change of approach (the prior review presumably prompted it).


Suggestion — stale function name in test docstring

tests/test_triage.py, TestBoundaryHelper.test_unique_tokens:

def test_unique_tokens(self):
    """Each call to _boundary produces a different token."""

The docstring references _boundary (the old private helper) rather than make_boundary. Should be updated to match.


Clean

  • make_boundary in prompt_utils.py is identical in behavior to the removed _boundary closures in both modules — no regression.
  • All four triage blocks correctly receive distinct tokens per invocation.
  • review.py refactor is mechanical and correct; no behavioral change.
  • re import is now at module level in test_triage.py as previously requested.
  • test_each_block_unique_in_prompt regex ([0-9a-f]{8}) correctly matches token_hex(4) output.
  • Docstring in prompt_utils.py ("8-character hex token") accurately describes token_hex(4) output.

@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 3)

Both suggestions from the prior round are addressed: test_unique_tokens docstring now correctly names make_boundary, and re was already at module level in the previous round.


Suggestion — TestBoundaryHelper lives in the wrong test file

tests/test_triage.py, TestBoundaryHelper

test_unique_tokens and test_format test make_boundary from prompt_utils.py directly, with no triage-specific setup. review.py also calls make_boundary, so if someone later adds boundary tests from the review side they'll either duplicate these or hunt for them in test_triage.py. A tests/test_prompt_utils.py would be the natural home; the test_each_block_unique_in_prompt test (which exercises triage integration specifically) can stay here. Minor—no behavioral impact.


Clean

  • make_boundary extraction is correct and the callers in both review.py and triage.py are mechanically identical to the removed closures — no behavioral regression.
  • All four triage blocks receive distinct tokens; regex \w+ in test_each_block_unique_in_prompt correctly matches labels using [A-Z_].
  • secrets import removed from review.py without leaving a stale reference.
  • Docstrings in prompt_utils.py, triage.py, and the test class are accurate and consistent.
  • No XML tags remain in the prompt path (negative assertions in the test confirm this).

Tests for the shared make_boundary helper belong with the module
they test, not in triage-specific tests. The triage integration
test (test_each_block_unique_in_prompt) stays in test_triage.py.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 4)

The round 3 suggestion about TestBoundaryHelper is addressed: the pure make_boundary unit tests moved to test_prompt_utils.py as TestMakeBoundary, and the triage-integration test (test_each_block_unique_in_prompt) stays in test_triage.py. Correct split.

Two new small findings:


Suggestion — TestBoundaryHelper class name is now a misnomer

tests/test_triage.py, TestBoundaryHelper

The class was named for the old _boundary helper, which no longer exists. It now contains one test that exercises the triage prompt integration. The name no longer describes what it's testing. TestBuildTriagePromptBoundaries or folding test_each_block_unique_in_prompt into the existing TestBuildTriagePrompt class would be cleaner. Low priority.


Suggestion — test_format doesn't verify begin/end token pairing

tests/test_prompt_utils.py, TestMakeBoundary.test_format

def test_format(self):
    begin, end = make_boundary("ISSUE_BODY")
    assert begin.startswith("--- BEGIN ISSUE_BODY ")
    assert begin.endswith(" ---")
    assert end.startswith("--- END ISSUE_BODY ")
    assert end.endswith(" ---")

This verifies the prefix/suffix but not that the token embedded in begin matches the token in end. If make_boundary were refactored to call token_hex twice, the delimiters would become unpaired (begin and end with different tokens), and this test would still pass. Extracting the token from begin and asserting it appears in end would close that gap:

token = begin.split()[-2]  # "--- BEGIN LABEL token ---"
assert end == f"--- END ISSUE_BODY {token} ---"

Clean

  • prompt_utils.py implementation is correct; single token variable guarantees begin/end share the same token.
  • review.py and triage.py callers are mechanically equivalent to the removed closures — no behavioral regression.
  • PR description "What stayed the same" stale line from round 2 is gone.
  • Negative assertions in test_contains_boundary_delimiters_and_schema confirm no XML tags remain.
  • test_each_block_unique_in_prompt regex [0-9a-f]{8} correctly matches token_hex(4) output; 4 unique tokens asserted.

- Rename TestBoundaryHelper to TestBuildTriagePromptBoundaries
  (class now contains only the triage integration test).
- Add token pairing assertion to test_format: extract the token
  from the begin string and verify it matches the end string.
@dcellison
Copy link
Copy Markdown
Owner Author

Review by Kai

PR Review: Harden triage prompt against injection via randomized boundaries (round 5)

Both round 4 suggestions are addressed. The PR is clean.


Clean

  • TestBoundaryHelper is renamed TestBuildTriagePromptBoundaries and now contains only test_each_block_unique_in_prompt — the name accurately describes what it tests.
  • test_format_and_token_pairing extracts the token from begin and asserts end == f"--- END ISSUE_BODY {token} ---", closing the pairing gap from round 4.
  • token = begin.split()[-2] correctly indexes ["---", "BEGIN", "ISSUE_BODY", "<token>", "---"] — no off-by-one.
  • make_boundary implementation is correct: single token variable shared between begin and end strings.
  • review.py refactor is mechanical — no behavioral change.
  • Negative XML assertions in test_contains_boundary_delimiters_and_schema confirm no static tags remain in the prompt path.
  • secrets import correctly removed from review.py with no stale reference.

Ready to merge.

@dcellison dcellison merged commit 102f8f4 into main Mar 23, 2026
1 check passed
@dcellison dcellison deleted the fix/harden-triage-boundaries branch March 23, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden triage prompt against injection via randomized boundaries

2 participants