Harden triage prompt against injection via randomized boundaries#115
Harden triage prompt against injection via randomized boundaries#115
Conversation
Replace static XML delimiters (<issue-body>, <issue-metadata>, etc.) with randomly generated boundary tokens using the same secrets.token_hex(4) pattern from review.py. Each block gets a unique 8-char hex token, making it computationally infeasible for injected content to forge a closing delimiter. Fixes #110
Review by KaiPR Review: Harden triage prompt against injection via randomized boundariesThe fix is correct and the approach is sound. A few findings: Suggestion —
|
- Move _boundary() to kai.prompt_utils.make_boundary() so both review.py and triage.py use the same function. Eliminates the duplication of security-relevant code between the two agents. - review.py: remove nested _boundary(), remove secrets import, import make_boundary from prompt_utils. - triage.py: remove local _boundary(), remove secrets import, import make_boundary from prompt_utils. - Move 'import re' to module level in test_triage.py.
Review by KaiPR Review: Harden triage prompt against injection via randomized boundaries (revised)The refactoring is correct. Prior suggestions ( Suggestion — PR description contradicts the implementationPR description, "What stayed the same" section:
The diff creates Suggestion — stale function name in test docstring
def test_unique_tokens(self):
"""Each call to _boundary produces a different token."""The docstring references Clean
|
Review by KaiPR Review: Harden triage prompt against injection via randomized boundaries (round 3)Both suggestions from the prior round are addressed: Suggestion —
|
Tests for the shared make_boundary helper belong with the module they test, not in triage-specific tests. The triage integration test (test_each_block_unique_in_prompt) stays in test_triage.py.
Review by KaiPR Review: Harden triage prompt against injection via randomized boundaries (round 4)The round 3 suggestion about Two new small findings: Suggestion —
|
- Rename TestBoundaryHelper to TestBuildTriagePromptBoundaries (class now contains only the triage integration test). - Add token pairing assertion to test_format: extract the token from the begin string and verify it matches the end string.
Review by KaiPR Review: Harden triage prompt against injection via randomized boundaries (round 5)Both round 4 suggestions are addressed. The PR is clean. Clean
Ready to merge. |
Summary
Replaces static, predictable XML delimiters in
build_triage_prompt()with randomly generated boundary tokens, closing a prompt injection vulnerability where an attacker could include</issue-body>in their GitHub issue to break out of the data context and inject arbitrary instructions.The vulnerability
build_triage_prompt()wrapped untrusted user content in static XML tags:An attacker could craft an issue body containing
</issue-body>followed by prompt instructions. Claude would interpret the injected text as system-level instructions rather than user data.The fix
Same pattern as review.py (PR #90): each data block gets a unique random boundary token via
secrets.token_hex(4):The token is generated fresh for each prompt invocation, and each of the four blocks gets its own unique token. An attacker cannot predict or forge a closing delimiter.
Changes
prompt_utils.py(new): Sharedmake_boundary()helper extracted from review.py's nested function. Both review.py and triage.py now import from this shared module, eliminating duplication of security-relevant code.triage.py: Replaced all four static XML tag pairs withmake_boundary()calls, updated preamble instruction text, updated docstrings.review.py: Removed nested_boundary()function, now importsmake_boundaryfromprompt_utils.test_triage.py: Updated prompt structure assertions from XML tags to boundary patterns, addedTestBoundaryHelperclass (token uniqueness, format, per-block uniqueness).What stayed the same
_parse_triage_json- unchanged (downstream of prompt format)Fixes #110
Test plan
make_boundary()produces unique tokens across callsmake_boundary()output matches expected formatgrep '<issue-' src/kai/triage.pyreturns zero matches