Replace local spec file loading with issue-body fetching#88
Conversation
Remove the filesystem-based spec resolution (resolve_spec_from_body, resolve_spec_from_branch, load_spec) which had a path traversal vulnerability and exposed internal directory structure in PR bodies. Replace with load_spec_from_issue() which extracts fixes/closes/resolves #N references from the PR description and fetches the linked issue bodies via the GitHub API. Multiple issue references are concatenated. Also removes the SPEC_DIR env var and spec_dir config field, since spec files are no longer loaded from the local filesystem. Fixes #87
Review by KaiPR Review: Replace local spec file loading with issue-body fetchingOverall this is a clean, well-motivated change. The security fix (removing path traversal) is correct and the new approach is simpler. A few issues worth addressing: Warning:
|
- Use .body // empty string jq coercion so null issue bodies become empty strings instead of the literal string 'null' - Restrict regex to horizontal whitespace ([^\S\n]+) so keywords and issue refs on separate lines do not match - Add 15s timeout on gh api calls to prevent hung invocations from blocking reviews indefinitely - Add tests for null body, newline regex, and timeout handling
Review by KaiPR Review: Replace local spec file loading with issue-body fetchingAll four issues from the prior review have been addressed: One new issue: Warning:
|
|
Re: On Python 3.11+, |
…ersal fix Reverts the issue-body fetching from PR #88 and restores local filesystem spec loading. The key change from the original code: load_spec() now contains paths within the repo root using Path.resolve().relative_to() to prevent path traversal attacks. Why revert: piping untrusted external content (GitHub issue bodies) directly into a Claude session is a prompt injection surface. The boundary tokens from PR #90 prevent structural injection (delimiter escape) but not semantic injection (content inside the boundary influencing Claude's behavior). The security principle: don't build pipelines from external content to LLM sessions. A human reads the issue, copies relevant content to a local spec file, and references it in the PR. The human is the firewall. This also avoids establishing a pattern that future agents with more capabilities could inherit. The review agent today is toolless and non-interactive, but the pattern of 'fetch external content, pipe to LLM' is dangerous to normalize. What stays from PRs #88/#90: - Boundary tokens (PR #90) for PR descriptions and diffs (these MUST be fed to the agent since they are what is being reviewed) - _resolve_local_repo() improvement (better than old home_repo_name) - Prior comment awareness, issue triage agent, etc. (unrelated) Fixes #91
…ersal fix (#92) * Remove issue-body fetching, restore local spec loading with path traversal fix Reverts the issue-body fetching from PR #88 and restores local filesystem spec loading. The key change from the original code: load_spec() now contains paths within the repo root using Path.resolve().relative_to() to prevent path traversal attacks. Why revert: piping untrusted external content (GitHub issue bodies) directly into a Claude session is a prompt injection surface. The boundary tokens from PR #90 prevent structural injection (delimiter escape) but not semantic injection (content inside the boundary influencing Claude's behavior). The security principle: don't build pipelines from external content to LLM sessions. A human reads the issue, copies relevant content to a local spec file, and references it in the PR. The human is the firewall. This also avoids establishing a pattern that future agents with more capabilities could inherit. The review agent today is toolless and non-interactive, but the pattern of 'fetch external content, pipe to LLM' is dangerous to normalize. What stays from PRs #88/#90: - Boundary tokens (PR #90) for PR descriptions and diffs (these MUST be fed to the agent since they are what is being reviewed) - _resolve_local_repo() improvement (better than old home_repo_name) - Prior comment awareness, issue triage agent, etc. (unrelated) Fixes #91 * Fix None description crash and branch spec containment - Guard resolve_spec_from_body against None description (GitHub sends null for PRs with no body, causing AttributeError on splitlines). - Add Path.resolve().relative_to() containment check to strategy 2 (branch-based spec path) matching strategy 1. A misconfigured spec_dir pointing outside the repo would otherwise leak files.
Summary
resolve_spec_from_body,resolve_spec_from_branch,load_spec) which had a path traversal vulnerability (spec: ../../etc/kai/envcould read arbitrary files) and exposed internal directory structure in PR bodiesload_spec_from_issue()which extractsfixes #N/closes #N/resolves #Nreferences from the PR description and fetches the linked issue bodies via the GitHub APISPEC_DIRenv var,spec_dirconfig field, and all associated passthrough in webhook.py and install.pyFixes #87
Test plan
spec:marker is inert (not matched)