v0.2.0
This release makes the bot interactive and controllable from the PR — conversational replies, comment commands, context-aware labels, and configurable triggers — and hardens startup and the manual-review path.
Added
- Conversational replies: a maintainer can
@thrillhousebotanywhere in a PR thread — including as a reply to one of the bot's review findings — and the bot answers in context, pulling in the original finding, the surrounding diff, and the prior thread replies instead of having to re-run the whole review. An explicit@-mention is required; a bare reply on a thread (even the bot's own finding) does not pull it in. Replies are posted back into the same review thread (or as a PR comment for top-level mentions), gated to the same write-access/allowlisted users as a manual/review, and can be turned off withREVIEW_CONVERSATIONAL_REPLIES_ENABLED=false. Requires subscribing the GitHub App to the newpull_request_review_commentevent (added tomanifest.json) (#31, #202) - Comment commands: drive the bot from a PR with
/help,/summary,/resolve,/pause, and/resume(each also accepts the@Thrillhousebot <command>mention form)./pausesilences the bot on a PR — skipping automatic reviews and conversational replies, and ignoring/reviewand/summary— until/resume;/resolveresolves the bot's open finding threads;/summaryposts the PR summary if one was not generated yet. Every command except/helprequires repository write access (#32) - Context-aware PR labels (opt-in): the model is shown the repository's existing labels and picks the few that best describe the change. Off by default (
REVIEW_LABELS_ENABLED); when on, it either posts a one-line suggestion comment or applies the labels (REVIEW_LABELS_APPLY), with optional creation of new labels (REVIEW_LABELS_ALLOW_CREATE) and a per-PR cap (REVIEW_LABELS_MAX, default 3). Labelling is best-effort and never blocks a review (#61) - Configurable review triggers: narrow which pull requests are auto-reviewed — skip drafts (
WEBHOOK_SKIP_DRAFTS), gate on labels (WEBHOOK_REQUIRED_LABELS/WEBHOOK_EXCLUDED_LABELS), and filter by base-branch glob (WEBHOOK_BASE_BRANCHES/WEBHOOK_IGNORED_BASE_BRANCHES); base-branch globs are gitignore-style, so*does not cross/— use**to span slashes (e.g.dependabot/**, or**alone for every branch). Defaults review every PR, matching prior behavior; a manual/reviewalways bypasses the filters (#40) - Review on ready-for-review: a draft PR marked "Ready for review" is reviewed immediately, pairing with
WEBHOOK_SKIP_DRAFTSso drafts can be skipped until they are ready (#72) - Fail-fast configuration validation: required configuration (
GITHUB_APP_ID,GITHUB_PRIVATE_KEY,GITHUB_WEBHOOK_SECRET,AI_API_KEY) is validated at startup, and the app refuses to boot with a single message naming every missing or malformed value — including a non-numeric App id or a private key that is not valid PEM RSA — instead of failing later on the first webhook (#27) - Configurable bot identity: the bot's own account login(s) are configurable via
GITHUB_BOT_LOGINS, so loop protection,/resolve, summary deduplication, and follow-up finding tracking all keep recognizing the bot's own activity when the App is deployed under a different slug (#165, #201) - Reviewer flags single-page collection fetches: the review prompt now has a pagination/truncation dimension, so a diff that lists a paginated collection (a GitHub REST endpoint or a GraphQL connection) and then consumes the result as if complete — searched, counted, iterated, or used to drive an action like
/resolve— without walking every page is reported as a silent-truncation finding. The bot had been catching one such case while missing analogous REST and GraphQL ones (including in the same PR) because no dimension prompted the pattern; severity scales with what is dropped and confidence stays calibrated for the page-size assumption (#166) - Reviewer rejects refuted runtime-crash claims: the review prompt now traces an alleged runtime failure (
NullPointerException, index-out-of-bounds, and the like) from the enclosing method's entry down to the flagged line, and discards the finding when an in-diff guard makes that line unreachable for the claimed input — an earlier return/continue/throw, or a null/range check on a value derived from the flagged one. This removes a recurring class of confident false-positive crash findings the reviewer raised against code that already guards the condition (#112)
Changed
- Manual-trigger authorization is time-bounded: the write-access check for a manual
/review(installation-token mint + collaborator-permission call) now runs under a configurable timeout (MANUAL_TRIGGER_AUTH_TIMEOUT, default5s) on the webhook ack thread and fails closed if GitHub is too slow, so a degraded GitHub can no longer tie up a webhook worker past the delivery SLA (#92) - CI — actionlint guardrail: workflows and the consolidated Trivy composite action are linted (including inline shell via shellcheck), with the release-gate scan path mirrored so it is validated on PR CI (#93)
- CI — faster pipeline: SpotBugs moved off the test job's critical path into the parallel lint job, the test job collapsed into a single Maven reactor, and the native build + image publish skipped for docs-only pushes to
main(#170) - CI — SonarCloud scoping: the Sonar scan runs only on
mainand same-repo pull requests (matching the SonarCloud community plan), and a.dockerignorekeeps the Docker build context small (#165)
Fixed
- AI prompts dropped every context variable but the first: each AI service (
PrReviewer,ReplyAssistant,FindingVerifier) declared@UserMessageon a method parameter, which makes quarkus-langchain4j send only that parameter's raw value as the user message and never render the prompt template. So reviews ran on the diff alone — silently ignoring the repository instructions (.github/thrillhousebot.md), project stack, PR title/description, base comparison, related tests, and previous findings — the finding verifier audited candidates without the diff, and conversational replies saw only the maintainer's question with no diff, finding, or thread. Moved@UserMessageto the method so every@Vvariable is interpolated, and reducedPromptTemplateEscaperto marker-neutralization (its Qute unparsed-section wrapper was never stripped for data-bound values and corrupted any content containing|}). Added end-to-end and structural regression tests that pin the rendered prompt (#186) - Reviewer corrupted the marker-handling code it was reviewing: the prompt-injection defense rewrote the diff-section delimiters (
<<<DIFF_START>>>/<<<DIFF_END>>>) found inside the diff, so whenever the bot reviewed code that legitimately contains those markers — the escaper, the prompt templates, and any PR that edits them — it saw altered source. That produced false "contradictory assertion"/no-op findings and silently degraded review accuracy of exactly those files. Replaced the fixed delimiters with a per-review unguessable random fence around the diff (the "random sequence enclosure"/spotlighting defense) and now pass the diff byte-exact; the small prose context slots keep the lightweight marker-neutralization as defense-in-depth (#187) - Large PRs were silently truncated to 30 files:
getPullRequestFilesfetched only GitHub's default first page, so any PR with more than 30 changed files was reviewed — and described / changelog'd / replied to — on a partial diff, with no warning. It now paginates (100 files per page, bounded at 30 pages) so the whole diff is assembled before review (#190) - False "undefined / missing symbol" findings when the definition is just outside the diff: a finding could confidently flag a variable, env var, import, or config key as undefined/unset when its definition sat in the same file a few unchanged lines outside the diff hunk's context window — GitHub serves only ~3 lines of context, so the definition was never in the reviewed material (a CRITICAL false positive on
release.ymlin PR #88 claimedNEXT/TAGwere undefined when the step'senv:block defined them). The reviewer now treats an unseen definition as unconfirmed rather than absent, and the verifier rejects an "undefined / missing symbol" finding only when the scope its definition would occupy isn't shown in the material (an unverifiable claim) — a genuinely missing symbol that the diff does demonstrate (e.g. the diff removes the definition) still stands (#192) - Approval gating ignored ruleset-based branch protection: CI-aware approval gating resolved the required status checks only from classic branch protection, so a repository that protects its base branch with a repository/organization ruleset (the modern mechanism) silently fell back to gating approvals on every check instead of the actual required set. Required contexts are now unioned from rulesets and classic protection both (#178)
- Duplicate "no issues, but CI pending" message on a clean first review: when a PR had no findings but a required check was still pending or failing, the bot posted the held-back notice twice — once in the PR summary's CI-status table and again as a separate COMMENT review restating it. The redundant COMMENT review is now skipped when a first review is held back solely by CI; an unresolved prior finding, a follow-up review, or a
REQUEST_CHANGESverdict still posts it (#175)
Documentation
- Repository review guidance: added a dogfooded
.github/thrillhousebot.mdwith GitHub platform facts and review heuristics for this codebase, so the bot stops repeating known false positives and primes recurring misses (#168) - Documented the new v0.2.0 configuration keys in
README.mdand.env.example— review triggers, PR labels, conversational replies,MANUAL_TRIGGER_AUTH_TIMEOUT, andGITHUB_BOT_LOGINS(#165)