Skip to content

ci: add LLM PR review workflow with manual trigger#39

Merged
jorben merged 6 commits intomasterfrom
ci/llm-pr-review-manual-trigger
Feb 13, 2026
Merged

ci: add LLM PR review workflow with manual trigger#39
jorben merged 6 commits intomasterfrom
ci/llm-pr-review-manual-trigger

Conversation

@jorben
Copy link
Collaborator

@jorben jorben commented Feb 13, 2026

Summary

  • Add GitHub Actions workflow for automated LLM-powered PR code review using Anthropic Messages API
  • Support both automatic trigger on PR events (opened, synchronize, reopened) and manual trigger via workflow_dispatch with PR number input

Changes

  • New workflow file .github/workflows/llm-pr-review.yml
  • workflow_dispatch input: pr_number (required, type: number)
  • PR_NUMBER env and concurrency.group use || fallback to support both trigger modes

Test plan

  • Verify workflow triggers automatically on new PR to master
  • Verify manual trigger works via Actions tab with a valid PR number
  • Verify concurrency grouping correctly deduplicates runs

🤖 Generated with Claude Code

jorben and others added 4 commits February 13, 2026 23:56
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Some models return thinking blocks before text content, causing
the review text extraction to fail when using content[0].

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR adds a GitHub Actions workflow for automated LLM-powered PR code reviews using Anthropic's Messages API. The implementation supports both automatic triggers on PR events and manual triggers via workflow dispatch. Overall the approach is sound with good error handling, but there are a few issues to address.

Critical

  • No critical issues found - The workflow uses appropriate permissions, has error handling, and implements concurrency controls correctly.

Important

  • [llm-pr-review.yml:24] fetch-depth: Unnecessary git fetch: The fetch-depth: 0 fetches the entire git history, but the workflow only uses GitHub API to fetch the PR diff. This is unnecessary and adds time to every run. Consider removing this or setting fetch-depth: 1 if any git operations are needed at all.

  • [llm-pr-review.yml:72] jq first() safety: Using first on an array could fail if the array is empty. The code checks if [ -z "$review_text" ] after, but it's cleaner to handle this in jq:

    review_text=$(printf "%s" "$llm_response" | jq -r '.content | map(select(.type == "text")) | first?.text // ""')

    Note: The current code has a parsing error - first returns an object, not a string, so it should be .text after first, not .text // "" directly. This actually works due to jq's behavior but is confusing.

Suggestion

  • [llm-pr-review.yml:45] Character limit consideration: The 60,000 character limit is reasonable but arbitrary. Consider documenting this limit or making it configurable via a workflow variable.

  • [llm-pr-review.yml:90] max_tokens hardcoded: The max_tokens: 8192 is hardcoded. Consider making this configurable via workflow variables if different models require different values.

  • [llm-pr-review.yml:38] API retry logic: The workflow makes multiple API calls (GitHub API for PR metadata, diff, and Anthropic API) without retry logic. Consider adding retries for transient failures, especially for the LLM API call which could fail due to rate limits.

Praise

  • [llm-pr-review.yml:19-20] Concurrency control: Excellent use of cancel-in-progress: true to deduplicate workflow runs and prevent redundant reviews.

  • [llm-pr-review.yml:22-23] Minimal permissions: The workflow uses the principle of least privilege with only contents: read and pull-requests: write.

  • [llm-pr-review.yml:29-32] Shell safety: Good use of set -euo pipefail and proper IFS setting for predictable shell behavior.

  • [llm-pr-review.yml:82-86] Error handling: Comprehensive error handling with specific error messages for each API call failure.

  • [llm-pr-review.yml:46-70] System prompt: Well-structured system prompt that clearly defines review criteria and expected output format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

A well-structured workflow for LLM-powered PR review. There are a few security and robustness concerns worth addressing before merging.

Critical

  • [.github/workflows/llm-pr-review.yml:97] Passing diff via command-line argument risks exceeding OS argument length limits: jq -n with --arg diff passes the entire (up to 60KB) diff as a CLI argument. On Linux, ARG_MAX is typically 2MB so 60KB is fine in practice, but a safer approach is to use --rawfile or pipe via jq --slurpfile / environment variables. More critically, if $request_body exceeds the shell's argument limit when passed to curl -d, the call will silently fail or error. Consider writing the request body to a temp file and using curl -d @file.

Important

  • [.github/workflows/llm-pr-review.yml:42] Sensitive API key could leak in debug/error output: If the curl call to the LLM API fails and Actions step debug logging is enabled, the request (including headers with $API_KEY) may be logged. Consider adding $API_KEY as a masked value explicitly, or confirm that since it comes from secrets.* it's always masked (it is—but only if the literal secret value appears in logs; partial matches or encoded forms may not be caught).

  • [.github/workflows/llm-pr-review.yml:68] Trimming diff by raw byte count with head -c can split a multi-byte UTF-8 character: This will produce invalid UTF-8 at the boundary, which could cause jq to reject the input or produce garbled output. Consider truncating on a line boundary instead (e.g., head -c $max_diff_chars | head -n -0 or count lines instead of characters).

  • [.github/workflows/llm-pr-review.yml:41] No validation of PR_NUMBER from workflow_dispatch: Although typed as number, a malicious or accidental input could still be non-numeric in edge cases. Add a quick validation step:

    if ! [[ "$PR_NUMBER" =~ ^[0-9]+$ ]]; then
      echo "::error::Invalid PR number"; exit 1
    fi

Suggestion

  • [.github/workflows/llm-pr-review.yml:108] Consider logging LLM API usage metadata (e.g., usage.input_tokens, usage.output_tokens) from the response for cost monitoring. A simple jq '.usage' after the call would suffice.

  • [.github/workflows/llm-pr-review.yml:34] fetch-depth: 1 is fine since the diff comes from the API, but consider adding a comment clarifying that the checkout is only needed for workflow context (or remove it entirely if no local files are used in the step).

  • [.github/workflows/llm-pr-review.yml:120] Consider gating the review post on diff size—if the diff was truncated, prepend a note like > ⚠️ Diff was truncated to 60k characters so reviewers know the LLM didn't see the full change.

Praise

  • [.github/workflows/llm-pr-review.yml:16-18] Good use of concurrency with cancel-in-progress to avoid redundant reviews on rapid pushes.
  • [.github/workflows/llm-pr-review.yml:45-48] Proper set -euo pipefail and consistent error handling with ::error:: annotations throughout—solid defensive scripting.
  • [.github/workflows/llm-pr-review.yml:20-22] Minimal permissions scoping (contents: read, pull-requests: write) follows least-privilege nicely.

- Truncate on line boundary to avoid splitting multi-byte UTF-8 chars
- Wrap diff in code fence for clearer LLM parsing
- Add hunk header guidance in system prompt for accurate line references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

The workflow implements an LLM-powered PR review using Anthropic's Messages API with both automatic and manual trigger support. Overall the implementation is sound, but there are some important security and correctness considerations around JSON escaping and error handling.

Critical

  • [52-56] Potential JSON injection with unescaped PR metadata: The pr_title, pr_body, and pr_diff_trimmed variables are inserted into the user message without explicit validation. While jq --arg handles basic escaping, malformed or extremely large PR titles/bodies could potentially cause issues. Consider adding length limits or explicit escaping validation before constructing the request body.

Important

  • [45] Missing API_BASE validation: If ANTHROPIC_BASE_URL variable is not set, the curl command will fail with an unclear error message. Consider adding validation:

    if [ -z "$API_BASE" ]; then
      echo "::error::ANTHROPIC_BASE_URL variable not set"
      exit 1
    fi
  • [45] Missing MODEL_ID validation: Similar issue if ANTHROPIC_MODEL variable is not configured.

  • [90] No idempotency for manual triggers: When using workflow_dispatch, the workflow could be run multiple times for the same PR, creating duplicate reviews. Consider checking for existing reviews before posting or using a separate "request review" approach for manual triggers.

  • [120] Silent failure on review post: The final curl uses >/dev/null which discards the response. A failed review post could go unnoticed. Consider logging the response or at least checking the HTTP status code explicitly.

Suggestion

  • [35] Diff truncation could break line integrity: Using head -c $max_diff_chars | sed '$d' may truncate in the middle of a line. Consider truncating at line boundaries for cleaner diffs:

    pr_diff_trimmed=$(printf "%s" "$pr_diff" | head -c $max_diff_chars | awk '{if(length > 0) print}')
  • [21-22] Add timeout-minutes: Consider adding timeout-minutes: 10 to the job to prevent runaway workflows from consuming resources.

  • Consider adding a step to label the PR (e.g., "llm-reviewed") to track which PRs have been reviewed.

Praise

  • [21-23] Concurrency group correctly handles both trigger modes with || fallback, and cancel-in-progress: true properly deduplicates workflow runs.

  • [24-25] Permissions are properly scoped to minimum required: contents: read and pull-requests: write only.

  • [44] Good use of set -euo pipefail for strict error handling.

  • [32] The system prompt is well-crafted with clear formatting instructions for the LLM output.

  • [93-96] Proper error handling with informative error messages when the LLM response is empty or malformed.

@jorben jorben merged commit 500bc41 into master Feb 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant