Code Review Agent

An AI-powered code review tool that analyzes Python codebases for documentation gaps, style issues, algorithm improvements, security vulnerabilities, and maintainability concerns.

Bot vs Agent Mode

This tool has two modes:

	Bot Mode (`code_reviewer.py`)	Agent Mode (`agent_reviewer.py`)
Workflow	Linear: fetch → review → post	Dynamic: observe → reason → act → iterate
Decision Making	None - follows script	Reasons about what to focus on
Context	Reviews files in isolation	Fetches related files for context
Self-Correction	None	Critiques and filters its own findings
Speed	Faster, cheaper	Slower, more thorough
Best For	Quick checks, CI/CD	Complex PRs, thorough reviews

Quick Start

Bot Mode (Fast, Simple)

# Review a PR
python code_reviewer.py --github owner/repo --pr 123

# Review local files
python code_reviewer.py src/

Agent Mode (Thorough, Intelligent)

# Let the agent reason about the PR
python agent_reviewer.py --github owner/repo --pr 123

# Verbose mode to see the agent's reasoning
python agent_reviewer.py --github owner/repo --pr 123 -v

What the Agent Does Differently

The agent follows an observe → reason → act → iterate loop:

1. OBSERVE: Analyze PR context (title, description, file types, size)
   💭 "This PR adds authentication code and touches 3 files..."

2. REASON: Decide review strategy
   💭 "Security should be the focus. Let me also fetch the base User class..."

3. ACT: Fetch context, perform focused review
   🔧 Fetching related files for context
   🔧 Reviewing with focus on: security, error_handling

4. ITERATE: Self-critique findings
   💭 "Finding #3 is too pedantic, removing it..."

5. POST: Final review with summary
   ✅ Posted 4 high-quality findings

Example Agent Session

$ python agent_reviewer.py --github mcsimien/myrepo --pr 42 -v

============================================================
🤖 Agentic Code Review - Starting
============================================================

📍 Iteration 1
  🔧 Executing: analyze_pr_context
  💭 This PR adds a new payment processing module. Security is critical.

📍 Iteration 2
  🔧 Executing: fetch_changed_files
  🔧 Executing: fetch_related_files
     Input: {"file_paths": ["src/models/user.py"], "reason": "Need to understand User model for auth check"}

📍 Iteration 3
  🔧 Executing: review_code
     Input: {"files": ["src/payment.py"], "focus_areas": ["security", "error_handling"]}

📍 Iteration 4
  🔧 Executing: self_critique
  💭 Removed 2 low-value findings about variable naming

📍 Iteration 5
  🔧 Executing: post_review

✅ Review posted successfully!

============================================================
📊 Agent Summary
============================================================
Iterations: 5
Files reviewed: 2
Related files fetched: 1
Final findings: 4
Review posted: True

Quick Start

# Review a single file
python code_reviewer.py my_script.py

# Review an entire directory
python code_reviewer.py src/

# Use custom rules
python code_reviewer.py . --rules rules.yaml

# Output as JSON (for CI integration)
python code_reviewer.py src/ --output json

GitHub Integration

Post review comments directly to GitHub Pull Requests:

# Auto-fetch code from PR (no local files needed!)
python code_reviewer.py --github owner/repo --pr 123

# With local code (for testing before pushing)
python code_reviewer.py src/ --github myorg/myrepo --pr 45

# Summary only (no inline comments)
python code_reviewer.py --github owner/repo --pr 123 --no-inline

Monitor for New PRs

Automatically watch repositories and review new PRs as they're opened:

# Start the monitor daemon
python pr_monitor.py --repo owner/repo

# Monitor multiple repos
python pr_monitor.py --repo owner/repo1 --repo owner/repo2

# Custom check interval (2 minutes)
python pr_monitor.py --repo owner/repo --interval 120

# Run once and exit (for cron jobs)
python pr_monitor.py --repo owner/repo --once

# Verbose output
python pr_monitor.py --repo owner/repo -v

The monitor:

Checks for open PRs at regular intervals
Tracks which PRs have been reviewed (stored in ~/.code_review_agent/reviewed_prs.json)
Re-reviews PRs when new commits are pushed
Skips PRs with no Python files

GitHub Setup

You have two options for authentication:

Option A: Personal Access Token (Quick Setup)

Comments will appear as your username.

Go to github.com/settings/tokens
Click "Generate new token (classic)"
Select scope: repo

Set the token:

export GITHUB_TOKEN="ghp_your_token_here"

Option B: GitHub App (Custom Bot Name)

Comments will appear as CodeReviewAgent[bot] (or whatever you name your app).

Create the App at github.com/settings/apps/new:
- Name: CodeReviewAgent (or your preferred name)
- Homepage URL: any URL (can be your repo)
- Uncheck "Webhook → Active"
- Permissions:
  - Pull requests: Read & Write
  - Contents: Read
- Click "Create GitHub App"
Get your App ID from the app's settings page (shown near the top)
Generate a Private Key:
- Scroll to "Private keys" section
- Click "Generate a private key"
- Save the downloaded .pem file securely
Install the App:
- Go to your app's settings → "Install App"
- Install on your repository (or all repositories)
- Note the Installation ID from the URL: github.com/settings/installations/INSTALLATION_ID

Set environment variables:

export GITHUB_APP_ID="123456"
export GITHUB_APP_PRIVATE_KEY_PATH="/path/to/your-app.private-key.pem"
export GITHUB_APP_INSTALLATION_ID="12345678"

Install the cryptography package:
```
pip install cryptography
```

Run the review:

python code_reviewer.py src/ --github owner/repo --pr 123

What Gets Posted

Summary comment with counts of errors, warnings, and info messages
Inline comments on specific lines that are part of the PR diff
Review status: "Request Changes" if errors found, "Comment" otherwise

Example Output on GitHub

The review will appear as:

🤖 Code Review Agent Report

❌ 2 Error(s) ⚠️ 3 Warning(s) ℹ️ 1 Info

📄 src/utils.py

❌ security Line 42: Possible hardcoded secret detected

💡 Use environment variables or a secrets manager instead

⚠️ algorithm Line 15: Nested loop detected - potential O(n²) complexity

💡 Consider if this can be optimized with a different data structure

Current Features (MVP)

Single file and directory scanning
Configurable rules via YAML
Mock review mode (works without API)
Basic heuristic checks:
- Missing docstrings
- Line length violations
- Nested loop detection
- Hardcoded secrets detection
Text and JSON output formats

Setup

Prerequisites

pip install pyyaml  # For YAML config support

Enable Claude API (Optional)

Get an API key from console.anthropic.com

Set the environment variable:

export ANTHROPIC_API_KEY="your-key-here"

Install the SDK:
```
pip install anthropic
```
In code_reviewer.py, set HAS_ANTHROPIC = True and uncomment the API code

Expansion Roadmap

Phase 1: Enhanced Local Analysis (No API Required)

Goal: Make the tool useful even without API access

AST-based analysis - Use Python's ast module for accurate parsing
- Proper function/class detection
- Accurate line numbers
- Import analysis
- Complexity calculation (cyclomatic)
More heuristic checks:
- Unused imports
- Unused variables
- Duplicate code detection (simple)
- TODO/FIXME tracking
Type hint validation - Check for missing type annotations

Files to create:

analyzers/
    __init__.py
    ast_analyzer.py    # AST-based code parsing
    complexity.py      # Cyclomatic complexity
    security.py        # Security pattern matching

Phase 2: Full Claude Integration

Goal: Leverage Claude for intelligent, context-aware review

API integration - Uncomment and test Claude API calls
Chunking strategy - Handle large files (split by function/class)
Context window management - Stay within token limits
Caching - Don't re-review unchanged files
Cost tracking - Monitor API usage

Key code changes:

# Add caching
import hashlib
import json
from pathlib import Path

CACHE_DIR = Path(".code_review_cache")

def get_cache_key(code: str, rules: dict) -> str:
    content = code + json.dumps(rules, sort_keys=True)
    return hashlib.sha256(content.encode()).hexdigest()

def check_cache(cache_key: str) -> Optional[ReviewResult]:
    cache_file = CACHE_DIR / f"{cache_key}.json"
    if cache_file.exists():
        # Return cached result
        pass

Phase 3: Git Integration

Goal: Focus reviews on what changed

Diff-based review - Only review changed lines
Pre-commit hook - Review before commits
PR review mode - Compare branches
Blame-aware context - Show who wrote problematic code

New CLI options:

# Review only staged changes
python code_reviewer.py --staged

# Review changes since last commit
python code_reviewer.py --diff HEAD~1

# Compare branches
python code_reviewer.py --compare main feature-branch

Files to create:

git_integration/
    __init__.py
    diff_parser.py     # Parse git diffs
    hooks.py           # Pre-commit hook setup

Phase 4: CI/CD Integration

Goal: Automate reviews in your pipeline

GitHub Action - Run on PRs automatically
Exit codes - Fail builds on errors
PR comments - Post findings as review comments
Status checks - Block merges on critical issues
Baseline mode - Only flag new issues

Create GitHub Action:

# .github/workflows/code-review.yml
name: Code Review
on: [pull_request]
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install -r requirements.txt
      - run: python code_reviewer.py src/ --output json > review.json
      - name: Post Review Comments
        run: python scripts/post_github_comments.py review.json

Phase 5: Advanced Analysis

Goal: Deeper, more valuable insights

Cross-file analysis - Track dependencies, find unused exports
Historical tracking - Trend analysis over time
Learning from feedback - Remember suppressed warnings
Custom rules engine - Define patterns in YAML/regex
Multi-language support - JavaScript, TypeScript, C++

Example custom rule:

custom:
  enabled: true
  patterns:
    - name: deprecated_api
      pattern: "requests\\.get\\("
      message: "Use httpx instead of requests for async support"
      severity: info
      
    - name: company_standard
      pattern: "print\\("
      message: "Use logging module instead of print statements"
      severity: warning

Phase 6: Team Features

Goal: Make it useful for teams

Shared rule configs - Pull from central repo
Metrics dashboard - Track code quality over time
Team baselines - Different standards per project
Review assignments - Route issues to owners
Suppression comments - # noqa: DOC001

Project Structure (Target)

code_review_agent/
├── code_reviewer.py       # Main CLI entry point
├── rules.yaml             # Default rules
├── requirements.txt
├── README.md
│
├── analyzers/             # Analysis modules
│   ├── __init__.py
│   ├── ast_analyzer.py
│   ├── complexity.py
│   ├── documentation.py
│   ├── security.py
│   └── style.py
│
├── integrations/          # External integrations
│   ├── __init__.py
│   ├── claude_client.py   # API wrapper with caching
│   ├── git_integration.py
│   └── github_action.py
│
├── output/                # Output formatters
│   ├── __init__.py
│   ├── text.py
│   ├── json_format.py
│   ├── markdown.py
│   └── github_comments.py
│
└── tests/                 # Test suite
    ├── test_analyzers.py
    ├── test_integration.py
    └── fixtures/
        └── sample_code.py

Next Steps

Run the MVP on one of your existing Python projects
Identify gaps - What issues does it miss that you care about?
Pick one Phase 1 item - I recommend AST analysis first
Iterate - Add features based on what you actually need

Example Output

============================================================
Review: example.py
============================================================

Found 3 potential issues.

✗ [ERROR] [security]
  Line 42: Possible hardcoded secret detected
  → Use environment variables or a secrets manager instead

⚠ [WARNING] [algorithm]
  Line 15: Nested loop detected - potential O(n²) complexity
  → Consider if this can be optimized with a different data structure

⚠ [WARNING] [documentation]
  Functions appear to be missing docstrings
  → Add docstrings describing function purpose, parameters, and return values

============================================================

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
README.md		README.md
agent_reviewer.py		agent_reviewer.py
code-review.yml		code-review.yml
code_reviewer.py		code_reviewer.py
github_integration.py		github_integration.py
pr_monitor.py		pr_monitor.py
requirements.txt		requirements.txt
rules.yaml		rules.yaml
sample_code.py		sample_code.py

Folders and files

Latest commit

History

Repository files navigation

Code Review Agent

Bot vs Agent Mode

Quick Start

Bot Mode (Fast, Simple)

Agent Mode (Thorough, Intelligent)

What the Agent Does Differently

Example Agent Session

Quick Start

GitHub Integration

Monitor for New PRs

GitHub Setup

Option A: Personal Access Token (Quick Setup)

Option B: GitHub App (Custom Bot Name)

What Gets Posted

Example Output on GitHub