Skip to content

MCSimien/Code_Review_Agent

Repository files navigation

Code Review Agent

An AI-powered code review tool that analyzes Python codebases for documentation gaps, style issues, algorithm improvements, security vulnerabilities, and maintainability concerns.

Bot vs Agent Mode

This tool has two modes:

Bot Mode (code_reviewer.py) Agent Mode (agent_reviewer.py)
Workflow Linear: fetch β†’ review β†’ post Dynamic: observe β†’ reason β†’ act β†’ iterate
Decision Making None - follows script Reasons about what to focus on
Context Reviews files in isolation Fetches related files for context
Self-Correction None Critiques and filters its own findings
Speed Faster, cheaper Slower, more thorough
Best For Quick checks, CI/CD Complex PRs, thorough reviews

Quick Start

Bot Mode (Fast, Simple)

# Review a PR
python code_reviewer.py --github owner/repo --pr 123

# Review local files
python code_reviewer.py src/

Agent Mode (Thorough, Intelligent)

# Let the agent reason about the PR
python agent_reviewer.py --github owner/repo --pr 123

# Verbose mode to see the agent's reasoning
python agent_reviewer.py --github owner/repo --pr 123 -v

What the Agent Does Differently

The agent follows an observe β†’ reason β†’ act β†’ iterate loop:

1. OBSERVE: Analyze PR context (title, description, file types, size)
   πŸ’­ "This PR adds authentication code and touches 3 files..."

2. REASON: Decide review strategy
   πŸ’­ "Security should be the focus. Let me also fetch the base User class..."

3. ACT: Fetch context, perform focused review
   πŸ”§ Fetching related files for context
   πŸ”§ Reviewing with focus on: security, error_handling

4. ITERATE: Self-critique findings
   πŸ’­ "Finding #3 is too pedantic, removing it..."

5. POST: Final review with summary
   βœ… Posted 4 high-quality findings

Example Agent Session

$ python agent_reviewer.py --github mcsimien/myrepo --pr 42 -v

============================================================
πŸ€– Agentic Code Review - Starting
============================================================

πŸ“ Iteration 1
  πŸ”§ Executing: analyze_pr_context
  πŸ’­ This PR adds a new payment processing module. Security is critical.

πŸ“ Iteration 2
  πŸ”§ Executing: fetch_changed_files
  πŸ”§ Executing: fetch_related_files
     Input: {"file_paths": ["src/models/user.py"], "reason": "Need to understand User model for auth check"}

πŸ“ Iteration 3
  πŸ”§ Executing: review_code
     Input: {"files": ["src/payment.py"], "focus_areas": ["security", "error_handling"]}

πŸ“ Iteration 4
  πŸ”§ Executing: self_critique
  πŸ’­ Removed 2 low-value findings about variable naming

πŸ“ Iteration 5
  πŸ”§ Executing: post_review

βœ… Review posted successfully!

============================================================
πŸ“Š Agent Summary
============================================================
Iterations: 5
Files reviewed: 2
Related files fetched: 1
Final findings: 4
Review posted: True

Quick Start

# Review a single file
python code_reviewer.py my_script.py

# Review an entire directory
python code_reviewer.py src/

# Use custom rules
python code_reviewer.py . --rules rules.yaml

# Output as JSON (for CI integration)
python code_reviewer.py src/ --output json

GitHub Integration

Post review comments directly to GitHub Pull Requests:

# Auto-fetch code from PR (no local files needed!)
python code_reviewer.py --github owner/repo --pr 123

# With local code (for testing before pushing)
python code_reviewer.py src/ --github myorg/myrepo --pr 45

# Summary only (no inline comments)
python code_reviewer.py --github owner/repo --pr 123 --no-inline

Monitor for New PRs

Automatically watch repositories and review new PRs as they're opened:

# Start the monitor daemon
python pr_monitor.py --repo owner/repo

# Monitor multiple repos
python pr_monitor.py --repo owner/repo1 --repo owner/repo2

# Custom check interval (2 minutes)
python pr_monitor.py --repo owner/repo --interval 120

# Run once and exit (for cron jobs)
python pr_monitor.py --repo owner/repo --once

# Verbose output
python pr_monitor.py --repo owner/repo -v

The monitor:

  • Checks for open PRs at regular intervals
  • Tracks which PRs have been reviewed (stored in ~/.code_review_agent/reviewed_prs.json)
  • Re-reviews PRs when new commits are pushed
  • Skips PRs with no Python files

GitHub Setup

You have two options for authentication:

Option A: Personal Access Token (Quick Setup)

Comments will appear as your username.

  1. Go to github.com/settings/tokens
  2. Click "Generate new token (classic)"
  3. Select scope: repo
  4. Set the token:
    export GITHUB_TOKEN="ghp_your_token_here"

Option B: GitHub App (Custom Bot Name)

Comments will appear as CodeReviewAgent[bot] (or whatever you name your app).

  1. Create the App at github.com/settings/apps/new:

    • Name: CodeReviewAgent (or your preferred name)
    • Homepage URL: any URL (can be your repo)
    • Uncheck "Webhook β†’ Active"
    • Permissions:
      • Pull requests: Read & Write
      • Contents: Read
    • Click "Create GitHub App"
  2. Get your App ID from the app's settings page (shown near the top)

  3. Generate a Private Key:

    • Scroll to "Private keys" section
    • Click "Generate a private key"
    • Save the downloaded .pem file securely
  4. Install the App:

    • Go to your app's settings β†’ "Install App"
    • Install on your repository (or all repositories)
    • Note the Installation ID from the URL: github.com/settings/installations/INSTALLATION_ID
  5. Set environment variables:

    export GITHUB_APP_ID="123456"
    export GITHUB_APP_PRIVATE_KEY_PATH="/path/to/your-app.private-key.pem"
    export GITHUB_APP_INSTALLATION_ID="12345678"
  6. Install the cryptography package:

    pip install cryptography
  7. Run the review:

    python code_reviewer.py src/ --github owner/repo --pr 123

What Gets Posted

  • Summary comment with counts of errors, warnings, and info messages
  • Inline comments on specific lines that are part of the PR diff
  • Review status: "Request Changes" if errors found, "Comment" otherwise

Example Output on GitHub

The review will appear as:

πŸ€– Code Review Agent Report

❌ 2 Error(s) ⚠️ 3 Warning(s) ℹ️ 1 Info

πŸ“„ src/utils.py

  • ❌ security Line 42: Possible hardcoded secret detected
    • πŸ’‘ Use environment variables or a secrets manager instead
  • ⚠️ algorithm Line 15: Nested loop detected - potential O(nΒ²) complexity
    • πŸ’‘ Consider if this can be optimized with a different data structure

Current Features (MVP)

  • Single file and directory scanning
  • Configurable rules via YAML
  • Mock review mode (works without API)
  • Basic heuristic checks:
    • Missing docstrings
    • Line length violations
    • Nested loop detection
    • Hardcoded secrets detection
  • Text and JSON output formats

Setup

Prerequisites

pip install pyyaml  # For YAML config support

Enable Claude API (Optional)

  1. Get an API key from console.anthropic.com
  2. Set the environment variable:
    export ANTHROPIC_API_KEY="your-key-here"
  3. Install the SDK:
    pip install anthropic
  4. In code_reviewer.py, set HAS_ANTHROPIC = True and uncomment the API code

Expansion Roadmap

Phase 1: Enhanced Local Analysis (No API Required)

Goal: Make the tool useful even without API access

  • AST-based analysis - Use Python's ast module for accurate parsing

    • Proper function/class detection
    • Accurate line numbers
    • Import analysis
    • Complexity calculation (cyclomatic)
  • More heuristic checks:

    • Unused imports
    • Unused variables
    • Duplicate code detection (simple)
    • TODO/FIXME tracking
  • Type hint validation - Check for missing type annotations

Files to create:

analyzers/
    __init__.py
    ast_analyzer.py    # AST-based code parsing
    complexity.py      # Cyclomatic complexity
    security.py        # Security pattern matching

Phase 2: Full Claude Integration

Goal: Leverage Claude for intelligent, context-aware review

  • API integration - Uncomment and test Claude API calls
  • Chunking strategy - Handle large files (split by function/class)
  • Context window management - Stay within token limits
  • Caching - Don't re-review unchanged files
  • Cost tracking - Monitor API usage

Key code changes:

# Add caching
import hashlib
import json
from pathlib import Path

CACHE_DIR = Path(".code_review_cache")

def get_cache_key(code: str, rules: dict) -> str:
    content = code + json.dumps(rules, sort_keys=True)
    return hashlib.sha256(content.encode()).hexdigest()

def check_cache(cache_key: str) -> Optional[ReviewResult]:
    cache_file = CACHE_DIR / f"{cache_key}.json"
    if cache_file.exists():
        # Return cached result
        pass

Phase 3: Git Integration

Goal: Focus reviews on what changed

  • Diff-based review - Only review changed lines
  • Pre-commit hook - Review before commits
  • PR review mode - Compare branches
  • Blame-aware context - Show who wrote problematic code

New CLI options:

# Review only staged changes
python code_reviewer.py --staged

# Review changes since last commit
python code_reviewer.py --diff HEAD~1

# Compare branches
python code_reviewer.py --compare main feature-branch

Files to create:

git_integration/
    __init__.py
    diff_parser.py     # Parse git diffs
    hooks.py           # Pre-commit hook setup

Phase 4: CI/CD Integration

Goal: Automate reviews in your pipeline

  • GitHub Action - Run on PRs automatically
  • Exit codes - Fail builds on errors
  • PR comments - Post findings as review comments
  • Status checks - Block merges on critical issues
  • Baseline mode - Only flag new issues

Create GitHub Action:

# .github/workflows/code-review.yml
name: Code Review
on: [pull_request]
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install -r requirements.txt
      - run: python code_reviewer.py src/ --output json > review.json
      - name: Post Review Comments
        run: python scripts/post_github_comments.py review.json

Phase 5: Advanced Analysis

Goal: Deeper, more valuable insights

  • Cross-file analysis - Track dependencies, find unused exports
  • Historical tracking - Trend analysis over time
  • Learning from feedback - Remember suppressed warnings
  • Custom rules engine - Define patterns in YAML/regex
  • Multi-language support - JavaScript, TypeScript, C++

Example custom rule:

custom:
  enabled: true
  patterns:
    - name: deprecated_api
      pattern: "requests\\.get\\("
      message: "Use httpx instead of requests for async support"
      severity: info
      
    - name: company_standard
      pattern: "print\\("
      message: "Use logging module instead of print statements"
      severity: warning

Phase 6: Team Features

Goal: Make it useful for teams

  • Shared rule configs - Pull from central repo
  • Metrics dashboard - Track code quality over time
  • Team baselines - Different standards per project
  • Review assignments - Route issues to owners
  • Suppression comments - # noqa: DOC001

Project Structure (Target)

code_review_agent/
β”œβ”€β”€ code_reviewer.py       # Main CLI entry point
β”œβ”€β”€ rules.yaml             # Default rules
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ analyzers/             # Analysis modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ ast_analyzer.py
β”‚   β”œβ”€β”€ complexity.py
β”‚   β”œβ”€β”€ documentation.py
β”‚   β”œβ”€β”€ security.py
β”‚   └── style.py
β”‚
β”œβ”€β”€ integrations/          # External integrations
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ claude_client.py   # API wrapper with caching
β”‚   β”œβ”€β”€ git_integration.py
β”‚   └── github_action.py
β”‚
β”œβ”€β”€ output/                # Output formatters
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ text.py
β”‚   β”œβ”€β”€ json_format.py
β”‚   β”œβ”€β”€ markdown.py
β”‚   └── github_comments.py
β”‚
└── tests/                 # Test suite
    β”œβ”€β”€ test_analyzers.py
    β”œβ”€β”€ test_integration.py
    └── fixtures/
        └── sample_code.py

Next Steps

  1. Run the MVP on one of your existing Python projects
  2. Identify gaps - What issues does it miss that you care about?
  3. Pick one Phase 1 item - I recommend AST analysis first
  4. Iterate - Add features based on what you actually need

Example Output

============================================================
Review: example.py
============================================================

Found 3 potential issues.

βœ— [ERROR] [security]
  Line 42: Possible hardcoded secret detected
  β†’ Use environment variables or a secrets manager instead

⚠ [WARNING] [algorithm]
  Line 15: Nested loop detected - potential O(nΒ²) complexity
  β†’ Consider if this can be optimized with a different data structure

⚠ [WARNING] [documentation]
  Functions appear to be missing docstrings
  β†’ Add docstrings describing function purpose, parameters, and return values

============================================================

About

Code Review Agent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages