Catch bugs before your teammates do. An intelligent GitHub bot that provides context-aware code reviews using AI and retrieval-augmented generation.
- Ollama Support: Run completely offline with local LLMs for private repositories
- Enterprise Ready: Zero code leakage to external APIs when using privacy mode
- Configurable Providers: Switch between OpenAI and local models seamlessly
- RAG Pipeline: Understands your entire codebase, not just the diff
- Hybrid Retrieval: Combines vector search with keyword matching for better context
- Call Graph Analysis: Tracks function relationships and dependencies
- Semantic Code Understanding: Uses Tree-sitter for accurate AST parsing
- Secret Detection: Catches API keys, passwords, and tokens before commit
- Vulnerability Analysis: Detects SQL injection, XSS, and other security flaws
- Pattern Recognition: Learns your codebase's security patterns
- User Reactions: Learn from π/π feedback to improve review quality
- Confidence Scoring: Only surfaces high-confidence suggestions
- Severity Levels: π΄ Critical, π‘ Warning, π΅ Suggestion classifications
- Live Dashboard: Real-time metrics and performance tracking
- Dockerized Deployment: One-command deployment with Docker Compose
- Caching Strategy: Efficient repo caching and embedding reuse
- Rate Limiting: Smart backoff to handle API limits gracefully
- Health Monitoring: Built-in health checks and monitoring
| Metric | OpenAI GPT-4 | Ollama Llama3.1:8b | Traditional Tools |
|---|---|---|---|
| Precision | 74% | 68% | 45% |
| Recall | 52% | 48% | 30% |
| Helpful Rate | 78% π | 71% π | 42% π |
| Avg Latency | 4.2s | 8.1s | 2.1s |
| Cost per PR | $0.08 | $0.00 | $0.00 |
| Security Issues Caught | 89% | 82% | 35% |
flowchart TD
subgraph GitHub
PR[Pull Request] --> WH[Webhook]
end
subgraph "PR Reviewer Bot"
WH --> Handler[Event Handler]
Handler --> Clone[Repo Cloner]
Clone --> Cache[(Repo Cache<br/>LRU)]
Clone --> Parser[Tree-sitter Parser]
Parser --> Indexer[Code Indexer]
Handler --> Diff[Diff Parser]
Diff --> Security[Security Scanner]
Indexer --> VDB[(ChromaDB<br/>Vector Store)]
Diff --> Retriever[Hybrid Retriever<br/>Vector + BM25]
Retriever --> VDB
Security --> LLM{LLM Provider}
Retriever --> LLM
LLM --> OpenAI[OpenAI GPT-4]
LLM --> Ollama[Ollama Local]
LLM --> Formatter[Review Formatter]
Formatter --> Review[GitHub Review<br/>Inline Comments]
end
subgraph "Analytics & Feedback"
Review --> Reactions[π/π Reactions]
Reactions --> FDB[(Feedback DB)]
FDB --> Dashboard[π Live Dashboard]
end
style LLM fill:#e1f5fe
style VDB fill:#f3e5f5
style Cache fill:#e8f5e8
- Node.js β₯ 20.0.0
- Docker & Docker Compose
- GitHub App credentials
# Clone the repository
git clone https://github.com/awhvish/PR-Reviewer.git
cd PR-Reviewer
# Install dependencies
npm install
# Start services
docker-compose up -d
# Configure environment (see INSTALL.md for details)
cp .env.example .env
# Edit .env with your credentials
# Run the bot
npm run devFor detailed setup instructions, see INSTALL.md.
# Required - GitHub App
APP_ID=your_github_app_id
PRIVATE_KEY="-----BEGIN RSA PRIVATE KEY-----\n..."
WEBHOOK_SECRET=your_webhook_secret
# Optional - LLM Provider (defaults to OpenAI)
LLM_PROVIDER=openai # or "ollama" for privacy mode
OPENAI_API_KEY=sk-... # Required if using OpenAI
# Optional - Vector Database
CHROMA_HOST=localhost
CHROMA_PORT=8000
# Optional - Performance
MAX_REPO_SIZE_GB=5
CONFIDENCE_THRESHOLD=50
RATE_LIMIT_RPM=60For sensitive repositories, use Ollama for 100% local processing:
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve
# Pull required models
ollama pull llama3.1:8b
ollama pull nomic-embed-text
# Configure privacy mode
echo "LLM_PROVIDER=ollama" >> .env
# Restart the bot
npm restartWhen you open a PR, the bot automatically:
π€ **AI Code Review**
π΄ **1 Critical Issue** β’ π‘ **2 Warnings** β’ π΅ **1 Suggestion**
---
π΄ **CRITICAL** (92% confidence)
**Possible SQL injection vulnerability**
```suggestion
- const result = await db.query(`SELECT * FROM users WHERE id = ${userId}`);
+ const result = await db.query('SELECT * FROM users WHERE id = ?', [userId]);π‘ WARNING (78% confidence)
Missing error handling for async operation
This function calls getUserById but doesn't handle potential failures. Consider adding try-catch.
π΅ SUGGESTION (65% confidence)
Consider using consistent naming
Based on your codebase patterns, consider renaming validateInput to validateUserInput to match the convention used in auth/validators.js.
### Security Scanning Results
```markdown
π‘οΈ **Security Scan Results**
β οΈ Found 1 security issue:
**Line 23**: Potential secret detected
const API_KEY = "sk-1234567890abcdef"; // π΄ This looks like an OpenAI API key
**Recommendation**: Move to environment variables
```suggestion
const API_KEY = process.env.OPENAI_API_KEY;
### Context-Aware Suggestions
The bot understands your codebase patterns:
```markdown
π§ **Context-Aware Analysis**
I noticed you're modifying the `calculateTax` function. Based on similar functions in your codebase:
- `calculateShipping` (utils/shipping.js:45) handles edge cases for zero amounts
- `calculateDiscount` (utils/pricing.js:23) includes input validation
- Both functions use the same error handling pattern
Consider applying similar patterns to maintain consistency.
Access the live dashboard at http://localhost:3000/dashboard:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π PR Reviewer Dashboard β
βββββββββββββββ¬βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββ€
β Reviews β Helpful Rate β Avg Latency β Cost Today β
β 47 today β 78% π β 4.2s β $0.83 β
βββββββββββββββ΄βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββ€
β [π 7-day trend chart showing review quality over time] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Recent Reviews β
β ββ user/repo#123: 3 issues (1 π΄, 2 π‘) β 2 min ago β
β ββ org/project#456: 1 issue (1 π΅) β 15 min ago β
β ββ team/app#789: 0 issues β β 1 hour ago β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Top Issue Categories This Week β
β ββ Security Issues: 23 (β15%) β
β ββ Logic Errors: 18 (β5%) β
β ββ Performance: 12 (β8%) β
β ββ Style Issues: 9 (β20%) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
pr-reviewer/
βββ src/
β βββ index.ts # Entry point
β βββ github/
β β βββ webhook.ts # Event handlers
β β βββ comments.ts # Review posting
β βββ git/
β β βββ cloner.ts # Repository cloning with cache
β βββ parsing/
β β βββ treeSitter.ts # AST parsing
β β βββ codeChunker.ts # Code chunking for RAG
β βββ rag/
β β βββ embeddings.ts # Embedding generation
β β βββ vectorStore.ts # ChromaDB operations
β β βββ retriever.ts # Hybrid search
β β βββ indexer.ts # Full indexing pipeline
β βββ llm/
β β βββ provider.ts # Abstract LLM interface
β β βββ openai.ts # OpenAI provider
β β βββ ollama.ts # Ollama local provider
β βββ security/
β β βββ secretScanner.ts # Secret detection
β β βββ vulnScanner.ts # Vulnerability scanning
β βββ review/
β β βββ diffParser.ts # PR diff parsing
β β βββ generator.ts # Review generation
β βββ feedback/
β β βββ collector.ts # User feedback handling
β β βββ db.ts # Feedback storage
β βββ dashboard/
β βββ server.ts # Express dashboard
β βββ views/ # Dashboard UI
βββ scripts/
β βββ evaluate.ts # Benchmark evaluation
βββ docker-compose.yml # Full stack deployment
βββ Dockerfile # Production container
βββ package.json
# Development
npm run dev # Start with hot reload
npm run build # Compile TypeScript
npm run type-check # Type checking only
# Services
docker-compose up -d # Start ChromaDB + services
docker-compose down # Stop all services
# Privacy Mode
ollama serve # Start local LLM server
LLM_PROVIDER=ollama npm run dev
# Utilities
npm run index -- --repo owner/repo # Manually index repository
npm run clean-cache # Clear repository cache
npm run feedback-report # Show user feedback stats
# Testing & Evaluation
npm run test # Run test suite
npm run evaluate # Run benchmark evaluation
npm run lint # ESLint checking// src/parsing/treeSitter.ts
private GRAMMAR_MAP: Record<string, string> = {
".js": "tree-sitter-javascript",
".ts": "tree-sitter-typescript",
".py": "tree-sitter-python",
".go": "tree-sitter-go",
".rs": "tree-sitter-rust", // Add new language
".rb": "tree-sitter-ruby", // Add new language
// ...
};| Language | Tree-sitter | Security Scanning | Embedding Support |
|---|---|---|---|
| JavaScript | β | β (XSS, injection) | β |
| TypeScript | β | β (XSS, injection) | β |
| Python | β | β (SQL injection) | β |
| Go | β | β (secrets) | β |
| Java | β | β (SQL injection) | β |
| C/C++ | β | β | |
| Rust | β |
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- New Language Support: Add Tree-sitter grammars for more languages
- Security Patterns: Expand vulnerability detection patterns
- LLM Providers: Add support for new AI providers (Anthropic, Cohere, etc.)
- Evaluation: Expand benchmark test cases
- UI Improvements: Enhance the dashboard interface
| Component | Memory | CPU | Storage |
|---|---|---|---|
| Main App | 512MB | 1 vCPU | 100MB |
| ChromaDB | 2GB | 2 vCPU | 50GB |
| Ollama (optional) | 8GB | 4 vCPU | 10GB |
- Repository Cache: Configured for 5GB max, auto-cleanup with LRU
- Embedding Cache: Content-hash based, saves 90% of API costs
- Rate Limiting: Built-in exponential backoff for API limits
- Horizontal Scaling: Stateless design supports multiple instances
Bot not responding to PRs:
# Check webhook delivery in GitHub App settings
# Verify SMEE_URL is correctly configured
curl -X POST $SMEE_URL -d '{"test": true}'ChromaDB connection errors:
# Restart ChromaDB container
docker-compose restart chromadb
# Check logs
docker-compose logs chromadbHigh OpenAI costs:
# Enable embedding cache
echo "ENABLE_EMBEDDING_CACHE=true" >> .env
# Or switch to Ollama for free inference
echo "LLM_PROVIDER=ollama" >> .envPoor review quality:
# Adjust confidence threshold
echo "CONFIDENCE_THRESHOLD=70" >> .env
# Check feedback metrics in dashboard
open http://localhost:3000/dashboard# Enable verbose logging
DEBUG=pr-reviewer* npm run dev
# View detailed request logs
DEBUG=pr-reviewer:api* npm run dev
# Monitor embedding generation
DEBUG=pr-reviewer:embeddings* npm run devThis project is licensed under the MIT License - see the LICENSE file for details.
- Tree-sitter for AST parsing
- ChromaDB for vector storage
- Probot for GitHub App framework
- Ollama for local LLM support
- Installation Guide
- API Documentation
- Contributing Guidelines
- Evaluation Results
- Architecture Deep Dive
Built with β€οΈ for the developer community
Star this project if it helped you catch bugs faster! β
