CodeRAG - Code Q&A with Verifiable Citations

RAG-based Q&A system for code repositories that provides grounded answers with verifiable citations.

🚀 Quick Start

# Install
pip install code-rag-me

# Configure (get free API key from https://console.groq.com/keys)
coderag setup

# Start web interface
coderag serve

Open http://localhost:8000 to use the web interface.

Claude Desktop Integration (MCP)

# Auto-configure Claude Desktop
coderag mcp-install

# Restart Claude Desktop

Now you can use CodeRAG directly in Claude Desktop!

✨ Features

Grounded Responses: Every answer includes citations to source code [file:start-end]
Cloud or Local LLM: Use Groq (free), OpenAI, Anthropic, or run locally with GPU
GitHub Integration: Index any public GitHub repository
MCP Support: Integrate directly with Claude Desktop
Semantic Chunking: Tree-sitter for Python, text fallback for other languages
Web Interface: Gradio UI for easy interaction
REST API: Programmatic access for integration
CLI: Full command-line interface

📋 CLI Commands

coderag setup              # Configure LLM provider and API key
coderag serve              # Start web server
coderag mcp-install        # Configure Claude Desktop for MCP
coderag mcp-run            # Run MCP server (used by Claude Desktop)
coderag index <url>        # Index a GitHub repository
coderag query <repo> "?"   # Ask a question about code
coderag repos              # List indexed repositories
coderag doctor             # Diagnose setup issues

🔧 Installation

Linux

Arch Linux / Manjaro

Arch Linux uses PEP 668 to protect system Python. Use one of these methods:

Option A: pipx (Recommended for CLI tools)

sudo pacman -S python-pipx
pipx install code-rag-me

Option B: Virtual environment

python -m venv ~/.local/share/coderag-venv
source ~/.local/share/coderag-venv/bin/activate
pip install code-rag-me

To always have coderag available, add to your ~/.bashrc or ~/.zshrc:

alias coderag="~/.local/share/coderag-venv/bin/coderag"

Debian / Ubuntu

# Install Python and pip
sudo apt update
sudo apt install python3 python3-pip python3-venv

# Option A: pipx (Recommended)
sudo apt install pipx
pipx install code-rag-me

# Option B: Virtual environment
python3 -m venv ~/.local/share/coderag-venv
source ~/.local/share/coderag-venv/bin/activate
pip install code-rag-me

Fedora / RHEL / CentOS

# Install Python and pip
sudo dnf install python3 python3-pip

# Option A: pipx (Recommended)
sudo dnf install pipx
pipx install code-rag-me

# Option B: Virtual environment
python3 -m venv ~/.local/share/coderag-venv
source ~/.local/share/coderag-venv/bin/activate
pip install code-rag-me

Other Linux Distributions

# Create virtual environment
python3 -m venv ~/.local/share/coderag-venv
source ~/.local/share/coderag-venv/bin/activate
pip install code-rag-me

macOS

Option A: pipx (Recommended)

# Install pipx via Homebrew
brew install pipx
pipx ensurepath
pipx install code-rag-me

Option B: Virtual environment

python3 -m venv ~/.local/share/coderag-venv
source ~/.local/share/coderag-venv/bin/activate
pip install code-rag-me

Option C: Homebrew Python

brew install python@3.11
pip3 install code-rag-me

Windows

Option A: pipx (Recommended)

# Install pipx
pip install pipx
pipx ensurepath

# Install CodeRAG
pipx install code-rag-me

Option B: Virtual environment

# Create virtual environment
python -m venv %USERPROFILE%\coderag-venv

# Activate (Command Prompt)
%USERPROFILE%\coderag-venv\Scripts\activate.bat

# Activate (PowerShell)
& $env:USERPROFILE\coderag-venv\Scripts\Activate.ps1

# Install
pip install code-rag-me

Option C: Direct install (not recommended)

pip install code-rag-me

Note: On Windows, you may need to run PowerShell as Administrator or enable script execution with Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

From Source

git clone https://github.com/Sebastiangmz/CodeRAG.git
cd CodeRAG
pip install -e .
coderag setup

Docker

git clone https://github.com/Sebastiangmz/CodeRAG.git
cd CodeRAG
docker compose up

Post-Installation

After installing, configure your LLM provider:

coderag setup

This will prompt you to:

Choose an LLM provider (Groq recommended - free tier available)
Enter your API key (get one at https://console.groq.com/keys)
Configure optional settings

📖 Usage Examples

Web Interface

Run coderag serve
Open http://localhost:8000
Go to "Index Repository" → Enter GitHub URL → Click "Index"
Go to "Ask Questions" → Select repo → Ask questions

Command Line

# Index a repository
coderag index https://github.com/owner/repo

# Ask questions
coderag query abc12345 "How does authentication work?"

# List repositories
coderag repos

REST API

# Index repository
curl -X POST http://localhost:8000/api/v1/repos/index \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com/owner/repo"}'

# Query
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"question": "How does X work?", "repo_id": "abc12345"}'

Claude Desktop (MCP)

After running coderag mcp-install and restarting Claude Desktop:

You: Use coderag to index https://github.com/owner/repo

Claude: I'll index that repository for you...
        ✅ Indexed! 150 files, 1,234 chunks.

You: How does the authentication system work?

Claude: Based on the code, authentication is handled in...
        [src/auth/handler.py:45-78]

⚙️ Configuration

Environment Variables

# LLM Provider (groq, openai, anthropic, openrouter, together, local)
MODEL_LLM_PROVIDER=groq
MODEL_LLM_API_KEY=your-api-key

# Embeddings (runs locally on CPU by default)
MODEL_EMBEDDING_DEVICE=auto  # auto, cuda, or cpu

# Server
SERVER_HOST=0.0.0.0
SERVER_PORT=8000

Config File

Configuration is stored in ~/.config/coderag/config.json after running coderag setup.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                         User Interface                       │
│              (Gradio UI / REST API / MCP / CLI)             │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────┴──────────────────────────────────────┐
│                     Ingestion Pipeline                        │
│  GitHub Clone → File Filter → Chunker (Tree-sitter/Text)    │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────┴──────────────────────────────────────┐
│                   Indexing & Storage                          │
│      Embeddings (nomic-embed) → ChromaDB (Cosine)           │
└──────────────────────┬──────────────────────────────────────┘
                       │
┌──────────────────────┴──────────────────────────────────────┐
│                    Retrieval & Generation                     │
│   Query → Top-K Search → LLM (Cloud/Local) → Response       │
└──────────────────────────────────────────────────────────────┘

📁 Project Structure

src/coderag/
├── cli.py          # Unified CLI
├── ingestion/      # Repository loading and chunking
├── indexing/       # Embeddings and vector storage
├── retrieval/      # Semantic search
├── generation/     # LLM inference and citations
├── mcp/            # Model Context Protocol server
├── ui/             # Gradio web interface
├── api/            # REST API endpoints
└── models/         # Data models

🧪 Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

# Type check
mypy src/

📊 Performance

Indexing: ~1000 files in < 5 minutes
Query: Response in < 10 seconds
Embeddings: Runs on CPU (~275MB model)
LLM: Cloud (instant) or Local (requires 8GB+ VRAM)

📝 Citation Format

All responses include citations:

[file_path:start_line-end_line]

Example:

The authentication logic is in the login() function [src/auth.py:45-78].

🐛 Troubleshooting

Run diagnostics:

coderag doctor

Common Issues

`externally-managed-environment` error (Linux)

error: externally-managed-environment
× This environment is externally managed

This happens on modern Linux distributions (Arch, Fedora 38+, Ubuntu 23.04+) that implement PEP 668. Solution: use pipx or a virtual environment. See the Installation section for your distribution.

No API key configured

coderag setup  # Run interactive setup

CUDA / GPU errors

If you don't have a GPU or encounter CUDA errors:

export MODEL_EMBEDDING_DEVICE=cpu
coderag serve

Or add to your .env file:

MODEL_EMBEDDING_DEVICE=cpu

Claude Desktop not detecting MCP

Run coderag mcp-install
Completely quit Claude Desktop (not just close the window)
Restart Claude Desktop
Check the MCP icon in Claude Desktop settings

Permission denied on Linux/macOS

# If using pipx
pipx ensurepath
source ~/.bashrc  # or ~/.zshrc

# If using venv, make sure it's activated
source ~/.local/share/coderag-venv/bin/activate

PowerShell execution policy (Windows)

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

📄 License

MIT License - see LICENSE file

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

🙏 Acknowledgments

Groq for fast, free LLM inference
nomic-embed-text by Nomic AI
ChromaDB for vector storage
Tree-sitter for code parsing
MCP by Anthropic

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
configs		configs
eval_datasets		eval_datasets
scripts		scripts
src/coderag		src/coderag
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

Sebastiangmz/CodeRAG

Folders and files

Latest commit

History

Repository files navigation