Turn any codebase into a searchable knowledge base for AI coding assistants.
Most RAG frameworks require heavy infrastructure — vector databases, LangChain, cloud APIs. mcp-rag takes a different approach: a single Python pipeline that chunks your code, embeds it locally via Ollama, stores everything in SQLite, and serves hybrid semantic + keyword search over the Model Context Protocol (MCP).
One config file. No cloud dependencies. Works with any MCP client.
Your Code ──→ Chunkers ──→ Ollama Embeddings ──→ SQLite + FTS5 ──→ MCP Server ──→ Claude Code
(AST-aware) (local, private) (vector + text) (search + lookup tools)
| mcp-rag | LangChain / LlamaIndex | Cloud RAG (Pinecone, etc.) | |
|---|---|---|---|
| Dependencies | 3 Python packages + Ollama | Dozens of packages, complex chains | Cloud account, API keys, billing |
| Infrastructure | SQLite (zero config) | Requires external vector DB | Managed service |
| Privacy | 100% local, nothing leaves your machine | Depends on provider choices | Data sent to cloud |
| MCP native | Built as an MCP server from the ground up | Requires adapter/wrapper | Requires adapter/wrapper |
| Config model | One JSON file drives everything | Code changes per project | Dashboard + code changes |
# Clone and install
git clone https://github.com/JMRussas/mcp-rag.git
cd mcp-rag
pip install -r requirements.txt
# Pull the embedding model (one time)
ollama pull nomic-embed-text
# Configure — point at your codebase
cp config.example.json config.json
# Edit config.json: set repo paths, source tags, chunker types
# Build the index
python pipeline.py rebuild
# Register with Claude Code
claude mcp add my-project-rag -s user -- python /path/to/mcp-rag/server.pyOr use Docker (set ollama.host to http://ollama:11434 in your config.json):
cp config.example.json config.json
# Edit config.json: set repo paths and ollama.host to http://ollama:11434
docker compose up --buildSource Files Pipeline MCP Server
============ ====================== ==============================
.py files \ ┌─ search_<project> (semantic)
.cs files ├─→ Chunkers ──→ JSONL ──→ │ Ollama embed query
.md files / (pluggable) │ ──→ cosine similarity over numpy matrix
any code │ ──→ source/module filtering
│
└─ lookup_<project> (keyword)
exact type_name match
──→ case-insensitive LIKE
──→ FTS5 full-text search
│
▼
Claude Code / any MCP client
Pipeline: Parses source files with language-aware chunkers, generates embeddings via Ollama, and stores vectors + metadata in SQLite with FTS5 full-text indexing.
Server: Loads all embeddings into a pre-normalized numpy matrix at startup. Semantic search computes cosine similarity via dot product. Keyword lookup uses a three-tier fallback (exact match, partial match, FTS5). Tool names, descriptions, and server identity are entirely config-driven.
- Config-driven tool identity. MCP tool names and descriptions come from
config.json, not code. The same server binary serves any project without modification. - Hybrid search. Semantic similarity for "find code that does X" + three-tier keyword fallback for "show me class Y". Both available as separate MCP tools.
- Pluggable chunker registry. Language-specific parsers (Python AST, C# brace-depth tracking) self-register at import time. Adding a new language is one file.
- Atomic rebuilds. Pipeline writes to a temp DB then swaps, so the MCP server stays running during re-indexing.
- Zero-copy embeddings. Stored as binary blobs in SQLite, loaded once into a numpy array. No JSON serialization overhead at query time.
| Type | Language | Strategy |
|---|---|---|
python |
Python (.py) | AST-based: one chunk per top-level class/function, preserves imports as context |
csharp |
C# (.cs) | Brace-depth tracking: one chunk per type definition, extracts namespace + doc headers |
digest |
Module definitions (.digest) | Two-pass depth-aware module path tracking for nested hierarchies |
markdown |
Markdown (.md) | Split on headings (h1-h3), strips YAML frontmatter |
code |
Any text file | One chunk per file, binary detection, category from directory structure |
# chunkers/rust_chunker.py
from pathlib import Path
from chunkers import register_chunker
from chunkers.base import BaseChunker
class RustChunker(BaseChunker):
def chunk_directory(self, source_dir: Path, repo_config: dict) -> list[dict]:
source_tag = repo_config.get("source_tag", "rust")
chunks = []
for rs_file in sorted(source_dir.rglob("*.rs")):
chunks.append({
"id": f"rust:{source_tag}:{rs_file.stem}",
"text": rs_file.read_text(),
"source": source_tag,
"module_path": "",
"type_name": "",
"category": "",
"heading": "",
"file_path": str(rs_file.relative_to(source_dir)),
})
return chunks
register_chunker("rust", RustChunker)Add the import to chunkers/__init__.py and use "type": "rust" in your config.
All tool names, descriptions, and server identity are config-driven:
See examples/ for complete Python, C#, and docs-only configurations.
python pipeline.py clone # Clone/update git-based repos
python pipeline.py chunk # Run chunkers → data/chunks.jsonl
python pipeline.py embed # Generate embeddings → data/rag.db
python pipeline.py rebuild # All three steps in sequence
python pipeline.py stats # Print database statisticsmcp-rag/
├── server.py MCP server — semantic search + keyword lookup
├── pipeline.py CLI pipeline — chunk, embed, rebuild, stats
├── config.example.json Configuration template
├── chunkers/
│ ├── __init__.py Chunker registry (register/lookup by name)
│ ├── base.py Abstract base class + shared utilities
│ ├── python_chunker.py Python AST chunker
│ ├── csharp.py C# brace-depth chunker
│ ├── digest.py Nested module definition parser
│ ├── markdown.py Heading-based markdown splitter
│ └── code.py Generic whole-file chunker
├── tests/ Pytest suite (61 tests)
├── examples/ Ready-to-use configs (Python, C#, docs)
├── Dockerfile Container build
├── docker-compose.yml One-command setup with Ollama
└── .github/workflows/ CI (lint + test)
Built as part of a local AI development infrastructure, extracted and open-sourced as a standalone tool. Development uses a structured review process — each commit addresses specific findings from code review passes (SQL injection safety, transaction correctness, logging hygiene). 61 tests with CI running lint (ruff) and pytest on every push.
See commit history for the review-driven development trail.
- All embeddings loaded into memory at startup — practical up to ~50k chunks (~150 MB)
- Full rebuild each time (no incremental re-indexing)
- Chunkers use AST/regex parsing, not full language servers
- Single Ollama instance for embedding
Python 3.11+ · FastMCP · SQLite + FTS5 · Ollama · nomic-embed-text · numpy
MIT
{ "mcp": { "server_name": "my-project-rag", "search_tool": { "name": "search_myproject", // tool name (unique across MCP servers) "description": "Search my project." // shown to the AI as tool description }, "lookup_tool": { "name": "lookup_myproject", "description": "Look up a type by name." } }, "ollama": { "host": "http://localhost:11434", "embed_model": "nomic-embed-text", // 768-dimensional embeddings "embed_timeout": 30.0 }, "database": { "path": "data/rag.db" }, "search": { "default_top_k": 8, "max_top_k": 20, "embed_dimensions": 768 }, "sources": { "repos_dir": "data/repos", "chunks_path": "data/chunks.jsonl" }, "repos": [ { "name": "src", "path": "/path/to/source", // local path "type": "python", // chunker type "source_tag": "src" // for filtering results }, { "name": "wiki", "url": "https://github.com/org/wiki.git", // or a git URL "local_dir": "wiki", "type": "markdown", "source_tag": "wiki" } ] }