Local-first RAG MCP Server for Verse / UEFN Documentation
Fully air-gapped · Ollama embeddings · pgvector · Model Context Protocol
verse-rag is a production-ready Model Context Protocol (MCP) server that gives AI coding assistants the ability to crawl, index, and semantically search documentation — entirely on local infrastructure.
Originally forked from coleam00/mcp-crawl4ai-rag, this fork is a ground-up rewrite of the storage and embedding layers:
| Upstream | This fork |
|---|---|
| OpenAI embeddings | Ollama (qwen3-embedding:8b, 4096 dims) |
| Supabase cloud | Self-hosted pgvector + PostgREST |
supabase-py client |
Direct httpx calls to PostgREST |
| Blocking embed + store | Fire-and-forget background tasks |
The result is a stack that runs entirely offline with no external API calls, no cloud dependencies, and no API costs.
Claude Code / AI Client
│ SSE (port 8051)
▼
┌─────────────┐
│ MCP Server │ crawl4ai_mcp.py — FastMCP + Crawl4AI
└──────┬──────┘
│ httpx
▼
┌──────────────┐ ┌─────────────────┐
│ PostgREST │──────▶│ PostgreSQL 16 │
│ (REST API) │ │ + pgvector │
└──────────────┘ └─────────────────┘
│
│ /api/embed
▼
┌─────────────┐
│ Ollama │ qwen3-embedding:8b (4096 dims)
└─────────────┘
Services (Docker Compose):
| Container | Image | Role |
|---|---|---|
verse-rag-db |
pgvector/pgvector:pg16 |
Vector store |
verse-rag-postgrest |
postgrest/postgrest:v12.2.0 |
REST API over PostgreSQL |
verse-rag-mcp |
verse-rag:latest (local build) |
MCP server |
Ollama runs as a separate service on the host (or in another container) — this stack connects to it over the ollama Docker network.
| Tool | Description |
|---|---|
crawl_single_page |
Crawl one URL and queue it for indexing |
smart_crawl_url |
Recursively crawl a site, following internal links |
crawl_verse_docs |
Shortcut to crawl the official Verse/UEFN documentation |
perform_rag_query |
Semantic (or hybrid) search with optional source filtering |
get_available_sources |
List all indexed sources in the database |
All crawl tools return immediately — embedding and storage run in background tasks so the MCP call never blocks waiting for Ollama.
- Docker and Docker Compose
- Ollama running with
qwen3-embedding:8bpulled:ollama pull qwen3-embedding:8b
- Three external Docker networks:
mcps,nginx,ollamadocker network create mcps docker network create nginx docker network create ollama
git clone https://github.com/berry-13/verse-rag.git
cd verse-ragdocker build -f docker/Dockerfile -t verse-rag:latest .docker compose -f docker/compose.yml up -dThis starts:
- PostgreSQL with pgvector (
verse-rag-db) - PostgREST auto-REST layer (
verse-rag-postgrest) - The MCP server on port
8051(verse-rag-mcp)
Claude Code:
claude mcp add-json verse-rag '{"type":"sse","url":"http://localhost:8051/sse"}' --scope userAny SSE-compatible client:
{
"mcpServers": {
"verse-rag": {
"transport": "sse",
"url": "http://localhost:8051/sse"
}
}
}All configuration is via environment variables. The defaults in docker/compose.yml are ready to use for a standard local setup.
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama endpoint |
EMBEDDING_MODEL |
qwen3-embedding:8b |
Ollama embedding model |
SUPABASE_URL |
http://postgrest:3000 |
PostgREST endpoint |
SUPABASE_SERVICE_KEY |
(JWT in compose.yml) | PostgREST JWT token |
USE_HYBRID_SEARCH |
true |
Combine vector + full-text search |
USE_RERANKING |
true |
Cross-encoder reranking of results |
HOST |
0.0.0.0 |
MCP server bind address |
PORT |
8051 |
MCP server port |
TRANSPORT |
sse |
MCP transport (sse or stdio) |
MAX_CRAWL_DEPTH |
3 |
Maximum recursive crawl depth |
MAX_CONCURRENT_CRAWLS |
5 |
Parallel crawl workers |
Combines pgvector cosine similarity with PostgreSQL full-text search (tsvector). Results are scored as a weighted combination (0.7 vector + 0.3 text) using a FULL OUTER JOIN in the hybrid_search_crawled_pages SQL function.
Best for technical documentation where exact term matches (function names, keywords) matter alongside semantic similarity.
After initial retrieval, applies a cross-encoder model (cross-encoder/ms-marco-MiniLM-L-6-v2) to re-score and reorder results against the original query. Runs locally on CPU with no API cost. Adds ~100–200 ms to query latency in exchange for meaningfully better result ordering.
The init.sql creates:
crawled_pages— main table withvector(4096)embeddings (matched toqwen3-embedding:8b), source labelling, and aUNIQUE(url, chunk_number)constraint for idempotent upsertsmatch_crawled_pages()— vector similarity search functionhybrid_search_crawled_pages()— combined vector + full-text search function- GIN index on
contentfor full-text search - B-tree index on
source_idfor source filtering
Note:
ivfflat/hnswindexes are limited to 2000 dimensions. At 4096 dims, queries use a sequential scan — acceptable for documentation-scale datasets.
- Batch embedding: all chunks from a page are embedded in a single
/api/embedcall to Ollama - Batch upsert: all records are written in a single
POSTto PostgREST withPrefer: resolution=merge-duplicates - Semaphore: embedding requests are serialized through
asyncio.Semaphore(1)since Ollama processes one embedding job at a time - Fire-and-forget: crawl tools return immediately; background
asyncio.Taskhandles embedding + storage. Monitor progress with:docker logs verse-rag-mcp -f
Only the last Docker layer (COPY src/) is invalidated on source changes, so rebuilds are fast:
docker build -f docker/Dockerfile -t verse-rag:latest . && \
docker compose -f docker/compose.yml up -d mcpverse-rag/
├── src/
│ ├── crawl4ai_mcp.py # MCP server, tool definitions, lifespan
│ └── utils.py # Embeddings, chunking, PostgREST client
├── docker/
│ ├── Dockerfile # Build definition
│ ├── compose.yml # Full stack definition
│ └── init.sql # PostgreSQL schema + functions
├── knowledge_graphs/ # Optional Neo4j hallucination detection
└── pyproject.toml
CUDA unavailable / Ollama SIGABRT
The GPU may be in Exclusive_Process mode. Reset it:
sudo nvidia-smi -c 0This resets on reboot and must be re-applied after driver reloads.
MCP tools hang after container restart
Claude Code caches SSE session IDs. After restarting the MCP container, restart Claude Code to clear the stale session before using any tools.
Embeddings failing silently
Check background task output:
docker logs verse-rag-mcp -f | grep -E "✅|❌"MIT