DocCompass is a powerful, modular platform designed to ingest, index, and serve documentation for multiple frameworks. It leverages the Model Context Protocol (MCP) to provide intelligent, structured knowledge access for developers, teams, and AI agents.
Think of it as your "Personal Documentation Concierge"โautomatically crawling complex documentation sites, parsing them into high-quality semantic sections, and making them instantly searchable via vector embeddings.
- Adaptive Documentation Ingestion: Automatically crawls and ingest documentation from any provided base URL with configurable depth using Crawl4AI.
- Intelligent Hierarchical Parsing: Breaks down documentation into logical sections while maintaining parent-child relationships, ensuring context is preserved.
- Semantic & Keyword Search: Optimized search using PGVector for semantic retrieval with keyword fallback.
- Delta Sync & Deduplication: Smart ingestion that only updates changed sections, minimizing overhead and embedding costs.
- Robust MCP Integration: Full compatibility with the Model Context Protocol, allowing IDEs like VS Code and Cursor to "read" documentation through your local gateway.
- Operator Dashboard: A sleek, monospace UI to track ingestion jobs, browse indexed documentation, and manage resources.
The gateway is built with a modern, high-performance stack:
- Backend: FastAPI with FastMCP for the core service.
- Database: PostgreSQL with PGVector for vector storage.
- Task Queue: Celery + Redis for robust asynchronous ingestion pipelines.
- ORM: SQLModel for type-safe database interactions.
- Ingestion: Crawl4AI for high-fidelity web scraping.
- Package Manager: uv for blazing-fast Python dependency management.
User triggers url -> Crawl4AI fetches pages -> Hierarchical Parser chunks content -> Provider (Bedrock/OpenAI) generates embeddings -> PGVector stores indices -> MCP Server serves content.
- Docker + Docker Compose
uv(optional, for local development)
- Clone the repository
- Setup Environment Variables:
Edit
cp .env.example .env
.envto configure your embedding provider (AWS Bedrock or OpenAI). - Start the Stack:
This will start the Database, Redis, Migrations (one-shot), Backend, Celery Worker, and Frontend. If you only want to run the backend and use the CLI, simply run
USE_FRONTEND=true make up
make up.
| Service | URL | Description |
|---|---|---|
| Backend Health | http://localhost:8000/health |
Service status & dependency check |
| Interactive Docs | http://localhost:8000/docs |
Swagger UI for API exploration |
| MCP Endpoint | http://localhost:8000/mcp |
The transport URL for MCP clients |
| Dashboard | http://localhost:3000 |
Management UI |
| Variable | Default | Description |
|---|---|---|
MCP_SERVER_TOKEN |
super-secret-token |
Bearer token for MCP authentication |
EMBEDDING_MODEL |
bedrock:... |
Model for vectorization (Bedrock or OpenAI) |
EMBEDDING_TOKEN_LIMIT |
8192 |
Max tokens your embedding model accepts |
AWS_REGION |
us-east-1 |
AWS region for Bedrock (if used) |
POSTGRES_CONNECTION_STRING |
postgresql+psycopg://... |
DB connection string |
Tip
To use OpenAI, uncomment OPENAI_API_KEY in your .env and update EMBEDDING_MODEL to a valid OpenAI model string (e.g., openai:text-embedding-3-small).
Add the following to your MCP settings (e.g., ~/.../mcp_settings.json or equivalent):
{
"mcpServers": {
"framework-documentations-mcp-server": {
"serverUrl": "http://localhost:8000/mcp",
"headers": {
"Authorization": "Bearer super-secret-token"
}
}
}
}Note: The exact configuration depending on your client's transport support. The gateway supports HTTP transport.
DocCompass includes a standalone asynchronous CLI powered by Typer that connects to the backend and exposes core functionality directly in your terminal.
To install the CLI globally using uv:
make install-cli(Ensure your ~/.local/bin is in your $PATH to use the doccompass command from anywhere).
Set the backend URL (defaults to http://localhost:8000):
doccompass config --set-backend-url http://localhost:8000- Ingest Docs:
doccompass ingestion run <url> [--max-depth 3] - List Jobs:
doccompass ingestion list - Browse Docs:
doccompass docs list - Tree View:
doccompass docs tree <id> - Search Docs:
doccompass docs search <id> "query" - Get Content:
doccompass docs content <id> <path>
DocCompass is designed to be agent-friendly. We provide a set of Agent Skills that allow AI coding assistants (like AntiGravity, Cursor, or VS Code Copilot) to intelligently interact with the gateway via the CLI.
These skills provide structured instructions and logic for agents to:
- Discover available documentation sets.
- Search semantically within those sets.
- Retrieve specific markdown content for context.
Point your agent to the following skill definitions within this repository:
- List Available Docs: Guides the agent on how to find what documentation is currently indexed.
- Search Documentation: Provides a step-by-step workflow for semantic search and content extraction.
By using these skills, your AI assistant can act as an expert on any framework you've ingested into DocCompass.
If you prefer to run services outside of Docker for development:
cd backenduv syncuv run alembic upgrade headuv run python -m app.main
cd frontendnpm installnpm start
uv run celery -A app.tasks worker --loglevel=info
We use pytest for backend verification.
cd backend
uv run pytest- CLI tool for DocCompass
- Cronjobs for existing documentations, for periodic fetching and syncing.
- Better user experience for the progress indicator. Currently, there's a weight assigned to each stage, which makes it difficult for the end user to predict the ETA.
- Implement a tool to allow agents to get sections by URLs for easier backtracking based on links provided within certain sections.
- Standalone CLI tool (synchronous) for low-resource environments.
For any suggestions, feel free to create Issues within the repository!