DocCompass 🚀

DocCompass is a powerful, modular platform designed to ingest, index, and serve documentation for multiple frameworks. It leverages the Model Context Protocol (MCP) to provide intelligent, structured knowledge access for developers, teams, and AI agents.

Think of it as your "Personal Documentation Concierge"—automatically crawling complex documentation sites, parsing them into high-quality semantic sections, and making them instantly searchable via vector embeddings.

🌟 Key Features

Adaptive Documentation Ingestion: Automatically crawls and ingest documentation from any provided base URL with configurable depth using Crawl4AI.
Intelligent Hierarchical Parsing: Breaks down documentation into logical sections while maintaining parent-child relationships, ensuring context is preserved.
Semantic & Keyword Search: Optimized search using PGVector for semantic retrieval with keyword fallback.
Delta Sync & Deduplication: Smart ingestion that only updates changed sections, minimizing overhead and embedding costs.
Robust MCP Integration: Full compatibility with the Model Context Protocol, allowing IDEs like VS Code and Cursor to "read" documentation through your local gateway.
Operator Dashboard: A sleek, monospace UI to track ingestion jobs, browse indexed documentation, and manage resources.

🏗️ Architecture & Technology Stack

The gateway is built with a modern, high-performance stack:

Backend: FastAPI with FastMCP for the core service.
Database: PostgreSQL with PGVector for vector storage.
Task Queue: Celery + Redis for robust asynchronous ingestion pipelines.
ORM: SQLModel for type-safe database interactions.
Ingestion: Crawl4AI for high-fidelity web scraping.
Package Manager: uv for blazing-fast Python dependency management.

Data Flow

User triggers url -> Crawl4AI fetches pages -> Hierarchical Parser chunks content -> Provider (Bedrock/OpenAI) generates embeddings -> PGVector stores indices -> MCP Server serves content.

🚦 Getting Started

Prerequisites

Docker + Docker Compose
uv (optional, for local development)

Quick Start (Docker Compose)

Clone the repository
Setup Environment Variables:
```
cp .env.example .env
```
Edit .env to configure your embedding provider (AWS Bedrock or OpenAI).
Start the Stack:
```
USE_FRONTEND=true make up
```
This will start the Database, Redis, Migrations (one-shot), Backend, Celery Worker, and Frontend. If you only want to run the backend and use the CLI, simply run make up.

Verify Services

Service	URL	Description
Backend Health	`http://localhost:8000/health`	Service status & dependency check
Interactive Docs	`http://localhost:8000/docs`	Swagger UI for API exploration
MCP Endpoint	`http://localhost:8000/mcp`	The transport URL for MCP clients
Dashboard	`http://localhost:3000`	Management UI

🛠️ Configuration Guide (`.env`)

Variable	Default	Description
`MCP_SERVER_TOKEN`	`super-secret-token`	Bearer token for MCP authentication
`EMBEDDING_MODEL`	`bedrock:...`	Model for vectorization (Bedrock or OpenAI)
`EMBEDDING_TOKEN_LIMIT`	`8192`	Max tokens your embedding model accepts
`AWS_REGION`	`us-east-1`	AWS region for Bedrock (if used)
`POSTGRES_CONNECTION_STRING`	`postgresql+psycopg://...`	DB connection string

Tip

To use OpenAI, uncomment OPENAI_API_KEY in your .env and update EMBEDDING_MODEL to a valid OpenAI model string (e.g., openai:text-embedding-3-small).

🔌 MCP Integration

VS Code / AntiGravity

Add the following to your MCP settings (e.g., ~/.../mcp_settings.json or equivalent):

{
  "mcpServers": {
      "framework-documentations-mcp-server": {
        "serverUrl": "http://localhost:8000/mcp",
        "headers": {
        "Authorization": "Bearer super-secret-token"
      }
    }
  }
}

Note: The exact configuration depending on your client's transport support. The gateway supports HTTP transport.

💻 CLI Usage

DocCompass includes a standalone asynchronous CLI powered by Typer that connects to the backend and exposes core functionality directly in your terminal.

Installation

To install the CLI globally using uv:

make install-cli

(Ensure your ~/.local/bin is in your $PATH to use the doccompass command from anywhere).

Configuration

Set the backend URL (defaults to http://localhost:8000):

doccompass config --set-backend-url http://localhost:8000

Key Commands

Ingest Docs: doccompass ingestion run <url> [--max-depth 3]
List Jobs: doccompass ingestion list
Browse Docs: doccompass docs list
Tree View: doccompass docs tree <id>
Search Docs: doccompass docs search <id> "query"
Get Content: doccompass docs content <id> <path>

🤖 Agent Skills

DocCompass is designed to be agent-friendly. We provide a set of Agent Skills that allow AI coding assistants (like AntiGravity, Cursor, or VS Code Copilot) to intelligently interact with the gateway via the CLI.

These skills provide structured instructions and logic for agents to:

Discover available documentation sets.
Search semantically within those sets.
Retrieve specific markdown content for context.

Enabling Agent Skills

Point your agent to the following skill definitions within this repository:

List Available Docs: Guides the agent on how to find what documentation is currently indexed.
Search Documentation: Provides a step-by-step workflow for semantic search and content extraction.

By using these skills, your AI assistant can act as an expert on any framework you've ingested into DocCompass.

🖥️ Local Development

If you prefer to run services outside of Docker for development:

Backend

cd backend
uv sync
uv run alembic upgrade head
uv run python -m app.main

Frontend

cd frontend
npm install
npm start

Celery Worker

uv run celery -A app.tasks worker --loglevel=info

🧪 Testing

We use pytest for backend verification.

cd backend
uv run pytest

Implementation Roadmap/Todo List

CLI tool for DocCompass
Cronjobs for existing documentations, for periodic fetching and syncing.
Better user experience for the progress indicator. Currently, there's a weight assigned to each stage, which makes it difficult for the end user to predict the ETA.
Implement a tool to allow agents to get sections by URLs for easier backtracking based on links provided within certain sections.
Standalone CLI tool (synchronous) for low-resource environments.

For any suggestions, feel free to create Issues within the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.agent/skills		.agent/skills
backend		backend
cli		cli
db/init		db/init
docs/dev		docs/dev
frontend		frontend
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocCompass 🚀

🌟 Key Features

🏗️ Architecture & Technology Stack

Data Flow

🚦 Getting Started

Prerequisites

Quick Start (Docker Compose)

Verify Services

🛠️ Configuration Guide (`.env`)

🔌 MCP Integration

VS Code / AntiGravity

💻 CLI Usage

Installation

Configuration

Key Commands

🤖 Agent Skills

Enabling Agent Skills

🖥️ Local Development

Backend

Frontend

Celery Worker

🧪 Testing

Implementation Roadmap/Todo List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocCompass 🚀

🌟 Key Features

🏗️ Architecture & Technology Stack

Data Flow

🚦 Getting Started

Prerequisites

Quick Start (Docker Compose)

Verify Services

🛠️ Configuration Guide (.env)

🔌 MCP Integration

VS Code / AntiGravity

💻 CLI Usage

Installation

Configuration

Key Commands

🤖 Agent Skills

Enabling Agent Skills

🖥️ Local Development

Backend

Frontend

Celery Worker

🧪 Testing

Implementation Roadmap/Todo List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🛠️ Configuration Guide (`.env`)

Packages