Skip to content

dingavinga1/doccompass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DocCompass ๐Ÿš€

DocCompass is a powerful, modular platform designed to ingest, index, and serve documentation for multiple frameworks. It leverages the Model Context Protocol (MCP) to provide intelligent, structured knowledge access for developers, teams, and AI agents.

Think of it as your "Personal Documentation Concierge"โ€”automatically crawling complex documentation sites, parsing them into high-quality semantic sections, and making them instantly searchable via vector embeddings.


๐ŸŒŸ Key Features

  • Adaptive Documentation Ingestion: Automatically crawls and ingest documentation from any provided base URL with configurable depth using Crawl4AI.
  • Intelligent Hierarchical Parsing: Breaks down documentation into logical sections while maintaining parent-child relationships, ensuring context is preserved.
  • Semantic & Keyword Search: Optimized search using PGVector for semantic retrieval with keyword fallback.
  • Delta Sync & Deduplication: Smart ingestion that only updates changed sections, minimizing overhead and embedding costs.
  • Robust MCP Integration: Full compatibility with the Model Context Protocol, allowing IDEs like VS Code and Cursor to "read" documentation through your local gateway.
  • Operator Dashboard: A sleek, monospace UI to track ingestion jobs, browse indexed documentation, and manage resources.

๐Ÿ—๏ธ Architecture & Technology Stack

The gateway is built with a modern, high-performance stack:

  • Backend: FastAPI with FastMCP for the core service.
  • Database: PostgreSQL with PGVector for vector storage.
  • Task Queue: Celery + Redis for robust asynchronous ingestion pipelines.
  • ORM: SQLModel for type-safe database interactions.
  • Ingestion: Crawl4AI for high-fidelity web scraping.
  • Package Manager: uv for blazing-fast Python dependency management.

Data Flow

User triggers url -> Crawl4AI fetches pages -> Hierarchical Parser chunks content -> Provider (Bedrock/OpenAI) generates embeddings -> PGVector stores indices -> MCP Server serves content.


๐Ÿšฆ Getting Started

Prerequisites

Quick Start (Docker Compose)

  1. Clone the repository
  2. Setup Environment Variables:
    cp .env.example .env
    Edit .env to configure your embedding provider (AWS Bedrock or OpenAI).
  3. Start the Stack:
    USE_FRONTEND=true make up
    This will start the Database, Redis, Migrations (one-shot), Backend, Celery Worker, and Frontend. If you only want to run the backend and use the CLI, simply run make up.

Verify Services

Service URL Description
Backend Health http://localhost:8000/health Service status & dependency check
Interactive Docs http://localhost:8000/docs Swagger UI for API exploration
MCP Endpoint http://localhost:8000/mcp The transport URL for MCP clients
Dashboard http://localhost:3000 Management UI

๐Ÿ› ๏ธ Configuration Guide (.env)

Variable Default Description
MCP_SERVER_TOKEN super-secret-token Bearer token for MCP authentication
EMBEDDING_MODEL bedrock:... Model for vectorization (Bedrock or OpenAI)
EMBEDDING_TOKEN_LIMIT 8192 Max tokens your embedding model accepts
AWS_REGION us-east-1 AWS region for Bedrock (if used)
POSTGRES_CONNECTION_STRING postgresql+psycopg://... DB connection string

Tip

To use OpenAI, uncomment OPENAI_API_KEY in your .env and update EMBEDDING_MODEL to a valid OpenAI model string (e.g., openai:text-embedding-3-small).


๐Ÿ”Œ MCP Integration

VS Code / AntiGravity

Add the following to your MCP settings (e.g., ~/.../mcp_settings.json or equivalent):

{
  "mcpServers": {
      "framework-documentations-mcp-server": {
        "serverUrl": "http://localhost:8000/mcp",
        "headers": {
        "Authorization": "Bearer super-secret-token"
      }
    }
  }
}

Note: The exact configuration depending on your client's transport support. The gateway supports HTTP transport.


๐Ÿ’ป CLI Usage

DocCompass includes a standalone asynchronous CLI powered by Typer that connects to the backend and exposes core functionality directly in your terminal.

Installation

To install the CLI globally using uv:

make install-cli

(Ensure your ~/.local/bin is in your $PATH to use the doccompass command from anywhere).

Configuration

Set the backend URL (defaults to http://localhost:8000):

doccompass config --set-backend-url http://localhost:8000

Key Commands

  • Ingest Docs: doccompass ingestion run <url> [--max-depth 3]
  • List Jobs: doccompass ingestion list
  • Browse Docs: doccompass docs list
  • Tree View: doccompass docs tree <id>
  • Search Docs: doccompass docs search <id> "query"
  • Get Content: doccompass docs content <id> <path>

๐Ÿค– Agent Skills

DocCompass is designed to be agent-friendly. We provide a set of Agent Skills that allow AI coding assistants (like AntiGravity, Cursor, or VS Code Copilot) to intelligently interact with the gateway via the CLI.

These skills provide structured instructions and logic for agents to:

  1. Discover available documentation sets.
  2. Search semantically within those sets.
  3. Retrieve specific markdown content for context.

Enabling Agent Skills

Point your agent to the following skill definitions within this repository:

By using these skills, your AI assistant can act as an expert on any framework you've ingested into DocCompass.


๐Ÿ–ฅ๏ธ Local Development

If you prefer to run services outside of Docker for development:

Backend

  1. cd backend
  2. uv sync
  3. uv run alembic upgrade head
  4. uv run python -m app.main

Frontend

  1. cd frontend
  2. npm install
  3. npm start

Celery Worker

  1. uv run celery -A app.tasks worker --loglevel=info

๐Ÿงช Testing

We use pytest for backend verification.

cd backend
uv run pytest

Implementation Roadmap/Todo List

  • CLI tool for DocCompass
  • Cronjobs for existing documentations, for periodic fetching and syncing.
  • Better user experience for the progress indicator. Currently, there's a weight assigned to each stage, which makes it difficult for the end user to predict the ETA.
  • Implement a tool to allow agents to get sections by URLs for easier backtracking based on links provided within certain sections.
  • Standalone CLI tool (synchronous) for low-resource environments.

For any suggestions, feel free to create Issues within the repository!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors