Skip to content

MeteorGeminy/docs-vector-mcp

Repository files navigation

Docs Vector MCP

Vectorize GitHub tool documentation and provide MCP (Model Control Protocol) interface for AI Agents.

Features

  • 🔄 Auto-fetch from GitHub - Automatically crawls and extracts documentation from GitHub repositories
  • 🧠 Vector Embeddings - Uses OpenAI embeddings to store documentation in vector database
  • 🔍 Semantic Search - Find relevant documentation using natural language queries
  • 🔌 MCP Protocol - Standard Model Control Protocol interface for AI Agents
  • 🎨 Modern Web UI - Built with Next.js 15 + TailwindCSS

Architecture

┌─────────────┐    ┌──────────────┐    ┌──────────────┐    ┌────────────┐
│ GitHub Repo │ →  │  Crawl Docs  │ →  │ Split Chunks│ →  │  Embedding │
└─────────────┘    └──────────────┘    └──────────────┘    └────────────┘
                          ↓
                    ┌──────────────┐
                    │ Vector DB    │ ←  Query  ┌──────────┐
                    │  (Upstash)   │ →  Result │ AI Agent │
                    └──────────────┘           └──────────┘
                          ↑
                     ┌───────────┐
                     │  MCP API  │
                     └───────────┘

Tech Stack

  • Framework: Next.js 15 + TypeScript + TailwindCSS
  • Vector Database: Upstash Vector (serverless, perfect for Cloudflare deployment)
  • Embeddings: OpenAI text-embedding-3-small
  • GitHub API: Octokit
  • MCP: @modelcontextprotocol/sdk

Environment Variables

Create a .env.local file:

# GitHub (optional but recommended for higher rate limits)
GITHUB_TOKEN=your_github_token

# OpenAI
OPENAI_API_KEY=your_openai_api_key

# Upstash Vector
UPSTASH_VECTOR_RESTAR_URL=your_upstash_vector_url
UPSTASH_VECTOR_RESTAR_TOKEN=your_upstash_vector_token

Getting Started

Install dependencies

npm install

Run development server

npm run dev

Open http://localhost:3000 in your browser.

CLI Usage

Index a GitHub repository

npx tsx cli/index.ts index <owner> <repo> [branch]

Example:

npx tsx cli/index.ts index openai openai-python main

Search indexed documentation

npx tsx cli/index.ts search "how to use embeddings"

Show statistics

npx tsx cli/index.ts stats

Clear all indexed documents

npx tsx cli/index.ts clear

Start MCP server (for AI Agent connection)

npx tsx cli/index.ts mcp

MCP Integration

Add this configuration to your AI Agent that supports MCP:

{
  "mcpServers": {
    "docs-vector": {
      "command": "node",
      "args": [
        "path/to/docs-vector-mcp/dist/cli/index.js",
        "mcp"
      ],
      "env": {
        "OPENAI_API_KEY": "<your-openai-api-key>",
        "UPSTASH_VECTOR_RESTAR_URL": "<your-upstash-url>",
        "UPSTASH_VECTOR_RESTAR_TOKEN": "<your-upstash-token>"
      }
    }
  }
}

Available MCP Tools

  1. search_docs - Search documentation semantically

    • Parameters:
      • query (string): The search query
      • limit (number, optional): Maximum number of results (1-20, default 5)
  2. get_stats - Get statistics about stored documentation

    • No parameters

Deployment

Cloudflare Pages

This project is optimized for Cloudflare Pages deployment:

  1. Push your code to GitHub
  2. Connect your repository to Cloudflare Pages
  3. Set build command: npm install && npx next build
  4. Set output directory: .next
  5. Add all environment variables in Cloudflare dashboard
  6. Deploy!

CI/CD with GitHub Actions

A sample workflow is included in .github/workflows/deploy.yml that automatically deploys to Cloudflare Pages on every push to main branch.

Project Structure

docs-vector-mcp/
├── app/                    # Next.js app router
│   ├── api/               # API routes
│   │   ├── index/         # Indexing endpoint
│   │   ├── search/        # Search endpoint
│   │   └── stats/         # Stats endpoint
│   ├── globals.css        # Global styles
│   ├── layout.tsx         # Root layout
│   └── page.tsx           # Home page
├── components/            # React components
│   ├── IndexForm.tsx      # Repository indexing form
│   └── SearchForm.tsx     # Search form
├── lib/                   # Core libraries
│   ├── github.ts          # GitHub fetcher
│   ├── text-processor.ts  # Text chunking
│   ├── embedding.ts       # Embedding generator
│   ├── vector-store.ts    # Vector storage
│   ├── mcp-server.ts      # MCP server
│   └── docs-service.ts    # Service orchestrator
├── cli/                   # CLI entry
│   └── index.ts           # CLI main
├── .github/
│   └── workflows/         # GitHub Actions
├── next.config.ts         # Next.js config
├── tailwind.config.ts     # Tailwind config
└── package.json           # Dependencies

How It Works

  1. Add Repository: You input a GitHub repository that contains tool documentation
  2. Crawling: The system fetches all documentation files (.md, .mdx, .rst, .txt, etc.) from the repo
  3. Processing: Text is cleaned and split into overlapping chunks
  4. Embedding: OpenAI generates vector embeddings for each chunk
  5. Storage: Vectors are stored in Upstash Vector database
  6. Search: When an AI Agent asks a question, the query is embedded and similar documents are retrieved
  7. Response: Relevant documentation snippets are returned to the AI Agent for answering

License

MIT

About

docs-vector-mcp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors