Vectorize GitHub tool documentation and provide MCP (Model Control Protocol) interface for AI Agents.
- 🔄 Auto-fetch from GitHub - Automatically crawls and extracts documentation from GitHub repositories
- 🧠 Vector Embeddings - Uses OpenAI embeddings to store documentation in vector database
- 🔍 Semantic Search - Find relevant documentation using natural language queries
- 🔌 MCP Protocol - Standard Model Control Protocol interface for AI Agents
- 🎨 Modern Web UI - Built with Next.js 15 + TailwindCSS
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐
│ GitHub Repo │ → │ Crawl Docs │ → │ Split Chunks│ → │ Embedding │
└─────────────┘ └──────────────┘ └──────────────┘ └────────────┘
↓
┌──────────────┐
│ Vector DB │ ← Query ┌──────────┐
│ (Upstash) │ → Result │ AI Agent │
└──────────────┘ └──────────┘
↑
┌───────────┐
│ MCP API │
└───────────┘
- Framework: Next.js 15 + TypeScript + TailwindCSS
- Vector Database: Upstash Vector (serverless, perfect for Cloudflare deployment)
- Embeddings: OpenAI text-embedding-3-small
- GitHub API: Octokit
- MCP: @modelcontextprotocol/sdk
Create a .env.local file:
# GitHub (optional but recommended for higher rate limits)
GITHUB_TOKEN=your_github_token
# OpenAI
OPENAI_API_KEY=your_openai_api_key
# Upstash Vector
UPSTASH_VECTOR_RESTAR_URL=your_upstash_vector_url
UPSTASH_VECTOR_RESTAR_TOKEN=your_upstash_vector_tokennpm installnpm run devOpen http://localhost:3000 in your browser.
npx tsx cli/index.ts index <owner> <repo> [branch]Example:
npx tsx cli/index.ts index openai openai-python mainnpx tsx cli/index.ts search "how to use embeddings"npx tsx cli/index.ts statsnpx tsx cli/index.ts clearnpx tsx cli/index.ts mcpAdd this configuration to your AI Agent that supports MCP:
{
"mcpServers": {
"docs-vector": {
"command": "node",
"args": [
"path/to/docs-vector-mcp/dist/cli/index.js",
"mcp"
],
"env": {
"OPENAI_API_KEY": "<your-openai-api-key>",
"UPSTASH_VECTOR_RESTAR_URL": "<your-upstash-url>",
"UPSTASH_VECTOR_RESTAR_TOKEN": "<your-upstash-token>"
}
}
}
}-
search_docs- Search documentation semantically- Parameters:
query(string): The search querylimit(number, optional): Maximum number of results (1-20, default 5)
- Parameters:
-
get_stats- Get statistics about stored documentation- No parameters
This project is optimized for Cloudflare Pages deployment:
- Push your code to GitHub
- Connect your repository to Cloudflare Pages
- Set build command:
npm install && npx next build - Set output directory:
.next - Add all environment variables in Cloudflare dashboard
- Deploy!
A sample workflow is included in .github/workflows/deploy.yml that automatically deploys to Cloudflare Pages on every push to main branch.
docs-vector-mcp/
├── app/ # Next.js app router
│ ├── api/ # API routes
│ │ ├── index/ # Indexing endpoint
│ │ ├── search/ # Search endpoint
│ │ └── stats/ # Stats endpoint
│ ├── globals.css # Global styles
│ ├── layout.tsx # Root layout
│ └── page.tsx # Home page
├── components/ # React components
│ ├── IndexForm.tsx # Repository indexing form
│ └── SearchForm.tsx # Search form
├── lib/ # Core libraries
│ ├── github.ts # GitHub fetcher
│ ├── text-processor.ts # Text chunking
│ ├── embedding.ts # Embedding generator
│ ├── vector-store.ts # Vector storage
│ ├── mcp-server.ts # MCP server
│ └── docs-service.ts # Service orchestrator
├── cli/ # CLI entry
│ └── index.ts # CLI main
├── .github/
│ └── workflows/ # GitHub Actions
├── next.config.ts # Next.js config
├── tailwind.config.ts # Tailwind config
└── package.json # Dependencies
- Add Repository: You input a GitHub repository that contains tool documentation
- Crawling: The system fetches all documentation files (.md, .mdx, .rst, .txt, etc.) from the repo
- Processing: Text is cleaned and split into overlapping chunks
- Embedding: OpenAI generates vector embeddings for each chunk
- Storage: Vectors are stored in Upstash Vector database
- Search: When an AI Agent asks a question, the query is embedded and similar documents are retrieved
- Response: Relevant documentation snippets are returned to the AI Agent for answering
MIT