Skip to content

adgregory/ai-mcp

Repository files navigation

AI Knowledge MCP

An MCP (Model Context Protocol) server that provides RAG-powered access to AI/ML framework documentation and source code. Use it with Claude Code, Claude Desktop, or any MCP-compatible client to search across curated AI engineering knowledge.

Features

  • Semantic search across documentation and code from popular AI/ML frameworks
  • Domain-aware chunking for Markdown, Python, and YAML files
  • Pre-indexed repositories: Axolotl, LangGraph, CrewAI, Agent Lightning
  • Vector search powered by LanceDB with Gemini embeddings

Indexed Domains

Domain Repository Description
axolotl axolotl-ai-cloud/axolotl Fine-tuning framework for LLMs
langgraph langchain-ai/langgraph Agentic workflow framework
crewai crewAIInc/crewAI Multi-agent orchestration
agent-lightning microsoft/agent-lightning Microsoft's agent framework

Installation

Prerequisites

  • Python 3.13+
  • uv package manager
  • Google Cloud credentials (for Gemini embeddings)
  • Task (optional, for task runner)

Setup

# Clone the repository
git clone https://github.com/your-username/ai-mcp.git
cd ai-mcp

# Install dependencies
uv sync

# Index the knowledge base (requires Google Cloud auth)
task index

Usage

MCP Tools

The server exposes three tools:

search_knowledge

Search the knowledge base using natural language queries.

search_knowledge(
    query: str,              # Natural language search query
    domain: str | None,      # Filter by domain (optional)
    limit: int = 10          # Max results to return
)

list_domains

List available domains and their statistics.

list_domains()
# Returns: {"domains": [...], "stats": {...}, "total_chunks": int}

get_source

Retrieve a specific chunk by ID.

get_source(source_id: str)
# Returns: Full chunk data including content and metadata

Claude Code Integration

Add to your ~/.claude.json:

{
  "mcpServers": {
    "ai-knowledge": {
      "command": "/path/to/ai-mcp/scripts/mcp-wrapper.sh",
      "args": []
    }
  }
}

Or using uv directly (may require VS Code terminal for PATH):

{
  "mcpServers": {
    "ai-knowledge": {
      "command": "uv",
      "args": ["run", "ai-knowledge-mcp"],
      "cwd": "/path/to/ai-mcp"
    }
  }
}

Development Server

# Run the MCP server locally
task dev

# Or directly
uv run ai-knowledge-mcp

Architecture

src/ai_knowledge_mcp/
├── server.py           # FastMCP server entry point
├── config.py           # Configuration and domain definitions
├── chunking/           # Content chunkers
│   ├── markdown.py     # Markdown/MDX/QMD chunker
│   ├── code.py         # Python AST-aware chunker
│   └── yaml_chunker.py # YAML config chunker
├── ingestion/
│   ├── github.py       # GitHub repo fetcher
│   └── indexer.py      # Embedding and indexing pipeline
├── storage/
│   ├── embeddings.py   # Gemini embedding client
│   └── vectorstore.py  # LanceDB vector store
└── tools/
    └── search.py       # Search tool implementation

How It Works

  1. Indexing: Repositories are cloned, files are chunked based on type (markdown sections, Python functions/classes, YAML documents)
  2. Embedding: Chunks are embedded using Gemini's text-embedding-004 model via Vertex AI
  3. Storage: Vectors are stored in LanceDB for fast similarity search
  4. Search: Queries are embedded and matched against stored vectors using cosine similarity

Tasks

task install          # Install dependencies
task dev              # Run MCP server
task index            # Index all repositories
task index:axolotl    # Index only Axolotl
task index:langgraph  # Index only LangGraph
task index:crewai     # Index only CrewAI
task lint             # Run linter
task format           # Format code
task test             # Run tests
task clean            # Remove generated files

Configuration

Environment Variables

Variable Description Default
GOOGLE_CLOUD_PROJECT GCP project for Vertex AI Required
GOOGLE_CLOUD_LOCATION GCP region us-central1
FASTMCP_LOG_ENABLED Enable FastMCP logging true

Adding New Repositories

Edit src/ai_knowledge_mcp/config.py:

RepoConfig(
    owner="org-name",
    repo="repo-name",
    domain="your-domain",
    description="Description for the domain",
    include_patterns=["docs/**/*.md", "src/**/*.py"],
    exclude_patterns=["**/test*"],
)

Then run task index to rebuild the knowledge base.

Tech Stack

  • FastMCP - MCP server framework
  • LanceDB - Embedded vector database
  • Gemini - Text embeddings via Vertex AI
  • uv - Python package manager
  • Tree-sitter - Code parsing for Python chunking

License

MIT

About

AI knowledge base MCP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors