AI Knowledge MCP

An MCP (Model Context Protocol) server that provides RAG-powered access to AI/ML framework documentation and source code. Use it with Claude Code, Claude Desktop, or any MCP-compatible client to search across curated AI engineering knowledge.

Features

Semantic search across documentation and code from popular AI/ML frameworks
Domain-aware chunking for Markdown, Python, and YAML files
Pre-indexed repositories: Axolotl, LangGraph, CrewAI, Agent Lightning
Vector search powered by LanceDB with Gemini embeddings

Indexed Domains

Domain	Repository	Description
`axolotl`	axolotl-ai-cloud/axolotl	Fine-tuning framework for LLMs
`langgraph`	langchain-ai/langgraph	Agentic workflow framework
`crewai`	crewAIInc/crewAI	Multi-agent orchestration
`agent-lightning`	microsoft/agent-lightning	Microsoft's agent framework

Installation

Prerequisites

Python 3.13+
uv package manager
Google Cloud credentials (for Gemini embeddings)
Task (optional, for task runner)

Setup

# Clone the repository
git clone https://github.com/your-username/ai-mcp.git
cd ai-mcp

# Install dependencies
uv sync

# Index the knowledge base (requires Google Cloud auth)
task index

Usage

MCP Tools

The server exposes three tools:

`search_knowledge`

Search the knowledge base using natural language queries.

search_knowledge(
    query: str,              # Natural language search query
    domain: str | None,      # Filter by domain (optional)
    limit: int = 10          # Max results to return
)

`list_domains`

List available domains and their statistics.

list_domains()
# Returns: {"domains": [...], "stats": {...}, "total_chunks": int}

`get_source`

Retrieve a specific chunk by ID.

get_source(source_id: str)
# Returns: Full chunk data including content and metadata

Claude Code Integration

Add to your ~/.claude.json:

{
  "mcpServers": {
    "ai-knowledge": {
      "command": "/path/to/ai-mcp/scripts/mcp-wrapper.sh",
      "args": []
    }
  }
}

Or using uv directly (may require VS Code terminal for PATH):

{
  "mcpServers": {
    "ai-knowledge": {
      "command": "uv",
      "args": ["run", "ai-knowledge-mcp"],
      "cwd": "/path/to/ai-mcp"
    }
  }
}

Development Server

# Run the MCP server locally
task dev

# Or directly
uv run ai-knowledge-mcp

Architecture

src/ai_knowledge_mcp/
├── server.py           # FastMCP server entry point
├── config.py           # Configuration and domain definitions
├── chunking/           # Content chunkers
│   ├── markdown.py     # Markdown/MDX/QMD chunker
│   ├── code.py         # Python AST-aware chunker
│   └── yaml_chunker.py # YAML config chunker
├── ingestion/
│   ├── github.py       # GitHub repo fetcher
│   └── indexer.py      # Embedding and indexing pipeline
├── storage/
│   ├── embeddings.py   # Gemini embedding client
│   └── vectorstore.py  # LanceDB vector store
└── tools/
    └── search.py       # Search tool implementation

How It Works

Indexing: Repositories are cloned, files are chunked based on type (markdown sections, Python functions/classes, YAML documents)
Embedding: Chunks are embedded using Gemini's text-embedding-004 model via Vertex AI
Storage: Vectors are stored in LanceDB for fast similarity search
Search: Queries are embedded and matched against stored vectors using cosine similarity

Tasks

task install          # Install dependencies
task dev              # Run MCP server
task index            # Index all repositories
task index:axolotl    # Index only Axolotl
task index:langgraph  # Index only LangGraph
task index:crewai     # Index only CrewAI
task lint             # Run linter
task format           # Format code
task test             # Run tests
task clean            # Remove generated files

Configuration

Environment Variables

Variable	Description	Default
`GOOGLE_CLOUD_PROJECT`	GCP project for Vertex AI	Required
`GOOGLE_CLOUD_LOCATION`	GCP region	`us-central1`
`FASTMCP_LOG_ENABLED`	Enable FastMCP logging	`true`

Adding New Repositories

Edit src/ai_knowledge_mcp/config.py:

RepoConfig(
    owner="org-name",
    repo="repo-name",
    domain="your-domain",
    description="Description for the domain",
    include_patterns=["docs/**/*.md", "src/**/*.py"],
    exclude_patterns=["**/test*"],
)

Then run task index to rebuild the knowledge base.

Tech Stack

FastMCP - MCP server framework
LanceDB - Embedded vector database
Gemini - Text embeddings via Vertex AI
uv - Python package manager
Tree-sitter - Code parsing for Python chunking

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
scripts		scripts
src/ai_knowledge_mcp		src/ai_knowledge_mcp
tests		tests
.CLAUDE.md		.CLAUDE.md
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
Taskfile.yml		Taskfile.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Knowledge MCP

Features

Indexed Domains

Installation

Prerequisites

Setup

Usage

MCP Tools

`search_knowledge`

`list_domains`

`get_source`

Claude Code Integration

Development Server

Architecture

How It Works

Tasks

Configuration

Environment Variables

Adding New Repositories

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Knowledge MCP

Features

Indexed Domains

Installation

Prerequisites

Setup

Usage

MCP Tools

search_knowledge

list_domains

get_source

Claude Code Integration

Development Server

Architecture

How It Works

Tasks

Configuration

Environment Variables

Adding New Repositories

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`search_knowledge`

`list_domains`

`get_source`

Packages