AI-Powered Markdown Documentation Management System
A comprehensive documentation management system that combines semantic search, intelligent code exploration, and multiple interaction interfaces. Manage your markdown documentation with vector-based search while having the ability to fetch additional context from source code.
- Markdown Documentation Storage: Central repository for markdown documentation in
md-docs/directory - Vector-Based Semantic Search: Natural language queries with AI-powered query expansion and filtering
- Smart Chunking: Intelligent content splitting that preserves code blocks and tables
- File Operations: Create, read, edit, rename, and delete documentation files
- Auto-Update Embeddings: Automatically update search index after file changes
- Code Search: Powerful ripgrep/grep integration for pattern matching across source code
- AI-Powered Exploration: LLM-generated summaries of files and folders
- Folder Visualization: Directory tree structure with file sizes
- File Pattern Matching: Find files using glob patterns
- Telegram Bot: Interactive chat-based documentation assistant
- MCP Servers: 2 Model Context Protocol servers for Claude Desktop and other MCP clients
- CLI Tools: 11+ standalone command-line tools for all operations
- Programmatic API: TypeScript API for custom integrations
- Docker Support: Production-ready containerized deployment
- Technical Documentation Repository: Central storage for API docs, guides, tutorials
- Knowledge Base: Company wikis, internal documentation, process guides
- Semantic Search: Find relevant documentation using natural language queries
- Documentation Editing: Create, edit, rename, delete documentation files programmatically
- Context Enhancement: Fetch source code context to supplement documentation answers
- Code Search: Find implementations, patterns, and examples in your codebase
- Project Discovery: AI-powered exploration of unfamiliar codebases
- Architecture Visualization: Generate folder structure views of projects
- AI Documentation Assistant: Telegram bot that answers questions using docs + code
- Claude Desktop Integration: MCP servers for seamless Claude integration
- Custom Tools: Build your own documentation workflows using the API
npm installCreate a .env file with your OpenAI API key:
cp .env.example .envEdit .env and add your API key:
OPENAI_API_KEY=your_openai_api_key_here
Place your markdown documentation files in the md-docs/ directory:
mkdir -p md-docs
# Add your .md files to md-docs/Or create files programmatically:
npm run docs:create -- my-guide.md "# My Guide\n\nContent here..."Process your markdown files to create vector embeddings for semantic search:
npm run embedThis creates a SQLite vector database at ./vector.db.
Search your documentation using natural language queries:
npm run docs:search -- "How to configure authentication"Explore your documentation and code:
# Search your codebase
npm run code:search -- "class.*Service" src --file-pattern="*.ts"
# Explore a folder with AI summaries
npm run code:explore -- src/tools 1
# Visualize directory structure
npm run code:tree -- src 3 --show-sizesnpm run embed- Reads all
.mdfiles frommd-docs/ - Intelligently chunks content using semantic boundaries (paragraphs, sentences, code blocks, tables)
- Preserves code blocks and markdown tables intact for better context
- Generates embeddings using OpenAI API
- Stores in SQLite vector database with rich metadata (source, content type, size)
- Metadata enables filtering by code/table presence and source tracking
npm run search -- "your query"Options:
--limit=N- Return top N results (default: 5)
Example:
npm run search -- "user authentication setup" --limit=10Run as an MCP server for integration with Claude Desktop or other MCP clients:
npm run mcpThe MCP server implements an intelligent three-stage RAG pipeline:
- Query Expansion: LLM generates 3-5 variations of your query with related terms
- Vector Search: Searches database with all query variations (up to 20 results)
- LLM Filtering: Analyzes and returns only the 3-5 most relevant results
Edit your Claude Desktop config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add this configuration (use absolute paths):
{
"mcpServers": {
"docs-rag": {
"command": "npx",
"args": [
"tsx",
"/absolute/path/to/md-docs-agent/src/mcp/mcp-server.ts"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key_here",
"VECTOR_DB_PATH": "/absolute/path/to/md-docs-agent/vector.db",
"OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
"OPENAI_MODEL": "gpt-4o-mini"
}
}
}
}Restart Claude Desktop, then ask: "Can you search the docs for authentication setup?"
Run the interactive Telegram bot to provide documentation assistance:
npm run botThe Telegram bot provides an interactive way to search and explore documentation:
Features:
- Natural language question answering
- Enhanced context retrieval (reads full files for top results)
- Source file references in responses
- Telegram-compatible Markdown formatting
- Returns "No information in documentation" when no relevant content is found
Setup:
-
Create a Telegram bot via @BotFather:
- Send
/newbotto BotFather - Follow the instructions to create your bot
- Copy the bot token
- Send
-
Add the bot token to your
.envfile:TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here -
Start the bot:
npm run bot
-
Open Telegram and start a conversation with your bot!
Bot Commands:
/start- Welcome message and introduction/help- Usage instructions and example questions- Any text message - Search documentation and get answers
How it works:
- User sends a question via Telegram
- Bot searches the vector database using semantic search
- For top 3 results, bot fetches full file content
- LLM generates comprehensive answer with full context
- Response is formatted in Telegram-compatible HTML and sent back
Architecture: The bot uses a multi-MCP architecture:
- docs-rag MCP: Semantic search with query expansion
- tools MCP: Code search, file operations, and folder exploration
File reading is done directly via fetchFile() for better performance.
Configuration is managed via mcp-servers.json.
The project includes 11 reusable tools organized by purpose:
# Semantic search in documentation
npm run search -- "authentication" --limit=5
# Read file content
npm run tool:fetch-file -- path/to/file.md
# Create a new documentation file
npm run tool:create-file -- new-guide.md "# New Guide\n\nContent here..."
# Edit existing file (replace, insert, append, prepend, delete)
npm run tool:edit-file -- guide.md replace --search="old text" --replace="new text"
# Rename or move a file
npm run tool:rename-file -- old-name.md new-name.md
# Delete a file (with backup)
npm run tool:delete-file -- obsolete.md
# Update embeddings after file changes
npm run tool:update-embeddings -- path/to/changed-file.md# Search code with ripgrep/grep
npm run tool:search -- "class.*Auth" src --file-pattern="*.ts"
# Find files by glob pattern
npm run tool:find-files -- src "*.ts" 10
# AI-powered folder exploration (generates summaries)
npm run tool:explore -- src/tools 1
# Directory tree visualization
npm run tool:tree -- src 3 --show-sizesimport {
// Documentation Management
fetchFile, createFile, editFile, renameFile, deleteFile,
semanticSearch, updateEmbeddings,
// Code Exploration
searchCode, exploreFolder, getFolderStructure, findFiles
} from './src/tools';
// === Documentation Management ===
// Read a file
const file = fetchFile({ filePath: 'guide.md' });
console.log(file.content);
// Create a new file
const created = createFile({
filePath: 'new-guide.md',
content: '# New Guide\n\nContent...'
});
// Edit file (replace, insert, append, prepend, delete)
const edited = editFile({
filePath: 'guide.md',
operation: 'replace',
search: 'old text',
replace: 'new text'
});
// Rename/move a file
const renamed = renameFile({ oldPath: 'old.md', newPath: 'new.md' });
// Delete a file (creates backup)
const deleted = deleteFile({ filePath: 'obsolete.md' });
// Semantic search
const docs = await semanticSearch({ query: 'authentication', limit: 5 });
console.log(docs.results);
// Update embeddings after changes
const updated = await updateEmbeddings({ filePath: 'guide.md' });
// === Code Exploration ===
// Search code patterns
const code = searchCode({
pattern: 'class.*Auth',
path: 'src',
filePattern: '*.ts'
});
// Find files by pattern
const files = findFiles({ path: 'src', pattern: '*.ts', maxResults: 10 });
// AI-powered folder exploration
const exploration = await exploreFolder({
folderPath: 'src/tools',
maxDepth: 1
});
// Get folder structure
const tree = getFolderStructure({
path: 'src',
maxDepth: 3,
showSizes: true
});Run integration tests:
npm testDeploy the Telegram bot using Docker:
# 1. Clone repository
git clone <repository-url>
cd md-docs-agent
# 2. Configure environment
cp docker/.env.example .env
nano .env # Add your API keys
# 3. Start bot
docker-compose -f docker/docker-compose.yml up -d# View logs
docker-compose -f docker/docker-compose.yml logs -f
# Stop bot
docker-compose -f docker/docker-compose.yml down
# Restart after changes
docker-compose -f docker/docker-compose.yml restart
# Update documentation
cp new-docs/* md-docs/
docker-compose -f docker/docker-compose.yml run --rm telegram-bot npx tsx src/embed.ts
docker-compose -f docker/docker-compose.yml restartSee DEPLOY.md and docker/README.md for detailed deployment instructions.
Configure the tool via .env file:
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
- | Your OpenAI API key (required) |
OPENAI_EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
OPENAI_MODEL |
gpt-4o-mini |
LLM for query expansion and filtering |
TELEGRAM_BOT_TOKEN |
- | Telegram bot token from @BotFather (required for bot) |
TELEGRAM_ALLOWED_CHAT_IDS |
- | Comma-separated list of allowed chat IDs (optional access control) |
VECTOR_DB_PATH |
./vector.db |
Path to SQLite database |
CHUNK_SIZE |
1000 |
Text chunk size in characters |
CHUNK_OVERLAP |
200 |
Overlap between chunks |
The system is built around two core capabilities: documentation management and code exploration, accessible through multiple interfaces.
┌─────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ Telegram Bot │ MCP Servers │ CLI Tools │ Programmatic API │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────┴─────────────────────────────┐
│ Core Capabilities │
├─────────────────────────────┬─────────────────────────────┤
│ Documentation Management │ Code Exploration │
│ • Semantic Search (RAG) │ • Pattern Search (grep) │
│ • File Operations │ • AI Exploration (LLM) │
│ • Vector Embeddings │ • Folder Visualization │
│ • Auto-Update Index │ • File Discovery │
└─────────────────────────────┴─────────────────────────────┘
│
┌─────────────────────────────┴─────────────────────────────┐
│ Storage Layer │
│ Vector DB (SQLite) │ md-docs/ │ Source Code │
└─────────────────────────────────────────────────────────────┘
-
Tools Layer (
src/tools/) - Reusable, testable utilities- Documentation: fetch, create, edit, rename, delete, semantic search, update embeddings
- Code Exploration: code search, find files, explore folder, folder structure
- Each tool works standalone via CLI or programmatically
-
Core Layer (
src/core/) - Business logic- Search orchestration with query expansion and LLM filtering
- Answer generation with context assembly
- Prompt template management
-
MCP Layer (
src/mcp/) - Protocol servers for external integrationsmcp-server: RAG search with query expansion and filteringtools-mcp-server: All 11 tools exposed via MCP protocolmcp-client: Multi-server connection manager
-
Interface Layer - Multiple ways to interact
- Telegram Bot: Chat-based documentation assistant
- MCP Servers: Integration with Claude Desktop and other MCP clients
- CLI Tools: Command-line interface for all operations
- Programmatic API: TypeScript/JavaScript library
- Dual Purpose: Both documentation management and code exploration in one system
- Modular: Each component has a single, well-defined responsibility
- Flexible: Multiple interfaces share the same core capabilities
- Reusable: Tools work standalone, via MCP, or programmatically
- Testable: Clean separation of concerns with dependency injection
- Extensible: Easy to add new tools, interfaces, or integrations
- TypeScript - Type-safe JavaScript
- OpenAI API - Text embeddings and LLM operations
- SQLite Vector - Vector database with vec0 extension
- better-sqlite3 - SQLite driver
- MCP SDK - Model Context Protocol
- Telegraf - Telegram bot framework
- Docker - Containerization and deployment
ISC