Skip to content

devapro/md-docs-agent

Repository files navigation

MD Docs Agent

AI-Powered Markdown Documentation Management System

A comprehensive documentation management system that combines semantic search, intelligent code exploration, and multiple interaction interfaces. Manage your markdown documentation with vector-based search while having the ability to fetch additional context from source code.

Core Features

Documentation Management

  • Markdown Documentation Storage: Central repository for markdown documentation in md-docs/ directory
  • Vector-Based Semantic Search: Natural language queries with AI-powered query expansion and filtering
  • Smart Chunking: Intelligent content splitting that preserves code blocks and tables
  • File Operations: Create, read, edit, rename, and delete documentation files
  • Auto-Update Embeddings: Automatically update search index after file changes

Source Code Exploration

  • Code Search: Powerful ripgrep/grep integration for pattern matching across source code
  • AI-Powered Exploration: LLM-generated summaries of files and folders
  • Folder Visualization: Directory tree structure with file sizes
  • File Pattern Matching: Find files using glob patterns

Interfaces & Integration

  • Telegram Bot: Interactive chat-based documentation assistant
  • MCP Servers: 2 Model Context Protocol servers for Claude Desktop and other MCP clients
  • CLI Tools: 11+ standalone command-line tools for all operations
  • Programmatic API: TypeScript API for custom integrations
  • Docker Support: Production-ready containerized deployment

Use Cases

Documentation Management

  • Technical Documentation Repository: Central storage for API docs, guides, tutorials
  • Knowledge Base: Company wikis, internal documentation, process guides
  • Semantic Search: Find relevant documentation using natural language queries
  • Documentation Editing: Create, edit, rename, delete documentation files programmatically

Code Exploration

  • Context Enhancement: Fetch source code context to supplement documentation answers
  • Code Search: Find implementations, patterns, and examples in your codebase
  • Project Discovery: AI-powered exploration of unfamiliar codebases
  • Architecture Visualization: Generate folder structure views of projects

Integrated Workflows

  • AI Documentation Assistant: Telegram bot that answers questions using docs + code
  • Claude Desktop Integration: MCP servers for seamless Claude integration
  • Custom Tools: Build your own documentation workflows using the API

Quick Start

1. Install Dependencies

npm install

2. Configure Environment

Create a .env file with your OpenAI API key:

cp .env.example .env

Edit .env and add your API key:

OPENAI_API_KEY=your_openai_api_key_here

3. Add Your Documentation

Place your markdown documentation files in the md-docs/ directory:

mkdir -p md-docs
# Add your .md files to md-docs/

Or create files programmatically:

npm run docs:create -- my-guide.md "# My Guide\n\nContent here..."

4. Generate Embeddings

Process your markdown files to create vector embeddings for semantic search:

npm run embed

This creates a SQLite vector database at ./vector.db.

5. Search Documentation

Search your documentation using natural language queries:

npm run docs:search -- "How to configure authentication"

6. Start Using the Tools

Explore your documentation and code:

# Search your codebase
npm run code:search -- "class.*Service" src --file-pattern="*.ts"

# Explore a folder with AI summaries
npm run code:explore -- src/tools 1

# Visualize directory structure
npm run code:tree -- src 3 --show-sizes

Usage

Embedding Generation

npm run embed
  • Reads all .md files from md-docs/
  • Intelligently chunks content using semantic boundaries (paragraphs, sentences, code blocks, tables)
  • Preserves code blocks and markdown tables intact for better context
  • Generates embeddings using OpenAI API
  • Stores in SQLite vector database with rich metadata (source, content type, size)
  • Metadata enables filtering by code/table presence and source tracking

Semantic Search

npm run search -- "your query"

Options:

  • --limit=N - Return top N results (default: 5)

Example:

npm run search -- "user authentication setup" --limit=10

MCP Server

Run as an MCP server for integration with Claude Desktop or other MCP clients:

npm run mcp

The MCP server implements an intelligent three-stage RAG pipeline:

  1. Query Expansion: LLM generates 3-5 variations of your query with related terms
  2. Vector Search: Searches database with all query variations (up to 20 results)
  3. LLM Filtering: Analyzes and returns only the 3-5 most relevant results

Configure in Claude Desktop

Edit your Claude Desktop config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Add this configuration (use absolute paths):

{
  "mcpServers": {
    "docs-rag": {
      "command": "npx",
      "args": [
        "tsx",
        "/absolute/path/to/md-docs-agent/src/mcp/mcp-server.ts"
      ],
      "env": {
        "OPENAI_API_KEY": "your_openai_api_key_here",
        "VECTOR_DB_PATH": "/absolute/path/to/md-docs-agent/vector.db",
        "OPENAI_EMBEDDING_MODEL": "text-embedding-3-small",
        "OPENAI_MODEL": "gpt-4o-mini"
      }
    }
  }
}

Restart Claude Desktop, then ask: "Can you search the docs for authentication setup?"

Telegram Bot

Run the interactive Telegram bot to provide documentation assistance:

npm run bot

The Telegram bot provides an interactive way to search and explore documentation:

Features:

  • Natural language question answering
  • Enhanced context retrieval (reads full files for top results)
  • Source file references in responses
  • Telegram-compatible Markdown formatting
  • Returns "No information in documentation" when no relevant content is found

Setup:

  1. Create a Telegram bot via @BotFather:

    • Send /newbot to BotFather
    • Follow the instructions to create your bot
    • Copy the bot token
  2. Add the bot token to your .env file:

    TELEGRAM_BOT_TOKEN=your_telegram_bot_token_here
    
  3. Start the bot:

    npm run bot
  4. Open Telegram and start a conversation with your bot!

Bot Commands:

  • /start - Welcome message and introduction
  • /help - Usage instructions and example questions
  • Any text message - Search documentation and get answers

How it works:

  1. User sends a question via Telegram
  2. Bot searches the vector database using semantic search
  3. For top 3 results, bot fetches full file content
  4. LLM generates comprehensive answer with full context
  5. Response is formatted in Telegram-compatible HTML and sent back

Architecture: The bot uses a multi-MCP architecture:

  • docs-rag MCP: Semantic search with query expansion
  • tools MCP: Code search, file operations, and folder exploration

File reading is done directly via fetchFile() for better performance.

Configuration is managed via mcp-servers.json.

CLI Tools

The project includes 11 reusable tools organized by purpose:

Documentation Management Tools

# Semantic search in documentation
npm run search -- "authentication" --limit=5

# Read file content
npm run tool:fetch-file -- path/to/file.md

# Create a new documentation file
npm run tool:create-file -- new-guide.md "# New Guide\n\nContent here..."

# Edit existing file (replace, insert, append, prepend, delete)
npm run tool:edit-file -- guide.md replace --search="old text" --replace="new text"

# Rename or move a file
npm run tool:rename-file -- old-name.md new-name.md

# Delete a file (with backup)
npm run tool:delete-file -- obsolete.md

# Update embeddings after file changes
npm run tool:update-embeddings -- path/to/changed-file.md

Code Exploration Tools

# Search code with ripgrep/grep
npm run tool:search -- "class.*Auth" src --file-pattern="*.ts"

# Find files by glob pattern
npm run tool:find-files -- src "*.ts" 10

# AI-powered folder exploration (generates summaries)
npm run tool:explore -- src/tools 1

# Directory tree visualization
npm run tool:tree -- src 3 --show-sizes

Programmatic Usage

import {
  // Documentation Management
  fetchFile, createFile, editFile, renameFile, deleteFile,
  semanticSearch, updateEmbeddings,
  // Code Exploration
  searchCode, exploreFolder, getFolderStructure, findFiles
} from './src/tools';

// === Documentation Management ===

// Read a file
const file = fetchFile({ filePath: 'guide.md' });
console.log(file.content);

// Create a new file
const created = createFile({
  filePath: 'new-guide.md',
  content: '# New Guide\n\nContent...'
});

// Edit file (replace, insert, append, prepend, delete)
const edited = editFile({
  filePath: 'guide.md',
  operation: 'replace',
  search: 'old text',
  replace: 'new text'
});

// Rename/move a file
const renamed = renameFile({ oldPath: 'old.md', newPath: 'new.md' });

// Delete a file (creates backup)
const deleted = deleteFile({ filePath: 'obsolete.md' });

// Semantic search
const docs = await semanticSearch({ query: 'authentication', limit: 5 });
console.log(docs.results);

// Update embeddings after changes
const updated = await updateEmbeddings({ filePath: 'guide.md' });

// === Code Exploration ===

// Search code patterns
const code = searchCode({
  pattern: 'class.*Auth',
  path: 'src',
  filePattern: '*.ts'
});

// Find files by pattern
const files = findFiles({ path: 'src', pattern: '*.ts', maxResults: 10 });

// AI-powered folder exploration
const exploration = await exploreFolder({
  folderPath: 'src/tools',
  maxDepth: 1
});

// Get folder structure
const tree = getFolderStructure({
  path: 'src',
  maxDepth: 3,
  showSizes: true
});

Testing

Run integration tests:

npm test

Docker Deployment

Deploy the Telegram bot using Docker:

Quick Start

# 1. Clone repository
git clone <repository-url>
cd md-docs-agent

# 2. Configure environment
cp docker/.env.example .env
nano .env  # Add your API keys

# 3. Start bot
docker-compose -f docker/docker-compose.yml up -d

Management

# View logs
docker-compose -f docker/docker-compose.yml logs -f

# Stop bot
docker-compose -f docker/docker-compose.yml down

# Restart after changes
docker-compose -f docker/docker-compose.yml restart

# Update documentation
cp new-docs/* md-docs/
docker-compose -f docker/docker-compose.yml run --rm telegram-bot npx tsx src/embed.ts
docker-compose -f docker/docker-compose.yml restart

See DEPLOY.md and docker/README.md for detailed deployment instructions.

Configuration

Configure the tool via .env file:

Variable Default Description
OPENAI_API_KEY - Your OpenAI API key (required)
OPENAI_EMBEDDING_MODEL text-embedding-3-small OpenAI embedding model
OPENAI_MODEL gpt-4o-mini LLM for query expansion and filtering
TELEGRAM_BOT_TOKEN - Telegram bot token from @BotFather (required for bot)
TELEGRAM_ALLOWED_CHAT_IDS - Comma-separated list of allowed chat IDs (optional access control)
VECTOR_DB_PATH ./vector.db Path to SQLite database
CHUNK_SIZE 1000 Text chunk size in characters
CHUNK_OVERLAP 200 Overlap between chunks

Architecture

The system is built around two core capabilities: documentation management and code exploration, accessible through multiple interfaces.

System Overview

┌─────────────────────────────────────────────────────────────┐
│                    User Interfaces                          │
│  Telegram Bot │ MCP Servers │ CLI Tools │ Programmatic API │
└─────────────────────────────────────────────────────────────┘
                              │
┌─────────────────────────────┴─────────────────────────────┐
│                     Core Capabilities                       │
├─────────────────────────────┬─────────────────────────────┤
│  Documentation Management   │   Code Exploration          │
│  • Semantic Search (RAG)    │   • Pattern Search (grep)   │
│  • File Operations          │   • AI Exploration (LLM)    │
│  • Vector Embeddings        │   • Folder Visualization    │
│  • Auto-Update Index        │   • File Discovery          │
└─────────────────────────────┴─────────────────────────────┘
                              │
┌─────────────────────────────┴─────────────────────────────┐
│                      Storage Layer                          │
│    Vector DB (SQLite)  │  md-docs/  │  Source Code        │
└─────────────────────────────────────────────────────────────┘

Layered Architecture

  1. Tools Layer (src/tools/) - Reusable, testable utilities

    • Documentation: fetch, create, edit, rename, delete, semantic search, update embeddings
    • Code Exploration: code search, find files, explore folder, folder structure
    • Each tool works standalone via CLI or programmatically
  2. Core Layer (src/core/) - Business logic

    • Search orchestration with query expansion and LLM filtering
    • Answer generation with context assembly
    • Prompt template management
  3. MCP Layer (src/mcp/) - Protocol servers for external integrations

    • mcp-server: RAG search with query expansion and filtering
    • tools-mcp-server: All 11 tools exposed via MCP protocol
    • mcp-client: Multi-server connection manager
  4. Interface Layer - Multiple ways to interact

    • Telegram Bot: Chat-based documentation assistant
    • MCP Servers: Integration with Claude Desktop and other MCP clients
    • CLI Tools: Command-line interface for all operations
    • Programmatic API: TypeScript/JavaScript library

Key Benefits

  • Dual Purpose: Both documentation management and code exploration in one system
  • Modular: Each component has a single, well-defined responsibility
  • Flexible: Multiple interfaces share the same core capabilities
  • Reusable: Tools work standalone, via MCP, or programmatically
  • Testable: Clean separation of concerns with dependency injection
  • Extensible: Easy to add new tools, interfaces, or integrations

Technology Stack

License

ISC

About

AI-Powered Markdown Documentation Management System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors