Skip to content

andyleimc-source/mdrag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

mdrag

Give any local Markdown folder a semantic-search MCP server. Runs entirely offline.

Turn ~/Desktop/sales/, ~/Desktop/notes/, or any directory full of Markdown files into a searchable knowledge base that Claude Code, Cursor, Cline, and other MCP clients can query with natural-language questions.

  • πŸ—‚ Multi-vault: one MCP server manages many doc folders, each a separate "vault"
  • πŸ”’ Fully local: no API keys, no cloud β€” embeddings run on your machine
  • ⚑ Incremental indexing: only re-embed files that changed
  • 🧠 Any embedding model: default is Chinese-optimized bge-small-zh-v1.5; English / multilingual models work too
  • πŸ“¦ Self-contained: each vault's vector DB lives inside the folder (.mdrag/), move it anywhere

Installation

pip install mdrag

Requires Python β‰₯ 3.10.


Quickstart (3 steps)

Let's say Bob has a folder ~/Desktop/sales/ full of meeting notes, proposals, and competitor research in Markdown.

1. Register the MCP server (once, globally)

claude mcp add mdrag --scope user -- mdrag serve

This tells Claude Code "there's an MCP server called mdrag β€” launch it with mdrag serve when needed". You'll only do this once per machine.

2. Register your doc folder as a vault

mdrag vault add sales ~/Desktop/sales

The first time you run this, a ~100MB embedding model downloads (once), then all .md files under ~/Desktop/sales/ get indexed. A .mdrag/ subfolder is created inside sales/ to hold the vector database.

3. Use it from Claude Code

Open Claude Code in any project. Ask:

"Use the mdrag MCP to search my sales vault for the Q4 pipeline review"

Claude will call mcp__mdrag__search(vault="sales", query="Q4 pipeline review") and return the top matching documents.


Adding another folder

No new MCP config needed β€” just register another vault:

mdrag vault add marketing ~/Desktop/marketing
mdrag vault add notes ~/Documents/notes

All vaults are visible through the same MCP server. Claude calls:

mcp__mdrag__list_vaults()                          β†’ see all vaults
mcp__mdrag__search(vault="marketing", query="...")
mcp__mdrag__search(vault="notes", query="...")

CLI reference

mdrag serve                          Start the MCP stdio server
mdrag vault add NAME PATH            Register a directory and index it
mdrag vault list                     Show all vaults
mdrag vault info NAME                Show vault details
mdrag vault reindex NAME [--full]    Re-index (incremental or full)
mdrag vault remove NAME [--purge]    Unregister (and optionally delete .mdrag/)

Common options:

  • --model MODEL_NAME on vault add β€” pick a different embedding model
  • --no-index on vault add β€” skip initial indexing (useful when first adding, want to index later)
  • --full on vault reindex β€” rebuild from scratch (required after changing the model)

MCP tools exposed

When mdrag serve is running, these tools are available to the AI client:

Tool Purpose
list_vaults() List all registered vaults with their stats
search(vault, query, top_k=5, tags="") Semantic search within a vault, optional tag filter
get_doc(vault, path) Read the full content of a document
list_tags(vault) List all frontmatter tags in a vault with counts

Frontmatter (optional)

If your Markdown files have YAML frontmatter, mdrag will use it:

---
title: Q4 Pipeline Review
tags: [sales, forecast, 2026-q4]
summary: Overview of deals in play for Q4 2026.
---

# Q4 Pipeline Review
...
  • title β€” used as the result title (falls back to filename)
  • tags β€” searchable via the tags parameter of search
  • summary β€” shown in search results

No frontmatter? It still works β€” mdrag auto-generates a preview from the file body.


Embedding models

Language Recommended model Notes
Chinese BAAI/bge-small-zh-v1.5 (default) ~100MB, CPU-friendly
English BAAI/bge-small-en-v1.5 Same family, English
Multilingual paraphrase-multilingual-MiniLM-L12-v2 For mixed-language vaults
Higher accuracy BAAI/bge-base-zh-v1.5 or -en ~400MB, noticeably slower

Change the model when registering a vault:

mdrag vault add notes ~/Documents/notes --model BAAI/bge-small-en-v1.5

After changing the model on an existing vault (edit ~/.mdrag/vaults.yaml), run a full rebuild:

mdrag vault reindex notes --full

How it works

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ ~/Desktop/sales/   β”‚        β”‚ ~/.mdrag/         β”‚
 β”‚   meeting-01.md    β”‚        β”‚   vaults.yaml        β”‚  ← registry
 β”‚   proposal.md      β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 β”‚   .mdrag/       β”‚ ← LanceDB vector store (per-vault)
 β”‚     docs.lance/    β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β”‚ mdrag serve
            β–Ό
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚   FastMCP stdio server   β”‚
 β”‚   tools:                 β”‚
 β”‚     search / get_doc /   β”‚
 β”‚     list_vaults /        β”‚
 β”‚     list_tags            β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚ MCP protocol (stdio / JSON-RPC)
            β–Ό
     Claude Code / Cursor / Cline
  • Vault registry is at ~/.mdrag/vaults.yaml
  • Each vault's vector database lives inside the vault directory at .mdrag/ β€” self-contained, portable
  • Embeddings use sentence-transformers, stored in LanceDB
  • MCP server is built on FastMCP

FAQ

How do I update the index after editing files?

mdrag vault reindex sales

It's incremental β€” only files with changed mtime are re-embedded.

Can I automate re-indexing?

Yes. Add to cron (Linux/macOS):

0 * * * * /path/to/mdrag vault reindex sales

Or use launchd on macOS / Task Scheduler on Windows.

Does it support PDF, DOCX, etc.?

Not yet. Convert to Markdown first (e.g. with pandoc) and point mdrag at the result.

Model download is slow / fails

If you're in China, set a HuggingFace mirror:

export HF_ENDPOINT=https://hf-mirror.com
mdrag vault add sales ~/Desktop/sales

Where is the vector data stored?

  • Vault registry: ~/.mdrag/vaults.yaml
  • Each vault's vectors: <vault_path>/.mdrag/docs.lance/

Can I share a vault across machines?

Yes β€” the .mdrag/ folder is self-contained. Sync the whole vault directory (via Dropbox, rsync, git-lfs, whatever) and mdrag vault add <name> <path> on the other machine. No re-indexing needed as long as the embedding model matches.


Integrations

Claude Code

claude mcp add mdrag --scope user -- mdrag serve

Or manually in ~/.mcp.json:

{
  "mcpServers": {
    "mdrag": {
      "command": "mdrag",
      "args": ["serve"]
    }
  }
}

Cursor / Cline / other MCP clients

Add the same stdio command to your client's MCP configuration. The command is mdrag serve β€” it communicates over stdio following the MCP protocol.


Development

git clone https://github.com/andyleimc-source/mdrag
cd mdrag
python -m venv .venv
.venv/bin/pip install -e .[dev]
.venv/bin/pytest

Try the example vault shipped in the repo:

mdrag vault add demo ./examples/sample-vault
mdrag vault list

License

MIT β€” do whatever you want with it.

About

Give any local Markdown folder a semantic-search MCP server. Fully local, multi-vault, incremental indexing.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages