Skip to content

Semantic code search server for MCP, powered by vector embeddings and Qdrant

License

Notifications You must be signed in to change notification settings

kuyermqi/deepmatch-mcp

Repository files navigation

DeepMatch MCP

A Model Context Protocol (MCP) server for semantic code search using vector embeddings. Index your codebase and search with natural language queries.

Features

  • Semantic Code Search: Find code by meaning, not just keywords
  • Multiple Embedding Providers: OpenAI, Ollama, Gemini, OpenAI-compatible APIs
  • Real-time File Watching: Automatically re-index on file changes
  • Multi-repository Support: Index multiple directories simultaneously
  • Smart Filtering: Respects .gitignore, skips binary files and common build directories
  • MCP Protocol: Works with any MCP-compatible client (Claude Desktop, etc.)

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        deepmatch-mcp                            │
├─────────────────────────────────────────────────────────────────┤
│  CLI Entrypoint (src/cli.ts)                                    │
│  - Parses config from CLI flags and environment variables       │
│  - Orchestrates startup: scan → index → watch → serve           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │   Config    │  │  Providers  │  │     Vector Store        │  │
│  │  (Zod)      │  │ (Embedders) │  │      (Qdrant)           │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│                                                                 │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  Scanner    │  │  Chunker    │  │     Index Manager       │  │
│  │ (Directory) │  │ (Line-based)│  │   (Batch Processing)    │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│                                                                 │
│  ┌─────────────┐  ┌───────────────────────────────────────────┐ │
│  │  Watcher    │  │              MCP Server                   │ │
│  │ (Chokidar)  │  │  (stdio transport, 'search' tool)         │ │
│  └─────────────┘  └───────────────────────────────────────────┘ │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Module Overview

Module Path Description
Config src/config/ CLI/ENV parsing with Zod validation
Providers src/providers/ Embedding providers (OpenAI, Ollama, Gemini, OpenAI-compatible)
Store src/store/ Qdrant vector database wrapper
Chunker src/chunker/ Line-based text chunking with configurable limits
Scanner src/scanner/ Directory traversal with .gitignore support
Indexer src/indexer/ Batch embedding and vector upsert orchestration
Watcher src/watcher/ File change detection with debouncing
MCP src/mcp/ MCP stdio server with search tool

Installation

# Install dependencies
npm install

# Build
npm run build

Usage

Prerequisites

  1. Qdrant - Vector database (default: http://localhost:6333)

    # Using Docker
    docker run -p 6333:6333 qdrant/qdrant
  2. Embedding Provider - One of:

    • OpenAI API key
    • Ollama running locally
    • Gemini API key
    • Any OpenAI-compatible API

CLI Options

npx deepmatch-mcp [options]

Options:
  --path <path>              Repository path to index (repeatable)
  --provider <provider>      Embedding provider: openai|ollama|gemini|openai-compatible
  --model <model>            Embedding model name
  --embedding-dim <dim>      Embedding dimension (auto-detected if not set)
  --batch-size <size>        Batch size for embeddings (default: 60)
  --max-files <count>        Maximum files to index (default: 50000)
  --qdrant-url <url>         Qdrant server URL (default: http://localhost:6333)
  --qdrant-key <key>         Qdrant API key
  --openai-key <key>         OpenAI API key
  --ollama-url <url>         Ollama server URL
  --gemini-key <key>         Gemini API key
  --openai-compat-base-url   OpenAI-compatible base URL
  --openai-compat-key        OpenAI-compatible API key

Environment Variables

All CLI options can be set via environment variables:

Variable CLI Flag
DEEPMATCH_PATHS --path
DEEPMATCH_PROVIDER --provider
DEEPMATCH_MODEL --model
DEEPMATCH_EMBEDDING_DIM --embedding-dim
DEEPMATCH_BATCH_SIZE --batch-size
DEEPMATCH_MAX_FILES --max-files
DEEPMATCH_QDRANT_URL --qdrant-url
DEEPMATCH_QDRANT_API_KEY --qdrant-key
DEEPMATCH_OPENAI_API_KEY --openai-key
DEEPMATCH_OLLAMA_URL --ollama-url
DEEPMATCH_GEMINI_API_KEY --gemini-key
DEEPMATCH_OPENAI_COMPAT_BASE_URL --openai-compat-base-url
DEEPMATCH_OPENAI_COMPAT_API_KEY --openai-compat-key

CLI flags take precedence over environment variables.

Examples

With OpenAI:

npx deepmatch-mcp \
  --path /path/to/your/repo \
  --provider openai \
  --openai-key sk-xxx

With Ollama:

npx deepmatch-mcp \
  --path /path/to/repo1 \
  --path /path/to/repo2 \
  --provider ollama \
  --ollama-url http://localhost:11434 \
  --model qwen3-embedding:0.6b

With environment variables:

export DEEPMATCH_PATHS="/path/to/repo"
export DEEPMATCH_PROVIDER="openai"
export DEEPMATCH_OPENAI_API_KEY="sk-xxx"
npx deepmatch-mcp

MCP Client Configuration

For Claude Desktop, add to your MCP settings:

{
  "mcpServers": {
    "deepmatch": {
      "command": "npx",
      "args": ["deepmatch-mcp", "--path", "/path/to/repo", "--provider", "openai"],
      "env": {
        "DEEPMATCH_OPENAI_API_KEY": "sk-xxx"
      }
    }
  }
}

MCP Tools

search

Search for code using semantic similarity.

Input Schema:

Parameter Type Required Description
query string Yes Natural language search query
limit number No Max results (1-50, default: 10)
paths string[] No Filter to specific repository paths
minScore number No Minimum similarity score (0-1)

Output:

{
  "total_count": 5,
  "items": [
    {
      "filePath": "/repo/src/auth.ts",
      "repoPath": "/repo",
      "startLine": 10,
      "endLine": 25,
      "codeChunk": "function authenticate(token: string) {...}",
      "score": 0.92
    }
  ]
}

Local Development

Setup

# Clone and install
git clone https://github.com/657KB/deepmatch-mcp
cd deep-match
npm install

Development Workflow

# Run tests (TDD)
npm test

# Run tests in watch mode
npx vitest

# Build TypeScript
npm run build

# Test the CLI
node dist/cli.js --help

Project Structure

src/
├── cli.ts                 # Main entry point
├── config/
│   ├── schema.ts          # Zod schemas and defaults
│   ├── index.ts           # CLI/ENV parsing
│   └── config.test.ts
├── providers/
│   ├── types.ts           # IEmbedder interface
│   ├── embedders.ts       # Provider implementations
│   ├── index.ts
│   └── embedders.test.ts
├── store/
│   ├── types.ts           # IVectorStore interface
│   ├── qdrant.ts          # Qdrant implementation
│   ├── index.ts
│   └── qdrant.test.ts
├── chunker/
│   ├── extensions.ts      # Supported file extensions
│   ├── chunker.ts         # Line-based chunking
│   ├── index.ts
│   └── chunker.test.ts
├── scanner/
│   ├── scanner.ts         # Directory traversal
│   ├── index.ts
│   └── scanner.test.ts
├── indexer/
│   ├── index-manager.ts   # Batch indexing orchestration
│   ├── index.ts
│   └── index-manager.test.ts
├── watcher/
│   ├── file-watcher.ts    # Chokidar file watching
│   ├── index.ts
│   └── file-watcher.test.ts
└── mcp/
    ├── server.ts          # MCP server + search tool
    ├── index.ts
    └── server.test.ts

Running Tests

# Run all tests
npm test

# Run specific test file
npx vitest src/chunker/chunker.test.ts

# Run with coverage
npx vitest --coverage

Configuration Defaults (Roo-Code Aligned)

Parameter Default Description
batchSize 60 Embedding batch size
maxFiles 50,000 Maximum files to index
chunkMin 50 Minimum chunk size (chars)
chunkMax 1,000 Maximum chunk size (chars)
chunkMaxTolerance 1.15 Tolerance factor for max size
chunkRebalanceMin 200 Minimum remainder to trigger rebalance
qdrantUrl http://localhost:6333 Qdrant server URL

File Filtering

Supported Extensions

TypeScript, JavaScript, Python, Java, C/C++, C#, Go, Rust, Ruby, PHP, Swift, Kotlin, Scala, Lua, R, Perl, Shell, SQL, HTML, CSS, JSON, YAML, XML, Markdown, Vue, Svelte

Excluded Directories

node_modules, dist, build, target, .git, hidden directories, __pycache__, venv, .next, .nuxt, coverage, vendor, Pods, .gradle, .idea, .vscode

Additional Filters

  • Files larger than 1MB are skipped
  • .gitignore rules are respected (stacked for nested directories)

License

MIT

About

Semantic code search server for MCP, powered by vector embeddings and Qdrant

Resources

License

Stars

Watchers

Forks

Packages

No packages published