Skip to content

cyngielson/codebase-search-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rust-mcp — Local Semantic Code Search via MCP

A high-performance MCP (Model Context Protocol) server that indexes your codebase into a local PostgreSQL + pgvector database and exposes semantic search to AI assistants (Claude Desktop, VS Code Copilot, etc.) — no cloud services required, no per-query cost.

Why?

rust-mcp (this) Zilliz Cloud + OpenAI
Embeddings Local (FastEmbed / ONNX) OpenAI API — ~$0.0001/1k tokens
Vector storage Self-hosted PostgreSQL Zilliz Cloud — $25+/month
Privacy 100% local — code never leaves your machine Code sent to OpenAI + Milvus
Index speed ~12 000 chunks / 30 s Network-bound
Offline use Yes No

Features

  • Semantic search — find code by meaning, not just keywords ("find payment logic" finds process_transaction())
  • Multi-provider embeddings — FastEmbed (default), LocalONNX, Ollama, OpenAI, HuggingFace
  • Rich search filters — file extensions, path pattern (SQL LIKE), tags, date ranges, similarity threshold
  • Auto-tagging — chunks auto-tagged as api_endpoint, database_model, authentication, etc.
  • Multi-project — index multiple repos into one DB, search per-project or across all
  • pgvector indexes — IVFFlat (fast build) or HNSW (better recall) cosine similarity
  • MCP protocol — works natively with Claude Desktop and VS Code Copilot Chat

Requirements

  • Rust 1.75+
  • PostgreSQL 14+ with pgvector extension
  • (Optional) Ollama for local Ollama embeddings

Quick Start

1. Install pgvector

# Ubuntu / Debian
sudo apt install postgresql-16-pgvector

# macOS (Homebrew)
brew install pgvector

# Windows: see https://github.com/pgvector/pgvector#windows

2. Create database

CREATE DATABASE claude_context;

Apply the schema (or let the server auto-initialize on first run):

psql -d claude_context -f schema.sql

3. Configure environment

cp .env.example .env
# Edit .env — set POSTGRES_URL at minimum

4. Build

cargo build --release --bin rust-mcp-server

5. Wire into your MCP client

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "vector-database": {
      "command": "C:/path/to/rust-mcp-server.exe",
      "args": ["local-onnx"],
      "env": {
        "POSTGRES_URL": "postgresql://postgres:password@localhost:5432/claude_context",
        "RUST_LOG": "info"
      }
    }
  }
}

VS Code Copilot — add to mcp.json (User or workspace):

{
  "servers": {
    "vector-database": {
      "type": "stdio",
      "command": "C:/path/to/rust-mcp-server.exe",
      "args": ["local-onnx"],
      "env": {
        "POSTGRES_URL": "postgresql://postgres:password@localhost:5432/claude_context",
        "RUST_LOG": "info"
      }
    }
  }
}

The local-onnx argument selects the embedded ONNX model (mxbai-embed-large-v1, 1024D). No additional setup needed.


MCP Tools Reference

Once connected, the following tools are available to your AI assistant:

Indexing

Tool Description
vector-index_codebase_incremental Index a project (or update existing). Creates the project if it does not exist.
vector-reindex_local_onnx Re-embed all chunks for an existing project using the local ONNX model.
vector-remove_project Delete a project and all its chunks from the database.
vector-list_projects List all indexed projects with chunk counts and metadata.
vector-get_project_stats Detailed stats for one project (chunk count, file types, tags, etc.).

Searching

Tool Description
vector-advanced_search_code Semantic search with full filter support (main search tool).
vector-search_code Simple semantic search — query + limit only.
vector-search_by_tags Find chunks matching specific auto-tags.
vector-find_similar_code Find code similar to a given snippet.

Utilities

Tool Description
vector-get_chunk_by_id Retrieve a specific chunk by its database ID.
vector-get_file_chunks Get all chunks for a specific file path.
vector-database_health Check PostgreSQL + pgvector connectivity and index status.

Usage Examples

Index a project

Ask your AI assistant:

"Index my project at /home/user/myrepo as my-api"

The assistant calls vector-index_codebase_incremental:

{
  "path": "/home/user/myrepo",
  "project_name": "my-api"
}

Basic semantic search

"Find authentication middleware in my-api"

{
  "path": "/home/user/myrepo",
  "query": "authentication middleware token validation",
  "limit": 10
}

Search with filters

By file extension:

{
  "path": "/home/user/myrepo",
  "query": "database connection pool",
  "file_extensions": ["py", "rs"],
  "limit": 10
}

By path pattern (SQL LIKE syntax):

{
  "path": "/home/user/myrepo",
  "query": "surge pricing calculation",
  "path_pattern": "%pricing%",
  "file_extensions": ["py"],
  "limit": 10
}

Narrow to a specific module:

{
  "path": "/home/user/myrepo",
  "query": "A/B test variant selection",
  "path_pattern": "%smart_engine%",
  "limit": 10
}

By auto-tags:

{
  "path": "/home/user/myrepo",
  "query": "payment processing",
  "tags": ["payment_system", "api_endpoint"],
  "tag_logic": "AND",
  "limit": 10
}

With similarity threshold (filter out weak matches):

{
  "path": "/home/user/myrepo",
  "query": "order state machine transitions",
  "min_similarity": 0.15,
  "limit": 20
}

Search tips

  • No results? Try a broader query. Semantic search matches concepts, not exact keywords.
  • Too many irrelevant results? Add path_pattern to scope the search to a module, or raise min_similarity.
  • Large codebase (50k+ chunks)? Always use path_pattern or file_extensions to reduce noise.
  • path_pattern uses SQL LIKE syntax: %auth% matches any path containing "auth", %modules/api/% matches that subdirectory.

Embedding Providers

Provider Quality Speed Requires
local-onnx (recommended) High Fast ONNX model in models/ dir
fastembed High Very fast Nothing — bundled model
ollama High Medium Ollama running locally
openai Very high Network OPENAI_API_KEY
huggingface High Network HUGGINGFACE_API_KEY

Default model: mixedbread-ai/mxbai-embed-large-v1 @ 1024 dimensions.

Pass the provider as the first argument to the server binary, e.g.:

rust-mcp-server.exe local-onnx
rust-mcp-server.exe ollama
rust-mcp-server.exe openai

Auto-Tags

During indexing, each chunk is automatically tagged based on its content and path. You can filter searches by tag:

Tag Detected when
api_endpoint Flask/FastAPI route decorators, Express handlers
api_layer Files named api_*, routes_*, views_*
database_model SQLAlchemy models, ORM classes, schema definitions
database_schema SQL DDL statements, migration files
authentication Login, token, JWT, OAuth patterns
business_logic Core logic files not matching other categories
order_management Order, booking, ride, dispatch patterns
payment_system Payment, invoice, billing, transaction patterns
configuration Config files, settings, environment loading
test_code Files in tests/, test_* prefix/suffix
documentation Markdown, docstrings, README files
filetype_{ext} One tag per file extension (e.g. filetype_py)
module_{dir} Top-level directory name (e.g. module_src)

Performance

Benchmarked on e5-2680c2 / 32 GB RAM:

  • Indexing: ~12 000 chunks in ~30 seconds (local-onnx, CPU)
  • Search: < 50 ms cosine similarity over 500k+ chunks (IVFFlat index)

Indexing throughput depends heavily on file count and chunk size. A 43k-file Python project (49k chunks) indexes in ~143 seconds.


Troubleshooting

Server fails to start — "extension vector does not exist"

# Enable pgvector in your database
psql -d claude_context -c "CREATE EXTENSION IF NOT EXISTS vector;"

Search returns unrelated results

  • Add path_pattern to narrow the search scope
  • Increase min_similarity (try 0.1 or 0.2)
  • Use more specific query terms — describe the behavior, not just names

"Project not found" on reindex

  • Use vector-index_codebase_incremental first to create the project, then vector-reindex_local_onnx for subsequent updates.

Verify pgvector is installed

psql -d claude_context -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

Verify Ollama is running (if using Ollama provider)

curl http://localhost:11434/api/tags

License

MIT

Contributing

PRs welcome. Run cargo clippy -- -D warnings and cargo test before submitting.

About

A Model Context Protocol (MCP) server for efficient local codebase searching and indexing.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages