Skip to content

AustinSchoen/codebase-intel

Repository files navigation

Codebase Intelligence

An MCP server that indexes your codebases and gives AI agents deep understanding of your code. It parses source files with Tree-sitter, generates embeddings with Voyage AI, and stores everything in Qdrant + PostgreSQL for fast hybrid search. Connect it to Claude Code (or any MCP client) and your agent can search code semantically, look up symbols, trace references, explore class hierarchies, and get architectural summaries.

Features

  • 10 language parsers — Go, Python, TypeScript, JavaScript, Rust, C, C++, Kotlin, Swift, Dart
  • Hybrid search — dense (Voyage AI) + sparse vectors via Qdrant
  • Symbol intelligence — fuzzy lookup, cross-references, class hierarchies
  • Architectural summaries — module and subsystem explanations (via Claude)
  • Unreal Engine aware — parses UCLASS / UPROPERTY / UFUNCTION reflection macros and .Build.cs module graphs
  • Incremental indexing — only re-processes changed files
  • Multi-codebase — one server serves many indexed projects
  • Daemon mode — file watcher with auto-reindex on changes
  • HTTP + stdio transports — works as a local MCP or shared team server

Quick Start

# 1. Clone
git clone https://github.com/AustinSchoen/codebase-intel.git
cd codebase-intel

# 2. Run setup (starts Qdrant, Postgres, and the MCP server)
./setup.sh

# 3. Create a codebase config
cp configs/example-codebase.yaml configs/my-project.yaml
# Edit: set path, name, and languages

# 4. Index
./scripts/index.sh configs/my-project.yaml

# 5. Wire it to Claude Code (setup.sh prints this command with the token filled in)
claude mcp add codebase-intel \
  --transport http \
  --url http://localhost:8090/mcp \
  --header "Authorization: Bearer <MCP_API_KEY from setup.sh output>"

Setup prompts for your Voyage AI API key and generates everything else. Your Claude Code session now has search_code, get_symbol, get_references, and the rest of the MCP tools below.

Indexing multiple machines. If you want to run the server on one host and have other machines (laptops, dev VMs) push their codebases to it, see docs/DEPLOYMENT.md. It's git clone + ./setup-indexer.sh per machine — the indexer auto-discovers the server on the LAN via mDNS.

MCP Tools

Tool Description
list_codebases List all indexed codebases with stats
search_code Hybrid semantic + keyword code search
get_symbol Fuzzy symbol lookup (functions, classes, structs)
list_modules List top-level modules with file counts
get_references Find callers and references to a symbol
get_class_hierarchy Traverse inheritance trees
get_file_context File symbols, dependents, and content
get_module_summary Architectural summary of a module
explain_subsystem AI-generated explanation of a subsystem
generate_claude_md Generate a CLAUDE.md for a codebase
reindex Trigger incremental or full reindex
reindex_status Check reindex progress

Architecture

The indexer is a thin HTTP client: it walks the codebase, reads files, and ships content to the server. The server owns the indexing pipeline (parse → chunk → embed → store) and the credentials for Voyage, Qdrant, and Postgres. Remote indexer hosts only need the MCP server URL and a bearer token — they never touch the backends directly.

┌─────────────────────────────────────────────────────────┐
│                  Claude Code / MCP Client                │
└──────────────────────────┬──────────────────────────────┘
                           │ HTTP :8090
┌──────────────────────────▼──────────────────────────────┐
│                     MCP Server (Go)                      │
│  tools/list · tools/call · /health · /ready · /mcp/...   │
│  Pipeline: parse → chunk → embed (Voyage) → store        │
└─────┬──────────────────────────────────────┬────────────┘
      │                                      │
┌─────▼─────────┐                   ┌────────▼────────────┐
│  Qdrant       │                   │  PostgreSQL 17      │
│  (vectors)    │                   │  (symbols, refs,    │
│               │                   │   summaries)        │
└───────────────┘                   └─────────────────────┘

                  ▲ POST /mcp/indexer/files
                  │   (file content, JSON-encoded)
                  │
┌─────────────────┴───────────────────────────────────────┐
│           Indexer Daemon (thin HTTP client)              │
│  Walks codebase · fsnotify watcher · base64 + POST       │
│  Modes: one-shot, daemon (file watch + auto-reindex)     │
└─────────────────────────────────────────────────────────┘

Configuration

Server config (configs/server.yaml)

Generated by setup.sh. Configures connections to Qdrant, Postgres, and optional features. Does not contain a codebase section — the server is multi-codebase.

Codebase config (configs/my-project.yaml)

One per codebase. See configs/example-codebase.yaml for all options. Key fields:

Field Description
codebase.path Absolute path to the codebase root
codebase.name Unique identifier used in MCP tool calls
codebase.languages Languages to parse
codebase.exclude_patterns Glob patterns to skip
indexing.incremental Skip unchanged files (default: true)

Environment variables

Variable Purpose
VOYAGE_API_KEY Voyage AI API key for embeddings
QDRANT_API_KEY Qdrant auth (auto-generated)
CI_PG_USER Postgres username (default: codebase_intel)
CI_PG_PASSWORD Postgres password (auto-generated)
MCP_API_KEY Bearer token for MCP HTTP API (auto-generated)

Troubleshooting

Server won't start

docker compose logs server    # check server logs
docker compose logs postgres  # check postgres logs
docker compose logs qdrant    # check qdrant logs

The server fails fast on startup if Postgres is unreachable — letting it run with a broken metadata store would silently degrade the MCP tools that depend on it. Docker's restart policy handles transient Postgres unavailability.

Health vs readiness

  • GET /health — liveness probe, always 200 while the HTTP server is up
  • GET /ready — readiness probe, 200 only if Postgres and Qdrant are both reachable right now; 503 with a per-backend status object otherwise
curl -s http://localhost:8090/ready
# {"ready":true,"backends":{"postgres":{"ok":true},"qdrant":{"ok":true}}}

Indexer can't connect to Postgres/Qdrant The indexer runs on the host and connects via localhost. Make sure the codebase config uses localhost (not postgres/qdrant) for store URLs.

Re-run migrations

source .env && ./bin/codebase-intel-indexer -config configs/my-project.yaml -migrate

Full reindex (ignore content hashes)

./scripts/index.sh configs/my-project.yaml --reindex

Reset everything

docker compose down -v   # removes volumes (all data)
./setup.sh               # fresh start

Contributing

See CONTRIBUTING.md for development setup, project layout, and conventions. Bug reports and PRs welcome.

License

MIT © Austin Schoen

About

MCP server for AI-powered codebase intelligence

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors