___ _ ___ _
/ __\___ __| | ___ / _ \_ __ __ _ _ __ | |__
/ / / _ \ / _` |/ _ \/ /_\/ '__/ _` | '_ \| '_ \
/ /__| (_) | (_| | __/ /_\\| | | (_| | |_) | | | |
\____/\___/ \__,_|\___\____/|_| \__,_| .__/|_| |_|
|_|
Live, queryable knowledge graph for your codebase.
Turn any JS/TS/Python codebase into a live, queryable knowledge graph — then give your AI assistant a way to navigate it.
CodeGraph indexes your repository using tree-sitter into an embedded Kuzu graph database with vector embeddings, then exposes a local MCP server that Claude Code, Cursor, and Windsurf can call to answer structural questions about your code.
Zero infrastructure. The graph lives at ~/.codegraph/. No Docker, no external services, no cloud.
Once connected, ask your AI assistant questions like:
- "What calls
useAuthin this repo?" - "Show me the full component tree rooted at
App." - "What's the blast radius of renaming
formatPrice?" - "Find all symbols semantically similar to 'JWT auth helper'."
- "What are the transitive dependencies of
src/lib/db.ts?"
Behind the scenes, the assistant picks from 10 typed MCP tools that translate to Cypher queries against your indexed graph — no LLM hallucination about your code structure.
npm i -g @leanlabsinnov/codegraphRequires Node.js 20+. Works on macOS, Linux, and Windows.
The fastest way to get started is codegraph run — a single command that handles setup, indexing, and serving:
codegraph run ~/my-project # setup + index + serve
codegraph run ~/my-project --watch # …and auto re-index on file changesIt will prompt for an LLM provider and API key if you haven't configured one yet, run a quick self-test, incrementally index the repo, and boot the MCP server.
# 1. Pick an LLM provider
codegraph config llm set byo-openai # also: byo-anthropic, byo-google, local-ollama
export OPENAI_API_KEY=sk-...
# 2. Verify the connection (5-token gen + 1 embedding round-trip)
codegraph config llm test
# 3. Index a repo — parses, extracts symbols/edges, and embeds everything
codegraph index ~/my-project
# 4. Boot the MCP server
codegraph serve
# → MCP server: http://127.0.0.1:3748/mcp
# → Bearer token: see ~/.codegraph/config.jsonThen point your AI client at http://127.0.0.1:3748/mcp with the bearer token. See docs/clients.md for copy-paste config snippets for Claude Code, Cursor, and Windsurf.
| Command | Description |
|---|---|
codegraph run <path> |
All-in-one: setup, incremental index, serve. Add --watch to auto re-index on changes |
codegraph run <path> --watch |
Same as above, plus watches for file changes with 2s debounce |
codegraph index <path> |
Walk the repo, parse JS/TS/Python, embed every symbol, write to the graph |
codegraph index <path> --incremental |
Only re-index files that changed since last run |
codegraph index <path> --no-embed |
Parse only — faster, semantic search disabled |
codegraph status <path> |
Node/edge counts and embedding coverage for the indexed repo |
codegraph wipe [path] |
Delete a repo's graph rows (--yes skips confirmation), or the whole graph dir |
codegraph serve [--port N] [--host H] |
Boot the MCP server (default port 3748) |
codegraph doctor |
Health check: Node version, config, API keys, Kuzu write, LLM round-trip |
codegraph config show |
Print the resolved ~/.codegraph/config.json |
codegraph config llm set [preset] |
Switch LLM preset (interactive picker when no arg) |
codegraph config llm test |
Round-trip the configured provider — one gen + one embed |
The server exposes 10 tools over SSE on http://127.0.0.1:3748/mcp:
| Tool | Description |
|---|---|
search_symbol |
Find symbols by name — exact, prefix, optional kind/path filter |
find_file |
Locate files by path fragment |
search_semantic |
Vector similarity search across all embedded symbols |
get_file_context |
All imports, exports, and defined symbols for a file |
find_callers |
Who calls a given function or symbol (via CALLS edges) |
get_component_tree |
Recursive RENDERS descendants from a root component |
affected_by |
Nodes reachable from a symbol via CALLS/IMPORTS/RENDERS |
get_dependencies |
Direct and transitive IMPORTS of a file |
blast_radius |
Reverse-BFS upstream dependent count (CALLS + IMPORTS + RENDERS) |
nl_query |
Natural language → Cypher via LLM → validated → executed (read-only guard) |
| Preset | Generation model | Embedding model | Dimensions |
|---|---|---|---|
byo-openai |
gpt-4o-mini |
text-embedding-3-small |
1536 |
byo-anthropic |
claude-3-5-haiku-latest |
text-embedding-3-small (OpenAI) |
1536 |
byo-google |
gemini-1.5-flash-latest |
text-embedding-004 |
768 |
local-ollama |
qwen2.5-coder:14b |
nomic-embed-text |
768 |
Switch providers with codegraph config llm set. Switching provider triggers a re-embed — every vector is tagged with provider:model:dimension; mismatched vectors never silently pollute search results.
codegraph CLI
│
▼
ingestion ──── web-tree-sitter (parse JS/TS/Python)
│ ──── LLM router (embed all non-File symbols)
│
▼
Kuzu graph DB (~/.codegraph/graph)
│ ──── Symbol nodes (File, Function, Class, Interface,
│ Component, Route, Variable)
│ ──── Rel tables (IMPORTS, CALLS, RENDERS,
│ INHERITS, DEFINES, EXPORTS)
│
▼
MCP server (SSE · http://127.0.0.1:3748/mcp)
│ ──── 10 MCP tools (typed Cypher + vector search)
│ ──── in-memory LRU result cache (30 s TTL)
│ ──── bearer-token auth
▼
Claude Code / Cursor / Windsurf
- Walk — gitignore-aware file walk, filtered to
.ts/.tsx/.js/.jsx/.py - Parse — per-file
web-tree-sitterparse with lazy WASM grammar loading - Extract — 5-pass AST extraction per JS/TS file:
- Declarations → nodes +
DEFINES/EXPORTS/INHERITSedges - Import statements →
IMPORTSedges - Call expressions →
CALLSedges - JSX elements →
RENDERSedges - Route detection (Express + Next.js App/Pages router)
- Declarations → nodes +
- Resolve — cross-file edge resolution, tsconfig path alias support
- Embed — batch of 100 symbols per LLM call, format:
"${kind} ${name}\n${signature}\n${leadingComment}" - Write —
deleteByRepo()+upsertNodes()+upsertEdges()in Kuzu
codegraph/
├── packages/
│ ├── cli/ @leanlabsinnov/codegraph — published CLI (bundles all below)
│ ├── ingestion/ @codegraph/ingestion — tree-sitter parse + embed engine
│ ├── graph-db/ @codegraph/graph-db — Kuzu embedded DB client
│ ├── mcp-server/ @codegraph/mcp-server — MCP SSE server + 10 tools
│ ├── llm-router/ @codegraph/llm-router — multi-provider LLM abstraction
│ └── shared/ @codegraph/shared — types, schemas, constants
├── docs/
│ └── clients.md — client setup (Claude Code, Cursor, Windsurf)
├── fixtures/
│ ├── sample-app/ — deterministic Next.js + Express test fixture
│ └── sample-python/
└── scripts/
├── smoke-mcp.ts
└── smoke-tree-sitter.ts
- Node.js 20+
- pnpm 9+
git clone https://github.com/Cirilcetra/codegraph.git
cd codegraph
pnpm install
cp .env.example .env # add your API key
pnpm build| Script | Description |
|---|---|
pnpm build |
Build all packages |
pnpm dev |
Watch-mode build across all packages |
pnpm test |
Run all tests (vitest) |
pnpm test:watch |
Watch-mode tests |
pnpm typecheck |
Type-check all packages |
pnpm lint |
Biome lint |
pnpm format |
Biome format (write) |
pnpm smoke |
Run both smoke tests |
pnpm build
node packages/cli/dist/cli.js serveRun codegraph doctor first — it covers 90% of issues (missing API key, unwriteable storage, wrong Node version).
For client-specific issues (token config, SSE connection, Cursor MCP setup), see docs/clients.md.
- Incremental delta re-indexing (
codegraph run --watch/codegraph index --incremental) - All-in-one
codegraph runcommand with auto-setup, serve, and file watcher - HNSW vector index (blocked on Kuzu upstream fixes #5965 / #6040)
- Web-based graph visualizer (Phase 4)
- Managed hosted option
PRs welcome. Please run pnpm lint && pnpm typecheck && pnpm test before opening one.
MIT — see LICENSE