CodeGraph

   ___          _        ___                 _
  / __\___   __| | ___  / _ \_ __ __ _ _ __ | |__
 / /  / _ \ / _` |/ _ \/ /_\/ '__/ _` | '_ \| '_ \
/ /__| (_) | (_| |  __/ /_\\| | | (_| | |_) | | | |
\____/\___/ \__,_|\___\____/|_|  \__,_| .__/|_| |_|
                                      |_|

Live, queryable knowledge graph for your codebase.

CodeGraph

Turn any JS/TS/Python codebase into a live, queryable knowledge graph — then give your AI assistant a way to navigate it.

CodeGraph indexes your repository using tree-sitter into an embedded Kuzu graph database with vector embeddings, then exposes a local MCP server that Claude Code, Cursor, and Windsurf can call to answer structural questions about your code.

Zero infrastructure. The graph lives at ~/.codegraph/. No Docker, no external services, no cloud.

What you can ask

Once connected, ask your AI assistant questions like:

"What calls useAuth in this repo?"
"Show me the full component tree rooted at App."
"What's the blast radius of renaming formatPrice?"
"Find all symbols semantically similar to 'JWT auth helper'."
"What are the transitive dependencies of src/lib/db.ts?"

Behind the scenes, the assistant picks from 10 typed MCP tools that translate to Cypher queries against your indexed graph — no LLM hallucination about your code structure.

Install

npm i -g @leanlabsinnov/codegraph

Requires Node.js 20+. Works on macOS, Linux, and Windows.

Quickstart

The fastest way to get started is codegraph run — a single command that handles setup, indexing, and serving:

codegraph run ~/my-project             # setup + index + serve
codegraph run ~/my-project --watch     # …and auto re-index on file changes

It will prompt for an LLM provider and API key if you haven't configured one yet, run a quick self-test, incrementally index the repo, and boot the MCP server.

Manual setup (step by step)

# 1. Pick an LLM provider
codegraph config llm set byo-openai        # also: byo-anthropic, byo-google, local-ollama
export OPENAI_API_KEY=sk-...

# 2. Verify the connection (5-token gen + 1 embedding round-trip)
codegraph config llm test

# 3. Index a repo — parses, extracts symbols/edges, and embeds everything
codegraph index ~/my-project

# 4. Boot the MCP server
codegraph serve
# → MCP server: http://127.0.0.1:3748/mcp
# → Bearer token: see ~/.codegraph/config.json

Then point your AI client at http://127.0.0.1:3748/mcp with the bearer token. See docs/clients.md for copy-paste config snippets for Claude Code, Cursor, and Windsurf.

Commands

Command	Description
`codegraph run <path>`	All-in-one: setup, incremental index, serve. Add `--watch` to auto re-index on changes
`codegraph run <path> --watch`	Same as above, plus watches for file changes with 2s debounce
`codegraph index <path>`	Walk the repo, parse JS/TS/Python, embed every symbol, write to the graph
`codegraph index <path> --incremental`	Only re-index files that changed since last run
`codegraph index <path> --no-embed`	Parse only — faster, semantic search disabled
`codegraph status <path>`	Node/edge counts and embedding coverage for the indexed repo
`codegraph wipe [path]`	Delete a repo's graph rows (`--yes` skips confirmation), or the whole graph dir
`codegraph serve [--port N] [--host H]`	Boot the MCP server (default port 3748)
`codegraph doctor`	Health check: Node version, config, API keys, Kuzu write, LLM round-trip
`codegraph config show`	Print the resolved `~/.codegraph/config.json`
`codegraph config llm set [preset]`	Switch LLM preset (interactive picker when no arg)
`codegraph config llm test`	Round-trip the configured provider — one gen + one embed

MCP Tools

The server exposes 10 tools over SSE on http://127.0.0.1:3748/mcp:

Tool	Description
`search_symbol`	Find symbols by name — exact, prefix, optional kind/path filter
`find_file`	Locate files by path fragment
`search_semantic`	Vector similarity search across all embedded symbols
`get_file_context`	All imports, exports, and defined symbols for a file
`find_callers`	Who calls a given function or symbol (via `CALLS` edges)
`get_component_tree`	Recursive `RENDERS` descendants from a root component
`affected_by`	Nodes reachable from a symbol via `CALLS`/`IMPORTS`/`RENDERS`
`get_dependencies`	Direct and transitive `IMPORTS` of a file
`blast_radius`	Reverse-BFS upstream dependent count (`CALLS + IMPORTS + RENDERS`)
`nl_query`	Natural language → Cypher via LLM → validated → executed (read-only guard)

LLM Providers

Preset	Generation model	Embedding model	Dimensions
`byo-openai`	`gpt-4o-mini`	`text-embedding-3-small`	1536
`byo-anthropic`	`claude-3-5-haiku-latest`	`text-embedding-3-small` (OpenAI)	1536
`byo-google`	`gemini-1.5-flash-latest`	`text-embedding-004`	768
`local-ollama`	`qwen2.5-coder:14b`	`nomic-embed-text`	768

Switch providers with codegraph config llm set. Switching provider triggers a re-embed — every vector is tagged with provider:model:dimension; mismatched vectors never silently pollute search results.

Architecture

codegraph CLI
     │
     ▼
 ingestion  ──── web-tree-sitter (parse JS/TS/Python)
     │       ──── LLM router (embed all non-File symbols)
     │
     ▼
 Kuzu graph DB  (~/.codegraph/graph)
     │       ──── Symbol nodes  (File, Function, Class, Interface,
     │                           Component, Route, Variable)
     │       ──── Rel tables    (IMPORTS, CALLS, RENDERS,
     │                           INHERITS, DEFINES, EXPORTS)
     │
     ▼
 MCP server  (SSE · http://127.0.0.1:3748/mcp)
     │       ──── 10 MCP tools (typed Cypher + vector search)
     │       ──── in-memory LRU result cache (30 s TTL)
     │       ──── bearer-token auth
     ▼
 Claude Code / Cursor / Windsurf

How indexing works

Walk — gitignore-aware file walk, filtered to .ts/.tsx/.js/.jsx/.py
Parse — per-file web-tree-sitter parse with lazy WASM grammar loading
Extract — 5-pass AST extraction per JS/TS file:
- Declarations → nodes + DEFINES/EXPORTS/INHERITS edges
- Import statements → IMPORTS edges
- Call expressions → CALLS edges
- JSX elements → RENDERS edges
- Route detection (Express + Next.js App/Pages router)
Resolve — cross-file edge resolution, tsconfig path alias support
Embed — batch of 100 symbols per LLM call, format: "${kind} ${name}\n${signature}\n${leadingComment}"
Write — deleteByRepo() + upsertNodes() + upsertEdges() in Kuzu

Project Structure

codegraph/
├── packages/
│   ├── cli/          @leanlabsinnov/codegraph — published CLI (bundles all below)
│   ├── ingestion/    @codegraph/ingestion      — tree-sitter parse + embed engine
│   ├── graph-db/     @codegraph/graph-db       — Kuzu embedded DB client
│   ├── mcp-server/   @codegraph/mcp-server     — MCP SSE server + 10 tools
│   ├── llm-router/   @codegraph/llm-router     — multi-provider LLM abstraction
│   └── shared/       @codegraph/shared         — types, schemas, constants
├── docs/
│   └── clients.md    — client setup (Claude Code, Cursor, Windsurf)
├── fixtures/
│   ├── sample-app/   — deterministic Next.js + Express test fixture
│   └── sample-python/
└── scripts/
    ├── smoke-mcp.ts
    └── smoke-tree-sitter.ts

Development

Prerequisites

Node.js 20+
pnpm 9+

Setup

git clone https://github.com/Cirilcetra/codegraph.git
cd codegraph
pnpm install
cp .env.example .env   # add your API key
pnpm build

Scripts

Script	Description
`pnpm build`	Build all packages
`pnpm dev`	Watch-mode build across all packages
`pnpm test`	Run all tests (vitest)
`pnpm test:watch`	Watch-mode tests
`pnpm typecheck`	Type-check all packages
`pnpm lint`	Biome lint
`pnpm format`	Biome format (write)
`pnpm smoke`	Run both smoke tests

Running the MCP server locally

pnpm build
node packages/cli/dist/cli.js serve

Troubleshooting

Run codegraph doctor first — it covers 90% of issues (missing API key, unwriteable storage, wrong Node version).

For client-specific issues (token config, SSE connection, Cursor MCP setup), see docs/clients.md.

Roadmap

Incremental delta re-indexing (codegraph run --watch / codegraph index --incremental)
All-in-one codegraph run command with auto-setup, serve, and file watcher
HNSW vector index (blocked on Kuzu upstream fixes #5965 / #6040)
Web-based graph visualizer (Phase 4)
Managed hosted option

Contributing

PRs welcome. Please run pnpm lint && pnpm typecheck && pnpm test before opening one.

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
fixtures		fixtures
packages		packages
scripts		scripts
web		web
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
codegraph-0.0.0.tgz		codegraph-0.0.0.tgz
codegraph-build-plan.md		codegraph-build-plan.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeGraph

What you can ask

Install

Quickstart

Manual setup (step by step)

Commands

MCP Tools

LLM Providers

Architecture

How indexing works

Project Structure

Development

Prerequisites

Setup

Scripts

Running the MCP server locally

Troubleshooting

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeGraph

What you can ask

Install

Quickstart

Manual setup (step by step)

Commands

MCP Tools

LLM Providers

Architecture

How indexing works

Project Structure

Development

Prerequisites

Setup

Scripts

Running the MCP server locally

Troubleshooting

Roadmap

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages