Skip to content

Cirilcetra/codegraph

Repository files navigation

   ___          _        ___                 _
  / __\___   __| | ___  / _ \_ __ __ _ _ __ | |__
 / /  / _ \ / _` |/ _ \/ /_\/ '__/ _` | '_ \| '_ \
/ /__| (_) | (_| |  __/ /_\\| | | (_| | |_) | | | |
\____/\___/ \__,_|\___\____/|_|  \__,_| .__/|_| |_|
                                      |_|

Live, queryable knowledge graph for your codebase.

CodeGraph

npm version License: MIT Node.js >=20

Turn any JS/TS/Python codebase into a live, queryable knowledge graph — then give your AI assistant a way to navigate it.

CodeGraph indexes your repository using tree-sitter into an embedded Kuzu graph database with vector embeddings, then exposes a local MCP server that Claude Code, Cursor, and Windsurf can call to answer structural questions about your code.

Zero infrastructure. The graph lives at ~/.codegraph/. No Docker, no external services, no cloud.


What you can ask

Once connected, ask your AI assistant questions like:

  • "What calls useAuth in this repo?"
  • "Show me the full component tree rooted at App."
  • "What's the blast radius of renaming formatPrice?"
  • "Find all symbols semantically similar to 'JWT auth helper'."
  • "What are the transitive dependencies of src/lib/db.ts?"

Behind the scenes, the assistant picks from 10 typed MCP tools that translate to Cypher queries against your indexed graph — no LLM hallucination about your code structure.


Install

npm i -g @leanlabsinnov/codegraph

Requires Node.js 20+. Works on macOS, Linux, and Windows.


Quickstart

The fastest way to get started is codegraph run — a single command that handles setup, indexing, and serving:

codegraph run ~/my-project             # setup + index + serve
codegraph run ~/my-project --watch     # …and auto re-index on file changes

It will prompt for an LLM provider and API key if you haven't configured one yet, run a quick self-test, incrementally index the repo, and boot the MCP server.

Manual setup (step by step)

# 1. Pick an LLM provider
codegraph config llm set byo-openai        # also: byo-anthropic, byo-google, local-ollama
export OPENAI_API_KEY=sk-...

# 2. Verify the connection (5-token gen + 1 embedding round-trip)
codegraph config llm test

# 3. Index a repo — parses, extracts symbols/edges, and embeds everything
codegraph index ~/my-project

# 4. Boot the MCP server
codegraph serve
# → MCP server: http://127.0.0.1:3748/mcp
# → Bearer token: see ~/.codegraph/config.json

Then point your AI client at http://127.0.0.1:3748/mcp with the bearer token. See docs/clients.md for copy-paste config snippets for Claude Code, Cursor, and Windsurf.


Commands

Command Description
codegraph run <path> All-in-one: setup, incremental index, serve. Add --watch to auto re-index on changes
codegraph run <path> --watch Same as above, plus watches for file changes with 2s debounce
codegraph index <path> Walk the repo, parse JS/TS/Python, embed every symbol, write to the graph
codegraph index <path> --incremental Only re-index files that changed since last run
codegraph index <path> --no-embed Parse only — faster, semantic search disabled
codegraph status <path> Node/edge counts and embedding coverage for the indexed repo
codegraph wipe [path] Delete a repo's graph rows (--yes skips confirmation), or the whole graph dir
codegraph serve [--port N] [--host H] Boot the MCP server (default port 3748)
codegraph doctor Health check: Node version, config, API keys, Kuzu write, LLM round-trip
codegraph config show Print the resolved ~/.codegraph/config.json
codegraph config llm set [preset] Switch LLM preset (interactive picker when no arg)
codegraph config llm test Round-trip the configured provider — one gen + one embed

MCP Tools

The server exposes 10 tools over SSE on http://127.0.0.1:3748/mcp:

Tool Description
search_symbol Find symbols by name — exact, prefix, optional kind/path filter
find_file Locate files by path fragment
search_semantic Vector similarity search across all embedded symbols
get_file_context All imports, exports, and defined symbols for a file
find_callers Who calls a given function or symbol (via CALLS edges)
get_component_tree Recursive RENDERS descendants from a root component
affected_by Nodes reachable from a symbol via CALLS/IMPORTS/RENDERS
get_dependencies Direct and transitive IMPORTS of a file
blast_radius Reverse-BFS upstream dependent count (CALLS + IMPORTS + RENDERS)
nl_query Natural language → Cypher via LLM → validated → executed (read-only guard)

LLM Providers

Preset Generation model Embedding model Dimensions
byo-openai gpt-4o-mini text-embedding-3-small 1536
byo-anthropic claude-3-5-haiku-latest text-embedding-3-small (OpenAI) 1536
byo-google gemini-1.5-flash-latest text-embedding-004 768
local-ollama qwen2.5-coder:14b nomic-embed-text 768

Switch providers with codegraph config llm set. Switching provider triggers a re-embed — every vector is tagged with provider:model:dimension; mismatched vectors never silently pollute search results.


Architecture

codegraph CLI
     │
     ▼
 ingestion  ──── web-tree-sitter (parse JS/TS/Python)
     │       ──── LLM router (embed all non-File symbols)
     │
     ▼
 Kuzu graph DB  (~/.codegraph/graph)
     │       ──── Symbol nodes  (File, Function, Class, Interface,
     │                           Component, Route, Variable)
     │       ──── Rel tables    (IMPORTS, CALLS, RENDERS,
     │                           INHERITS, DEFINES, EXPORTS)
     │
     ▼
 MCP server  (SSE · http://127.0.0.1:3748/mcp)
     │       ──── 10 MCP tools (typed Cypher + vector search)
     │       ──── in-memory LRU result cache (30 s TTL)
     │       ──── bearer-token auth
     ▼
 Claude Code / Cursor / Windsurf

How indexing works

  1. Walk — gitignore-aware file walk, filtered to .ts/.tsx/.js/.jsx/.py
  2. Parse — per-file web-tree-sitter parse with lazy WASM grammar loading
  3. Extract — 5-pass AST extraction per JS/TS file:
    • Declarations → nodes + DEFINES/EXPORTS/INHERITS edges
    • Import statements → IMPORTS edges
    • Call expressions → CALLS edges
    • JSX elements → RENDERS edges
    • Route detection (Express + Next.js App/Pages router)
  4. Resolve — cross-file edge resolution, tsconfig path alias support
  5. Embed — batch of 100 symbols per LLM call, format: "${kind} ${name}\n${signature}\n${leadingComment}"
  6. WritedeleteByRepo() + upsertNodes() + upsertEdges() in Kuzu

Project Structure

codegraph/
├── packages/
│   ├── cli/          @leanlabsinnov/codegraph — published CLI (bundles all below)
│   ├── ingestion/    @codegraph/ingestion      — tree-sitter parse + embed engine
│   ├── graph-db/     @codegraph/graph-db       — Kuzu embedded DB client
│   ├── mcp-server/   @codegraph/mcp-server     — MCP SSE server + 10 tools
│   ├── llm-router/   @codegraph/llm-router     — multi-provider LLM abstraction
│   └── shared/       @codegraph/shared         — types, schemas, constants
├── docs/
│   └── clients.md    — client setup (Claude Code, Cursor, Windsurf)
├── fixtures/
│   ├── sample-app/   — deterministic Next.js + Express test fixture
│   └── sample-python/
└── scripts/
    ├── smoke-mcp.ts
    └── smoke-tree-sitter.ts

Development

Prerequisites

  • Node.js 20+
  • pnpm 9+

Setup

git clone https://github.com/Cirilcetra/codegraph.git
cd codegraph
pnpm install
cp .env.example .env   # add your API key
pnpm build

Scripts

Script Description
pnpm build Build all packages
pnpm dev Watch-mode build across all packages
pnpm test Run all tests (vitest)
pnpm test:watch Watch-mode tests
pnpm typecheck Type-check all packages
pnpm lint Biome lint
pnpm format Biome format (write)
pnpm smoke Run both smoke tests

Running the MCP server locally

pnpm build
node packages/cli/dist/cli.js serve

Troubleshooting

Run codegraph doctor first — it covers 90% of issues (missing API key, unwriteable storage, wrong Node version).

For client-specific issues (token config, SSE connection, Cursor MCP setup), see docs/clients.md.


Roadmap

  • Incremental delta re-indexing (codegraph run --watch / codegraph index --incremental)
  • All-in-one codegraph run command with auto-setup, serve, and file watcher
  • HNSW vector index (blocked on Kuzu upstream fixes #5965 / #6040)
  • Web-based graph visualizer (Phase 4)
  • Managed hosted option

Contributing

PRs welcome. Please run pnpm lint && pnpm typecheck && pnpm test before opening one.


License

MIT — see LICENSE

About

Local code knowledge graph for JS, TS, and Python — symbols, call graphs, blast radius, semantic search. Exposed to Cursor / Claude Code / Windsurf via MCP. No Docker.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages