Skip to content

R-Dson/pi-codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

codebase-memory — pi-coding-agent extension

A minimal port of codebase-memory-mcp as a pi-coding-agent extension.

Instead of a Go binary + tree-sitter + SQLite the extension runs entirely inside the Node.js process that already hosts pi:

MCP component Extension equivalent
Go binary + CGO Node.js built-ins — zero native deps
tree-sitter AST Per-language regex, line-by-line with quick-filter
Content-hash incremental index MD5 per file, async stat batch, same skip-if-unchanged logic
SQLite WAL database .pi-codebase.bin next to your project (v8 serialized)
11 MCP tools via stdio 5 pi tools + 1 slash-command

Installation

# Install directly from GitHub:
pi install git:github.com/R-Dson/pi-codebase

# Or via npm:
pi install npm:pi-codebase-memory

Resources


Tools

codebase_index

Full scan — walks the project, extracts symbols, writes .pi-codebase.bin.

codebase_index()
codebase_index({ root_path: "/my/app" })

Supported languages: TypeScript · JavaScript · Python · Go · Rust · Java · C# · PHP · C · C++ · Ruby · Swift · Kotlin · Shell · Perl · Dart · Lua · Scala · R

Ignored directories: node_modules, .git, dist, build, .next, __pycache__, target, .cache, vendor, .venv, venv, coverage, .nyc_output, out


codebase_update (incremental)

Re-parses only files whose MD5 content hash has changed since the last index run. Unchanged files are reused verbatim — identical to the MCP's incremental reindex strategy. Falls back to a full scan when no prior index exists.

codebase_update()                        # check everything since last run
codebase_update({ root_path: "/my/app" })

Output tells you how many files were +added, -removed, or ~changed.


codebase_search

Query the in-memory index — much faster than grep for structural questions. Equivalent to search_graph in the MCP.

codebase_search({ query: "Handler" })                     # name regex
codebase_search({ kind: "class" })                        # by kind
codebase_search({ query: "process", file_pattern: "api" })
codebase_search({ kind: "function", limit: 100 })

Supported kinds: function · method · class · interface · type · variable · struct · enum · trait · module · route · http_call · macro · protocol · extension · object


codebase_refs

Find every usage of a symbol across the project. Equivalent to trace_call_path(direction="inbound") in the MCP.

Search back-end priority:

  1. ripgrep (rg) — if installed; fastest, cross-platform including native Windows
  2. grep — Unix (Linux / macOS / WSL)
  3. Pure Node.js — always available; slower on large trees but works everywhere
codebase_refs({ symbol: "processOrder" })
codebase_refs({ symbol: "UserService", file_pattern: "*.ts" })
codebase_refs({ symbol: "main", limit: 200 })

codebase_schema

High-level overview: file counts per language, symbol counts per kind, index age, root directory listing. Equivalent to get_graph_schema.

codebase_schema()

The output also reports the platform and which search back-ends are active (find, grep, rg), so you know exactly what the extension is using.


Command

/codebase   →  index status (root, file/symbol count, age, platform info)

What gets extracted

Language Kinds
TypeScript / TSX function, arrow function, class, interface, type, enum, method, route, http_call
JavaScript / JSX function, arrow function, class, method, route, http_call
Python function, method, class, route
Go function, method, struct, interface, type
Rust function, struct, enum, trait, type, module
Java class, interface, enum, method, route
C# class, interface, enum, struct, function, route
PHP function, class, interface, route
C function, struct, enum, type, macro
C++ class, struct, enum, function, method, type, macro
Ruby class, module, method
Swift class, struct, protocol, enum, function, method, type, extension
Kotlin class, interface, function, method, type, enum, object
Shell function
Perl function, module, class
Dart class, function, method, enum, type, mixin
Lua function, module
Scala class, object, trait, function, method, type, enum
R function

Signatures are captured up to 200 characters — enough to show full generic bounds in Rust (pub fn foo<T: Serialize + Clone>() and long Java return types.


Platform support

Environment Discovery Symbol extraction Reference search
Linux / macOS find (fast) Node.js regex rggrep
WSL find (fast) Node.js regex rggrep
Native Windows Node.js walk Node.js regex rg → JS scan

Install ripgrep (winget install ripgrep / brew install ripgrep / apt install ripgrep) to get the fastest reference search on all platforms.


Persistence & incremental workflow

# Day 1 — initial index
codebase_index()          →  writes .pi-codebase.bin

# Day 2, session start    →  index reloaded automatically from .pi-codebase.bin

# After editing a few files
codebase_update()         →  only changed files are re-parsed (hash diff)

# After a big refactor
codebase_index()          →  full re-scan (safe to run at any time)

Add .pi-codebase.bin to .gitignore if you prefer not to commit it:

echo ".pi-codebase.bin" >> .gitignore

Workflow examples

# Structural overview of an unfamiliar repo
You: "What does this codebase look like?"
  → codebase_index() then codebase_schema()

# Find all HTTP handlers
You: "Where are the route handlers?"
  → codebase_search({ query: "Handler|Route|Controller", kind: "function" })

# Call-site tracing
You: "What calls processPayment?"
  → codebase_refs({ symbol: "processPayment" })

# Dead-code hint
You: "Find all exported functions in the billing package"
  → codebase_search({ query: "^[A-Z]", kind: "function", file_pattern: "billing" })

# After editing
You: "I just moved some files around, update the index"
  → codebase_update()

Comparison with codebase-memory-mcp

Feature MCP This extension
Requires Go + CGO ❌ — zero external deps
tree-sitter AST accuracy ⚠️ regex (resilient to syntax errors)
Content-hash incremental index ✅ MD5, async stat batch, same strategy
Call-graph edges (multi-hop) ❌ (use codebase_refs for single-hop)
Cross-service HTTP linking
Cypher-like query language
Dead-code detection
Works inside pi without MCP
Modular, no build step
Persistent index
Reference search ✅ (rg / grep / JS)
Symbol search
Schema / overview
Windows support ❌ (WSL only) ✅ (native + WSL)
Resilient to broken syntax ⚠️ ✅ regex keeps working

File structure

codebase-memory/
├── index.ts       # Entry point — state, events, registration (~140 lines)
├── types.ts       # Interfaces, constants, language specs with quickFilter (~420 lines)
├── indexing.ts    # File discovery, symbol extraction, full/incremental index (~340 lines)
├── search.ts      # ripgrep / grep / JS reference search (~90 lines)
└── tools.ts       # Helpers + 5 tool registrations with renderers (~480 lines)

Performance

The indexer is optimized for speed:

  • Concurrent I/O — semaphore-based worker pool keeps N files in flight simultaneously
  • Async stat batchincrementalIndex stats all files in parallel, not sequentially
  • Per-language quick-filter — a single cheap regex skips ~85% of lines before running expensive pattern matches
  • Native crypto — MD5 via Node's C++ crypto module (hardware-accelerated)
  • v8 serialization — binary index persistence is 5–10× faster than JSON
  • Auto-tuned thread poolUV_THREADPOOL_SIZE set to max(cpus × 2, 32) at startup

Real-world result on the Linux kernel (64,770 files, 7M+ symbols): ~24 seconds on a modern machine.


License

MIT

About

A fast, lightweight codebase indexing and search extension for pi-coding-agent. It provides instant symbol navigation, incremental re-indexing via content hashing, and powerful reference search (leveraging ripgrep/grep), and zero runtime dependencies.

Resources

License

Stars

Watchers

Forks

Contributors