Skip to content
refact-planner edited this page Jun 7, 2026 · 1 revision

AST

Tree-sitter indexing, cross-reference storage, and AST-aware code skeletons for search and embeddings.

Languages and parsers

The AST subsystem covers 8 languages: C, C++, Python, Java, Kotlin, JavaScript, Rust, and TypeScript. It uses 7 tree-sitter parsers because C and C++ share one parser.

Indexing flow

AST indexing is a two-phase process:

  1. Parse files and store the extracted symbols.
  2. Link cross-references between definitions and usages.

Indexing runs in the background in batches, so large workspaces are processed incrementally instead of blocking the main request path.

Storage model

AST data is stored in LMDB. The key layout uses prefixes that separate concerns:

  • d| for definitions
  • c| for fuzzy lookup
  • u| for back-links
  • classes| for inheritance

Skeletonization

The skeletonizer builds abbreviated code views from declarations and selected members. These reduced snippets are used as embedding-friendly text, preserving structure while trimming implementation detail.

HTTP endpoints

The engine exposes AST-related HTTP endpoints under /ast-*, including:

  • /ast-file-symbols
  • /ast-status

These endpoints return symbol information and indexing status for the currently available AST service.

Related links

Clone this wiki locally