Local-first semantic knowledge graph engine for LLM agents.
Atlas parses source code with tree-sitter, stores deterministic code facts in a local SQLite database, and exposes those facts through an interactive TUI, a CLI, and an MCP server. It is built for agents and developers that need reliable codebase context: symbol search, callers/callees, dependency edges, impact analysis, point inspection, bounded variable and caller tracing, forward call-chain queries, and C/C++ function-pointer dispatch annotations.
source code ──parse/extract──▶ .atlas/atlas.db ──query──▶ TUI / CLI / MCP
tree-sitter facts SQLite source of truth agent & developer context
- Features
- Install
- Quick start
- CLI
- MCP server
- Architecture
- Supported languages
- Documentation
- Development
- Known limitations
- License
- Local-first: writes all index data to
<project>/.atlas/atlas.db; no cloud service required. - Deterministic extraction: tree-sitter AST queries and stable blake3-based IDs instead of model guesses.
- Incremental sync: content-hash based dirty-file detection with Git-aware file discovery.
- Interactive TUI: keyboard-driven terminal UI with symbol search, detail view (Overview / Callers / Callees / Source tabs), and caller trace — launched via bare
atlaswith auto-indexing on empty databases. - Agent-native MCP: stdio MCP server exposing 18 bounded tools for search, graph, dependencies, trace, semantic analysis, background tasks, and project management.
- Graph + trace queries: callers, callees, shortest path, impact, source-position lookup, variable origin tracing, and caller-path tracing.
- Explicit capability boundaries: language capability metadata and trace diagnostics report partial results instead of silently overclaiming precision.
- Rust 1.85+ (Rust edition 2024)
- Git, recommended for file discovery (
atlasfalls back to filesystem traversal when needed)
git clone https://github.com/LordCasser/atlas.git
cd atlas
cargo build --release -p atlas-cli --features "all-languages,mcp"The binary is generated at target/release/atlas.
You can also install the local binary into Cargo's bin directory:
cargo install --path crates/atlas-cli --features "all-languages,mcp"# Run from your project root
# Auto-initialize the SQLite schema and build the index
atlas index
# Check project health
atlas status
atlas doctor
# Launch the interactive TUI (search, symbol detail, caller trace)
atlasAll commands accept --project <path> when running from outside the
project directory (supports both relative and absolute paths). The MCP server
uses the client's current working directory.
| Command | Purpose |
|---|---|
atlas (no subcommand) |
Launch the interactive TUI: symbol search, detail view (Overview/Callers/Callees/Source), and caller trace. Auto-indexes on first run. |
atlas index |
Auto-initialize .atlas/ schema, then discover and index source files. Supports --include, --exclude, --scope, and --analysis (manifest | structural | full). |
atlas sync |
Incrementally update the index after file changes. Supports --analysis. |
atlas status |
Show file, symbol, edge, database, and capability statistics. |
atlas doctor |
Check schema, SQLite/FTS5, grammar, and capability readiness. |
atlas files |
List indexed files with language and parse status. |
atlas mcp |
Start the stdio MCP server. Requires the mcp Cargo feature. |
The MCP server auto-initializes the database when starting from a fresh project, so you only need to ensure files are indexed:
# From your project root:
atlas index # (first time) OR atlas sync (incremental)
atlas mcp # auto-creates .atlas/ if missingMCP reads an existing
.atlas/atlas.db. Re-runatlas syncoratlas indexafter code changes.
Atlas MCP uses the client's current working directory. Configure the MCP server
without a project path, and start the client from the repository you want Atlas
to inspect. You can also switch projects at runtime with the project MCP tool
using action: "open".
Config files by client:
| Client | Global config | Project config |
|---|---|---|
| Claude Code | ~/.claude.json |
.claude/settings.local.json |
| Codex CLI | ~/.codex/config.toml |
- |
| OpenCode | ~/.config/opencode/opencode.json |
opencode.json in the project root |
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) |
- |
| Cursor | Cursor Settings -> MCP -> Add new MCP server | .cursor/mcp.json |
Claude/Cursor-style clients use
mcpServerswithcommandandargs. OpenCode uses its ownmcpobject: each server istype: "local"andcommandis a single array containing the executable and arguments.
Use the same no-project configuration for every repository.
Claude Code (~/.claude.json):
{
"mcpServers": {
"atlas": {
"command": "/path/to/atlas",
"args": ["mcp"]
}
}
}OpenCode (~/.config/opencode/opencode.json):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"atlas": {
"type": "local",
"command": ["/path/to/atlas", "mcp"],
"enabled": true
}
}
}Codex CLI (~/.codex/config.toml):
[mcp_servers.atlas]
command = "/path/to/atlas"
args = ["mcp"]
enabled = true| Group | MCP tools |
|---|---|
| Project management | project, index |
| Symbol search/detail | search, symbol |
| Graph navigation | calls, path, explore, impact |
| Trace | trace |
| File dependencies | file_dependencies |
| Semantic analysis | lifecycle, branch_diff, domain_rules |
| Background tasks | tasks, task_status, wait_for_task, resume_task |
| FP dispatch (C/C++) | fp_dispatches |
project(action="open")supports switching the active project at runtime. It defaults tostorage: "memory"andscan_files: falsefor zero-footprint, fast project switching. Usebackground: truefor large trees; then calltask_statusorwait_for_taskwith the returnedtask_id.projectactivates a project but does not index it; callindexafterwards.
Trace tools return the TraceQueryResponse<T> envelope documented in docs/trace-contract.md: ok, kind, capability, partial_result, diagnostics, and result.
Atlas is a Rust workspace with 15 Cargo packages. The public entry points are atlas-cli (CLI + TUI), atlas-mcp, and the atlas-engine facade. Engine internals are split by responsibility so extraction, persistence, graph construction, search, context, and trace can evolve independently.
atlas/
├── crates/
│ ├── atlas-cli # CLI binary + TUI (ratatui) + command dispatch
│ ├── atlas-mcp # stdio MCP server powered by rmcp + Atlas tool router
│ └── atlas-engine # public facade crate re-exporting core APIs
│ └── crates/
│ ├── types # IDs, IR records, language/capability metadata
│ ├── workspace # project root and source-path abstractions
│ ├── db # SQLite schema, Store, readers/writers
│ ├── extraction # tree-sitter frontends, SCM queries, scopes, bindings, dataflow, CFG
│ ├── resolution # reference/import/include/path-alias resolution
│ ├── graph # symbol edge builder, graph snapshot, graph traversal engine
│ ├── analysis # trace engine, variable slicing, caller-path analysis
│ ├── domain_rules # domain-specific semantic rules and rule learning
│ ├── search # FTS5 + LIKE + fuzzy search and query parsing
│ ├── context # agent-facing Markdown context builder
│ ├── filesync # file discovery, content hashing, incremental sync, locks
│ └── lazy # on-demand dataflow job planning and loading
├── docs/ # architectural and release documentation
├── skills/atlas/ # Agent Skill for using Atlas
├── Cargo.toml # workspace manifest
└── README.md
1. Discover files
└─ Git-aware discovery + include/exclude filters
2. Parse and extract
└─ tree-sitter frontends produce FileFacts: symbols, scopes, refs, imports, callsites, bindings, dataflow, CFG
3. Persist facts
└─ SQLite tables under .atlas/atlas.db are the source of truth
4. Resolve references
└─ scope/container/import/include/project-name matching; unresolved facts keep diagnostics instead of failing indexing
5. Build graph
└─ resolved refs and callsites become symbol_edges; GraphSnapshot accelerates read-only traversal
6. Serve queries
└─ TUI, CLI commands, and MCP tools call SearchEngine, GraphEngine, ContextBuilder, and TraceEngine
atlas-cli ──▶ atlas-engine, atlas-mcp
atlas-mcp ──▶ atlas-engine
atlas-engine facade ──▶ types, workspace, db, extraction, resolution,
graph, analysis, domain_rules, search, context,
filesync, lazy
engine internals stay acyclic:
types/workspace/db ─▶ extraction/resolution/graph/analysis/domain_rules/search/context/filesync/lazy ─▶ facade/API
Atlas stores index data in .atlas/atlas.db (schema version 1). Core tables include:
files symbols scopes references
imports symbol_edges callsites bindings
binding_uses data_nodes dataflow_edges cfg_nodes
cfg_edges function_summaries summary_param_reaches summary_return_sources
summary_call_arg_sources extraction_state extraction_jobs project_metadata
symbols_fts function_pointer_annotations
SQLite is the durable source of truth. In-memory graph snapshots are query accelerators and can be rebuilt from the database.
Default build:
| Language | Extensions | Capability level |
|---|---|---|
| TypeScript | .ts, .tsx |
DataflowFull |
| JavaScript | .js, .jsx, .mjs, .cjs |
DataflowFull |
| Python | .py, .pyi, .pyx |
DataflowFull |
all-languages build:
| Language | Extensions | Capability level |
|---|---|---|
| Java | .java |
DataflowFull |
| C | .c, .h |
DataflowFull |
| C++ | .cpp, .cc, .cxx, .hpp, .hh, .hxx |
DataflowFull |
| ArkTS | .ets, .sts |
DataflowFull via TypeScript grammar |
| Go | .go |
DataflowFull |
| C# | .cs |
DataflowFull |
| Rust | .rs |
DataflowFull |
| PHP | .php |
DataflowFull |
| Ruby | .rb |
DataflowFull |
| Kotlin | .kt, .kts |
DataflowFull |
| Cangjie | .cj, .cangjie |
DataflowFull |
Build variants:
cargo build --release -p atlas-cli
cargo build --release -p atlas-cli --features all-languages
cargo build --release -p atlas-cli --features "all-languages,mcp"Maintained documents:
docs/architecture.md— authoritative architecture: constraints, modules, schema, dataflow, capability profiles, design decisions.docs/requirements.md— product scope and acceptance criteria.docs/roadmap.md— current and future work.docs/testing.md— test layers, feature matrix, and release checks.docs/performance.md— measured performance baselines.docs/trace-contract.md— frozen trace JSON contract and diagnostics model.skills/atlas/SKILL.md— Agent Skill for using Atlas from another agent.
# Default tests: TypeScript, JavaScript, Python
cargo test
# Full CLI + MCP + all non-experimental language features
cargo test -p atlas-cli --features "all-languages,mcp"
# Build release binary with MCP
cargo build --release -p atlas-cli --features "all-languages,mcp"Conventions:
- Keep crate dependencies acyclic and aligned with the architecture above.
- Add or update fixtures when changing extraction, resolution, graph, or trace behavior.
- Update
docs/trace-contract.mdand tests when trace response fields or diagnostics change. - Update
docs/architecture.mdwhen implemented module boundaries, schema, CLI, MCP, or analysis behavior changes. - Keep release-facing documentation in
docs/; delete obsolete content rather than accumulating stale docs.
- Atlas performs best-effort semantic analysis, not compiler-grade type checking.
- C/C++ preprocessing is not expanded; include analysis is based on indexed directives and paths.
- Java classpath, Maven, and Gradle resolution are not fully modeled.
- Python dynamic runtime constructs and generated symbols are outside the static extraction model.
- TypeScript barrel/re-export chains use best-effort name fallback rather than a full export graph.
- Dataflow and trace precision varies by language; inspect
atlas doctoror trace capability metadata before relying on a trace result. - MCP serves a local SQLite index; run
atlas syncoratlas indexafter source changes. - Call edges (
Calls,Instantiates,Implements) are only created when both the caller and callee are indexed project symbols. External library calls (e.g.,useStatefromreact,printffromstdio.h) do not produce edges. See Edge visibility for details.
Atlas builds its code facts entirely from tree-sitter's Concrete Syntax Tree (CST). Here is the pipeline from raw source to traceable dataflow:
source code
→ tree_sitter::Parser (per-language grammar)
→ tree_sitter::Tree (CST)
Tree-sitter is an incremental, error-tolerant parser. Atlas uses 14 language grammars (TypeScript, JavaScript, Python, Java, C, C++, Go, C#, Rust, PHP, Ruby, Kotlin, ArkTS, Cangjie), each compiled from a grammar.js into a parser. Parsing is done per-file via a thread-local Parser to avoid allocation overhead.
CST root node
→ tree_sitter::Query (per-language .scm queries)
→ (capture_name, Node) pairs
Four tree-sitter queries run against every file:
| Query | .scm file |
Captures |
|---|---|---|
| definitions | definitions.scm |
(class_declaration) @definition.class, (function_declaration) @definition.function, etc. |
| references | references.scm |
(call_expression) @reference.call, (member_expression) @reference.field, etc. |
| imports | imports.scm |
(import_statement) @import, module path extraction |
| scopes | scopes.scm |
(function_declaration) @scope, (block) @scope, etc. |
Each capture includes its byte range and source text from the CST node. Queries are compiled once per language, then executed against every parsed file via QueryCursor::captures().
(capture_name, Node) pairs
→ LanguageAdapter::normalize()
→ Symbol, Reference, Import, ScopeDef (deterministic ID via blake3)
Each language has a LanguageAdapter that maps tree-sitter capture names to Atlas types. For example, a @definition.function capture becomes a Symbol with SymbolKind::Function, and its qualified name is built by walking child_by_field_name("name") up the CST. All IDs are deterministic — the same file always produces the same facts.
Symbols + Scopes
→ LexicalBinder (walks CST for `(identifier) @binding.use`)
→ BindingDef (declaration sites) + BindingUse (usage sites)
The LexicalBinder scans every identifier in the AST. For each usage, it walks the scope chain upward to find the nearest enclosing declaration with a matching name. This produces BindingDef/BindingUse pairs that connect variable uses to their definitions within the same file.
CST root + Bindings + Scopes
→ DataFlowBuilder (walks AST for assignment, call, field access, return patterns)
→ DataNode + DataFlowEdge
The DataFlowBuilder does NOT use tree-sitter queries — it walks the CST directly via Node::child(), child_by_field_name(), and named_children(). For each language, it pattern-matches against known AST node types:
| Pattern | AST nodes matched | Produces |
|---|---|---|
| Assignment | variable_declaration, assignment_expression |
Assign edge: RHS → LHS |
| Call arguments | call_expression → arguments → children |
ArgToCall edge: arg → call parameter slot |
| Field access | member_expression → property_identifier |
FieldLoad/FieldStore edges |
| Return values | return_statement → child expression |
ReturnValue edge |
| Destructuring | pattern_list, tuple_pattern, object_pattern |
Multi-target Assign edges |
DataNode records the source location (byte range), kind (Local, Param, Field, CallArg, Return, Expr), and function scope. DataFlowEdge connects a source node to a target node with a directed kind and confidence score.
CST root (per function)
→ CfgBuilder (walks function body, matching branch/loop/break AST patterns)
→ CfgNode + CfgEdge (Entry → blocks → Exit)
CFG construction walks the function AST, identifying control-flow splits (if_statement, switch_case, try_statement, for_statement, while_statement) and building a graph of basic blocks. Each CfgNode records the byte range it covers, and CfgEdge connects predecessor → successor. CFG is available for TypeScript, JavaScript, Python, Java, C, C++, Go, Rust, and Cangjie. C#, PHP, Ruby, and Kotlin do not yet have CFG support.
Atlas only creates call edges (Calls, Instantiates, Implements) when both the caller and the callee are indexed symbols in the project. If a reference resolves to a symbol outside the project — for example, an import from an external package like react, lodash, or std — no edge is produced.
How this works:
-
Resolution phase — Each reference is resolved against the project's symbol table. External imports (e.g.,
import { useState } from 'react') cannot be resolved because the target symbols are not indexed. These references remain unresolved. -
Edge building phase —
GraphBuilder::create_edges_for_referenceverifies that the resolved target symbol exists in the store viafind_symbol_by_id. If the target symbol is not found (external / not indexed), no incoming edge is added to the project's call graph. Similarly, edges require the source symbol (the enclosing function/class containing the reference) to exist — top-level statements without a containing symbol produce no edges.
Implications:
| Scenario | Edge created? |
|---|---|
foo() where foo is defined in the project |
✅ |
foo() where foo is imported from an external package |
❌ |
new Foo() where Foo is a class defined in the project |
✅ |
useState() where useState comes from react |
❌ |
| Top-level expression call (no enclosing function/class) | ❌ |
This design ensures the call graph is self-contained — all edges point to symbols that the user can inspect, trace, and navigate within their own codebase. External API calls are intentionally excluded to keep the graph focused on project-internal structure.
Symbol graph + DataFlow graph + CFG
→ TraceEngine (backward slice from user-specified location)
→ TracePath (step-by-step provenance: kind, range, file, confidence, evidence)
The TraceEngine combines symbol-level call graphs with intra-procedural dataflow. At call boundaries, it uses persistent summary tables (function_summaries, summary_param_reaches, summary_return_sources, summary_call_arg_sources) with CrossFunctionBridge to bridge ArgToParam and ReturnToCall edges across function boundaries without re-extracting dataflow.
| Component | Crate | Key files |
|---|---|---|
| Grammar registry | extraction |
grammar.rs |
| Queries | extraction |
queries/<lang>/*.scm |
| Language adapters | extraction |
languages/<lang>.rs |
| Normalize pipeline | extraction |
extract.rs |
| Query helpers | extraction |
query_helpers.rs |
| Lexical binding | extraction |
lexical_binder.rs |
| DataFlow builder | extraction |
dataflow_builder.rs |
| CFG builder | extraction |
cfg_builder.rs |
| Capability profiles | types |
capability.rs |
| Trace engine | analysis |
trace/engine.rs |
MIT. See LICENSE.