A self-hostable RAG / long-term-memory server for AI agents, written in Rust.
It stores documents, splits them into overlapping chunks, embeds each chunk locally, and indexes them with sqlite-vec (vector ANN) + FTS5 (BM25 keyword) for hybrid retrieval — then answers questions with RAG. Everything lives in a single SQLite file. Plug it into any agent via an OpenAI-compatible HTTP API, an MCP server (Claude Code, Cursor, …), or the CLI.
It is frictionless: layer0 serve auto-installs llama.cpp, auto-downloads
the default models, and starts the local sidecar for you. It runs fully
offline on any computer out of the box.
- Chunked retrieval — documents are chunked and embedded per chunk; RAG uses the matched chunk for tight context.
- sqlite-vec ANN index — cosine KNN over a
vec0virtual table, not a brute force scan. - Configurable RAG modes —
hybrid(vector + knowledge graph, then rerank, default),vector(semantic only), orgraph(graph-led). Vector + FTS5 BM25 are fused with Reciprocal Rank Fusion. - Knowledge graph, auto-built at ingest — entities + relationships are extracted by the chat LLM when documents are stored, so graph/hybrid retrieval has real data (not just manually-added nodes).
- Local-first, zero-config —
serveinstalls llama.cpp, downloads the default embedding (nomic) + chat (gemma-4-E4B) models, and starts the sidecar(s). - Flexible chat backend — resolves ACP (planned) → a remote backend like Claude (when an API key is set) → a local gemma model. No key required.
- OpenAI-compatible API, MCP server, and a CLI (incl. a
layer0 configTUI for editing settings). - Optional API-key auth, multi-database / multi-collection scoping.
- Self-update from GitHub releases (
layer0 update), configurable. - Single SQLite database — no external services.
layer0/
crates/
layer0-core/ DB, chunking, embeddings, sqlite-vec, graph, RAG, LLM client, installer, updater
layer0-server/ HTTP API server (OpenAI-compatible) + auth + bootstrap
layer0-cli/ CLI (layer0 binary)
layer0-mcp/ MCP server for Claude Code and other agents
skills/ agentskills.io skills (installable via `npx skills`)
.github/workflows/ CI (all platforms) + release (builds + GitHub Release)
- ACP client — planned: an editor/agent drives generation over the Agent Client Protocol (no model needed locally).
- Remote backend — used only when a key is available (e.g.
ANTHROPIC_API_KEY). Defaults to Claude via Anthropic's OpenAI-compatible endpoint. Any OpenAI-compatible server works. - Local gemma fallback — when no key is set, a local gemma model served by the llama.cpp sidecar handles chat, fully offline.
Embeddings are always local (nomic via the sidecar), unless you point
[llm].base_url at a remote embeddings endpoint — in which case the sidecar is
skipped automatically.
cargo build --release
# binaries: target/release/{layer0, layer0-server, layer0-mcp}Requires Rust stable + a C toolchain (MSVC on Windows, gcc/clang elsewhere).
SQLite is bundled. Prebuilt archives are on the GitHub Releases page, named
layer0-<target-triple>.{zip,tar.gz} and containing all three binaries.
layer0 initWrites ~/.layer0/config.toml, creates data dirs, and generates
.claude/mcp.json + .cursor/mcp.json in the current directory.
layer0 serveOn first run this installs llama.cpp, downloads the default embedding model
(nomic-embed-text-v1.5) and — if no chat key is set — the local chat model
(gemma-4-E4B-it), starts the sidecar(s), and serves on
http://127.0.0.1:8080. To use Claude instead of local gemma, set
ANTHROPIC_API_KEY before serving.
layer0 store "layer0 indexes chunks with sqlite-vec."
layer0 search "vector search"
layer0 ask "What does layer0 use for vector search?"
layer0 statusGlobal config: ~/.layer0/config.toml (see config/default.toml for the
fully-commented template). Environment overrides use the LAYER0__ prefix
(double underscore separates nested keys, e.g. LAYER0__SERVER__PORT).
Edit it interactively with layer0 config (a ratatui TUI) or by hand.
Key sections: [server] (host/port/cors, optional api_key), [llm] (local
embeddings backend), [chat] (remote chat backend), [embeddings] (dimensions
— must match the model, nomic = 768), [chunking] (chunk_size/overlap),
[rag] (mode = hybrid/vector/graph, rerank, extract_graph), [installer]
(model repos/files, ports, auto_start), [update] (repo, auto_check,
auto_update).
Set [server].api_key to require X-API-Key (or Authorization: Bearer) on
every request except /health. Unset = open (local default).
Base: http://localhost:8080. Highlights:
POST /v1/documents store (auto-chunked + embedded)
POST /v1/search hybrid search (vector + BM25 [+ graph] [+ rerank])
POST /v1/rag answer grounded in memory
GET/DELETE /v1/documents[/:id] list / fetch / delete
/v1/graph/... nodes, edges, BFS query
POST /v1/embeddings OpenAI-compatible
POST /v1/chat/completions OpenAI-compatible (routes to the chat backend)
/v1/db/:database/:collection/... scoped variants of the above
GET /v1/stats counts
GET /health liveness (no auth)
layer0 mcp # stdio JSON-RPC 2.0Tools: store_memory, search_memory, rag_query, get_document,
delete_memory, graph_query, memory_stats. layer0 init writes the
client config; or add it manually to .claude/mcp.json / .cursor/mcp.json.
skills/ contains agentskills.io-compatible skills
(layer0-setup, layer0-memory) — install them into any skills-aware
agent (e.g. npx skills add <repo>).
layer0 update # self-update from the latest GitHub release[update].auto_check logs when a newer release exists on serve;
[update].auto_update applies it on startup (takes effect on next restart).
GitHub Actions build and test on Linux/macOS/Windows. Pushing a v*.*.* tag
builds release binaries for five targets (linux x64/arm64, macOS x64/arm64,
windows x64) and publishes them to a GitHub Release. Release asset names embed
the Rust target triple, which the self-updater matches.
| Table | Contents |
|---|---|
documents |
Source documents + metadata (FTS5 mirror in documents_fts) |
chunks |
Per-document chunks (the retrieval unit) |
vec_chunks |
sqlite-vec vec0 cosine index over chunk embeddings |
graph_nodes / graph_edges |
Knowledge graph |
databases / collections |
Named scopes |
models |
Model registry |
MIT