🌐 layer0

A self-hostable RAG / long-term-memory server for AI agents, written in Rust.

It stores documents, splits them into overlapping chunks, embeds each chunk locally, and indexes them with sqlite-vec (vector ANN) + FTS5 (BM25 keyword) for hybrid retrieval — then answers questions with RAG. Everything lives in a single SQLite file. Plug it into any agent via an OpenAI-compatible HTTP API, an MCP server (Claude Code, Cursor, …), or the CLI.

It is frictionless: layer0 serve auto-installs llama.cpp, auto-downloads the default models, and starts the local sidecar for you. It runs fully offline on any computer out of the box.

Features

Chunked retrieval — documents are chunked and embedded per chunk; RAG uses the matched chunk for tight context.
sqlite-vec ANN index — cosine KNN over a vec0 virtual table, not a brute force scan.
Configurable RAG modes — hybrid (vector + knowledge graph, then rerank, default), vector (semantic only), or graph (graph-led). Vector + FTS5 BM25 are fused with Reciprocal Rank Fusion.
Knowledge graph, auto-built at ingest — entities + relationships are extracted by the chat LLM when documents are stored, so graph/hybrid retrieval has real data (not just manually-added nodes).
Local-first, zero-config — serve installs llama.cpp, downloads the default embedding (nomic) + chat (gemma-4-E4B) models, and starts the sidecar(s).
Flexible chat backend — resolves ACP (planned) → a remote backend like Claude (when an API key is set) → a local gemma model. No key required.
OpenAI-compatible API, MCP server, and a CLI (incl. a layer0 config TUI for editing settings).
Optional API-key auth, multi-database / multi-collection scoping.
Self-update from GitHub releases (layer0 update), configurable.
Single SQLite database — no external services.

Architecture

layer0/
  crates/
    layer0-core/    DB, chunking, embeddings, sqlite-vec, graph, RAG, LLM client, installer, updater
    layer0-server/  HTTP API server (OpenAI-compatible) + auth + bootstrap
    layer0-cli/     CLI (layer0 binary)
    layer0-mcp/     MCP server for Claude Code and other agents
  skills/              agentskills.io skills (installable via `npx skills`)
  .github/workflows/   CI (all platforms) + release (builds + GitHub Release)

How chat is resolved

ACP client — planned: an editor/agent drives generation over the Agent Client Protocol (no model needed locally).
Remote backend — used only when a key is available (e.g. ANTHROPIC_API_KEY). Defaults to Claude via Anthropic's OpenAI-compatible endpoint. Any OpenAI-compatible server works.
Local gemma fallback — when no key is set, a local gemma model served by the llama.cpp sidecar handles chat, fully offline.

Embeddings are always local (nomic via the sidecar), unless you point [llm].base_url at a remote embeddings endpoint — in which case the sidecar is skipped automatically.

Quick start

1. Build (or grab a release)

cargo build --release
# binaries: target/release/{layer0, layer0-server, layer0-mcp}

Requires Rust stable + a C toolchain (MSVC on Windows, gcc/clang elsewhere). SQLite is bundled. Prebuilt archives are on the GitHub Releases page, named layer0-<target-triple>.{zip,tar.gz} and containing all three binaries.

2. Initialize

layer0 init

Writes ~/.layer0/config.toml, creates data dirs, and generates .claude/mcp.json + .cursor/mcp.json in the current directory.

3. Serve (frictionless)

layer0 serve

On first run this installs llama.cpp, downloads the default embedding model (nomic-embed-text-v1.5) and — if no chat key is set — the local chat model (gemma-4-E4B-it), starts the sidecar(s), and serves on http://127.0.0.1:8080. To use Claude instead of local gemma, set ANTHROPIC_API_KEY before serving.

4. Use it

layer0 store "layer0 indexes chunks with sqlite-vec."
layer0 search "vector search"
layer0 ask "What does layer0 use for vector search?"
layer0 status

Configuration

Global config: ~/.layer0/config.toml (see config/default.toml for the fully-commented template). Environment overrides use the LAYER0__ prefix (double underscore separates nested keys, e.g. LAYER0__SERVER__PORT).

Edit it interactively with layer0 config (a ratatui TUI) or by hand.

Key sections: [server] (host/port/cors, optional api_key), [llm] (local embeddings backend), [chat] (remote chat backend), [embeddings] (dimensions — must match the model, nomic = 768), [chunking] (chunk_size/overlap), [rag] (mode = hybrid/vector/graph, rerank, extract_graph), [installer] (model repos/files, ports, auto_start), [update] (repo, auto_check, auto_update).

Auth

Set [server].api_key to require X-API-Key (or Authorization: Bearer) on every request except /health. Unset = open (local default).

HTTP API

Base: http://localhost:8080. Highlights:

POST /v1/documents              store (auto-chunked + embedded)
POST /v1/search                 hybrid search (vector + BM25 [+ graph] [+ rerank])
POST /v1/rag                    answer grounded in memory
GET/DELETE /v1/documents[/:id]  list / fetch / delete
/v1/graph/...                   nodes, edges, BFS query
POST /v1/embeddings             OpenAI-compatible
POST /v1/chat/completions       OpenAI-compatible (routes to the chat backend)
/v1/db/:database/:collection/... scoped variants of the above
GET  /v1/stats                  counts
GET  /health                    liveness (no auth)

MCP

layer0 mcp        # stdio JSON-RPC 2.0

Tools: store_memory, search_memory, rag_query, get_document, delete_memory, graph_query, memory_stats. layer0 init writes the client config; or add it manually to .claude/mcp.json / .cursor/mcp.json.

Skills

skills/ contains agentskills.io-compatible skills (layer0-setup, layer0-memory) — install them into any skills-aware agent (e.g. npx skills add <repo>).

Updating

layer0 update     # self-update from the latest GitHub release

[update].auto_check logs when a newer release exists on serve; [update].auto_update applies it on startup (takes effect on next restart).

Releases & CI

GitHub Actions build and test on Linux/macOS/Windows. Pushing a v*.*.* tag builds release binaries for five targets (linux x64/arm64, macOS x64/arm64, windows x64) and publishes them to a GitHub Release. Release asset names embed the Rust target triple, which the self-updater matches.

Database schema (single SQLite file)

Table	Contents
`documents`	Source documents + metadata (FTS5 mirror in `documents_fts`)
`chunks`	Per-document chunks (the retrieval unit)
`vec_chunks`	sqlite-vec `vec0` cosine index over chunk embeddings
`graph_nodes` / `graph_edges`	Knowledge graph
`databases` / `collections`	Named scopes
`models`	Model registry

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
config		config
crates		crates
skills		skills
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 layer0

Features

Architecture

How chat is resolved

Quick start

1. Build (or grab a release)

2. Initialize

3. Serve (frictionless)

4. Use it

Configuration

Auth

HTTP API

MCP

Skills

Updating

Releases & CI

Database schema (single SQLite file)

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 layer0

Features

Architecture

How chat is resolved

Quick start

1. Build (or grab a release)

2. Initialize

3. Serve (frictionless)

4. Use it

Configuration

Auth

HTTP API

MCP

Skills

Updating

Releases & CI

Database schema (single SQLite file)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages