Long-term memory for AI agents. Hierarchical, perspectival, aging knowledge.
Status: 0.1.0 — MCP tool & CLI for local use. APIs are evolving. Author: Florian Burka, developed in dialogue with Claude
VecLayer organizes knowledge as a hierarchy: summaries over summaries, at arbitrary depth, from different perspectives on the same raw data. A search starts with the overview and drills down on demand — like human remembering.
Instead of flat chunk lists or key-value stores, VecLayer provides structured, aging, self-describing memory. From the statistical shape of all memories — embedding clusters weighted by salience — an identity emerges organically.
Summaries are not a feature alongside others — they are the memory itself. The hierarchy that makes RAG better (overview before detail, navigation instead of flat lists) is the same structure from which identity emerges. And personality is not shaped by what you do often, but by what moved you. That is why salience measures significance, not frequency.
- Semantic search with hierarchy — query your documents and get results organized by document → section → paragraph, not flat chunk lists
- Persistent AI memory — an MCP server that gives coding assistants (Claude, etc.) long-term memory across sessions
- Automatic aging — important knowledge stays present, unused knowledge naturally fades
- Identity from memory — on connect, an agent receives a priming of who it is: core knowledge, open threads, recent learnings
- One primitive: Entry — Everything is an Entry. Four types:
raw,summary,meta,impression. ID =sha256(content), first 7 hex chars in CLI (like git). Identical content = identical ID = idempotent. - Seven perspectives —
intentions,people,temporal,knowledge,decisions,learnings,session. Each perspective has hints for LLMs. Extensible with custom perspectives. - Memory aging — RRD-inspired access tracking with fixed time windows. Important stays present, unused fades. Configurable degradation rules.
- Salience — Measures significance, not frequency. Composite of interaction density (0.5), perspective spread (0.25), and revision activity (0.25). High-salience entries survive aging.
- Identity — Emerges from salience-weighted embedding centroids per perspective. On connect, the agent receives a priming: core knowledge, open threads, recent learnings. The moment an agent wakes up and knows itself.
- Sleep cycle — Optional LLM-powered consolidation: reflect → think → add → compact. Most think actions are mechanical; only reflection and consolidation require an LLM.
For technical details see ARCHITECTURE.md.
# Initialize a new VecLayer store
veclayer init
# Store knowledge
veclayer store ./docs # files/directories
veclayer store "Core decision: Rust" # single text
veclayer store --perspective decisions "We chose Turso over Postgres"
# Recall
veclayer recall "architecture decisions"
veclayer recall --perspective decisions "backend"
# Drill down
veclayer focus abc1234
# Start server (MCP/HTTP)
veclayer serveVecLayer resolves configuration in this order (highest priority first): CLI flags → environment variables → user config path overrides → user config globals → project-local config → git auto-detection → defaults.
# Global defaults
data_dir = "~/.local/share/veclayer"
[llm]
provider = "ollama"
model = "llama3.2"
base_url = "http://localhost:11434"
# api_key = "sk-..." # for OpenAI-compatible providers (use HTTPS!)
# Project-specific overrides — matched by path glob and/or git remote
[[match]]
path = "~/work/myproject"
data_dir = "~/work/myproject/.veclayer-data"
[[match]]
git_remote = "github\\.com/myorg/.*"
data_dir = "~/.local/share/veclayer/myorg"[llm]
model = "codellama"VecLayer supports local and external embedding backends.
| Model | Dimension | Config value |
|---|---|---|
| BAAI/bge-small-en-v1.5 (default) | 384 | Xenova/bge-small-en-v1.5 |
| BAAI/bge-base-en-v1.5 | 768 | Xenova/bge-base-en-v1.5 |
| BAAI/bge-large-en-v1.5 | 1024 | Xenova/bge-large-en-v1.5 |
| all-MiniLM-L6-v2 | 384 | Xenova/all-MiniLM-L6-v2 |
Models download automatically on first use.
Use an external HTTP server for GPU-accelerated embeddings. VecLayer tries Ollama format first (/api/embed), then falls back to OpenAI-compatible (/v1/embeddings).
Config (.veclayer/config.toml or ~/.config/veclayer/config.toml):
[embedder]
type = "ollama"
model = "nomic-embed-text" # or any model your server supports
base_url = "http://localhost:11434"
dimension = 768 # must match the model's output dimensionEnvironment variables (override config):
VECLAYER_EMBEDDER=ollama
VECLAYER_OLLAMA_MODEL=nomic-embed-text
VECLAYER_OLLAMA_URL=http://localhost:11434
VECLAYER_OLLAMA_DIMENSION=768
TEI example (Hugging Face Text Embeddings Inference):
docker run --gpus all -p 8080:80 ghcr.io/huggingface/text-embeddings-inference \
--model-id BAAI/bge-small-en-v1.5
# then in config.toml:
# [embedder]
# type = "ollama"
# base_url = "http://localhost:8080"
# model = "BAAI/bge-small-en-v1.5"
# dimension = 384VecLayer provides an MCP server for integration with Claude Code and Opencode.
Ensure veclayer is installed and available in your PATH:
cargo install --path .First run downloads the embedding model (~130MB). See First Run for details.
Single project (store in project directory):
claude mcp add memory -- veclayer serve --mcp-stdioMulti-project setup with shared data directory:
# Add for each project with project-scoped memory (data directory is auto-created)
claude mcp add memory -- veclayer -d ~/.veclayer/data serve --mcp-stdio --project myappOpencode uses a similar MCP configuration format. Check the Opencode documentation for the current config path and schema.
Example configurations are available in .claude/settings.json.example (single-project) and .claude/settings.json.example.multi-project (multi-project).
Single project:
{
"mcpServers": {
"memory": {
"command": "veclayer",
"args": ["serve", "--mcp-stdio"]
}
}
}Multi-project (replace /home/you with your actual home directory — tilde ~ is not expanded in JSON):
{
"mcpServers": {
"memory": {
"command": "veclayer",
"args": ["-d", "/home/you/.veclayer/data", "serve", "--mcp-stdio", "--project", "myapp"]
}
}
}Use a single shared data directory with per-project MCP instances for isolation.
- One shared data directory (
~/.veclayer/data) - Each project gets its own MCP instance with
--project <name> - Project entries stay scoped to that project
- Personal entries (with
scope: "personal") are visible across all projects - Identity priming is computed from project-scoped + personal entries
# Project A: frontend (data directory is auto-created with 0700 permissions)
cd ~/projects/frontend
claude mcp add memory -- veclayer -d ~/.veclayer/data serve --mcp-stdio --project frontend
# Project B: backend
cd ~/projects/backend
claude mcp add memory -- veclayer -d ~/.veclayer/data serve --mcp-stdio --project backendStore knowledge that follows you across projects with scope: "personal":
{
"content": "I prefer Rust for systems programming due to safety and performance",
"scope": "personal",
"perspectives": ["learnings"]
}Project-scoped knowledge:
{
"content": "Frontend uses React with TypeScript",
"scope": "project",
"perspectives": ["knowledge"]
}| Command | Description |
|---|---|
init |
Initialize a new VecLayer store |
store |
Store knowledge (text, file, directory) |
recall |
Semantic search with perspective filter |
focus |
Drill into an entry, show children |
reflect |
Identity snapshot, salience ranking, archive candidates |
think |
Curate: promote, demote, relate, discover, aging, LLM consolidation |
serve |
Start MCP/HTTP server |
status |
Store statistics |
perspective |
Manage perspectives (list, add, remove) |
history |
Show version/relation history of an entry |
archive |
Demote entries to deep_only visibility |
export |
Export entries to JSONL |
import |
Import entries from JSONL |
Aliases: add = store, search/s = recall, f = focus, id = reflect
- Rust toolchain (stable, edition 2021+)
- protoc (Protocol Buffers compiler) — required by LanceDB
- Debian/Ubuntu:
apt-get install protobuf-compiler - macOS:
brew install protobuf
- Debian/Ubuntu:
- Internet access during first build —
ort-sysdownloads ONNX Runtime (~19 MB)
cargo build # debug build
cargo build --release # optimized build
cargo install --path . # install to PATHOn first use, VecLayer downloads the embedding model (BAAI/bge-small-en-v1.5, ~130 MB via HuggingFace) to a local cache (.fastembed_cache/ relative to the working directory). This requires internet access.
veclayer init
veclayer store "test" # triggers model download on first runFailed to initialize FastEmbed: Failed to retrieve onnx/model.onnx
The embedding model couldn't be downloaded. Common causes:
- No internet access or corporate TLS proxy intercepting HTTPS
- Fix: manually download the model files from
Xenova/bge-small-en-v1.5on HuggingFace and place them in.fastembed_cache/models--Xenova--bge-small-en-v1.5/snapshots/<commit_hash>/
Could not find protoc
Install the Protocol Buffers compiler (see prerequisites above).
Failed to connect to Ollama
The think cycle and cluster summarization require a running Ollama instance. These features are optional — store, recall, focus, and all non-LLM commands work without it.
| Feature | Default | What it enables |
|---|---|---|
llm |
Yes | LLM-powered summarization, clustering, think/consolidation cycle |
sync |
No | Cross-store synchronization (experimental) |
Build without LLM dependencies:
cargo build --no-default-featuresAll core functionality (store, recall, focus, perspectives, aging, identity) works without the llm feature. Only summarization and the think command require it.
| Component | Technology |
|---|---|
| Language | Rust |
| Storage | LanceDB (local cache & indices) |
| Embeddings | fastembed (CPU, ONNX) — trait-based, swappable |
| Parsing | pulldown-cmark (Markdown), extensible |
| Server | axum (MCP + HTTP) |
| CLI | clap v4 |
| Config | TOML + ENV overrides (12-Factor) |
Phases 1–5.5 complete: core model, perspectives, aging/salience, identity, think cycle, tool ergonomics. Next up: Phase 6 (UCAN sharing). See Issues for the full roadmap.
The following are known issues tracked for future releases:
- HTTP server has no authentication — The REST API (
veclayer serve --http) binds to localhost with restricted CORS but has no auth tokens. Do not expose to untrusted networks. Auth is planned for a future release. - API keys stored as plain strings — LLM API keys (for OpenAI-compatible providers) are held in memory as
Stringwithout zeroing on drop. Acceptable for CLI use; not suitable for long-running shared server deployments without additional safeguards. - Test env var manipulation — Some tests use
std::env::set_varwhich becameunsafein Rust 1.83+. These tests useserial_testfor isolation but will needunsafeblocks in a future Rust edition. - chunk.rs scope — The core
chunkmodule (1000+ lines) is planned for decomposition before v0.2:ChunkRelationand relation constants will move to therelationsmodule.
Explicitly rejected approaches — documented and reasoned, not forgotten.
| Rejected | Instead | Why |
|---|---|---|
| JSON annotations on entries | Content carries the semantics | No schema drift from optional fields |
| Paths as sole structure | Perspectives | Same entry, different views |
| Tags | Perspectives with hints | Tags are flat and unexplained |
| Separate vector spaces for emotions | Salience as composite score | One space, different weightings |
| S3 backends | Local files + Turso/pgvector | Simplicity, latency, offline capability |
| ACLs | UCAN | Decentralized, delegatable, offline-verifiable |
| Bearer tokens | UCAN with DID | Cryptographic, attenuatable |
| Static tool descriptions | Dynamic priming | Personalized per agent and session |
| Leaf/node separation | Everything is an Entry | One primitive, four types |
| "Trees" as concept | Perspectives | Trees are rigid, perspectives are views |
| Graph database | Relations on entries | The graph reveals itself in visualization |
| Metadata fields for emotions | Perspectives + content | The perspective is the semantics |
| Tool call hooks for auto-capture | Behavioral hints in priming | Intelligence stays with the agent |
MIT