Skip to content

aivmem/avm

AVM - AI Virtual Memory

AVM — 面向多 Agent 的本地零成本共享记忆系统。语义搜索,FUSE 挂载,私有+共享隔离。

Core Value

  • 面向多 Agent 的本地共享记忆 — 多个 Agent 共享同一记忆层,私有空间互不干扰
  • 零成本 — 本地 sentence-transformers (all-MiniLM-L6-v2),无需任何 API key,无网络依赖
  • 语义搜索 — 不是关键词匹配,是向量相似度。"伊朗军事冲突" 能找到 "中东局势紧张"
  • FUSE 挂载cat/echo/ls 直接操作记忆,shell 脚本和任何工具都能用
  • 多 Agent 隔离 — 私有空间 (/private/) + 共享空间 (/memory/shared/),协作不混淆

Why You Need AVM

The Problem: LLMs forget everything between sessions. Context windows are limited. RAG retrieves chunks, not structured knowledge.

AVM solves this:

Challenge Without AVM With AVM
Multi-agent sync Copy-paste, version chaos Shared namespaces, :delta for changes
Memory isolation All-or-nothing access Private + shared, per-agent permissions
Context limits Fixed window, truncate Token-aware recall, fit any budget
Knowledge structure Flat vector chunks Linked graph, typed relationships
Discovery Need exact keywords Semantic search + browse/explore/timeline

Real examples:

# Trading agent remembers across sessions
trader.remember("NVDA RSI at 72, overbought", importance=0.9, tags=["market"])
# 3 months later...
trader.recall("what did I observe about NVDA?", max_tokens=500)

# Agent forgets what it knows
trader.topics()      # "technical: 12, crypto: 8, macro: 5"
trader.timeline(7)   # "Mon: BTC signal, Tue: Fed notes..."

# Multi-agent collaboration
analyst.remember("SPY pattern", namespace="shared")
trader.recall("market patterns")  # sees analyst's shared memory

When to Use AVM

Best for:

  • 📦 Shared knowledge — Company docs, cron configs, market analysis that multiple agents access
  • 🤝 Multi-agent collaboration — Agent A writes analysis, Agent B recalls it
  • 🔄 Incremental sync — Read only changes since last read with :delta
  • 🗂️ External references — Paths, schedules, entity descriptions (not file content itself)

Not needed for:

  • 🔒 Private agent memory — Most agent frameworks have built-in memory tools
  • 📄 Code indexing — IDEs and LSP do this better
  • 📝 Ephemeral notes — Use TTL or just don't store

Rule of thumb: If only one agent needs it, use /private/ (auto-scoped to your agent). If multiple agents need it, put it in /memory/shared/.

AVM vs MemGPT

MemGPT/Letta AVM
Philosophy LLM manages its own memory Explicit API, you control
Memory decisions LLM decides when to store/retrieve Agent calls remember()/recall()
Architecture Agent framework Pure storage layer
LLM dependency Needs LLM for every memory op No LLM needed
Multi-agent Single agent focus Built-in isolation + sharing
Interface Python SDK FUSE mount, MCP, CLI, Python
Integration Self-contained Works with shell, editors, any tool

Analogy:

  • MemGPT = Autopilot (LLM drives)
  • AVM = Manual transmission (you drive)

When to use which:

  • MemGPT: Want autonomous memory, single agent, hands-off
  • AVM: Want explicit control, multi-agent, integrate with existing tools

They can work together: Use AVM as storage backend, add MemGPT-style logic on top for automatic memory management.

🎮 See it in action (click to expand)
    ╔═══════════════════════════════════════════════════════════╗
    ║     █████╗ ██╗   ██╗███╗   ███╗                          ║
    ║    ██╔══██╗██║   ██║████╗ ████║                          ║
    ║    ███████║██║   ██║██╔████╔██║                          ║
    ║    ██╔══██║╚██╗ ██╔╝██║╚██╔╝██║                          ║
    ║    ██║  ██║ ╚████╔╝ ██║ ╚═╝ ██║                          ║
    ║    AI Virtual Memory - Playground                         ║
    ╚═══════════════════════════════════════════════════════════╝

============================================================
  1. BASIC READ/WRITE
============================================================
✓ Written: /memory/lessons/risk_management.md
✓ Written: /memory/market/NVDA_analysis.md

📌 Read content:
   # Risk Management Rules
   ## Position Sizing
   - Never risk more than 2% of portfolio on a single trade
   - Use stop-loss orders religiously

============================================================
  2. FULL-TEXT SEARCH
============================================================
📌 Search: 'RSI overbought':
   [0.85] /memory/lessons/risk_management.md
   [0.72] /memory/market/NVDA_analysis.md

============================================================
  3. KNOWLEDGE GRAPH (LINKING)
============================================================
✓ Linked: NVDA_analysis → risk_management (related)

📌 Links from risk_management.md:
   → /memory/market/NVDA_analysis.md (related)

============================================================
  4. AGENT MEMORY (TOKEN-AWARE RECALL)
============================================================
✓ Remembered: NVDA warning (importance: 0.9)
✓ Remembered: BTC observation (importance: 0.7)

📌 Recall: 'NVDA risk' (max 500 tokens):
   ## Relevant Memory (2 items, ~120 tokens)
   [/memory/private/trader/nvda_warning.md] (0.92)
   NVDA showing weakness. RSI at 72, reduce exposure.

============================================================
  5. MULTI-AGENT ISOLATION
============================================================
✓ Analyst stored: SPY pattern (private to analyst)

📌 Trader tries to recall analyst's memory:
   Cannot access - private to analyst

📌 Trader stats: Private: 3
📌 Analyst stats: Private: 1

============================================================
  6. INCREMENTAL COLLABORATION
============================================================
# Analyst updates shared report
$ echo "New finding" >> /shared/report.md

# Trader reads only the changes
$ cat /shared/report.md:delta
# v3 (2026-03-07 10:30)
--- +++ @@ -5 +5,2 @@
+New finding

# Next read shows no changes
$ cat /shared/report.md:delta
(no changes)

============================================================
  6. METADATA & TAGS
============================================================
📌 Tag Cloud:
   market: 2, nvda: 1, warning: 1, btc: 1

============================================================
  7. NAVIGATION & DISCOVERY
============================================================
📌 Topics:
   📁 private: 3 memories
   🏷️ market: 2, technical: 1, crypto: 1

📌 Timeline (today):
   [14:30] nvda_alert: NVDA RSI at 72...
   [14:25] btc_note: BTC holding $65K...

📌 Workflow: topics() → browse() → explore() → recall()

============================================================
  DEMO COMPLETE 🎉
============================================================

Run it yourself:

pip install -e .
python playground.py

Performance

Benchmarked on Apple M2 Pro, 16GB RAM, macOS 15.7, Python 3.13, SQLite 3.45 (WAL mode).

Metric Value Notes
Write throughput 468 ops/s WAL + async embedding
Read throughput (hot) 724,000 ops/s LRU cache hit
Read throughput (cold) 3,300 ops/s Cache miss → SQLite
Search throughput 2,000 ops/s FTS5 full-text
Cache hit rate 95% Zipf access pattern
Token savings 97%+ vs. loading all memories

Key findings:

  • LRU cache is the dominant optimization — 420x read improvement
  • Multi-agent contention — SQLite write lock serializes writes; per-agent throughput drops linearly with agent count
  • Cold start — First query ~6x slower due to embedding model initialization

See detailed benchmarks and ablation study for full analysis.

Multi-Agent Discovery

Method Hops Latency Architecture
Traditional recall 4 ~3.5ms Per-agent search
TopicIndex 1 ~0.5ms Pre-computed index
Librarian 1 ~1.7ms Centralized router
Gossip 1 ~0.5ms Decentralized bloom filters

Features

  • FUSE Mount - Mount as filesystem, use ls, cat, echo
  • Virtual Nodes - Access metadata via :meta, :links, :tags
  • MCP Server - Integrate with AI agents via MCP protocol
  • Agent Memory - Token-aware recall with scoring strategies
  • Multi-Agent - Permissions, quotas, audit logging
  • Tell System - Cross-agent messaging with priority levels (urgent/normal/low), webhook delivery
  • Full-Text Search - FTS5 (English recommended; Chinese lacks tokenizer support)
  • Semantic Search - Local embedding (all-MiniLM-L6-v2), zero API cost, auto-index on write
  • FAISS Index - High-performance vector search (21x faster than SQLite brute force)
  • Hybrid Search - Combines FTS + semantic for best precision/recall tradeoff
  • TopicIndex - O(1) recall for known topics, reduces hop count from 4 to 1
  • Librarian - Global knowledge router for multi-agent discovery (95% hop reduction)
  • Gossip Protocol - Decentralized agent discovery using bloom filter digests
  • Memory Consolidation - Sleep-like memory processing: decay, merge, summarize
  • Subscriptions - Path pattern monitoring with webhook push notifications
  • Memory Digest - Daily/on-demand summaries of recent activity

Install

pip install -e .

# For FUSE mount (optional)
pip install fusepy
# macOS: brew install macfuse
# Linux: apt install fuse3

Quick Start

Python API

from avm import AVM

avm = AVM()

# Read/Write
avm.write("/memory/lesson.md", "# Trading Lesson\n\nRSI > 70 = overbought")
node = avm.read("/memory/lesson.md")

# Search
results = avm.search("RSI")

# Agent Memory
mem = avm.agent_memory("akashi")
mem.remember("NVDA showing weakness", tags=["market", "nvda"])
context = mem.recall("NVDA risk", max_tokens=4000)

CLI

# Read/Write
avm read /memory/lesson.md
avm write /memory/lesson.md --content "New lesson"

# Full-text search
avm search "RSI"

# Move / rename (DB-level, no FUSE required)
avm mv /memory/old-name.md /memory/new-name.md        # single node
avm mv /memory/news- /memory/archive/news-            # prefix tree (all children)
avm mv /memory/2024/ /archive/2024/                   # directory-style move

# Semantic search (embedding)
avm semantic "Iran conflict news"           # semantic similarity
avm semantic "BTC market" --limit 5         # limit results
avm semantic "trading" --agent akashi       # agent context

# Agent Memory (token-aware recall, hybrid FTS+embedding)
avm recall "NVDA risk" --agent akashi --max-tokens 4000

FUSE Mount

Mount AVM as a filesystem for shell access.

Requirements:

  • macOS: brew install macfuse (approve system extension in System Settings → Privacy & Security)
  • Linux: apt install fuse3
# Configure mounts in ~/.config/avm/mounts.yaml
# Example:
#   mounts:
#     - mountpoint: ~/.openclaw/workspace/avm
#       agent_id: myagent

# Start daemon (manages all mounts)
# Recommended: use launchd/systemd for auto-start on login
avm-daemon start --daemon   # background (double-fork)
avm-daemon start            # foreground (for launchd/systemd managed processes)

# Check status
avm-daemon status

# Reload config
avm-daemon reload

# Stop daemon
avm-daemon stop

# Use standard shell commands
ls /mnt/avm/memory/
cat /mnt/avm/memory/lesson.md
echo "New insight" > /mnt/avm/memory/log.md

# Virtual nodes (append suffix to any file path)
cat /mnt/avm/memory/lesson.md:meta      # Metadata (JSON)
cat /mnt/avm/memory/lesson.md:links     # Related nodes
cat /mnt/avm/memory/lesson.md:tags      # Tags
cat /mnt/avm/memory/lesson.md:ttl       # Time-to-live
cat /mnt/avm/memory/lesson.md:history   # Version history
cat /mnt/avm/memory/:list               # Directory listing
cat '/mnt/avm/memory/:list?limit=10'    # Paginated
cat '/mnt/avm/memory/:list?tag=work'    # Filter by tag
cat '/mnt/avm/memory/:changes?minutes=5' # Recent changes
cat /mnt/avm/memory/:stats              # Statistics
cat "/mnt/avm/:search?q=RSI"            # Search
cat "/mnt/avm/:recall?q=NVDA"           # Token-aware recall

# Shortcuts - quick access via @xxx prefix
cat /mnt/avm/memory/:list               # Shows: @abc  lesson.md  Risk management...
cat /mnt/avm/@abc                       # Access file by shortcut
cat /mnt/avm/@abc:meta                  # Works with suffixes too

MCP Server

# Start MCP server
avm-mcp --user akashi
# mcp_servers.yaml
avm-memory:
  command: avm-mcp
  args: ["--user", "akashi"]

MCP Tools:

Tool Description
avm_recall Token-controlled memory retrieval
avm_browse Get paths + summaries (two-pe)
avm_fetch Get full content of selected paths
avm_remember Store memory with tags/importance
avm_search Full-text search
avm_list List by prefix
avm_read Read specific path
avm_tags Tag cloud
avm_recent Time-based queries
avm_stats Statistics

Navigation & Discovery

When an agent forgets context or doesn't know keywords, use navigation methods:

mem = avm.agent_memory("trader")

# 1. Topic overview - see what's in memory
mem.topics()
# ## Memory Topics
# ### By Category:
#   📁 private: 15 memories
# ### By Tag:
#   🏷️ technical: 4 occurrences
#   🏷️ crypto: 3 occurrences

# 2. Browse tree - drill down without keywords
mem.browse("/memory", depth=2)
# 📁 private (15)
#   📁 trader (15)

# 3. Timeline - "what did I observe recently?"
mem.timeline(days=7, limit=10)
# ## Timeline (last 7 days)
# ### 2026-03-05
#   [14:30] nvda_rsi: NVDA RSI at 72...
#   [14:25] btc_support: BTC holding $65K...

# 4. Graph exploration - follow links
mem.explore("/memory/private/trader/nvda.md", depth=2)
# ## Starting from: .../nvda.md
# ### Hop 1:
#   [related] .../macd_analysis.md
# ### Hop 2:
#   [derived] .../trading_signal.md

Workflow: topics() → browse() → explore() → recall()

Configuration

# config.yaml
providers:
  # HTTP API
  - pattern: "/live/prices/{symbol}"
    handler: http
    config:
      url: "https://api.example.com/prices/${symbol}"
      headers:
        Authorization: "Bearer ${API_KEY}"
    ttl: 60

  # Script
  - pattern: "/system/status"
    handler: script
    config:
      command: "uptime"

  # Plugin
  - pattern: "/live/indicators/*"
    handler: plugin
    config:
      plugin: "my_plugins.talib"

permissions:
  - pattern: "/memory/*"
    access: rw
  - pattern: "/live/*"
    access: ro

default_access: ro

Handlers

Handler Description
file Local filesystem
http REST API calls
script Execute commands
plugin Python plugins
sqlite Database queries
index Structured index with status tracking

Index Handler (CLI/MCP only)

Track project files and extract code signatures:

# Via CLI
avm index scan myapp /path/to/project
avm index status myapp
avm index sigs myapp

Note: Index handler not exposed via FUSE mount, use CLI or MCP.

Custom Handlers

from avm import BaseHandler, register_handler

class RedisHandler(BaseHandler):
    def read(self, path, context):
        return self.redis.get(path)

register_handler('redis', RedisHandler)

Virtual Nodes

Access metadata via special suffixes:

Suffix Read Write
:meta JSON metadata Update metadata
:links Related nodes Add links
:tags Tags (comma-separated) Set tags
:shared Shared-with agents Set agents
:ttl Time remaining Set expiration (5m/2h/1d/never)
:history Change history (version, time, type) -
:path Relative path -
:info Available suffixes -
:data Raw content -
:list Directory listing -
:list?limit=N&offset=M Paginated listing -
:list?q=keyword Search + list -
:list?tag=xxx Filter by tag -
:changes?minutes=N Recently modified files -
:delta Diff since last read (auto-marks) -
:mark Read position (version) Update marker
:stats Statistics -
:search?q= Search results -
:recall?q= Token-aware recall -
:inbox Unread messages Mark all read

High-Performance Vector Search

AVM supports multiple vector storage backends for semantic search:

SQLite (default)

Brute-force cosine similarity, good for <5k documents:

from avm.embedding import EmbeddingStore, LocalEmbedding
store = EmbeddingStore(avm_store, LocalEmbedding())

FAISS (recommended for scale)

21x faster than SQLite, supports exact and approximate search:

from avm.faiss_store import FAISSEmbeddingStore, get_faiss_store
from avm.embedding import LocalEmbedding

# Flat index (exact, <10k docs)
store = FAISSEmbeddingStore(avm_store, LocalEmbedding(), index_type="flat")

# HNSW index (approximate, >10k docs)
store = FAISSEmbeddingStore(avm_store, LocalEmbedding(), index_type="hnsw")

# Batch index documents
store.add_nodes(nodes)
store.save()

# Search
results = store.search("market analysis", k=5)

Benchmark (2000 documents):

Backend Query Latency Recall
SQLite 58ms 100%
FAISS Flat 2.7ms 100%
FAISS HNSW 2.7ms ~90%

Subscriptions & Webhooks

Monitor path patterns for changes:

# Subscribe with webhook
avm subscribe "/memory/shared/market/*" -a trader -m realtime -w http://localhost:3000/hook

# Subscribe with throttling (batches updates)
avm subscribe "/memory/shared/*" -a analyst -m throttled -t 60

# List subscriptions
avm subscriptions --agent trader

# Unsubscribe
avm unsubscribe "/memory/shared/market/*" -a trader

Webhook payload:

{
  "event": "write",
  "path": "/memory/shared/market/nvda.md",
  "pattern": "/memory/shared/market/*",
  "agent_id": "trader",
  "timestamp": "2026-03-23T09:15:00Z"
}

Cross-Agent Messaging (Tell)

Send important messages to other agents:

# Send urgent message (injected into recipient's next read)
echo "DB schema changed!" > /mnt/avm/tell/kearsarge?priority=urgent

# Send normal message
echo "FYI: New API deployed" > /mnt/avm/tell/kearsarge

# Broadcast to all agents
echo "Team meeting at 3pm" > /mnt/avm/tell/@all

# Check your inbox
cat /mnt/avm/:inbox

# Mark all as read
cat "/mnt/avm/:inbox?mark=read"

Priority levels:

  • urgent - Injected into next file read (any file)
  • normal - Shown in :inbox
  • low - Only shown when explicitly reading :inbox

Two-Phase Retrieval

For large result sets, use two-pe retrieval to save tokens:

# Phase 1: Get paths + summaries (~200 tokens)
cat "/mnt/avm/memory/:search?q=NVDA"
# → [0.85] /memory/market/NVDA.md
# →     RSI overbought warning...
# → [0.72] /memory/lessons/nvda_q4.md
# →     Down 15% after Q4 earnings...

# Phase 2: Get selected content (~300 tokens)
cat /mnt/avm/memory/market/NVDA.md

# Total: 500 tokens vs 2000 tokens (75% saved)

Linux-Style Permissions

avm.init_permissions({
    "users": {
        "akashi": {
            "groups": ["trading", "admin"],
            "capabilities": ["search_all", "write", "sudo"]
        },
        "guest": {
            "groups": [],
            "capabilities": []
        }
    }
})

# Check permissions
user = avm.get_user("akashi")
avm.check_permission(user, "/memory/private/akashi/note.md", "write")

# API keys for skills
key = avm.create_api_key(user, paths=["/memory/*"], actions=["read"])

Multi-Bot Architecture

┌─────────────────────────────────────────┐
│           Application                   │
├─────────────────────────────────────────┤
│ Akashi → avm-mcp --user akashi ─┐       │
│ Yuze   → avm-mcp --user yuze   ─┼─→ DB  │
│ Laffey → avm-mcp --user laffey ─┘       │
└─────────────────────────────────────────┘
  • Each bot its own MCP process
  • Shared database for cross-bot memory
  • Auth at startup, no token per request

Database

Default location: ~/.local/share/avm/avm.db

Override:

avm --db /path/to/custom.db read /memory/note.md
XDG_DATA_HOME=/custom/path avm read /memory/note.md

FUSE Daemon Architecture

The daemon manages multiple FUSE mounts as separate fork()ed child processes. Each child gets its own /dev/macfuseN slot and is started serially (the parent polls stat().st_dev to confirm the previous mount is live before forking the next).

GPU Embedding (macOS MPS)

os.fork() invalidates the Apple GPU (MPS/XPC) context in the child. AVM solves this with a per-child multiprocessing.Pipe proxy:

parent (MPS GPU)                    child(akashi)
  LocalEmbedding(MPS)  ←──────────  PipeEmbeddingProxy(child_conn)
  EmbeddingPipeServer  ──send/recv──  encode("text") → [0.1, -0.3, ...]
  • Model loaded once in the parent, shared across all children
  • Each child has its own isolated Pipe fd pair — no cross-agent access
  • avm recall / avm semantic run in the main process and use MPS directly

Versions

  • v1.3.0 - GPU Pipe proxy, avm mv, fork-based daemon with st_dev polling
  • v1.2.0 - FAISS vector search (21x speedup), webhooks, hybrid search
  • v1.1.0 - TopicIndex O(1) recall, Gossip protocol, memory consolidation
  • v0.9.0 - Rename to AVM, FUSE mount with virtual nodes
  • v0.8.0 - Two-phase retrieval (browse + fetch)
  • v0.7.0 - Linux-style permissions, MCP server
  • v0.6.0 - Advanced features (sync, tags, export)
  • v0.5.0 - Multi-agent support
  • v0.4.0 - Agent Memory (token-aware recall)
  • v0.3.0 - Linked Retrieval + Document Synthesis
  • v0.2.0 - Config-driven providers/permissions
  • v0.1.0 - Core VFS

License

MIT

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages