Skip to content

avisual/memories-plugin

Repository files navigation

avisual memories

Tests Install from source Python 3.13+ License Tests

The only AI memory system that prevents mistakes before they happen.

Completely free and open source. No API keys, no subscriptions, no paywall. Everything runs locally on your machine.

A brain-like memory system for AI coding agents that gives them persistent, searchable long-term memory with neural-inspired retrieval — plus antipatterns that proactively warn agents when they're about to repeat past mistakes.

Works with any MCP-compatible client — Claude Code, Cursor, Windsurf, VS Code (Copilot), Cline, and more. Instead of starting every conversation from scratch, memories recalls relevant context from past sessions — what you worked on, what went wrong, what patterns you use — and surfaces it when needed.

See also Nate B Jones' similar approach to persistent AI memory.

Early release (March 2026) — this project is under active development. The main branch is the stable release; active development happens on the dev branch in the development repo. Contributions, feedback, and issues are welcome.

Why avisual memories?

Unlike basic RAG systems or commercial memory APIs, avisual memories is built like an actual brain:

Feature avisual memories Mem0 ($249/mo) LangChain Memory ChromaDB
Antipatterns (proactive warnings)
Spreading activation (neural retrieval)
Hebbian learning (auto-connection)
Memory consolidation (decay/merge)
Local-first (no API costs)
Open source (Apache 2.0)
Multi-agent memory sharing
Works with any AI agent

The antipatterns feature alone saves hours of debugging. When your agent tries to repeat a mistake (like using rm instead of trash, or browser automation that got blocked), memories surfaces a warning before the command runs.

How it works

Memories stores knowledge as atoms (discrete facts, experiences, skills, antipatterns) connected by synapses (weighted relationships). Retrieval uses spreading activation — when you recall one memory, activation flows through connected memories like neural pathways, surfacing related knowledge you didn't explicitly search for.

The system exposes a standard MCP server with 13 tools (remember, recall, connect, forget, amend, reflect, status, pathway, stats, create_task, update_task, list_tasks, stale_memories) that work with any MCP-compatible client.

For Claude Code users, optional hooks provide deep integration — automatic context injection on every prompt, learning from tool outputs, Hebbian strengthening at session end, and sub-agent memory propagation. Hooks are Claude Code-specific; the MCP tools work everywhere.

Architecture

┌─────────────────────────────────────────────────┐
│              Any MCP Client                      │
│  Claude Code · Cursor · Windsurf · VS Code · …  │
│                                                  │
│  MCP Tools: remember, recall, connect, forget,   │
│    amend, reflect, status, pathway, stats,        │
│    create_task, update_task, list_tasks,           │
│    stale_memories                                  │
│                                                  │
│  Claude Code only (optional):                    │
│    Hooks ──→ auto-recall, learn from errors,     │
│              Hebbian learning, sub-agent merge    │
└──────────────────────┬──────────────────────────┘
                       │ MCP (stdio)
┌──────────────────────▼──────────────────────────┐
│                 memories server                   │
│                                                  │
│  Brain ──→ Retrieval (spreading activation)      │
│        ──→ Learning (Hebbian, auto-linking,      │
│             supersession, novelty gating)         │
│        ──→ Consolidation (decay, merge, prune,   │
│             LTD, STC, abstraction, feedback)      │
│        ──→ Context (budget compression)          │
│                                                  │
│  Storage: SQLite + sqlite-vec + FTS5             │
│  Embeddings: Ollama (nomic-embed-text, 768-dim)  │
└─────────────────────────────────────────────────┘

Learning pipeline

When you remember() an atom, the learning engine automatically:

  • Auto-links — vector search finds related atoms and creates typed synapses (related-to, caused-by, elaborates, warns-against, contradicts)
  • Detects supersession — near-duplicate atoms (>0.9 similarity, same type) get a supersedes synapse; the older atom's confidence is reduced
  • Gates novelty — hooks check whether incoming content is genuinely new before storing, avoiding redundant atoms
  • Suggests regions — infers the best region from project context, keyword matching, or majority vote of similar atoms

Consolidation (reflect)

Calling reflect() runs a full consolidation cycle — 16 operations modelled on biological memory consolidation:

Phase Operations
Tune Auto-tune retrieval scoring weights based on feedback signals
Reclassify Fix misclassified antipatterns; apply user feedback (good/bad)
Resolve Settle contradiction pairs; reconsolidate superseded atoms
Decay Reduce confidence of stale atoms and synapse strengths
Prune Remove weak synapses, stale warns-against links, dormant connections
LTD & STC Long-term depression for unactivated synapses; expire STC tags
Abstract Cluster similar experiences into semantic summary atoms
Merge Unify exact and near-duplicate atoms (hash-based then embedding-based)
Promote Boost confidence of frequently accessed atoms

Quick Start

Install from Source

# 1. Clone and install
git clone https://github.com/avisual/memories-plugin.git
cd memories-plugin
uv sync

# 2. Run interactive setup
uv run python -m memories setup --interactive

The setup wizard will:

  • ✓ Check if Ollama is installed (and tell you how to install it)
  • ✓ Start Ollama daemon if needed (with your permission)
  • ✓ Download the embedding model (with your permission)
  • ✓ Create the database directory
  • ✓ Register the MCP server in Claude Code (with your permission)
  • ✓ Optionally configure hooks for Claude Code deep integration
  • ✓ Run a health check

Start a new session in your MCP client and memories will be active!

Tell Your AI Agent to Install This

Copy and paste this prompt to Claude Code (or any AI coding agent):

Install the memories MCP server for me. Run these commands:

1. Clone: git clone https://github.com/avisual/memories-plugin.git && cd memories-plugin && uv sync
2. Install Ollama if needed: brew install ollama && ollama serve &
3. Pull embedding model: ollama pull nomic-embed-text
4. Run setup: uv run python -m memories setup --non-interactive
5. Verify: uv run python -m memories diagnose

Then restart and I'll have persistent memory!

This gives the agent explicit commands to run - no ambiguity, no recursion.

Installation

Prerequisites

  • Python 3.13+
  • uv — Python package manager
  • Ollama — local embedding model server
  • Any MCP clientClaude Code, Cursor, Windsurf, VS Code with Copilot, Cline, or any other MCP-compatible environment

1. Clone and install

git clone https://github.com/avisual/memories-plugin.git
cd memories-plugin
uv sync

2. Run setup

# Interactive setup with prompts
uv run python -m memories setup

# Non-interactive setup (automated)
uv run python -m memories setup --non-interactive

3. Verify installation

# Run diagnostics
uv run python -m memories diagnose

# Or run health check
uv run python -m memories health

You should see all components marked as [ok].

Manual Setup (Alternative)

If you prefer manual configuration, follow these steps:

Install and start Ollama

# macOS
brew install ollama
ollama serve &

# Pull the embedding model
ollama pull nomic-embed-text

Verify the installation

uv run python -m memories health

You should see:

memories health check:
  atoms: 0
  synapses: 0
  regions: 0
  db_size: 0.05 MB
  ollama: healthy
  model: nomic-embed-text

Register the MCP server

Add to your MCP client's configuration. The server command is the same for all clients:

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "memories": {
      "type": "stdio",
      "command": "/path/to/memories-plugin/.venv/bin/python",
      "args": ["-m", "memories"]
    }
  }
}

Cursor (Settings → MCP → Add Server):

{
  "memories": {
    "command": "/path/to/memories-plugin/.venv/bin/python",
    "args": ["-m", "memories"]
  }
}

VS Code / Cline (.vscode/mcp.json or Cline MCP settings):

{
  "servers": {
    "memories": {
      "command": "/path/to/memories-plugin/.venv/bin/python",
      "args": ["-m", "memories"]
    }
  }
}

Replace /path/to/memories-plugin with your actual clone path.

Configure hooks (Claude Code only)

For Claude Code users, add to your ~/.claude/settings.json:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/memories/.venv/bin/python -m memories hook prompt-submit",
            "timeout": 15
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash|Write|Edit|MultiEdit|NotebookEdit",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/memories/.venv/bin/python -m memories hook post-tool",
            "timeout": 10
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/memories/.venv/bin/python -m memories hook stop",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

Replace /path/to/memories with your actual clone path.

Start a new session

The memory system is now active. In Claude Code with hooks, relevant memories are automatically recalled and injected on each prompt. In other MCP clients, use the recall tool to search memories and remember to store them.

MCP Tools

Once registered, your AI agent can use these tools directly in any MCP client:

Tool Description
remember Store a new memory atom. Auto-creates synaptic connections to related memories.
recall Search memories using semantic similarity with spreading activation.
connect Create or strengthen a connection between two memories.
forget Soft-delete (recoverable) or hard-delete a memory.
amend Update an existing memory. Re-embeds and re-links if content changes.
reflect Run memory consolidation — decay, prune, merge, promote, and more. Like sleep.
status Get system health: atom/synapse counts, regions, DB size, Ollama status.
pathway Visualize the connection graph radiating from a specific memory.
stats Hook invocation statistics, relevance scores, latency breakdowns.
create_task Create a task atom with lifecycle tracking (pending/in_progress/completed).
update_task Update task status; optionally flags linked memories as stale.
list_tasks List task atoms with optional status and region filters.
stale_memories Find memories linked to completed tasks that may be outdated.

Hooks (Claude Code only)

Hooks provide deep integration with Claude Code — automatic recall, learning from errors, and Hebbian strengthening. They're optional; the MCP tools work without them in any client. If you use a different MCP client, you get the full tool suite but without automatic background learning.

Hooks run automatically during Claude Code sessions:

Hook Event What it does
session-start SessionStart Initializes brain, starts session, recalls project-specific memories
prompt-submit UserPromptSubmit Recalls relevant memories (project-scoped + cross-project) and injects them as context before every prompt
pre-tool PreToolUse Recalls antipattern warnings before Bash/Task execution; captures intent as atoms
post-tool PostToolUse Learns from Bash errors, file edits, and tool outputs (novelty-gated before storing)
post-tool-failure PostToolUseFailure Captures tool failures as antipatterns or experiences
stop Stop Reads session transcript, applies Hebbian learning, propagates sub-agent atoms to parent session
subagent-stop SubagentStop Same as stop — runs in sub-agent (Task) sessions and merges atoms into the parent's learning graph
subagent-start SubagentStart Captures sub-agent delegation patterns as insights
pre-compact PreCompact Checkpoints Hebbian learning mid-session so atoms aren't lost if context compacts before stop fires
session-end SessionEnd Safety net: final Hebbian pass if Stop hook was missed
permission-request PermissionRequest Records permission requests (antipattern if dangerous, else experience)
task-completed TaskCompleted Records task completion milestones as experience atoms
notification Notification Captures elicitation dialog notifications (user clarification requests)

Sub-agent learning

When Claude Code spawns sub-agents (via the Task tool), each sub-agent runs its own stop hook. Memories detects the parent session automatically (by project + recency) and merges the sub-agent's atoms into the parent's co-activation graph. The parent's final stop then runs one consolidated Hebbian pass linking everything together.

Configuration

All settings use environment variables with the MEMORIES_ prefix:

Variable Default Description
MEMORIES_OLLAMA_URL http://localhost:11434 Ollama server URL
MEMORIES_EMBEDDING_MODEL nomic-embed-text Embedding model name
MEMORIES_EMBEDDING_DIMS 768 Embedding dimensions
MEMORIES_DB_PATH ~/.memories/memories.db Database file path
MEMORIES_CONTEXT_WINDOW_TOKENS 200000 Model context window size
MEMORIES_HOOK_BUDGET_PCT 0.02 Default hook injection budget (% of context window). Session-start uses 3%, prompt-submit 2%, pre-tool 0.5%.
MEMORIES_DEDUP_THRESHOLD 0.92 Cosine similarity above which a new atom is skipped as a near-duplicate
MEMORIES_REGION_DIVERSITY_CAP 2 Maximum atoms per project returned in a single retrieval pass
MEMORIES_DISTILL_THINKING false Use a local LLM to extract atomic facts from Claude thinking blocks
MEMORIES_DISTILL_MODEL llama3.2:3b Ollama model used for fact extraction (any generative model works)

Nested config uses double underscores:

# Change spreading activation depth
export MEMORIES_RETRIEVAL__SPREAD_DEPTH=3

# Change Hebbian learning increment
export MEMORIES_LEARNING__HEBBIAN_INCREMENT=0.1

Atomic fact extraction (optional)

When MEMORIES_DISTILL_THINKING=true, the stop hook uses a local Ollama generative model to extract 2–5 discrete facts from each Claude thinking block, storing each as a separate atom. This produces a denser, more precisely-linked memory graph at the cost of additional Ollama inference time.

# Enable with default model (llama3.2:3b)
ollama pull llama3.2:3b
export MEMORIES_DISTILL_THINKING=true

# Or use a different model
export MEMORIES_DISTILL_MODEL=mistral:7b

CLI Commands

Run python -m memories <command> (or uv run python -m memories <command> from source):

Command Description
setup Interactive or non-interactive setup wizard
health Quick health check (DB, Ollama, model)
diagnose Full diagnostics across all components
stats Session stats, hook performance, top atoms, latency
eval Show exactly what Claude sees for a given prompt (dry-run hook injection)
feedback Mark a recalled atom as good or bad (feedback <atom_id> good|bad)
backfill Scan all ~/.claude/projects/ transcripts and store novel insights as atoms. Auto-relinks the graph when done. Safe to run repeatedly.
relink Re-run auto_link for every atom to fill any missing synapses. Idempotent — existing synapses are strengthened, not duplicated.
normalise Rename fragmented region aliases to canonical names (e.g. merges general, project:gitproject:utils).
reatomise Split large blob atoms into 2–5 atomic facts using a local LLM, soft-delete the originals, then auto-relink the graph. Requires Ollama.
migrate Import atoms from a legacy claude-mem database

Verifying injection

To see exactly what memories are injected for a given prompt:

echo '{"session_id":"x","prompt":"YOUR QUESTION","cwd":"'$(pwd)'"}' | \
  python -m memories hook prompt-submit

Backfilling historical transcripts

# Basic backfill (novelty-gated, safe to re-run)
python -m memories backfill

# With verbose output and LLM fact extraction
MEMORIES_DISTILL_THINKING=true python -m memories backfill --verbose

Maintaining graph quality

# After a backfill, fix any region fragmentation:
python -m memories normalise --verbose

# Split large blob atoms into atomic facts (requires Ollama):
python -m memories reatomise --verbose

# Re-wire the whole graph (runs auto_link for every atom):
python -m memories relink --verbose

Memory Types

Type Description
fact A verified piece of knowledge
experience Something learned from practice
skill A how-to or technique
preference A personal or project preference
insight A derived observation or conclusion
antipattern A known mistake to avoid (surfaced as warnings during recall)
task A tracked task with lifecycle (pending → in_progress → completed)

Data Storage

All data is stored locally in ~/.memories/:

  • memories.db — SQLite database with sqlite-vec for vector search and FTS5 for keyword search
  • backups/ — Automatic backups (configurable count, default 5)

No data leaves your machine. Embeddings are generated locally via Ollama.

Migration from claude-mem

If you have existing observations from the claude-mem plugin:

uv run python -m memories migrate --source ~/.claude-mem/claude-mem.db

Use --dry-run to preview without making changes.

Development

# Install with dev dependencies
uv sync --group dev

# Run unit tests (no external services needed)
uv run pytest -m "not integration"

# Run a specific test file
uv run pytest tests/test_retrieval.py

# Health check
uv run python -m memories health

Integration tests

Integration tests run against a real Ollama server with nomic-embed-text and are executed automatically in CI. To run them locally:

# Start Ollama and pull the embedding model
ollama serve &
ollama pull nomic-embed-text

# Run integration tests only
uv run pytest -m integration -v

# Run everything (unit + integration)
uv run pytest

The integration suite (tests/test_integration_recall.py) covers:

  • Embedding quality — dimensionality, cache consistency, semantic similarity
  • Vector & FTS search — relevance ranking, latency
  • Full recall pipeline — spreading activation, antipattern surfacing, region filters
  • Learning pipeline — auto_link synapse creation, supersession detection, warns-against
  • Consolidation — near-duplicate merging with real embedding similarity
  • Dedup — pre-insertion deduplication in remember()
  • Batch operations — batch vs single embedding consistency, vector updates
  • Larger graph recall — 15-atom, 3-cluster graph with cross-cluster hub activation

Tests are auto-skipped when Ollama is unavailable.

Troubleshooting

Multi-window support

Multiple MCP client windows can share memories simultaneously. Each window spawns its own server process; they all access the same ~/.memories/memories.db safely via SQLite WAL mode.

To see how many servers are running:

uv run python -m memories diagnose

"Ollama server unreachable"

Make sure Ollama is running:

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama manually
ollama serve

# Or as a service (macOS)
brew services start ollama

"Model not found: nomic-embed-text"

Pull the embedding model:

ollama pull nomic-embed-text

"Health check failed"

Run full diagnostics:

uv run python -m memories diagnose

This will check:

  • MCP server status
  • Ollama installation and daemon
  • Required model availability
  • Database health
  • Configuration files

Database issues

If you encounter database corruption or lock issues:

# Check database status
uv run python -m memories health

# Backup and reset (WARNING: deletes all memories)
mv ~/.memories/memories.db ~/.memories/memories.db.backup
uv run python -m memories health  # Creates fresh DB

Uninstall

To remove memories from Claude Code configuration (other MCP clients: remove the server entry from your MCP settings):

uv run python -m memories setup --uninstall

This removes MCP server registration and hooks but keeps your ~/.memories/ data directory. To delete all data:

rm -rf ~/.memories/

Getting Help

Documentation

Contributing

Contributions welcome! Please:

  1. Fork the repo
  2. Create a feature branch
  3. Add tests for new features
  4. Ensure all tests pass: uv run pytest
  5. Submit a PR

Acknowledgments

This project builds on excellent open-source work:

Project By License Role
MCP Python SDK Anthropic MIT Model Context Protocol server framework
sqlite-vec Alex Garcia MIT Vector similarity search in SQLite
Ollama Ollama MIT Local model serving
nomic-embed-text Nomic AI Apache-2.0 Default embedding model
httpx Encode BSD-3 Async HTTP client
NumPy NumPy Developers BSD-3 Vector operations

See NOTICE for full attribution details.

License

Apache 2.0 - see LICENSE file for details.

Links

About

Brain-like memory system for Claude Code — persistent neural graph with spreading activation and Hebbian learning

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages