🧠 supamem

Languages: English · 简体中文 · Español · 日本語 · Русский

🧠 supamem

Qdrant-backed dual-memory for AI coding agents

Give Claude Code, Cursor, and OpenCode persistent semantic + structural memory across every project.

👋 Built by Dzmitry Sukhau — AI-native Solution / Software Architect / CTO

Available for consulting on AI products, integrating AI into existing products, and business-process automation.

If you're shipping LLM features, evaluating retrieval pipelines, hardening agentic systems, or building an AI-first product from scratch — let's talk.

✨ What is supamem?

supamem is a single-binary CLI that wires up a production-grade memory layer for any AI coding assistant. Drop it into a fresh repo, run supamem init, and your agents instantly gain:

🔍 Semantic search over project notes, ADRs, decisions, and past conversations (hybrid sparse+dense retrieval)
🤖 MCP server that any compatible client (Claude Code, Cursor, OpenCode) can talk to
🪝 Per-client hooks that auto-load relevant memory at session start and on file edits
📊 Welford usage stats so you can see what memory is actually being recalled
🧪 Eval harness with a 33-query golden corpus to detect retrieval regressions

Battle-tested inside SoftChat (Phases 80.1–80.5) before being extracted into a standalone package every team can adopt.

🎯 Why supamem exists

The problem: Coding agents have no memory between sessions. Every time you open a new conversation in Claude Code / Cursor / OpenCode, the model has zero context about your codebase, past decisions, ADRs, known issues, or conventions. So either:

You re-paste 5–15 KB of context at the start of every session (slow, error-prone, costly), or
You let the agent flounder — it grep-walks the repo, asks redundant questions, forgets last week's decisions, and rediscovers the same gotchas you already documented six months ago.

The fix: A persistent semantic + structural memory layer that automatically retrieves the right 1–2 KB of context for the current prompt — no manual pasting, no re-explaining, no context blow-out.

Phase 80.1 bench (33 labeled goldens, real Claude Code sessions): −78.5% tokens vs naive whole-doc retrieval at the same recall, p95 73 ms end-to-end.

The full evaluation is the same one we ran inside SoftChat to lock the production pipeline. Methodology: 33 representative dev queries → 4 retrieval arms compared (baseline_union, tuned_current, tuned_hybrid, mem0_vector) → token count + recall CI + latency measured per arm.

📊 Token consumption: agent with memory vs without

Numbers below are per typical 30-turn Claude Code session assuming a real codebase with ~50 ADRs / insights / rules (≈ what SoftChat ships). YMMV — but the ratio between arms holds.

Approach	Tokens/turn	Tokens/30-turn session	Notes
❌ No memory layer	≈ 0 auto-injected, but you paste context manually	30,000–80,000 (manual paste, repeated)	You spend cognitive load on copying instead of building
⚠️ Naive RAG (whole-doc embed)	~5,800 / turn	~174,000	Bloated, recalls big files when you only needed a paragraph
✅ supamem `tuned_hybrid`	~1,250 / turn	~37,500	Same recall, −78.5% tokens vs naive RAG

💰 Approximate inference cost savings

Anthropic API list pricing (Mar 2026): Sonnet 4.6 = $3 / Mtok input · Opus 4.7 = $15 / Mtok input.

Model	Tokens saved/session vs naive RAG	Cost saved/session	Monthly (110 sessions)
Sonnet 4.6	136,500	$0.41	~$45/dev
Opus 4.7	136,500	$2.05	~$225/dev

A 10-engineer team running Opus saves ~$2,250/month on input tokens alone — without counting the cost of slower iteration, lost decisions, and time spent re-pasting context. Output token savings (less hallucination, fewer back-and-forth turns) compound on top.

🥊 vs the alternatives

	No memory	Naive RAG	mem0 / atomic facts	supamem (tuned_hybrid)
Auto-inject on session start	❌	⚠️	✅	✅
Hybrid sparse+dense retrieval	❌	❌	❌	✅
Code-identifier preservation	❌	✅	❌ (drops names)	✅
Locked schema + golden eval	❌	❌	❌	✅
Multi-client (Claude/Cursor/OpenCode)	❌	❌	⚠️	✅
p95 latency	n/a	~120 ms	~80 ms	73 ms
Token bloat	High (manual)	Highest	Low but lossy	Lowest with full recall

Why hybrid? BM25 catches exact identifiers (ChatService.generate, env-var names, file paths) that dense embeddings smear. Dense catches semantic intent ("how do we handle billing webhooks?") that BM25 misses. RRF fusion combines both rankings so you get the best of each.

Why not mem0? mem0's atomic-fact extraction loses code identifiers — recall on the 33-query bench was 0.015 (effectively zero). Great for personal CRM-style memory, not for code-aware retrieval.

⚡️ 60-second quickstart

# 1. Install (uv is the fastest path)
uv tool install supamem

# 2. Start Qdrant (one-time, ~30s)
docker run -d -p 6333:6333 -p 6334:6334 -v $HOME/.qdrant:/qdrant/storage qdrant/qdrant:latest

# 3. Bootstrap your project
cd your-project
supamem init

# 4. Wire it into your AI client
supamem install --client claude-code   # or cursor, opencode

# 5. Confirm everything is healthy
supamem doctor

That's it. Open Claude Code (or your preferred client) inside the project — the memory tool is already on the menu. ✨

👀 See it work — `supamem live`

Run supamem live in a side terminal to watch every retrieval call as it happens — perfect alongside Claude Code / Cursor / OpenCode for instant visibility into the silent PreToolUse-hook injections (which save tokens by NOT showing UI).

The SessionStart banner (v0.1.4+) also lands a one-line status in your AI client at session open: 🧠 supamem v0.1.4 · <collection> · <N> chunks · audit <path> — auto-detects Claude Code / Cursor / OpenCode via env vars.

🎬 Interactive demo: supamem-live.cast — drop into asciinema.org/player or run locally with asciinema play docs/media/supamem-live.cast.

🚀 Features

Feature	Description
🔍 Hybrid retrieval	Tuned sparse (BM25) + dense (MiniLM) fusion, locked schema D-25
📚 Markdown chunker	Header-aware, 200-token chunks with 250-token soft max (T-1)
🤖 MCP server	`stdio` (default) and `http` transports, official `mcp` SDK
🪝 Multi-client hooks	Claude Code session-start, OpenCode session-start, Cursor MDC
🧰 One-command install	Atomic config patching with auto-backup and rollback
🩺 `supamem doctor`	Probe Qdrant, resolve config chain, surface version drift
👀 `supamem live`	Rich-Live terminal dashboard tailing the audit JSONL — real-time visibility into retrieval calls (v0.1.4+)
🎬 SessionStart banner	One-line cross-client banner injected at session open (Claude Code / Cursor / OpenCode), v0.1.4+
📊 Welford counters	Track recall rate, latency, query volume per project
🧪 Eval harness	33-query golden corpus + regression detection
🔁 Brownfield migration	Detect existing `dev_memory` and migrate non-destructively
🎨 Stylish CLI	Rich-powered spinners, panels, and color so you always see progress

📋 Prerequisites

You only really need two things: Python 3.12+ and Qdrant. Everything else is optional.

🐍 Python 3.12+ · click to expand install commands

# macOS (Homebrew)
brew install python@3.12

# Linux (Ubuntu/Debian)
sudo apt install python3.12 python3.12-venv

# Windows (PowerShell)
winget install Python.Python.3.12

We strongly recommend installing uv — the fastest Python package manager:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

🗄️ Qdrant 1.10+ · vector database (required)

The simplest path is Docker:

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v $HOME/.qdrant:/qdrant/storage \
  qdrant/qdrant:latest

Or with docker compose:

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports: ["6333:6333", "6334:6334"]
    volumes: ["./qdrant_data:/qdrant/storage"]
    restart: unless-stopped

Don't have Docker? Run a managed cluster on Qdrant Cloud (free tier available) and point supamem at the URL via supamem init.

🤖 An MCP-compatible client · pick at least one

Client	Install	Notes
Claude Code	`npm install -g @anthropic-ai/claude-code`	First-class MCP support
Cursor	Download from cursor.com	Uses MDC rules + MCP
OpenCode	`curl -fsSL https://opencode.ai/install \| bash`	Open-source TUI, MCP native

📦 Install

# Recommended: uv (fastest, isolated)
uv tool install supamem

# Alternative: pipx (also isolated)
pipx install supamem

# Plain pip (in a venv)
pip install supamem

Verify:

supamem --version

You should see a colorful banner and the credit line. 🎨

Latest: v0.1.4 is published on PyPI. Released via Trusted Publisher OIDC — every wheel is provenance-attested.

🎯 CLI surface

Command	Purpose
`supamem init`	Greenfield bootstrap — probes Qdrant, creates collection, writes `.supamem/config.toml`
`supamem install --client <name>`	Patch a client config (`claude-code`, `cursor`, `opencode`) — atomic with backup
`supamem index`	Embed dev memories into Qdrant using the locked tuned-hybrid pipeline (D-25)
`supamem mcp-server`	Run the MCP server (`--transport stdio` default; `--transport http` for HTTP)
`supamem hook <client>`	Per-client session/edit hooks (called by the client itself)
`supamem doctor`	🩺 Probe Qdrant, print resolved config chain, report version drift
`supamem stats`	Welford schema-v2 usage counters from `.supamem/state/`
`supamem live`	👀 Live dashboard tailing the audit JSONL — pipe-safe (plain JSONL when not a TTY); handles rotation, resize, Ctrl-C
`supamem migrate`	Brownfield migration from a pre-existing `dev_memory` collection
`supamem eval`	Run the regression harness against the bundled 33-query golden corpus
`supamem uninstall --client <name>`	Reverse `supamem install` cleanly

Every long-running command shows a live spinner with elapsed time so you always know it's working. Use --help on any subcommand for details.

🪛 Wiring into your client

Claude Code

supamem install --client claude-code

Adds an entry to ~/.claude.json under mcpServers and registers a session-start hook under ~/.claude/hooks/. Preview without applying with --dry-run.

Cursor

supamem install --client cursor

Patches .cursor/mcp.json and writes .cursor/rules/dual-memory.mdc.

OpenCode

supamem install --client opencode

Updates ~/.config/opencode/opencode.json and writes a session-start hook to ~/.config/opencode/hooks/.

🧠 How it works

┌─────────────────┐    MCP/stdio     ┌─────────────────┐    REST    ┌─────────────┐
│ Claude / Cursor │ ───────────────► │  supamem MCP    │ ─────────► │   Qdrant    │
│   / OpenCode    │ ◄─────────────── │     server      │ ◄───────── │  (vectors)  │
└─────────────────┘                  └─────────────────┘            └─────────────┘
        │                                    ▲
        │ session-start hook                 │ tuned-hybrid retrieval
        ▼                                    │ (BM25 + MiniLM fusion)
┌─────────────────┐                          │
│ supamem hook    │ ─────────────────────────┘
│  (auto-recall)  │
└─────────────────┘

Indexer chunks Markdown by header (T-1 chunker, 200-token target / 250 soft max)
Embedders produce sparse (BM25) and dense (MiniLM-L6) vectors
Retrieval runs both arms in parallel, fuses with reciprocal rank fusion, returns top-k
MCP server exposes dual_memory_search (read) and dual_memory_write (write/idempotent agent-memory persistence) — plus qdrant_find and qdrant_store as drop-in aliases for users coming from upstream mcp-server-qdrant (disable with SUPAMEM_QDRANT_ALIASES=0)
Hooks call supamem hook <client> at the right moment, so memory loads transparently

🤝 Contributing

We welcome PRs! Quick start:

git clone https://github.com/dzmitrys-dev/supamem.git
cd supamem
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest
ruff check .

Coming from an in-tree dev_memory setup? See MIGRATION.md.

📜 License

MIT — see LICENSE.

💜 Delivered with care by

SoftChat · SoftSkillz

Russian-language AI chat platform · AI-first product engineering

supamem was extracted from SoftChat's production memory stack so every team can run on the same battle-tested pipeline. If it makes your agents smarter, give us a ⭐ — and check out what we build with it.

_{Made with care in Belarus 🇧🇾 · app.softchat.ru · softskillz.ai}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
docs/media		docs/media
src/supamem		src/supamem
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.es.md		README.es.md
README.ja.md		README.ja.md
README.md		README.md
README.ru.md		README.ru.md
README.zh-CN.md		README.zh-CN.md
glama.json		glama.json
llms.txt		llms.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 supamem

👋 Built by Dzmitry Sukhau — AI-native Solution / Software Architect / CTO

✨ What is supamem?

🎯 Why supamem exists

📊 Token consumption: agent with memory vs without

💰 Approximate inference cost savings

🥊 vs the alternatives

⚡️ 60-second quickstart

👀 See it work — `supamem live`

🚀 Features

📋 Prerequisites

📦 Install

🎯 CLI surface

🪛 Wiring into your client

🧠 How it works

🤝 Contributing

📜 License

💜 Delivered with care by

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 supamem

👋 Built by Dzmitry Sukhau — AI-native Solution / Software Architect / CTO

✨ What is supamem?

🎯 Why supamem exists

📊 Token consumption: agent with memory vs without

💰 Approximate inference cost savings

🥊 vs the alternatives

⚡️ 60-second quickstart

👀 See it work — supamem live

🚀 Features

📋 Prerequisites

📦 Install

🎯 CLI surface

🪛 Wiring into your client

🧠 How it works

🤝 Contributing

📜 License

💜 Delivered with care by

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

👀 See it work — `supamem live`

Packages