Mimir

Observable long-term memory infrastructure for AI systems.

Mimir prioritizes measurable retrieval behavior, transparent tradeoffs, and observable memory operations over opaque “magic memory” claims.

Mimir is an open-source memory and retrieval orchestration platform for MCP-compatible agents. It provides lifecycle-aware memory management, multi-provider retrieval with inspectable pipelines, trust-weighted ranking, and comparative benchmarks — so recall quality is measurable, not marketed.

Self-hostable. No cloud dependency. OAuth or API-key auth for Cursor, Claude Code, and other MCP clients.

What Mimir is

Mimir stores episodic, semantic, and procedural memories, orchestrates retrieval across six providers, applies lifecycle and trust policies, and returns token-budgeted context with a debug trace explaining what was ranked and why.

It is infrastructure for long-horizon agent memory — not a black-box RAG wrapper and not a simulation of human cognition.

What makes it different

Theme	What you get
Transparency	Per-recall debug: providers used, exclusions, agreement scores, token cost
Observability	Telemetry dashboard: provider usefulness, drift, retrieval heatmaps
Benchmarkability	Fixture harness comparing `naive_rag`, `vector_only`, `conversational`, and `mimir`
Lifecycle	Active → aging → stale → archived; stale suppression at retrieval time
Trust	Trust scores, verification status, quarantine — retrieval is trust-weighted
Token efficiency	Context builder enforces budgets; benchmarks report token cost per query

Benchmarks

Comparative retrieval evaluation lives under benchmarks/retrieval/.

Resource	Description
benchmarks/retrieval/README.md	Systems, metrics, honesty policy
benchmarks/retrieval/reports/latest.md	Latest local run (regenerate with `make bench-retrieval`)
benchmarks/retrieval/reports/sample_v1.md	Sample output for GitHub visitors (fixture run, timestamped)
docs/BENCHMARK_WALKTHROUGH.md	How to run, read reports, and interpret weak spots
docs/TOKEN_EFFICIENCY.md	Measured token cost vs MRR (fixture data)
docs/RETRIEVAL_TRACE_GUIDE.md	Export and interpret retrieval traces
docs/COLD_START_VALIDATION.md	Clean-machine Docker validation

make bench-retrieval
# or: python -m benchmarks.retrieval.runners.cli --seed 42

Policy: Fixture-based comparisons with documented weak spots (e.g. orchestration latency). No fabricated superiority claims. See sample report disclaimer before citing numbers externally.

Current limitations

Honest scope for v0.1.0-rc — not a production-scale evaluation platform yet.

Benchmarks are fixture-based (standard_v1 / standard_v2), not production corpora at scale
Retrieval latency is often higher than naive RAG / vector-only due to multi-provider orchestration
Token efficiency varies by scenario; Mimir may inject more tokens than baselines while improving rank quality (see docs/TOKEN_EFFICIENCY.md)
standard_v2 is synthetic but noisier than v1 — still deterministic, not real user traffic
Experimental lifecycle / consolidation subsystems exist in-tree but are not required for core OSS memory + retrieval
Dashboard trace UX is improving; full per-provider candidate replay is richest on live POST /api/events/recall with token_budget

Quick start (Docker) — recommended

git clone https://github.com/SketchOTP/mimir
cd mimir
cp .env.example .env
docker compose up -d
./scripts/doctor.sh

Resource	URL
API health	http://127.0.0.1:8787/health
Dashboard	http://127.0.0.1:5173
Telemetry	http://127.0.0.1:5173/telemetry (after recalls)
API docs	http://127.0.0.1:8787/api/docs

Timing: ~20–40s startup with pre-built images; first docker compose build may take several minutes (embeddings stack).
RAM: ~4 GB minimum, ~8 GB comfortable. CPU-only is supported.
Details: docs/COLD_START_VALIDATION.md

Optional API-key owner (multi-user deployments):

docker compose exec api python -m mimir.auth.create_owner \
  --email you@example.com --display-name "Your Name"

Healthchecks

curl -s http://127.0.0.1:8787/health
curl -s http://127.0.0.1:8787/api/telemetry/retrieval/stats
make bench-retrieval   # or: python3 -m benchmarks.retrieval.runners.cli

Add to Cursor — Settings → MCP → Add Server:

{
  "mcpServers": {
    "mimir": {
      "url": "http://127.0.0.1:8787/mcp"
    }
  }
}

For SSH or headless setups, use Bearer API-key auth — see docs/CURSOR_MCP_SETUP.md.

Quick start (local dev)

python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
alembic upgrade head
make dev    # API :8787
make web    # UI :5173 (optional, second terminal)

First recall may load embedding model (~30s on CPU). Use examples/retrieval_debugger/ to inspect traces.

Integration examples

Example	Path
OpenAI chat + memory	examples/openai_chat_memory/
Claude + memory	examples/claude_memory/
Local LLM (Ollama)	examples/local_llm_memory/
Agent recall loop	examples/agent_memory_loop/
Trace debugger	examples/retrieval_debugger/
Token budgeting	examples/token_budgeting/

Architecture

High-level map (detail: ARCHITECTURE.md):

┌──────────────┐     ┌─────────────────────────────────────────────┐
│ MCP / REST   │────▶│ FastAPI — auth, MCP HTTP, memory API        │
│ clients      │     └──────────────┬──────────────────────────────┘
└──────────────┘                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
             ┌──────────┐   ┌─────────────┐  ┌──────────┐
             │ Ingestion│   │ Retrieval   │  │ Graph    │
             │ + layers │   │ orchestrator│  │ memory   │
             └────┬─────┘   └──────┬──────┘  └────┬─────┘
                  │                │              │
                  └────────────────┼──────────────┘
                                   ▼
                    ┌──────────────────────────────┐
                    │ SQLite/Postgres + ChromaDB   │
                    │ lifecycle · telemetry · API  │
                    └──────────────────────────────┘
                                   ▲
                    ┌──────────────┴──────────────┐
                    │ worker — consolidate,       │
                    │ lifecycle, reflect, graph   │
                    └─────────────────────────────┘

Retrieval flow

query → task categorization → adaptive provider weights
     → 6 providers (async) → merge / dedupe → trust & lifecycle filter
     → confidence scoring → token budget → context string + debug trace

Observability

Mimir surfaces retrieval behavior in the API and dashboard — not only final context text.

Signal	Where
Retrieval traces	`debug` on recall (providers, exclusions, ranked IDs)
Confidence / agreement	Cross-provider agreement in debug + telemetry
Token usage	Per-session token cost vs budget
Provider weighting	Adaptive weights by task category; dashboard provider stats
Lifecycle metadata	Memory state, trust, verification on browse views
Weak spots	Auto-listed in benchmark reports when Mimir underperforms baselines

Open Telemetry in the web UI after a few recall operations, or inspect recall debug in API responses.

Capabilities

Capability	What it means
Three memory layers	Episodic, semantic, procedural — auto-classified at write time
Knowledge graph	Entities and relationships; graph-aware retrieval
Multi-source retrieval	Six providers fused with adaptive, task-aware weights
Trust scoring	Trust, confidence, verification status — retrieval is trust-weighted
Adversarial quarantine	Seven pattern classes blocked before storage
Memory lifecycle	Four-stage state machine with recency and retrieval boosts
Offline consolidation	Nightly dedup, chain compression, trust updates from feedback
Reflection + contradictions	Async contradiction detection and improvement proposals
Skills + approvals	Reusable procedures and human gates for high-risk actions
OAuth 2.1 / PKCE + API keys	Browser OAuth or Bearer for SSH/headless
Multi-user isolation	`user_id` scoped across stores and workers
React PWA dashboard	Memories, telemetry, approvals, timeline
REST + MCP + Python SDK	HTTP, MCP Streamable HTTP, programmatic SDK

MCP tools

Tool	What it does
`memory.remember`	Store an event or fact; layer auto-classified
`memory.recall`	Retrieve relevant memories — token-budgeted context + debug
`memory.search`	Semantic search with optional layer filter
`memory.record_outcome`	Record task outcome; feeds trust and reflection
`skill.list`	List approved procedures for the project
`approval.request` / `approval.status`	Human gate for high-risk actions
`reflection.log`	Log observations for offline analysis
`improvement.propose`	Propose system-level changes (approval required)

Design rationale (selected)

Technical influences — stated as engineering choices, not biological claims:

Layered stores — separate write/retrieval paths for events, facts, and procedures (Tulving-style taxonomy as data model, not cognition simulation).
Offline consolidation — nightly integration of episodic traces into durable knowledge without silent deletion of high-trust items.
Multi-provider retrieval — mixture-of-experts style routing; weights adapt from task outcome feedback.
Lifecycle + forgetting — recency, retrieval frequency, and trust modulate active vs stale vs archived.
Quarantine at write time — injection, credential, and policy-overwrite patterns blocked before storage.

Experimental subsystems (consolidation research, architecture governor) are documented under docs/architecture/ and are not required for core OSS memory + retrieval.

Memory layers (reference)

Layer	Use
Episodic	Session events, outcomes, temporal logs
Semantic	Facts, preferences, rules, identity
Procedural	Workflows, runbooks, promoted patterns
Graph	Entity relationships and multi-hop context

Background workers

Worker	Schedule	Role
`consolidator`	Nightly	Dedup, chain compression, trust from feedback
`reflector`	Every 30 min	Contradictions, improvement proposals
`lifecycle`	Nightly	Aging, decay, supersession, deletion
`procedural_promoter`	Nightly	Promote validated episodic patterns
`graph_builder`	Nightly	Extract graph from corpus

Safety

Quarantine is permanent — updates cannot reactivate quarantined memories
System mutation endpoints off by default in production
High-trust memories not silently deleted by consolidator
Cross-user isolation at the DB layer

See docs/SECURITY.md · docs/MULTI_USER_SECURITY.md

Documentation

Doc	Purpose
ARCHITECTURE.md	Subsystems, retrieval pipeline, lifecycle, benchmarks
docs/BENCHMARK_WALKTHROUGH.md	Run and interpret retrieval benchmarks
CONTRIBUTING.md	Setup, tests, PR expectations
ROADMAP.md	Near-term OSS priorities
BENCHMARK_RESULTS.md	Eval harness + retrieval numbers (evidence policy)
docs/SELF_HOSTING.md	Local and production deployment
docs/CURSOR_MCP_SETUP.md	Cursor MCP configuration

Development

make dev              # API hot-reload :8787
make web              # Vite dev :5173
make test             # pytest
make evals            # 8-suite eval harness
make bench-retrieval  # comparative retrieval benchmarks
make gate             # release gate

Tech stack

Layer	Technology
API	Python 3.12, FastAPI, Uvicorn
Storage	SQLAlchemy 2 async, SQLite / Postgres, ChromaDB
Jobs	APScheduler
Frontend	React 18, TypeScript, Vite, PWA
Auth	OAuth 2.1 / PKCE, API keys
MCP	Streamable HTTP (JSON-RPC 2.0)

License

Apache-2.0 — see LICENSE.

Security: GitHub Security Advisory · docs/SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mimir

What Mimir is

What makes it different

Benchmarks

Current limitations

Quick start (Docker) — recommended

Quick start (local dev)

Integration examples

Architecture

Retrieval flow

Observability

Capabilities

MCP tools

Design rationale (selected)

Memory layers (reference)

Background workers

Safety

Documentation

Development

Tech stack

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
api		api
approvals		approvals
benchmarks		benchmarks
context		context
docs		docs
evals		evals
examples		examples
graph		graph
mcp		mcp
memory		memory
metrics		metrics
migrations		migrations
mimir		mimir
notifications		notifications
reflections		reflections
reports		reports
retrieval		retrieval
scripts		scripts
sdk		sdk
simulation		simulation
skills		skills
storage		storage
telemetry		telemetry
tests		tests
web		web
worker		worker
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
API_COMPATIBILITY.md		API_COMPATIBILITY.md
ARCHITECTURE.md		ARCHITECTURE.md
BENCHMARK_RESULTS.md		BENCHMARK_RESULTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OSS_READINESS.md		OSS_READINESS.md
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
RELEASE_NOTES_v0.1.0.md		RELEASE_NOTES_v0.1.0.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SECURITY_AUDIT.md		SECURITY_AUDIT.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Mimir

What Mimir is

What makes it different

Benchmarks

Current limitations

Quick start (Docker) — recommended

Quick start (local dev)

Integration examples

Architecture

Retrieval flow

Observability

Capabilities

MCP tools

Design rationale (selected)

Memory layers (reference)

Background workers

Safety

Documentation

Development

Tech stack

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages