A memory system lets your chatbot truly know you and maybe themselves.
- Intensity gating. Not everything is equally worth remembering. The more trivial the more accumulation needed in order to form a memory entry.
- Input decomposition. Input is decomposed into atomic segments that preserve their original meaning, only highly related segments will cluster to form episodes which provide high precision retrieval.
- Replayability. Human memory functions like an event-sourcing system, under the same experience, nearly identical personality would emerge again.
- Lightweight and minimal LLM calls.
paper
[ raw input ]
│
┌─────▼─────┐
│ decompose │
└─────┬─────┘
│
┌────────────────┴────────────────┐
ι ≥ θ ι < θ
│ │
vivid signal weak signal
│ │
┌────▼────┐ ┌──────▼──────┐
│ archive │ │ pool │
└────┬────┘ │ · · · · · │
│ │ · · · · · │
│ │ · · · · · │
│ └──────┬──────┘
│ │
│ enough gathered?
│ │
│ no ──────────┘
│ │ yes
│ ┌──────▼──────┐
│ │ cluster │
│ │ + score │
│ └──────┬──────┘
│ │
└──────────────┬──────────────────┘
│
┌─────▼─────┐
│ episode │ immutable
└─────┬─────┘
│
same topic chain?(NLI)
yes │ │ no
│ │
┌──────────▼─┐ ┌▼────────────┐
joins chain │ ... ──► e │ │ e │ new chain
└──────────┬─┘ └─────┬───────┘
│ │
┌─────▼───────────▼─────┐
│ snapshot │
│ centroid │ summary │
│ (eager) │ (lazy) │
└───────────┬───────────┘
│
[ retrieve ]
│
┌────────────────┴─────────────────┐
default verbose
│ │
snapshot summary with episode chain + labels
1.run Falkordb
docker run -e REDIS_ARGS="--appendonly yes --appendfsync everysec" -v <PATH>:/var/lib/falkordb/data -p 3000:3000 -p 6379:6379 -d --name falkordb falkordb/falkordbnote: command below is cpu version
docker run --name tei-embedding -d -p 8997:80 -v <PATH>:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id nomic-ai/nomic-embed-text-v1.5
Or embedding cloud api
note: command below is cpu version
docker run --name tei-nli -d -p 8999:80 -v <PATH>:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-latest --model-id cross-encoder/nli-deberta-v3-small# Provider to use: anthropic | openai | gemini (default: anthropic)
BUBBLE_LLM_PROVIDER=anthropic
# Anthropic (install: pip install 'bubble-memory[anthropic]')
ANTHROPIC_API_KEY=
# OpenAI-compatible — works with OpenAI, DeepSeek, Groq, Ollama, etc.
# (install: pip install 'bubble-memory[openai]')
# OPENAI_API_KEY=
# OPENAI_BASE_URL=https://api.openai.com/v1
# Google Gemini (install: pip install 'bubble-memory[gemini]')
# GEMINI_API_KEY=
FALKORDB_HOST=localhost
FALKORDB_PORT=6379
BUBBLE_EMBED_DIM=768
BUBBLE_EMBED_ENDPOINT=http://localhost:8997/v1/embeddings
#If you have NLI setup
BUBBLE_ENABLE_NLI=true
BUBBLE_NLI_ENDPOINT=http://localhost:8999/predictpip install bubble-memory
or
uv add bubble-memoryimport bubble
bubble.process(user_id, content, prior)prior: the context of the content, for example prior messages
import bubble
memory_user = await bubble.retrieve(user_id, query)Memory episodes are archived in <project root>/data/archive as jsonl
You can reconstruct your memory graph by a single command. No LLM call.
python -m bubble.main replay <user_id>See .env.example for ALL tunable arguments.
Doesn't support long-context memory. It focuses on facts/preferences, and personality consolidation.
Current promotion formula, tunable variables might not be the best.
I use [Name] to identify speaker in group chat. Swap out bubble.llm.prompts.DECOMPOSE_SYSTEM if it doesn't fit your use case.