A brain-inspired always-on agent memory that folds continuously arriving events into self-emerging cognitive structure — designed for the next generation of proactive assistants.
- 🎯 Highlights
- 🧠 Concepts in 60 seconds
- 🎬 Demo
- 🛠️ Installation
- 🚀 Quick Start
- ⚙️ Key Configurations
- 🔁 Benchmark Evaluation
- 📂 Project Structure
- 🔗 Citation
- 📜 License
- 🔮 Proactive Memory. Proactivity is a property of the memory substrate, not the agent's policy — goals emerge from the topology that accumulates the conditions for them.
- 🧠 Architecture. A tri-layered substrate extending Complementary Learning Systems with a prefrontal Intent layer — events fold into concepts, concepts crystallize into intents, surfaced through a hierarchical context window.
- 🌱 Conceptual Bootstrapping. Accumulation, compression, decay, completion — four structural debts of a streaming event log, resolved as transparent graph rewrites: test-time learning without gradient updates or surface text rewriting.
- 📊 Evaluation. CogEval-Bench isolates proactive emergence from retrieval accuracy; seven downstream benchmarks confirm the substrate stays robust on conventional memory tasks.
CogniFold ingests an asynchronous event stream and folds it into a typed concept graph. Four node types — the first three mirror Complementary Learning Systems (CLS) theory:
| Node | ID prefix | Layer | Role |
|---|---|---|---|
event |
e- |
Hippocampal | Episodic trace — each input committed verbatim |
concept |
c- |
Neocortical | Semantic pattern abstracted from recurring events |
intent |
i- |
Prefrontal | Crystallizes when a concept cluster crosses density — this is what makes memory proactive |
time |
t- |
— | Temporal anchor (deadlines, scheduled times) |
Eight typed/weighted edges (GROUNDS, CAUSES, TRIGGERS, REINFORCES, PART_OF, DERIVED_FROM, DEADLINE_FOR, RELATED_TO) wire them. Two ways to read the graph:
- Proactive Context Window (no query asked) — read the live
immediate / working / backgroundbands; intents surface on their own. - Memory Query Agent (explicit query) — retrieve via
bm25/semantic/hybridmodes, optionally wrapped in an agentic multi-round loop.
Details and tunables: ⚙️ Key Configurations.
1. Proactive memory in motion. The graph folds events, crystallizes concepts, and surfaces intents.
Demo.mp4
2. Substrate across narratives. I, Robot (top) and Currency Wars (bottom), two stream snapshots each.
| Requirement | Notes |
|---|---|
| Python ≥ 3.11 | 3.14 tested in CI |
uv (recommended) or pip |
uv gives ~10× faster installs |
| LLM API key (optional) | Google GOOGLE_API_KEY or OpenAI OPENAI_API_KEY — only needed for agent / semantic retrieval / agentic mode |
# 1. Clone the repository
git clone https://github.com/MergeFold/CogniFold.git
cd CogniFold
# 2. Install (pick one)
uv sync # fastest, uses uv.lock
pip install -e ".[agent,service]" # core + agent + HTTP service
pip install -e ".[dev,agent,service,viz]" # everything (dev tools, viz, FAISS)
# 3. Configure API keys
cp .env.example .env
# edit .env and set GOOGLE_API_KEY or OPENAI_API_KEY# 1. Generate a sample timeline (a saved demo is also under data/generated/)
cognifold generate --domain personal-timeline --persona software_engineer --events 50
# 2. Build the concept graph
cognifold run data/generated/alex_chen_timeline.json --save-graph output/graph.json
# 3. Query the graph
cognifold query --graph output/graph.json --retrieval bm25 "morning routine"
# 4. Replay the graph evolution as an interactive HTML
cognifold replay logs/replay_alex_chen_timeline_*.jsonl -o output/replay.html --openfrom cognifold import NodeType
from cognifold.graph.persistence import load_graph
from cognifold.scoring.hierarchical import HierarchicalContextSelector
# Load a previously saved graph
graph = load_graph("output/graph.json")
print(f"nodes={graph.node_count} edges={graph.edge_count}")
# Read the live, always-on context window — no query asked!
context = HierarchicalContextSelector().select_context(graph)
print(f"\nimmediate ({context.immediate.node_count} nodes — top-of-mind):")
for n in context.immediate.nodes[:5]:
print(f" [{n.type.value}] {n.data.get('title', n.id)}")
print(f"\nworking ({context.working.node_count} nodes — active patterns)")
print(f"background ({context.background.node_count} nodes — historical)")
# Emergent intents surface here without anyone asking
intents = graph.get_nodes_by_type(NodeType.INTENT)
print(f"\n{len(intents)} intents emerged from the graph state:")
for i in intents[:5]:
print(f" [{i.id}] {i.data.get('title', '?')} status={i.data.get('status', '?')}")
# Example output (50-event personal timeline):
# nodes=78 edges=124
#
# immediate (8 nodes — top-of-mind):
# [event] Met with team about Q3 plan
# [intent] Schedule follow-up with marketing
# [concept] product launch coordination
# [event] Coffee with Sarah at Blue Bottle
# [event] Reviewed candidate resume
#
# working (23 nodes — active patterns)
# background (47 nodes — historical)
#
# 3 intents emerged from the graph state:
# [i-7] Schedule follow-up with marketing status=pending
# [i-12] Buy birthday gift for Sarah status=pending
# [i-15] Q3 OKR review prep status=in_progressfrom cognifold.query.agent import MemoryQueryAgent
from cognifold.query.config import QueryConfig
agent = MemoryQueryAgent(graph, config=QueryConfig(retrieval_mode="hybrid"))
result = agent.query("What did I commit to about exercise?")
print(result.context_text)./scripts/start_server.sh # default :8000
cognifold client --url http://localhost:8000 # interactive REPL
# Or hit the API directly
curl -X POST http://localhost:8000/api/v1/sessions
curl http://localhost:8000/docs # OpenAPI / Swagger UISet via QueryConfig(retrieval_mode=...). The four modes select the entry point into the graph for an explicit query:
| Mode | When to use | Needs LLM key? |
|---|---|---|
legacy |
original keyword matching, minimal dependency | No |
bm25 |
TF-IDF inverted index; fast and deterministic | No |
semantic |
embedding-based vector search | Yes (Google / OpenAI) |
hybrid (default) |
BM25 + semantic via RRF fusion; best general accuracy | Yes — auto-degrades to BM25 if no embedder |
For hard multi-hop queries, wrap with AgenticRetriever: it runs hybrid first, asks an LLM whether the result is sufficient, and if not, expands the query in parallel and re-ranks via RRF.
HierarchicalContextSelector().select_context(graph) returns three bands, each a different attention regime:
| Band | Default size | Score weights |
|---|---|---|
immediate |
10% of window | recency 0.7 + urgency 0.3 |
working |
30% of window | PageRank 0.5 + recency 0.3 + type 0.2 (favors concepts) |
background |
50% of window | PageRank 0.8 + diversity 0.2 |
The window is read anytime — no query is required. Intents that crossed the crystallization threshold appear in immediate automatically; concepts that are being reinforced live in working; durable structure sinks to background.
| Flag | Purpose |
|---|---|
--event-stream |
enable inter-session consolidation (merge_similar_concepts + prune_orphan_concepts); required for paper-grade LoCoMo |
--query-mode {base, rag, episodic, mergefold} |
ablation switch: mergefold = full CogniFold; others are baselines |
--disable-concepts |
events-only baseline (skips concept formation) |
--model openai:gpt-4o-mini |
reader model |
--judge-model gpt-4o-mini |
LLM-as-judge for QA scoring (auto-derived from --model if omitted) |
--limit N |
cap number of examples (smoke-testing) |
--no-llm-eval |
skip LLM judging step (use exact-match / F1 only) |
Environment overrides accepted by scripts/reproduce.sh: MODEL=..., plus the LLM keys OPENAI_API_KEY / GOOGLE_API_KEY (from .env).
One wrapper for everything — sane defaults, dataset auto-downloaded on first run, paper-faithful flags applied per benchmark.
# canonical run: LoCoMo full 10-conversation Mem0 protocol (≈ 1 h on gpt-4o-mini)
bash scripts/reproduce.sh
# any single benchmark (paper order — CogEval-Bench first, LoCoMo second)
bash scripts/reproduce.sh cogeval # CogEval-Bench (structural diagnostic; the proactive thesis)
bash scripts/reproduce.sh locomo # LoCoMo (default; Mem0 protocol)
bash scripts/reproduce.sh musique # multi-hop QA
bash scripts/reproduce.sh narrativeqa # narrative comprehension
bash scripts/reproduce.sh tomi # theory of mind
bash scripts/reproduce.sh babilong # long-context fact extraction
bash scripts/reproduce.sh mutual # dialogue coherence
bash scripts/reproduce.sh streamingqa # streaming temporal QA
# all 8 paper benchmarks back-to-back (CogEval + 7 downstream; uses MODEL env to override)
bash scripts/reproduce.sh allEach run writes benchmarks/<name>/output/benchmark_results.json. Override the reader model via env: MODEL=openai:gpt-4o bash scripts/reproduce.sh locomo. The --event-stream flag is automatically applied to LoCoMo (it gates the inter-session consolidation pass central to the always-on memory thesis); all other benchmarks discharge consolidation through the shared base_runner post-ingestion hook.
cognifold/
├── src/cognifold/ # core library (20 submodules)
│ ├── __init__.py
│ ├── __main__.py
│ ├── config.py
│ ├── logging.py
│ ├── agent/ # LangGraph agent, prompts, sections, domain configs
│ ├── cli/ # CLI commands
│ ├── embeddings/ # Gemini / OpenAI providers, optional FAISS ANN
│ ├── executor/ # Plan execution with validation and rollback
│ ├── generator/ # Event generation (4 domains)
│ ├── graph/ # NetworkX wrapper, persistence, validation, metrics
│ ├── importers/ # Data importers (wiki)
│ ├── intent/ # Intent-to-action system: queue, executor, calibrator
│ ├── models/ # Pydantic schemas (Event, Node, Edge, UpdatePlan)
│ ├── pipeline/ # Pipeline orchestration (classic + layered)
│ ├── query/ # Query agent, strategies, assembly, LLM utilities
│ ├── replay/ # Graph evolution logging + interactive HTML
│ ├── retrieval/ # BM25, hybrid, agentic multi-round, cross-encoder
│ ├── scoring/ # PageRank, hierarchical context, node ranking
│ ├── service/ # HTTP service (FastAPI) — sessions, routes, auth, stores
│ ├── simulator/ # Timeline processing, visualization
│ ├── symbolic/ # Symbolic belief tracker, cognition / intent routers
│ ├── temporal/ # Temporal entity extraction, date parsing
│ ├── trace/ # Tracing / instrumentation
│ └── utils/ # Shared utilities (LLM metrics, budget, embeddings)
├── benchmarks/ # 8 benchmark runners + shared base-runner library
│ ├── shared/ # base_runner, baseline_runner, graph_evolution_tracker
│ ├── babilong/ cogeval/ locomo/ musique/
│ └── mutual/ narrativeqa/ streamingqa/ tomi/
├── configs/ # per-benchmark prompt profiles (YAML)
├── examples/ # sample timelines + replay HTML for 4 domains
├── scripts/ # auxiliary scripts (LoCoMo audit-protocol rejudge, …)
├── docs/ # ARCHITECTURE.md · BENCHMARK.md · PROMPTS.md
├── .github/ # CI / CD workflows
├── cognifold # CLI entry-point shell launcher
├── config.example.yaml # example application config
├── .env.example # example environment file
├── generate_demo.py # one-shot demo-graph generator
├── test_benchmarks.py # smoke tests for the benchmark runners
├── pyproject.toml
├── uv.lock
├── Makefile
├── README.md
├── LICENSE
└── .gitignore
A BibTeX entry for the accompanying paper will be added here once the paper is publicly released.
Apache-2.0 — see LICENSE.

