[Study] AI memory systems landscape — Nakajima/Opus 4.6 research article by codenamev · Pull Request #6 · codenamev/claude_memory

codenamev · 2026-05-25T15:25:47Z

Meta-study of the 2026-03-26 article surveying 7 memory benchmarks
and ~12 memory systems (Hindsight, Zep/Graphiti, MemGPT/Letta, Mem0,
Cognee, HippoRAG, etc.).

Headline finding: ClaudeMemory sits architecturally closest to Mem0
(49% on LongMemEval). Two unforced gaps separate us from Zep-class
systems (71.2%) — we already store the graph but don't traverse it
at query time, and we have temporal columns we don't rank by.

Four new High Priority items in improvements.md:

#64 graph traversal as third RRF source
#65 temporal-aware retrieval strategy
#66 bi-temporal schema cleanup (world vs ingest time)
#67 LongMemEval benchmark integration

Promotes existing #57 (provenance-strength-aware ranking) Medium→High
as the soft version of Hindsight's epistemic separation pattern.

Features to avoid: cross-encoder LLM reranking, full 4-column
Graphiti timestamps, cloud-required graph DBs, LoCoMo for
cross-vendor comparison (article itself discredits it).

See docs/influence/ai-memory-systems-2026.md.

https://claude.ai/code/session_01HWt4E8LyPnkrfctgGieupz

Meta-study of the 2026-03-26 article surveying 7 memory benchmarks and ~12 memory systems (Hindsight, Zep/Graphiti, MemGPT/Letta, Mem0, Cognee, HippoRAG, etc.). Headline finding: ClaudeMemory sits architecturally closest to Mem0 (49% on LongMemEval). Two unforced gaps separate us from Zep-class systems (71.2%) — we already store the graph but don't traverse it at query time, and we have temporal columns we don't rank by. Four new High Priority items in improvements.md: - #64 graph traversal as third RRF source - #65 temporal-aware retrieval strategy - #66 bi-temporal schema cleanup (world vs ingest time) - #67 LongMemEval benchmark integration Promotes existing #57 (provenance-strength-aware ranking) Medium→High as the soft version of Hindsight's epistemic separation pattern. Features to avoid: cross-encoder LLM reranking, full 4-column Graphiti timestamps, cloud-required graph DBs, LoCoMo for cross-vendor comparison (article itself discredits it). See docs/influence/ai-memory-systems-2026.md. https://claude.ai/code/session_01HWt4E8LyPnkrfctgGieupz

The original 0.12 plan was "Release Discipline" (#6 scoreboard + #11 API audit + #12 smoke gate). All three landed on time. Since then OTel ingestion (~15 commits, schema v18, new public surface) and the audit toolkit + contamination guardrails (this week's work) also landed — unplanned, but both serve the 1.0 visibility and stability pillars directly. Re-anchors the punchlist on three explicit 1.0 pillars (stability, visibility, long-horizon quality) so prioritization decisions are defensible. Adds #13 (audit toolkit) and #14 (OTel) as canonical 0.12 entries. Marks #3 (harm corpus expansion) and #4 (CLAUDE.md baseline) as in-progress Path B blockers — the remaining work before 0.12 tags. Updates velocity table: 0.12 widened ~1.5w → ~4w, 1.0 calendar shifted ~3w later, soak window held at 2-3w. CHANGELOG [Unreleased] gains entries for the audit toolkit and the contamination guardrails alongside the existing OTel/smoke-gate/ stability-audit/scoreboard items.

codenamev merged commit 75b684e into main May 25, 2026
1 check passed

codenamev deleted the claude/ai-memory-systems-research-BDgTO branch May 25, 2026 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Study] AI memory systems landscape — Nakajima/Opus 4.6 research article#6

[Study] AI memory systems landscape — Nakajima/Opus 4.6 research article#6
codenamev merged 1 commit into
mainfrom
claude/ai-memory-systems-research-BDgTO

codenamev commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

codenamev commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants