[Study] AI memory systems landscape — Nakajima/Opus 4.6 research article#6
Merged
Merged
Conversation
Meta-study of the 2026-03-26 article surveying 7 memory benchmarks and ~12 memory systems (Hindsight, Zep/Graphiti, MemGPT/Letta, Mem0, Cognee, HippoRAG, etc.). Headline finding: ClaudeMemory sits architecturally closest to Mem0 (49% on LongMemEval). Two unforced gaps separate us from Zep-class systems (71.2%) — we already store the graph but don't traverse it at query time, and we have temporal columns we don't rank by. Four new High Priority items in improvements.md: - #64 graph traversal as third RRF source - #65 temporal-aware retrieval strategy - #66 bi-temporal schema cleanup (world vs ingest time) - #67 LongMemEval benchmark integration Promotes existing #57 (provenance-strength-aware ranking) Medium→High as the soft version of Hindsight's epistemic separation pattern. Features to avoid: cross-encoder LLM reranking, full 4-column Graphiti timestamps, cloud-required graph DBs, LoCoMo for cross-vendor comparison (article itself discredits it). See docs/influence/ai-memory-systems-2026.md. https://claude.ai/code/session_01HWt4E8LyPnkrfctgGieupz
codenamev
added a commit
that referenced
this pull request
Jun 1, 2026
The original 0.12 plan was "Release Discipline" (#6 scoreboard + #11 API audit + #12 smoke gate). All three landed on time. Since then OTel ingestion (~15 commits, schema v18, new public surface) and the audit toolkit + contamination guardrails (this week's work) also landed — unplanned, but both serve the 1.0 visibility and stability pillars directly. Re-anchors the punchlist on three explicit 1.0 pillars (stability, visibility, long-horizon quality) so prioritization decisions are defensible. Adds #13 (audit toolkit) and #14 (OTel) as canonical 0.12 entries. Marks #3 (harm corpus expansion) and #4 (CLAUDE.md baseline) as in-progress Path B blockers — the remaining work before 0.12 tags. Updates velocity table: 0.12 widened ~1.5w → ~4w, 1.0 calendar shifted ~3w later, soak window held at 2-3w. CHANGELOG [Unreleased] gains entries for the audit toolkit and the contamination guardrails alongside the existing OTel/smoke-gate/ stability-audit/scoreboard items.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Meta-study of the 2026-03-26 article surveying 7 memory benchmarks
and ~12 memory systems (Hindsight, Zep/Graphiti, MemGPT/Letta, Mem0,
Cognee, HippoRAG, etc.).
Headline finding: ClaudeMemory sits architecturally closest to Mem0
(49% on LongMemEval). Two unforced gaps separate us from Zep-class
systems (71.2%) — we already store the graph but don't traverse it
at query time, and we have temporal columns we don't rank by.
Four new High Priority items in improvements.md:
Promotes existing #57 (provenance-strength-aware ranking) Medium→High
as the soft version of Hindsight's epistemic separation pattern.
Features to avoid: cross-encoder LLM reranking, full 4-column
Graphiti timestamps, cloud-required graph DBs, LoCoMo for
cross-vendor comparison (article itself discredits it).
See docs/influence/ai-memory-systems-2026.md.
https://claude.ai/code/session_01HWt4E8LyPnkrfctgGieupz