Feat: MEM-59 — extract.v5 granularity-aware dedup#183
Merged
Conversation
…_session_assistant) Paired follow-up to MEM-57. MEM-57's pre-extraction dedup context won LOCOMO big (+10.3) but introduced a granularity-blindness regression on LME single_session_assistant (74.2 peak → 57.6): when <related_memories> held a SUMMARY of a list and the input held the atomic items, the extractor dropped the items as paraphrases of the summary. v5 adds a granularity carve-out to the <related_memories> dedup rules: specific atomic facts (names, numbers, list items, quotes, dates, titles) are extracted even when the context holds only a summary/generalisation of the same topic — plus a worked summary-vs-atomic example. MEM-57's exact-paraphrase dedup (the mechanism behind the LOCOMO win) is preserved explicitly. Pure prompt change. No infra, no latency, parser unchanged. Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5. ## Benchmark validation (baseline preset, vs v4 / MEM-57) LongMemEval — the recovery target: - single_session_assistant: 57.6 → 80.9 (+23.3) — above MEM-55 v2's 74.2 peak - multi_session: 82.5 → 80.8 (−1.7) - preference: 80.2 → 80.8 (+0.6) - knowledge_update: 86.5 → 84.3 (−2.2) - single_session_user: 96.1 → 95.5 (−0.6) - temporal: 59.5 → 60.1 (+0.6) - Overall: 76.0 → 77.9 (+1.9) LOCOMO — no-regression check (held flat, all within ±2-3 J noise): - single_hop: 67.3 → 64.8 (−2.5) - multi_hop: 56.7 → 57.9 (+1.2) - open_domain: 71.5 → 70.2 (−1.3) - adversarial: 82.4 → 81.9 (−0.5) - temporal: 45.8 → 45.2 (−0.6) - Overall: 68.5 → 67.4 (−1.1) Closes the cycle-13 RAG work: MEM-57 + MEM-59 deliver both wins as a pair — LOCOMO +10.3 (MEM-57) and LME single_session_assistant fully recovered + overall up (MEM-59). ## Tests 227/227 pass. New test parse_extracted_facts_handles_v5_granularity_extraction pins the multi-atomic-item output round-trips through the parser. Closes MEM-59.
…sset Deep-review follow-up. The existing parse_extracted_facts_handles_v5_ granularity_extraction test only exercises the (unchanged) parser, so it would still pass if a future edit silently deleted the granularity rule or worked example from prompts/extract.txt — re-introducing the LME single_session_assistant regression with no test signal. Add extract_prompt_asset_contains_v5_granularity_carveout, which asserts the embedded prompt asset still contains: the granularity rule, the worked summary-vs-atomic example (incl. its TAB-separated output line, doubling as a tab-integrity guard), v4's preserved exact-paraphrase dedup rule, and that the version const tracks at extract.v5. Pure test addition — no behavior change, prompt unchanged (still the exact text that produced the validated MEM-59 benchmark numbers). 228/228 pass.
ducnmm
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
MEM-57 (#178) added a pre-extraction dedup step: before the extractor LLM turns new conversation text into stored facts, the server retrieves the user's most-similar existing memories and passes them in a
<related_memories>block so the extractor can skip duplicates. The dedup instruction was broad — "do not re-extract a fact already present (exact match or close paraphrase)."That broad rule has a granularity blind spot. Deduplication should operate at a fixed level of abstraction, but a summary and its constituent items live at different levels:
<related_memories>may hold a summary / generalisation — e.g. "Assistant provided a list of posture videos."Under the broad rule the extractor treats each atomic item as a "close paraphrase" of the summary and suppresses it. The result is lossy ingestion: the summary is stored, the discrete sub-facts (titles, numbers, names, list entries, quotes) are silently dropped and become unrecallable. A summary does not semantically cover its items — "a list of language apps" does not contain "Memrise uses mnemonics" — so suppressing the items loses real information.
What
A prompt-only change,
extract.v4 → extract.v5, that makes the dedup rule granularity-aware:<related_memories>holds only a summary or generalisation of the same topic. Suppression applies to same-level paraphrases, not to specific items under a general entry.FACT_EXTRACTION_PROMPT_VERSIONextract.v4 → extract.v5(surfaced on/health, recorded in run artifacts via MEM-56 so the prompt version is attributable).Solution
The algorithm/theory. Frame the existing memories as a deduplication filter over candidate facts. The filter's predicate was "is this candidate semantically equivalent to an existing entry?" — but it was being applied across abstraction levels, where "equivalent" is ill-defined. A summary subsumes its items in topic but not in information content; deduplicating an item against its summary discards the item's incremental information. v5 narrows the predicate to same-level equivalence: suppress a candidate only when an existing entry conveys the same specific information, not merely the same topic. Atomic specifics under a general entry are, by definition, new information and pass the filter.
Why prompt-only. The dedup logic lives entirely in the extractor's instructions — the retrieval path, the
<related_memories>plumbing, the parser, and theBUCKET<TAB>FACT_TEXToutput contract are all unchanged from MEM-57. The fix is a refinement of the instruction the LLM follows, so it is expressed as prompt text plus a worked example, not code. This keeps the change minimal and free of runtime/behavioural risk to the request path.Technical change
services/server/src/services/prompts/extract.txt<related_memories>block; tightened the exact-match rule with an inline example; new worked summary-vs-atomic example (TAB-separated output, matching the existing format).services/server/src/services/extractor.rsFACT_EXTRACTION_PROMPT_VERSIONbumped toextract.v5; doc comments updated to describe the v5 carve-out and reference the version const instead of a hardcoded string.No schema change, no migration, no new dependency, no change to retrieval/ranking/storage. Parser and output format unchanged.
Types of Changes
Testing
Full server suite passes (228/228); clippy clean on the changed file. Two extractor tests cover this change:
parse_extracted_facts_handles_v5_granularity_extraction— a multi-atomic-item extractor output round-trips correctly through the parser.extract_prompt_asset_contains_v5_granularity_carveout— pins the granularity rule, the worked example (incl. its TAB separator), and the preserved exact-paraphrase rule in the embedded prompt asset, so a future edit cannot silently remove them.Checklist
Related Issues
Additional Notes
<related_memories>") is planned for the next prompt cycle. Left out here to keep this change to the version that was evaluated end-to-end.