Feat: MEM-59 — extract.v5 granularity-aware dedup by hungtranphamminh · Pull Request #183 · MystenLabs/MemWal

hungtranphamminh · 2026-05-22T01:42:37Z

Summary

Why

MEM-57 (#178) added a pre-extraction dedup step: before the extractor LLM turns new conversation text into stored facts, the server retrieves the user's most-similar existing memories and passes them in a <related_memories> block so the extractor can skip duplicates. The dedup instruction was broad — "do not re-extract a fact already present (exact match or close paraphrase)."

That broad rule has a granularity blind spot. Deduplication should operate at a fixed level of abstraction, but a summary and its constituent items live at different levels:

<related_memories> may hold a summary / generalisation — e.g. "Assistant provided a list of posture videos."
The new input may hold the atomic items under that summary — e.g. the actual video titles.

Under the broad rule the extractor treats each atomic item as a "close paraphrase" of the summary and suppresses it. The result is lossy ingestion: the summary is stored, the discrete sub-facts (titles, numbers, names, list entries, quotes) are silently dropped and become unrecallable. A summary does not semantically cover its items — "a list of language apps" does not contain "Memrise uses mnemonics" — so suppressing the items loses real information.

What

A prompt-only change, extract.v4 → extract.v5, that makes the dedup rule granularity-aware:

Carve-out rule — atomic facts (names, numbers, list items, quotes, dates, titles, specific phrases) must still be extracted when <related_memories> holds only a summary or generalisation of the same topic. Suppression applies to same-level paraphrases, not to specific items under a general entry.
Worked example — a summary-in-context / atomic-items-in-input case with the expected output, so the model has a concrete demonstration of the distinction.
Preserves MEM-57's exact-paraphrase suppression explicitly (same-level duplicates are still skipped — that behaviour is the point of the dedup step and is unchanged).
Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5 (surfaced on /health, recorded in run artifacts via MEM-56 so the prompt version is attributable).

Solution

The algorithm/theory. Frame the existing memories as a deduplication filter over candidate facts. The filter's predicate was "is this candidate semantically equivalent to an existing entry?" — but it was being applied across abstraction levels, where "equivalent" is ill-defined. A summary subsumes its items in topic but not in information content; deduplicating an item against its summary discards the item's incremental information. v5 narrows the predicate to same-level equivalence: suppress a candidate only when an existing entry conveys the same specific information, not merely the same topic. Atomic specifics under a general entry are, by definition, new information and pass the filter.

Why prompt-only. The dedup logic lives entirely in the extractor's instructions — the retrieval path, the <related_memories> plumbing, the parser, and the BUCKET<TAB>FACT_TEXT output contract are all unchanged from MEM-57. The fix is a refinement of the instruction the LLM follows, so it is expressed as prompt text plus a worked example, not code. This keeps the change minimal and free of runtime/behavioural risk to the request path.

Technical change

Area	Change
`services/server/src/services/prompts/extract.txt`	Granularity carve-out rule added to the `<related_memories>` block; tightened the exact-match rule with an inline example; new worked summary-vs-atomic example (TAB-separated output, matching the existing format).
`services/server/src/services/extractor.rs`	`FACT_EXTRACTION_PROMPT_VERSION` bumped to `extract.v5`; doc comments updated to describe the v5 carve-out and reference the version const instead of a hardcoded string.
Tests	Two unit tests (see Testing).

No schema change, no migration, no new dependency, no change to retrieval/ranking/storage. Parser and output format unchanged.

Types of Changes

Testing

I have tested this code locally
I have added/updated unit tests
I have added/updated integration tests
I have tested in multiple browsers (if applicable)

Full server suite passes (228/228); clippy clean on the changed file. Two extractor tests cover this change:

parse_extracted_facts_handles_v5_granularity_extraction — a multi-atomic-item extractor output round-trips correctly through the parser.
extract_prompt_asset_contains_v5_granularity_carveout — pins the granularity rule, the worked example (incl. its TAB separator), and the preserved exact-paraphrase rule in the embedded prompt asset, so a future edit cannot silently remove them.

Checklist

My code follows the code style of this project
My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

Related Issues

Related to Feat: MEM-57 — pre-extraction dedup context (Mem0 v3 pattern) + extract.v4 #178 (MEM-57 pre-extraction context — the change this refines)

Additional Notes

Paired follow-up to MEM-57. MEM-57 shipped the pre-extraction dedup with this granularity gap documented; this PR is the scoped refinement.
Deferred to the next prompt iteration: the carve-out's wording ("more-specific … MUST extract") could, in principle, let a model rationalise re-emitting a genuine same-level duplicate as "more specific." A tighter phrasing (gate the carve-out on "not itself individually present in <related_memories>") is planned for the next prompt cycle. Left out here to keep this change to the version that was evaluated end-to-end.
Reviewed via a multi-agent deep review (prompt-logic correctness, code/test integrity) — no blockers.

…_session_assistant) Paired follow-up to MEM-57. MEM-57's pre-extraction dedup context won LOCOMO big (+10.3) but introduced a granularity-blindness regression on LME single_session_assistant (74.2 peak → 57.6): when <related_memories> held a SUMMARY of a list and the input held the atomic items, the extractor dropped the items as paraphrases of the summary. v5 adds a granularity carve-out to the <related_memories> dedup rules: specific atomic facts (names, numbers, list items, quotes, dates, titles) are extracted even when the context holds only a summary/generalisation of the same topic — plus a worked summary-vs-atomic example. MEM-57's exact-paraphrase dedup (the mechanism behind the LOCOMO win) is preserved explicitly. Pure prompt change. No infra, no latency, parser unchanged. Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5. ## Benchmark validation (baseline preset, vs v4 / MEM-57) LongMemEval — the recovery target: - single_session_assistant: 57.6 → 80.9 (+23.3) — above MEM-55 v2's 74.2 peak - multi_session: 82.5 → 80.8 (−1.7) - preference: 80.2 → 80.8 (+0.6) - knowledge_update: 86.5 → 84.3 (−2.2) - single_session_user: 96.1 → 95.5 (−0.6) - temporal: 59.5 → 60.1 (+0.6) - Overall: 76.0 → 77.9 (+1.9) LOCOMO — no-regression check (held flat, all within ±2-3 J noise): - single_hop: 67.3 → 64.8 (−2.5) - multi_hop: 56.7 → 57.9 (+1.2) - open_domain: 71.5 → 70.2 (−1.3) - adversarial: 82.4 → 81.9 (−0.5) - temporal: 45.8 → 45.2 (−0.6) - Overall: 68.5 → 67.4 (−1.1) Closes the cycle-13 RAG work: MEM-57 + MEM-59 deliver both wins as a pair — LOCOMO +10.3 (MEM-57) and LME single_session_assistant fully recovered + overall up (MEM-59). ## Tests 227/227 pass. New test parse_extracted_facts_handles_v5_granularity_extraction pins the multi-atomic-item output round-trips through the parser. Closes MEM-59.

…sset Deep-review follow-up. The existing parse_extracted_facts_handles_v5_ granularity_extraction test only exercises the (unchanged) parser, so it would still pass if a future edit silently deleted the granularity rule or worked example from prompts/extract.txt — re-introducing the LME single_session_assistant regression with no test signal. Add extract_prompt_asset_contains_v5_granularity_carveout, which asserts the embedded prompt asset still contains: the granularity rule, the worked summary-vs-atomic example (incl. its TAB-separated output line, doubling as a tab-integrity guard), v4's preserved exact-paraphrase dedup rule, and that the version const tracks at extract.v5. Pure test addition — no behavior change, prompt unchanged (still the exact text that produced the validated MEM-59 benchmark numbers). 228/228 pass.

hungtranphamminh added 2 commits May 21, 2026 22:43

ducnmm approved these changes May 22, 2026

View reviewed changes

hungtranphamminh merged commit de17c98 into dev May 22, 2026
8 checks passed

hungtranphamminh deleted the feat/MEM-59-extract-v5-granularity-dedup branch May 22, 2026 05:57

hungtranphamminh mentioned this pull request May 22, 2026

Fix: ENG-1785 — apply composite ranker to manual recall (parity with non-manual) #185

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: MEM-59 — extract.v5 granularity-aware dedup#183

Feat: MEM-59 — extract.v5 granularity-aware dedup#183
hungtranphamminh merged 2 commits into
devfrom
feat/MEM-59-extract-v5-granularity-dedup

hungtranphamminh commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hungtranphamminh commented May 22, 2026

Summary

Why

What

Solution

Technical change

Types of Changes

Testing

Checklist

Related Issues

Additional Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants