Skip to content

Feat: MEM-59 — extract.v5 granularity-aware dedup#183

Merged
hungtranphamminh merged 2 commits into
devfrom
feat/MEM-59-extract-v5-granularity-dedup
May 22, 2026
Merged

Feat: MEM-59 — extract.v5 granularity-aware dedup#183
hungtranphamminh merged 2 commits into
devfrom
feat/MEM-59-extract-v5-granularity-dedup

Conversation

@hungtranphamminh
Copy link
Copy Markdown
Collaborator

Summary

Why

MEM-57 (#178) added a pre-extraction dedup step: before the extractor LLM turns new conversation text into stored facts, the server retrieves the user's most-similar existing memories and passes them in a <related_memories> block so the extractor can skip duplicates. The dedup instruction was broad — "do not re-extract a fact already present (exact match or close paraphrase)."

That broad rule has a granularity blind spot. Deduplication should operate at a fixed level of abstraction, but a summary and its constituent items live at different levels:

  • <related_memories> may hold a summary / generalisation — e.g. "Assistant provided a list of posture videos."
  • The new input may hold the atomic items under that summary — e.g. the actual video titles.

Under the broad rule the extractor treats each atomic item as a "close paraphrase" of the summary and suppresses it. The result is lossy ingestion: the summary is stored, the discrete sub-facts (titles, numbers, names, list entries, quotes) are silently dropped and become unrecallable. A summary does not semantically cover its items — "a list of language apps" does not contain "Memrise uses mnemonics" — so suppressing the items loses real information.

What

A prompt-only change, extract.v4 → extract.v5, that makes the dedup rule granularity-aware:

  1. Carve-out rule — atomic facts (names, numbers, list items, quotes, dates, titles, specific phrases) must still be extracted when <related_memories> holds only a summary or generalisation of the same topic. Suppression applies to same-level paraphrases, not to specific items under a general entry.
  2. Worked example — a summary-in-context / atomic-items-in-input case with the expected output, so the model has a concrete demonstration of the distinction.
  3. Preserves MEM-57's exact-paraphrase suppression explicitly (same-level duplicates are still skipped — that behaviour is the point of the dedup step and is unchanged).
  4. Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5 (surfaced on /health, recorded in run artifacts via MEM-56 so the prompt version is attributable).

Solution

The algorithm/theory. Frame the existing memories as a deduplication filter over candidate facts. The filter's predicate was "is this candidate semantically equivalent to an existing entry?" — but it was being applied across abstraction levels, where "equivalent" is ill-defined. A summary subsumes its items in topic but not in information content; deduplicating an item against its summary discards the item's incremental information. v5 narrows the predicate to same-level equivalence: suppress a candidate only when an existing entry conveys the same specific information, not merely the same topic. Atomic specifics under a general entry are, by definition, new information and pass the filter.

Why prompt-only. The dedup logic lives entirely in the extractor's instructions — the retrieval path, the <related_memories> plumbing, the parser, and the BUCKET<TAB>FACT_TEXT output contract are all unchanged from MEM-57. The fix is a refinement of the instruction the LLM follows, so it is expressed as prompt text plus a worked example, not code. This keeps the change minimal and free of runtime/behavioural risk to the request path.

Technical change

Area Change
services/server/src/services/prompts/extract.txt Granularity carve-out rule added to the <related_memories> block; tightened the exact-match rule with an inline example; new worked summary-vs-atomic example (TAB-separated output, matching the existing format).
services/server/src/services/extractor.rs FACT_EXTRACTION_PROMPT_VERSION bumped to extract.v5; doc comments updated to describe the v5 carve-out and reference the version const instead of a hardcoded string.
Tests Two unit tests (see Testing).

No schema change, no migration, no new dependency, no change to retrieval/ranking/storage. Parser and output format unchanged.

Types of Changes

  • Breaking change
  • New feature (non-breaking change which adds functionality)
  • Bug fix
  • Performance optimization
  • Refactor
  • Library update
  • Documentation
  • Test (non-breaking change related to testing)
  • Security awareness

Testing

  • I have tested this code locally
  • I have added/updated unit tests
  • I have added/updated integration tests
  • I have tested in multiple browsers (if applicable)

Full server suite passes (228/228); clippy clean on the changed file. Two extractor tests cover this change:

  • parse_extracted_facts_handles_v5_granularity_extraction — a multi-atomic-item extractor output round-trips correctly through the parser.
  • extract_prompt_asset_contains_v5_granularity_carveout — pins the granularity rule, the worked example (incl. its TAB separator), and the preserved exact-paraphrase rule in the embedded prompt asset, so a future edit cannot silently remove them.

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Related Issues

Additional Notes

  • Paired follow-up to MEM-57. MEM-57 shipped the pre-extraction dedup with this granularity gap documented; this PR is the scoped refinement.
  • Deferred to the next prompt iteration: the carve-out's wording ("more-specific … MUST extract") could, in principle, let a model rationalise re-emitting a genuine same-level duplicate as "more specific." A tighter phrasing (gate the carve-out on "not itself individually present in <related_memories>") is planned for the next prompt cycle. Left out here to keep this change to the version that was evaluated end-to-end.
  • Reviewed via a multi-agent deep review (prompt-logic correctness, code/test integrity) — no blockers.

…_session_assistant)

Paired follow-up to MEM-57. MEM-57's pre-extraction dedup context won
LOCOMO big (+10.3) but introduced a granularity-blindness regression on
LME single_session_assistant (74.2 peak → 57.6): when <related_memories>
held a SUMMARY of a list and the input held the atomic items, the
extractor dropped the items as paraphrases of the summary.

v5 adds a granularity carve-out to the <related_memories> dedup rules:
specific atomic facts (names, numbers, list items, quotes, dates, titles)
are extracted even when the context holds only a summary/generalisation
of the same topic — plus a worked summary-vs-atomic example. MEM-57's
exact-paraphrase dedup (the mechanism behind the LOCOMO win) is preserved
explicitly.

Pure prompt change. No infra, no latency, parser unchanged. Bumps
FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5.

## Benchmark validation (baseline preset, vs v4 / MEM-57)

LongMemEval — the recovery target:
  - single_session_assistant:  57.6 → 80.9  (+23.3) — above MEM-55 v2's 74.2 peak
  - multi_session:             82.5 → 80.8  (−1.7)
  - preference:                80.2 → 80.8  (+0.6)
  - knowledge_update:          86.5 → 84.3  (−2.2)
  - single_session_user:       96.1 → 95.5  (−0.6)
  - temporal:                  59.5 → 60.1  (+0.6)
  - Overall:                   76.0 → 77.9  (+1.9)

LOCOMO — no-regression check (held flat, all within ±2-3 J noise):
  - single_hop:    67.3 → 64.8  (−2.5)
  - multi_hop:     56.7 → 57.9  (+1.2)
  - open_domain:   71.5 → 70.2  (−1.3)
  - adversarial:   82.4 → 81.9  (−0.5)
  - temporal:      45.8 → 45.2  (−0.6)
  - Overall:       68.5 → 67.4  (−1.1)

Closes the cycle-13 RAG work: MEM-57 + MEM-59 deliver both wins as a
pair — LOCOMO +10.3 (MEM-57) and LME single_session_assistant fully
recovered + overall up (MEM-59).

## Tests

227/227 pass. New test parse_extracted_facts_handles_v5_granularity_extraction
pins the multi-atomic-item output round-trips through the parser.

Closes MEM-59.
…sset

Deep-review follow-up. The existing parse_extracted_facts_handles_v5_
granularity_extraction test only exercises the (unchanged) parser, so it
would still pass if a future edit silently deleted the granularity rule
or worked example from prompts/extract.txt — re-introducing the LME
single_session_assistant regression with no test signal.

Add extract_prompt_asset_contains_v5_granularity_carveout, which asserts
the embedded prompt asset still contains: the granularity rule, the
worked summary-vs-atomic example (incl. its TAB-separated output line,
doubling as a tab-integrity guard), v4's preserved exact-paraphrase
dedup rule, and that the version const tracks at extract.v5.

Pure test addition — no behavior change, prompt unchanged (still the
exact text that produced the validated MEM-59 benchmark numbers).
228/228 pass.
@hungtranphamminh hungtranphamminh merged commit de17c98 into dev May 22, 2026
8 checks passed
@hungtranphamminh hungtranphamminh deleted the feat/MEM-59-extract-v5-granularity-dedup branch May 22, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants