fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction#58
Merged
Merged
Conversation
…x latent fact_recall row.get crash
THE BUG (C12.a)
Mnemosyne.remember(extract=True) reimplemented half of beam's
_extract_and_store_facts helper inline — calling triples.add_facts
but never _store_facts_in_table. Result: wrapper-extracted facts
were visible through recall() (which scores fact triples) but
invisible through fact_recall() (which queries the facts table).
Same anti-pattern as C26 / C23 / C20: tests instantiated BeamMemory
directly and never exercised the wrapper extract path. Zero coverage
for Mnemosyne.remember(extract=True) on either the triples or facts
side.
CHANGES
1. Mnemosyne.remember (mnemosyne/core/memory.py)
Pass extract/extract_entities through to the FIRST self.beam.remember()
call. Drop the wrapper's two inline extract blocks (~25 lines) so all
extraction routes through BeamMemory's canonical helpers
(_extract_and_store_entities, _extract_and_store_facts) plus NAI-5's
_ingest_graph_and_veracity. The wrapper is now a thin shim around
Beam for extraction concerns; its only remaining job is the legacy
`memories` dual-write.
Why first call, not second: extraction must run once. Passing flags
to the first call means the dedup-only second call doesn't re-run
_extract_and_store_facts and double-write the facts table.
2. fact_recall row access (mnemosyne/core/beam.py:2160-2174)
sqlite3.Row supports bracket access but NOT .get(); the existing
code called row.get("confidence", 0.5) etc. and crashed
AttributeError the moment the facts table contained rows. The bug
was latent because the wrapper never populated the facts table
(the C12.a bug above masked it). Once C12.a is fixed and rows
exist, fact_recall would crash on the first iteration.
Fix: convert raw_row to dict at the top of the loop so the
.get(...) calls work as the author intended. Also tighten the
None handling for confidence (was using `or 0.5` which would
misroute confidence=0.0 as 0.5).
NEW REGRESSION TESTS
tests/test_extract_parity.py — 5 tests:
* test_wrapper_extract_writes_facts_table — RED before fix
* test_wrapper_extract_still_writes_triples_table — regression guard
* test_wrapper_extracted_fact_is_visible_via_fact_recall — end-to-end
contract; required BOTH the wrapper fix AND the fact_recall fix
to pass
* test_wrapper_and_direct_paths_produce_same_table_state — locks in
the parity contract v2 plan §C12.a calls out
* test_wrapper_extract_entities_writes_mention_triples — adjacent
parity check for the entity-extraction path which Option A also
delegates to BeamMemory
DEFERRED
- C12.b (provider REMEMBER_SCHEMA missing extract / bank / metadata /
author_* / channel_*) — separate ledger item.
- C14 (public-recall surfaces fact-derived hits with provenance) —
separate ledger item; partially relieved by NAI-5's fact voice
in recall scoring.
VERIFICATION
tests/test_extract_parity.py: 5 passed (new file, 3 RED before fix,
1 RED before fact_recall fix)
broader sweep (extraction + entities + beam): 119 passed
full suite excluding LLM-dependent tests: 469 passed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt beam call, normalize row access CROSS-MODEL ADVERSARIAL REVIEW Both Claude and Codex flagged the same primary regression: my initial fix moved extraction inside BeamMemory.remember, which has an early-return on duplicate content (beam.py:942-963). The early-return runs UPDATE + _ingest_graph_and_veracity and returns BEFORE the extract blocks at lines 985-988. So `mem.remember(same_content, extract=True)` after a duplicate silently skipped extraction entirely — breaking the backfill scenario this PR was supposed to enable. Both reviewers also flagged a related issue: the wrapper made TWO beam.remember calls per Mnemosyne.remember, both of which run _ingest_graph_and_veracity. The second call duplicated graph_edges (PK is autoincrement, not content-derived) and bumped mention_count for what was a single user-level write. CHANGES 1. Dedup branch runs extraction (CRITICAL — addresses Codex finding AxDSan#1) mnemosyne/core/beam.py:961-970 — when _find_duplicate fires, also call _extract_and_store_entities and _extract_and_store_facts before returning. Extract=True now means "extraction always runs" regardless of insert vs dedup, matching the wrapper's pre-fix inline behavior. 2. Drop redundant second beam.remember call (addresses Claude finding AxDSan#1 and Codex finding AxDSan#3) mnemosyne/core/memory.py — the wrapper used to call beam.remember twice per remember (once for ID generation, once for "ID-sync dedup"). Now that the first call accepts extract flags and produces the working memory row with the right ID, the second call only runs the dedup branch redundantly + double-fires _ingest_graph_and_veracity. Removed entirely. Fewer surprises, fewer duplicate graph edges, fewer mention_count inflations. 3. Normalize fact_recall row access (addresses Claude finding AxDSan#4) mnemosyne/core/beam.py:2160-2178 — replaced inconsistent `or ""` collapse with the same `is not None` pattern used for confidence. Also extracted intermediate locals so each row.get is computed once. NEW TEST tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfill Locks in the dedup-extraction contract by exactly the regression scenario Codex found: pre-existing working_memory row (extract=False), then mem.remember(same_content, extract=True) — asserts dedup fires AND facts table populates. Without the fix this would fail silently. DEFERRED (per /review notes) - C12.a.cognee filed in ledger: cognee.py importer has the same latent sqlite3.Row.get() bug the C12.a fix surfaced for fact_recall, currently masked by `except Exception: pass`. Separate small follow-up PR. - NaN confidence handling — edge case, low value. - Double-call execution count nit (perf) — minor. VERIFICATION tests/test_extract_parity.py: 6 passed (was 5, +1 backfill test) full suite excluding LLM-dependent tests: 470 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
AxDSan
pushed a commit
that referenced
this pull request
May 11, 2026
THE BUG
mnemosyne/core/importers/cognee.py:106-109 set
`conn.row_factory = sqlite3.Row` and then called `row.get("id", "")`
etc. on each row. sqlite3.Row supports bracket access but NOT `.get()`,
so each iteration raised AttributeError. The surrounding
`except Exception: pass` swallowed it silently, so direct cognee
imports always returned `[]` regardless of how much data was in the
data_chunks table.
Same pattern as the latent fact_recall row.get bug fixed in PR #58
(C12.a). Both Claude and Codex adversarial reviewers found this
during /review for #58 and recommended a separate small PR.
THE FIX
Convert raw_row to dict at the top of the per-row loop so the existing
.get() calls work as the author intended. Also normalized the metadata
fallbacks to use the same `or ""` pattern (consistent with how content
falls back text -> content -> "").
REGRESSION TESTS
tests/test_importers/test_cognee.py — 3 tests:
* test_extract_direct_returns_rows_from_data_chunks (RED before fix)
* test_extract_direct_handles_null_text_falls_back_to_content (RED before fix)
* test_extract_direct_returns_empty_when_db_missing (defensive contract,
GREEN both before and after — locks in the existing missing-DB behavior)
VERIFICATION
tests/test_importers/test_cognee.py: 3 passed
full suite excluding LLM-dependent tests: 467 passed
LEDGER
.hermes/ledger/memory-contract.md C12.a.cognee row will move to PR_OPEN
when this lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mnemosyne.remember(extract=True)now writes to thefactstable the same wayBeamMemory.remember(extract=True)already did. Previously the wrapper reimplemented half of beam's canonical_extract_and_store_factshelper inline, callingtriples.add_facts(...)but never_store_facts_in_table(...). Wrapper-extracted facts were visible throughrecall()(which scores fact triples) but invisible throughfact_recall()(which queries the facts table).BeamMemory.remember(drops ~25 lines of duplicated logic).BeamMemory.remember's dedup early-return now also runs extraction so backfill calls work (mem.remember(same_content, extract=True)on an already-existing row populates the facts table).fact_recallwas crashing onsqlite3.Row.get(...)once rows existed — latent bug, masked while the wrapper kept the table empty. Converted to dict at the top of the per-row loop. Also normalized the None-handling for confidence/subject/predicate.tests/test_extract_parity.py: 3 RED before the wrapper fix, 1 RED before thefact_recallfix, 1 RED before the dedup-extract fix.The bug
Mnemosyne stores knowledge graph facts in two places:
triplestablerecall()— scores fact triples as part of NAI-5's "fact voice"factstablefact_recall()— public API for structured fact retrievalThe canonical helper
_extract_and_store_factsinmnemosyne/core/beam.py:597writes to BOTH:Mnemosyne.remember(extract=True)reimplemented half of this inline atmemory.py:299-309:Result: a memory remembered through the wrapper showed up in
recall()(because triples are populated and NAI-5 scores them) butfact_recall()returned nothing for the same memory.Why it slipped through
Same anti-pattern as C26 / C23 / C20: tests called
BeamMemory.remember(extract=True)directly withdb_path=tmpand never went throughMnemosyne.remember(extract=True). The wrapper's parallel extract path had zero coverage.tests/test_extraction_integration.pyexists but it manually injects facts viaTripleStore.add_facts(...)rather than exercisingextract=Trueend-to-end through the wrapper.The fix (commit 1: ab4a81b)
Mnemosyne.remember(memory.py) — passextract=extract, extract_entities=extract_entitiesto the firstself.beam.remember(...)call. Drop the wrapper's two inline extract blocks (~25 lines). All extraction now routes through BeamMemory's canonical helpers (_extract_and_store_entities,_extract_and_store_facts) plus NAI-5's_ingest_graph_and_veracity. The wrapper becomes a thin shim — its only remaining job is the legacymemoriesdual-write.fact_recallrow access (beam.py:2160-2174) —sqlite3.Rowsupports bracket access but NOT.get(). The existing code calledrow.get("confidence", 0.5)etc. and crashedAttributeErrorthe moment the facts table contained rows. The bug was unreachable while the wrapper bug above kept the table empty. Once C12.a was fixed and rows existed,fact_recallwould crash on the first iteration. Convert to dict at the top of the loop so.get(...)works as the author intended.Pre-landing review (commit 2: 85db0dd)
Cross-model adversarial review (Claude subagent + Codex CLI). Both flagged the same primary regression introduced by commit 1:
Both also flagged a related issue: the wrapper made TWO
beam.remembercalls perMnemosyne.remember, both of which run_ingest_graph_and_veracity. The second call duplicated graph_edges (autoincrement PK) and bumpedmention_countfor what was a single user-level write.Fixes applied in commit 2:
Dedup branch runs extraction (CRITICAL — addresses Codex's primary finding).
mnemosyne/core/beam.py— when_find_duplicatefires, the dedup branch now also calls_extract_and_store_entitiesand_extract_and_store_factsbefore returning.extract=Truenow means "extraction always runs" regardless of insert vs dedup, matching the wrapper's pre-fix inline behavior.Drop redundant second
beam.remembercall. The wrapper used to callbeam.remembertwice per remember (once for ID generation, once for "ID-sync dedup"). Now that the first call accepts extract flags and produces the working_memory row with the right ID, the second call only ran the dedup branch redundantly + double-fired_ingest_graph_and_veracity. Removed entirely.Normalize
fact_recallrow access. Replaced inconsistentor ""collapse with the sameis not Nonepattern used for confidence. Extracted intermediate locals so eachrow.getis computed once.New backfill regression test
tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfilllocks in the dedup-extraction contract by exactly the regression scenario the adversarial review found: pre-existing working_memory row (fromBeamMemory.remember(extract=False)), thenmem.remember(same_content, extract=True)— asserts dedup fires AND facts table populates. Without commit 2's fix this would fail silently; with the fix it's GREEN.Behavior change worth flagging
Pre-fix:
Mnemosyne.remember(extract=True)with already-existing working_memory content → wrapper's inline extract block ran AFTER both beam.remember calls regardless of dedup. Triples populated; facts table never.Post-fix:
Mnemosyne.remember(extract=True)with already-existing content → first beam.remember hits dedup, runs extraction (including facts table). Single user-level call → single set of extraction effects.This is the intended behavior for backfill scenarios. Anyone who depended on "extract=True is a no-op on duplicate content" (no callers found) would see facts/triples now actually appear.
Deferred (filed separately)
.hermes/ledger/memory-contract.mdlocally):mnemosyne/core/importers/cognee.py:106-109has the same latentsqlite3.Row.get()pattern asfact_recall. Currently masked byexcept Exception: pass. Both adversarial reviewers flagged it independently. Small follow-up PR.dict.getperf nit — bounded bytop_k, not a real concern.Test plan
uv run pytest tests/test_extract_parity.py -q(6 passed locally)uv run pytest -q --ignore=tests/test_local_llm.py --ignore=tests/test_llm_backends.py(470 passed locally)mem.remember("Alice was born in Boston.", extract=True)thenmem.beam.fact_recall("alice")returns the fact (was empty pre-fix).extract=False, then re-remember the same content withextract=True. Verify the facts table populates after the second call.fact_recallreturns sensibly withoutAttributeError.Verification
```
tests/test_extract_parity.py: 6 passed (new file)
broader sweep (extraction + entities + beam): 119 passed
full suite excluding LLM-dependent tests: 470 passed
```