fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction by kohai-ut · Pull Request #58 · AxDSan/mnemosyne

kohai-ut · 2026-05-09T22:50:49Z

Summary

Mnemosyne.remember(extract=True) now writes to the facts table the same way BeamMemory.remember(extract=True) already did. Previously the wrapper reimplemented half of beam's canonical _extract_and_store_facts helper inline, calling triples.add_facts(...) but never _store_facts_in_table(...). Wrapper-extracted facts were visible through recall() (which scores fact triples) but invisible through fact_recall() (which queries the facts table).
Three bundled fixes:
1. Wrapper delegates extraction to BeamMemory.remember (drops ~25 lines of duplicated logic).
2. BeamMemory.remember's dedup early-return now also runs extraction so backfill calls work (mem.remember(same_content, extract=True) on an already-existing row populates the facts table).
3. fact_recall was crashing on sqlite3.Row.get(...) once rows existed — latent bug, masked while the wrapper kept the table empty. Converted to dict at the top of the per-row loop. Also normalized the None-handling for confidence/subject/predicate.
6 regression tests in tests/test_extract_parity.py: 3 RED before the wrapper fix, 1 RED before the fact_recall fix, 1 RED before the dedup-extract fix.

The bug

Mnemosyne stores knowledge graph facts in two places:

Storage	Used by
`triples` table	`recall()` — scores fact triples as part of NAI-5's "fact voice"
`facts` table	`fact_recall()` — public API for structured fact retrieval

The canonical helper _extract_and_store_facts in mnemosyne/core/beam.py:597 writes to BOTH:

def _extract_and_store_facts(beam, memory_id, content, source=""):
    facts = extract_facts_safe(content)
    if not facts:
        return
    triples = TripleStore(db_path=beam.db_path)
    triples.add_facts(memory_id, facts, source=source, confidence=0.7)
    _store_facts_in_table(beam, memory_id, content, source, facts)

Mnemosyne.remember(extract=True) reimplemented half of this inline at memory.py:299-309:

if extract:
    try:
        from mnemosyne.core.extraction import extract_facts_safe
        from mnemosyne.core.triples import TripleStore
        facts = extract_facts_safe(content)
        if facts:
            triples = TripleStore(db_path=self.db_path)
            triples.add_facts(memory_id, facts, source=source, confidence=0.7)
                                 # ← never called _store_facts_in_table
    except Exception:
        pass

Result: a memory remembered through the wrapper showed up in recall() (because triples are populated and NAI-5 scores them) but fact_recall() returned nothing for the same memory.

Why it slipped through

Same anti-pattern as C26 / C23 / C20: tests called BeamMemory.remember(extract=True) directly with db_path=tmp and never went through Mnemosyne.remember(extract=True). The wrapper's parallel extract path had zero coverage. tests/test_extraction_integration.py exists but it manually injects facts via TripleStore.add_facts(...) rather than exercising extract=True end-to-end through the wrapper.

The fix (commit 1: `ab4a81b`)

Mnemosyne.remember (memory.py) — pass extract=extract, extract_entities=extract_entities to the first self.beam.remember(...) call. Drop the wrapper's two inline extract blocks (~25 lines). All extraction now routes through BeamMemory's canonical helpers (_extract_and_store_entities, _extract_and_store_facts) plus NAI-5's _ingest_graph_and_veracity. The wrapper becomes a thin shim — its only remaining job is the legacy memories dual-write.

fact_recall row access (beam.py:2160-2174) — sqlite3.Row supports bracket access but NOT .get(). The existing code called row.get("confidence", 0.5) etc. and crashed AttributeError the moment the facts table contained rows. The bug was unreachable while the wrapper bug above kept the table empty. Once C12.a was fixed and rows existed, fact_recall would crash on the first iteration. Convert to dict at the top of the loop so .get(...) works as the author intended.

Pre-landing review (commit 2: `85db0dd`)

Cross-model adversarial review (Claude subagent + Codex CLI). Both flagged the same primary regression introduced by commit 1:

BeamMemory.remember has an early-return on duplicate content (beam.py:942-963). The early-return runs UPDATE + _ingest_graph_and_veracity and returns BEFORE the extract blocks at lines 985-988. So mem.remember(same_content, extract=True) after a duplicate silently skipped extraction entirely — breaking backfill on pre-existing databases.

Both also flagged a related issue: the wrapper made TWO beam.remember calls per Mnemosyne.remember, both of which run _ingest_graph_and_veracity. The second call duplicated graph_edges (autoincrement PK) and bumped mention_count for what was a single user-level write.

Fixes applied in commit 2:

Dedup branch runs extraction (CRITICAL — addresses Codex's primary finding). mnemosyne/core/beam.py — when _find_duplicate fires, the dedup branch now also calls _extract_and_store_entities and _extract_and_store_facts before returning. extract=True now means "extraction always runs" regardless of insert vs dedup, matching the wrapper's pre-fix inline behavior.
Drop redundant second beam.remember call. The wrapper used to call beam.remember twice per remember (once for ID generation, once for "ID-sync dedup"). Now that the first call accepts extract flags and produces the working_memory row with the right ID, the second call only ran the dedup branch redundantly + double-fired _ingest_graph_and_veracity. Removed entirely.
Normalize fact_recall row access. Replaced inconsistent or "" collapse with the same is not None pattern used for confidence. Extracted intermediate locals so each row.get is computed once.

New backfill regression test

tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfill locks in the dedup-extraction contract by exactly the regression scenario the adversarial review found: pre-existing working_memory row (from BeamMemory.remember(extract=False)), then mem.remember(same_content, extract=True) — asserts dedup fires AND facts table populates. Without commit 2's fix this would fail silently; with the fix it's GREEN.

Behavior change worth flagging

Pre-fix: Mnemosyne.remember(extract=True) with already-existing working_memory content → wrapper's inline extract block ran AFTER both beam.remember calls regardless of dedup. Triples populated; facts table never.

Post-fix: Mnemosyne.remember(extract=True) with already-existing content → first beam.remember hits dedup, runs extraction (including facts table). Single user-level call → single set of extraction effects.

This is the intended behavior for backfill scenarios. Anyone who depended on "extract=True is a no-op on duplicate content" (no callers found) would see facts/triples now actually appear.

Deferred (filed separately)

C12.a.cognee (added to .hermes/ledger/memory-contract.md locally): mnemosyne/core/importers/cognee.py:106-109 has the same latent sqlite3.Row.get() pattern as fact_recall. Currently masked by except Exception: pass. Both adversarial reviewers flagged it independently. Small follow-up PR.
NaN confidence handling — edge case noted by Claude, not exercised by current writers.
Per-row dict.get perf nit — bounded by top_k, not a real concern.

Test plan

uv run pytest tests/test_extract_parity.py -q (6 passed locally)
uv run pytest -q --ignore=tests/test_local_llm.py --ignore=tests/test_llm_backends.py (470 passed locally)
Manual: mem.remember("Alice was born in Boston.", extract=True) then mem.beam.fact_recall("alice") returns the fact (was empty pre-fix).
Manual backfill: write a memory with extract=False, then re-remember the same content with extract=True. Verify the facts table populates after the second call.
Manual fact_recall under load: with several extract=True memories, fact_recall returns sensibly without AttributeError.

Verification

```
tests/test_extract_parity.py: 6 passed (new file)
broader sweep (extraction + entities + beam): 119 passed
full suite excluding LLM-dependent tests: 470 passed
```

…x latent fact_recall row.get crash THE BUG (C12.a) Mnemosyne.remember(extract=True) reimplemented half of beam's _extract_and_store_facts helper inline — calling triples.add_facts but never _store_facts_in_table. Result: wrapper-extracted facts were visible through recall() (which scores fact triples) but invisible through fact_recall() (which queries the facts table). Same anti-pattern as C26 / C23 / C20: tests instantiated BeamMemory directly and never exercised the wrapper extract path. Zero coverage for Mnemosyne.remember(extract=True) on either the triples or facts side. CHANGES 1. Mnemosyne.remember (mnemosyne/core/memory.py) Pass extract/extract_entities through to the FIRST self.beam.remember() call. Drop the wrapper's two inline extract blocks (~25 lines) so all extraction routes through BeamMemory's canonical helpers (_extract_and_store_entities, _extract_and_store_facts) plus NAI-5's _ingest_graph_and_veracity. The wrapper is now a thin shim around Beam for extraction concerns; its only remaining job is the legacy `memories` dual-write. Why first call, not second: extraction must run once. Passing flags to the first call means the dedup-only second call doesn't re-run _extract_and_store_facts and double-write the facts table. 2. fact_recall row access (mnemosyne/core/beam.py:2160-2174) sqlite3.Row supports bracket access but NOT .get(); the existing code called row.get("confidence", 0.5) etc. and crashed AttributeError the moment the facts table contained rows. The bug was latent because the wrapper never populated the facts table (the C12.a bug above masked it). Once C12.a is fixed and rows exist, fact_recall would crash on the first iteration. Fix: convert raw_row to dict at the top of the loop so the .get(...) calls work as the author intended. Also tighten the None handling for confidence (was using `or 0.5` which would misroute confidence=0.0 as 0.5). NEW REGRESSION TESTS tests/test_extract_parity.py — 5 tests: * test_wrapper_extract_writes_facts_table — RED before fix * test_wrapper_extract_still_writes_triples_table — regression guard * test_wrapper_extracted_fact_is_visible_via_fact_recall — end-to-end contract; required BOTH the wrapper fix AND the fact_recall fix to pass * test_wrapper_and_direct_paths_produce_same_table_state — locks in the parity contract v2 plan §C12.a calls out * test_wrapper_extract_entities_writes_mention_triples — adjacent parity check for the entity-extraction path which Option A also delegates to BeamMemory DEFERRED - C12.b (provider REMEMBER_SCHEMA missing extract / bank / metadata / author_* / channel_*) — separate ledger item. - C14 (public-recall surfaces fact-derived hits with provenance) — separate ledger item; partially relieved by NAI-5's fact voice in recall scoring. VERIFICATION tests/test_extract_parity.py: 5 passed (new file, 3 RED before fix, 1 RED before fact_recall fix) broader sweep (extraction + entities + beam): 119 passed full suite excluding LLM-dependent tests: 469 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nt beam call, normalize row access CROSS-MODEL ADVERSARIAL REVIEW Both Claude and Codex flagged the same primary regression: my initial fix moved extraction inside BeamMemory.remember, which has an early-return on duplicate content (beam.py:942-963). The early-return runs UPDATE + _ingest_graph_and_veracity and returns BEFORE the extract blocks at lines 985-988. So `mem.remember(same_content, extract=True)` after a duplicate silently skipped extraction entirely — breaking the backfill scenario this PR was supposed to enable. Both reviewers also flagged a related issue: the wrapper made TWO beam.remember calls per Mnemosyne.remember, both of which run _ingest_graph_and_veracity. The second call duplicated graph_edges (PK is autoincrement, not content-derived) and bumped mention_count for what was a single user-level write. CHANGES 1. Dedup branch runs extraction (CRITICAL — addresses Codex finding AxDSan#1) mnemosyne/core/beam.py:961-970 — when _find_duplicate fires, also call _extract_and_store_entities and _extract_and_store_facts before returning. Extract=True now means "extraction always runs" regardless of insert vs dedup, matching the wrapper's pre-fix inline behavior. 2. Drop redundant second beam.remember call (addresses Claude finding AxDSan#1 and Codex finding AxDSan#3) mnemosyne/core/memory.py — the wrapper used to call beam.remember twice per remember (once for ID generation, once for "ID-sync dedup"). Now that the first call accepts extract flags and produces the working memory row with the right ID, the second call only runs the dedup branch redundantly + double-fires _ingest_graph_and_veracity. Removed entirely. Fewer surprises, fewer duplicate graph edges, fewer mention_count inflations. 3. Normalize fact_recall row access (addresses Claude finding AxDSan#4) mnemosyne/core/beam.py:2160-2178 — replaced inconsistent `or ""` collapse with the same `is not None` pattern used for confidence. Also extracted intermediate locals so each row.get is computed once. NEW TEST tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfill Locks in the dedup-extraction contract by exactly the regression scenario Codex found: pre-existing working_memory row (extract=False), then mem.remember(same_content, extract=True) — asserts dedup fires AND facts table populates. Without the fix this would fail silently. DEFERRED (per /review notes) - C12.a.cognee filed in ledger: cognee.py importer has the same latent sqlite3.Row.get() bug the C12.a fix surfaced for fact_recall, currently masked by `except Exception: pass`. Separate small follow-up PR. - NaN confidence handling — edge case, low value. - Double-call execution count nit (perf) — minor. VERIFICATION tests/test_extract_parity.py: 6 passed (was 5, +1 backfill test) full suite excluding LLM-dependent tests: 470 passed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

THE BUG mnemosyne/core/importers/cognee.py:106-109 set `conn.row_factory = sqlite3.Row` and then called `row.get("id", "")` etc. on each row. sqlite3.Row supports bracket access but NOT `.get()`, so each iteration raised AttributeError. The surrounding `except Exception: pass` swallowed it silently, so direct cognee imports always returned `[]` regardless of how much data was in the data_chunks table. Same pattern as the latent fact_recall row.get bug fixed in PR #58 (C12.a). Both Claude and Codex adversarial reviewers found this during /review for #58 and recommended a separate small PR. THE FIX Convert raw_row to dict at the top of the per-row loop so the existing .get() calls work as the author intended. Also normalized the metadata fallbacks to use the same `or ""` pattern (consistent with how content falls back text -> content -> ""). REGRESSION TESTS tests/test_importers/test_cognee.py — 3 tests: * test_extract_direct_returns_rows_from_data_chunks (RED before fix) * test_extract_direct_handles_null_text_falls_back_to_content (RED before fix) * test_extract_direct_returns_empty_when_db_missing (defensive contract, GREEN both before and after — locks in the existing missing-DB behavior) VERIFICATION tests/test_importers/test_cognee.py: 3 passed full suite excluding LLM-dependent tests: 467 passed LEDGER .hermes/ledger/memory-contract.md C12.a.cognee row will move to PR_OPEN when this lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kohai-ut and others added 2 commits May 9, 2026 16:32

kohai-ut mentioned this pull request May 9, 2026

fix(C12.a.cognee): convert sqlite3.Row to dict in cognee direct importer #59

Merged

3 tasks

AxDSan merged commit a158ab2 into AxDSan:main May 11, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction#58

fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction#58
AxDSan merged 2 commits into
AxDSan:mainfrom
kohai-ut:fix/c12a-extract-facts-table

kohai-ut commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kohai-ut commented May 9, 2026

Summary

The bug

Why it slipped through

The fix (commit 1: ab4a81b)

Pre-landing review (commit 2: 85db0dd)

New backfill regression test

Behavior change worth flagging

Deferred (filed separately)

Test plan

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

The fix (commit 1: `ab4a81b`)

Pre-landing review (commit 2: `85db0dd`)