Skip to content

fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction#58

Merged
AxDSan merged 2 commits into
AxDSan:mainfrom
kohai-ut:fix/c12a-extract-facts-table
May 11, 2026
Merged

fix(C12.a): wrapper-extracted facts now populate the facts table + dedup runs extraction#58
AxDSan merged 2 commits into
AxDSan:mainfrom
kohai-ut:fix/c12a-extract-facts-table

Conversation

@kohai-ut
Copy link
Copy Markdown
Contributor

@kohai-ut kohai-ut commented May 9, 2026

Summary

  • Mnemosyne.remember(extract=True) now writes to the facts table the same way BeamMemory.remember(extract=True) already did. Previously the wrapper reimplemented half of beam's canonical _extract_and_store_facts helper inline, calling triples.add_facts(...) but never _store_facts_in_table(...). Wrapper-extracted facts were visible through recall() (which scores fact triples) but invisible through fact_recall() (which queries the facts table).
  • Three bundled fixes:
    1. Wrapper delegates extraction to BeamMemory.remember (drops ~25 lines of duplicated logic).
    2. BeamMemory.remember's dedup early-return now also runs extraction so backfill calls work (mem.remember(same_content, extract=True) on an already-existing row populates the facts table).
    3. fact_recall was crashing on sqlite3.Row.get(...) once rows existed — latent bug, masked while the wrapper kept the table empty. Converted to dict at the top of the per-row loop. Also normalized the None-handling for confidence/subject/predicate.
  • 6 regression tests in tests/test_extract_parity.py: 3 RED before the wrapper fix, 1 RED before the fact_recall fix, 1 RED before the dedup-extract fix.

The bug

Mnemosyne stores knowledge graph facts in two places:

Storage Used by
triples table recall() — scores fact triples as part of NAI-5's "fact voice"
facts table fact_recall() — public API for structured fact retrieval

The canonical helper _extract_and_store_facts in mnemosyne/core/beam.py:597 writes to BOTH:

def _extract_and_store_facts(beam, memory_id, content, source=""):
    facts = extract_facts_safe(content)
    if not facts:
        return
    triples = TripleStore(db_path=beam.db_path)
    triples.add_facts(memory_id, facts, source=source, confidence=0.7)
    _store_facts_in_table(beam, memory_id, content, source, facts)

Mnemosyne.remember(extract=True) reimplemented half of this inline at memory.py:299-309:

if extract:
    try:
        from mnemosyne.core.extraction import extract_facts_safe
        from mnemosyne.core.triples import TripleStore
        facts = extract_facts_safe(content)
        if facts:
            triples = TripleStore(db_path=self.db_path)
            triples.add_facts(memory_id, facts, source=source, confidence=0.7)
                                 # ← never called _store_facts_in_table
    except Exception:
        pass

Result: a memory remembered through the wrapper showed up in recall() (because triples are populated and NAI-5 scores them) but fact_recall() returned nothing for the same memory.

Why it slipped through

Same anti-pattern as C26 / C23 / C20: tests called BeamMemory.remember(extract=True) directly with db_path=tmp and never went through Mnemosyne.remember(extract=True). The wrapper's parallel extract path had zero coverage. tests/test_extraction_integration.py exists but it manually injects facts via TripleStore.add_facts(...) rather than exercising extract=True end-to-end through the wrapper.

The fix (commit 1: ab4a81b)

Mnemosyne.remember (memory.py) — pass extract=extract, extract_entities=extract_entities to the first self.beam.remember(...) call. Drop the wrapper's two inline extract blocks (~25 lines). All extraction now routes through BeamMemory's canonical helpers (_extract_and_store_entities, _extract_and_store_facts) plus NAI-5's _ingest_graph_and_veracity. The wrapper becomes a thin shim — its only remaining job is the legacy memories dual-write.

fact_recall row access (beam.py:2160-2174)sqlite3.Row supports bracket access but NOT .get(). The existing code called row.get("confidence", 0.5) etc. and crashed AttributeError the moment the facts table contained rows. The bug was unreachable while the wrapper bug above kept the table empty. Once C12.a was fixed and rows existed, fact_recall would crash on the first iteration. Convert to dict at the top of the loop so .get(...) works as the author intended.

Pre-landing review (commit 2: 85db0dd)

Cross-model adversarial review (Claude subagent + Codex CLI). Both flagged the same primary regression introduced by commit 1:

BeamMemory.remember has an early-return on duplicate content (beam.py:942-963). The early-return runs UPDATE + _ingest_graph_and_veracity and returns BEFORE the extract blocks at lines 985-988. So mem.remember(same_content, extract=True) after a duplicate silently skipped extraction entirely — breaking backfill on pre-existing databases.

Both also flagged a related issue: the wrapper made TWO beam.remember calls per Mnemosyne.remember, both of which run _ingest_graph_and_veracity. The second call duplicated graph_edges (autoincrement PK) and bumped mention_count for what was a single user-level write.

Fixes applied in commit 2:

  1. Dedup branch runs extraction (CRITICAL — addresses Codex's primary finding). mnemosyne/core/beam.py — when _find_duplicate fires, the dedup branch now also calls _extract_and_store_entities and _extract_and_store_facts before returning. extract=True now means "extraction always runs" regardless of insert vs dedup, matching the wrapper's pre-fix inline behavior.

  2. Drop redundant second beam.remember call. The wrapper used to call beam.remember twice per remember (once for ID generation, once for "ID-sync dedup"). Now that the first call accepts extract flags and produces the working_memory row with the right ID, the second call only ran the dedup branch redundantly + double-fired _ingest_graph_and_veracity. Removed entirely.

  3. Normalize fact_recall row access. Replaced inconsistent or "" collapse with the same is not None pattern used for confidence. Extracted intermediate locals so each row.get is computed once.

New backfill regression test

tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfill locks in the dedup-extraction contract by exactly the regression scenario the adversarial review found: pre-existing working_memory row (from BeamMemory.remember(extract=False)), then mem.remember(same_content, extract=True) — asserts dedup fires AND facts table populates. Without commit 2's fix this would fail silently; with the fix it's GREEN.

Behavior change worth flagging

Pre-fix: Mnemosyne.remember(extract=True) with already-existing working_memory content → wrapper's inline extract block ran AFTER both beam.remember calls regardless of dedup. Triples populated; facts table never.

Post-fix: Mnemosyne.remember(extract=True) with already-existing content → first beam.remember hits dedup, runs extraction (including facts table). Single user-level call → single set of extraction effects.

This is the intended behavior for backfill scenarios. Anyone who depended on "extract=True is a no-op on duplicate content" (no callers found) would see facts/triples now actually appear.

Deferred (filed separately)

  • C12.a.cognee (added to .hermes/ledger/memory-contract.md locally): mnemosyne/core/importers/cognee.py:106-109 has the same latent sqlite3.Row.get() pattern as fact_recall. Currently masked by except Exception: pass. Both adversarial reviewers flagged it independently. Small follow-up PR.
  • NaN confidence handling — edge case noted by Claude, not exercised by current writers.
  • Per-row dict.get perf nit — bounded by top_k, not a real concern.

Test plan

  • uv run pytest tests/test_extract_parity.py -q (6 passed locally)
  • uv run pytest -q --ignore=tests/test_local_llm.py --ignore=tests/test_llm_backends.py (470 passed locally)
  • Manual: mem.remember("Alice was born in Boston.", extract=True) then mem.beam.fact_recall("alice") returns the fact (was empty pre-fix).
  • Manual backfill: write a memory with extract=False, then re-remember the same content with extract=True. Verify the facts table populates after the second call.
  • Manual fact_recall under load: with several extract=True memories, fact_recall returns sensibly without AttributeError.

Verification

```
tests/test_extract_parity.py: 6 passed (new file)
broader sweep (extraction + entities + beam): 119 passed
full suite excluding LLM-dependent tests: 470 passed
```

kohai-ut and others added 2 commits May 9, 2026 16:32
…x latent fact_recall row.get crash

THE BUG (C12.a)
Mnemosyne.remember(extract=True) reimplemented half of beam's
_extract_and_store_facts helper inline — calling triples.add_facts
but never _store_facts_in_table. Result: wrapper-extracted facts
were visible through recall() (which scores fact triples) but
invisible through fact_recall() (which queries the facts table).

Same anti-pattern as C26 / C23 / C20: tests instantiated BeamMemory
directly and never exercised the wrapper extract path. Zero coverage
for Mnemosyne.remember(extract=True) on either the triples or facts
side.

CHANGES

1. Mnemosyne.remember (mnemosyne/core/memory.py)
   Pass extract/extract_entities through to the FIRST self.beam.remember()
   call. Drop the wrapper's two inline extract blocks (~25 lines) so all
   extraction routes through BeamMemory's canonical helpers
   (_extract_and_store_entities, _extract_and_store_facts) plus NAI-5's
   _ingest_graph_and_veracity. The wrapper is now a thin shim around
   Beam for extraction concerns; its only remaining job is the legacy
   `memories` dual-write.

   Why first call, not second: extraction must run once. Passing flags
   to the first call means the dedup-only second call doesn't re-run
   _extract_and_store_facts and double-write the facts table.

2. fact_recall row access (mnemosyne/core/beam.py:2160-2174)
   sqlite3.Row supports bracket access but NOT .get(); the existing
   code called row.get("confidence", 0.5) etc. and crashed
   AttributeError the moment the facts table contained rows. The bug
   was latent because the wrapper never populated the facts table
   (the C12.a bug above masked it). Once C12.a is fixed and rows
   exist, fact_recall would crash on the first iteration.

   Fix: convert raw_row to dict at the top of the loop so the
   .get(...) calls work as the author intended. Also tighten the
   None handling for confidence (was using `or 0.5` which would
   misroute confidence=0.0 as 0.5).

NEW REGRESSION TESTS

tests/test_extract_parity.py — 5 tests:
  * test_wrapper_extract_writes_facts_table — RED before fix
  * test_wrapper_extract_still_writes_triples_table — regression guard
  * test_wrapper_extracted_fact_is_visible_via_fact_recall — end-to-end
    contract; required BOTH the wrapper fix AND the fact_recall fix
    to pass
  * test_wrapper_and_direct_paths_produce_same_table_state — locks in
    the parity contract v2 plan §C12.a calls out
  * test_wrapper_extract_entities_writes_mention_triples — adjacent
    parity check for the entity-extraction path which Option A also
    delegates to BeamMemory

DEFERRED
- C12.b (provider REMEMBER_SCHEMA missing extract / bank / metadata /
  author_* / channel_*) — separate ledger item.
- C14 (public-recall surfaces fact-derived hits with provenance) —
  separate ledger item; partially relieved by NAI-5's fact voice
  in recall scoring.

VERIFICATION
tests/test_extract_parity.py: 5 passed (new file, 3 RED before fix,
1 RED before fact_recall fix)
broader sweep (extraction + entities + beam): 119 passed
full suite excluding LLM-dependent tests: 469 passed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt beam call, normalize row access

CROSS-MODEL ADVERSARIAL REVIEW
Both Claude and Codex flagged the same primary regression: my initial
fix moved extraction inside BeamMemory.remember, which has an early-return
on duplicate content (beam.py:942-963). The early-return runs UPDATE +
_ingest_graph_and_veracity and returns BEFORE the extract blocks at lines
985-988. So `mem.remember(same_content, extract=True)` after a duplicate
silently skipped extraction entirely — breaking the backfill scenario
this PR was supposed to enable.

Both reviewers also flagged a related issue: the wrapper made TWO
beam.remember calls per Mnemosyne.remember, both of which run
_ingest_graph_and_veracity. The second call duplicated graph_edges (PK
is autoincrement, not content-derived) and bumped mention_count for
what was a single user-level write.

CHANGES

1. Dedup branch runs extraction (CRITICAL — addresses Codex finding AxDSan#1)
   mnemosyne/core/beam.py:961-970 — when _find_duplicate fires, also
   call _extract_and_store_entities and _extract_and_store_facts before
   returning. Extract=True now means "extraction always runs" regardless
   of insert vs dedup, matching the wrapper's pre-fix inline behavior.

2. Drop redundant second beam.remember call (addresses Claude finding AxDSan#1
   and Codex finding AxDSan#3)
   mnemosyne/core/memory.py — the wrapper used to call beam.remember
   twice per remember (once for ID generation, once for "ID-sync dedup").
   Now that the first call accepts extract flags and produces the working
   memory row with the right ID, the second call only runs the dedup
   branch redundantly + double-fires _ingest_graph_and_veracity. Removed
   entirely. Fewer surprises, fewer duplicate graph edges, fewer
   mention_count inflations.

3. Normalize fact_recall row access (addresses Claude finding AxDSan#4)
   mnemosyne/core/beam.py:2160-2178 — replaced inconsistent `or ""`
   collapse with the same `is not None` pattern used for confidence.
   Also extracted intermediate locals so each row.get is computed once.

NEW TEST

tests/test_extract_parity.py::test_extract_runs_on_dedup_for_backfill
   Locks in the dedup-extraction contract by exactly the regression
   scenario Codex found: pre-existing working_memory row (extract=False),
   then mem.remember(same_content, extract=True) — asserts dedup fires
   AND facts table populates. Without the fix this would fail silently.

DEFERRED (per /review notes)
- C12.a.cognee filed in ledger: cognee.py importer has the same latent
  sqlite3.Row.get() bug the C12.a fix surfaced for fact_recall, currently
  masked by `except Exception: pass`. Separate small follow-up PR.
- NaN confidence handling — edge case, low value.
- Double-call execution count nit (perf) — minor.

VERIFICATION
tests/test_extract_parity.py: 6 passed (was 5, +1 backfill test)
full suite excluding LLM-dependent tests: 470 passed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AxDSan pushed a commit that referenced this pull request May 11, 2026
THE BUG
mnemosyne/core/importers/cognee.py:106-109 set
`conn.row_factory = sqlite3.Row` and then called `row.get("id", "")`
etc. on each row. sqlite3.Row supports bracket access but NOT `.get()`,
so each iteration raised AttributeError. The surrounding
`except Exception: pass` swallowed it silently, so direct cognee
imports always returned `[]` regardless of how much data was in the
data_chunks table.

Same pattern as the latent fact_recall row.get bug fixed in PR #58
(C12.a). Both Claude and Codex adversarial reviewers found this
during /review for #58 and recommended a separate small PR.

THE FIX
Convert raw_row to dict at the top of the per-row loop so the existing
.get() calls work as the author intended. Also normalized the metadata
fallbacks to use the same `or ""` pattern (consistent with how content
falls back text -> content -> "").

REGRESSION TESTS
tests/test_importers/test_cognee.py — 3 tests:
  * test_extract_direct_returns_rows_from_data_chunks (RED before fix)
  * test_extract_direct_handles_null_text_falls_back_to_content (RED before fix)
  * test_extract_direct_returns_empty_when_db_missing (defensive contract,
    GREEN both before and after — locks in the existing missing-DB behavior)

VERIFICATION
tests/test_importers/test_cognee.py: 3 passed
full suite excluding LLM-dependent tests: 467 passed

LEDGER
.hermes/ledger/memory-contract.md C12.a.cognee row will move to PR_OPEN
when this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AxDSan AxDSan merged commit a158ab2 into AxDSan:main May 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants