fix: cascade-clean orphaned vec_episodes rows on import_from_dict force-overwrite#87
Merged
Merged
Conversation
…orce-overwrite `vec_episodes` is a sqlite-vec virtual table keyed by `episodic_memory.rowid` (AUTOINCREMENT INTEGER PK). When `BeamMemory.import_from_dict(force=True)` DELETEs an existing episodic_memory row to replace it, the new row gets a fresh rowid via `cursor.lastrowid` — leaving the old row's vector embedding stranded in vec_episodes pointing at a rowid that AUTOINCREMENT will never re-issue. Operators with high import churn (backup/ restore cycles, multi-source imports, periodic sync) accumulate dead vec_episodes entries indefinitely. Bug originally surfaced by Codex adversarial /review on PR AxDSan#84 as a deferred follow-up — not E2.a.5's bug, but a sibling storage-hygiene concern in the same code area. Fix: - In `import_from_dict`'s episodic_memory loop, when an existing row is found and `force=True`, DELETE FROM vec_episodes WHERE rowid = existing["rowid"] BEFORE deleting the episodic_memory row. - Guarded by `_vec_available(self.conn)` — sqlite-vec is optional. - Hoisted `vec_ok = _vec_available(self.conn)` from before the embeddings-import loop to before the episodic_memory loop so both code paths share one check. - Broad `except sqlite3.Error` (not just OperationalError) catches all sqlite3 exception subclasses with logging — `working_memory` was already committed, so propagating any cleanup error would abort the import mid-loop leaving partial state. Best-effort cleanup: log + continue. Data integrity > orphan cleanup. /review army (Codex structured GATE FAIL + Claude adversarial CRITICAL/HIGH) caught two issues in commit 1's initial test: CRITICAL (2-source: Codex P2 + Claude C1) — the "best-effort failure" test dropped vec_episodes before calling import, but the production `vec_ok = _vec_available()` check then returned False and the cascade was SKIPPED entirely. The try/except path was never exercised — the test would have passed even with the try/except removed. Fixed by monkey-patching `_vec_available` to return True even after the table is dropped, so the cascade runs, the DELETE fails, and the try/except is actually exercised. HIGH (Claude H1) — every other test used `"episodic_embeddings": []`. The real-world payload carries embeddings, and the interesting case is: (a) cascade cleans old vec entry, (b) INSERT assigns new rowid, (c) embeddings section maps old-payload- rowid → new-rowid → reinserts. Added `test_import_from_dict_force_with_new_embeddings_no_orphan` to verify the full round-trip produces exactly 1 episodic_memory row + 1 vec_episodes row with vec_episodes.rowid == episodic_memory.rowid. HIGH (Claude H2) — `except sqlite3.OperationalError` was too narrow. Other sqlite3.Error subclasses (DatabaseError, NotSupportedError, IntegrityError from a corrupted vec0 shadow, etc.) would propagate and abort the import mid-loop with working_memory already committed — partial state. Broadened to `except sqlite3.Error` with a WARNING log so operators can see when the cleanup failed. Out of scope (documented, separate tickets): - `Mnemosyne.forget()` doesn't cascade to episodic_memory at all (C17 in ledger) — pre-existing different concern - `memory_embeddings` fallback table gets stale content on force- overwrite — different orphan shape, separate concern - `binary_vector` column NULL after force-INSERT — pre-existing - Concurrent `import_from_dict` race — `import_from_dict` is not advertised as concurrent-safe; out of scope 5 new tests in `tests/test_orphan_vec_episodes_cleanup.py`: - test_import_from_dict_force_cleans_vec_episodes_orphan - test_import_from_dict_no_force_does_not_touch_vec_episodes - test_import_from_dict_force_idempotent_no_orphan_accumulation - test_import_from_dict_cleanup_failure_is_best_effort - test_import_from_dict_force_with_new_embeddings_no_orphan 29 tests pass (5 new + 24 existing import/integration/E4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
vec_episodesis a sqlite-vec virtual table keyed byepisodic_memory.rowid(AUTOINCREMENT). WhenBeamMemory.import_from_dict(force=True)DELETEs an existing episodic_memory row to replace it, the new row gets a fresh rowid viacursor.lastrowid— leaving the old row's vector embedding stranded in vec_episodes pointing at a rowid that AUTOINCREMENT will never re-issue. Operators with high import churn accumulate dead vec_episodes entries indefinitely.This PR adds a cascade
DELETE FROM vec_episodes WHERE rowid = existing["rowid"]BEFORE the episodic_memory DELETE, plus broadens the cleanup exception catch tosqlite3.Errorso a cleanup failure doesn't abort the import mid-loop.5 regression tests, all passing. Originally flagged by Codex adversarial on PR #84 as a deferred follow-up.
Why this matters
Storage hygiene: long-running deployments that do regular import/export cycles (backups, multi-source sync) accumulate orphaned vec_episodes rows that never get cleaned. Each entry is ~768 bytes (float32) or ~48 bytes (binary quantized) — small per row but unbounded growth at scale.
This is the only DELETE FROM episodic_memory site in production code (verified via grep).
Mnemosyne.forget()doesn't currently cascade to episodic_memory at all — that's the separate C17 concern in the ledger.What this PR does
Single commit:
import_from_dictcascade-cleansvec_episodesbefore the episodic_memory DELETE on the force-overwrite path_vec_available(self.conn)— sqlite-vec is optionalvec_ok = _vec_available(self.conn)from before the embeddings-import loop to before the episodic_memory loop — both code paths now share one checkexcept sqlite3.Error(not narrowOperationalError) catches all sqlite3 exception subclasses with WARNING log —working_memorywas already committed at line 3978, so propagating a cleanup error would abort the import mid-loop with partial state/review army findings (all addressed)
vec_episodesbefore calling import, causingvec_ok=Falseshort-circuit instead of exercising try/except_vec_availableto return True after dropping the table, so the cascade runs and the DELETE fails, exercising the try/exceptepisodic_embeddingspayload — every other test used[], leaving theold_to_new_rowidmapping path uncovered for the force-overwrite casetest_import_from_dict_force_with_new_embeddings_no_orphanexcept sqlite3.OperationalErrortoo narrow — other sqlite3 exception subclasses propagate and abort import mid-loopexcept sqlite3.Error as cleanup_excwith WARNING logWhat is NOT in this PR (intentional, documented)
Mnemosyne.forget()doesn't cascade to episodic_memory at all (C17 in ledger) — pre-existing different concernmemory_embeddingsfallback table gets stale content on force-overwrite — different orphan shape, separate concernbinary_vectorcolumn NULL after force-INSERT — pre-existing, not affected by this PRimport_from_dictrace —import_from_dictis not advertised as concurrent-safe; out of scopeTest plan
tests/test_orphan_vec_episodes_cleanup.pypasstests/test_integration.pypasstest_beam.py -k "import or export"run: green🤖 Generated with Claude Code