Skip to content

fix: cascade-clean orphaned vec_episodes rows on import_from_dict force-overwrite#87

Merged
AxDSan merged 1 commit into
AxDSan:mainfrom
kohai-ut:fix/orphan-vec-episodes-cleanup
May 12, 2026
Merged

fix: cascade-clean orphaned vec_episodes rows on import_from_dict force-overwrite#87
AxDSan merged 1 commit into
AxDSan:mainfrom
kohai-ut:fix/orphan-vec-episodes-cleanup

Conversation

@kohai-ut
Copy link
Copy Markdown
Contributor

TL;DR

vec_episodes is a sqlite-vec virtual table keyed by episodic_memory.rowid (AUTOINCREMENT). When BeamMemory.import_from_dict(force=True) DELETEs an existing episodic_memory row to replace it, the new row gets a fresh rowid via cursor.lastrowid — leaving the old row's vector embedding stranded in vec_episodes pointing at a rowid that AUTOINCREMENT will never re-issue. Operators with high import churn accumulate dead vec_episodes entries indefinitely.

This PR adds a cascade DELETE FROM vec_episodes WHERE rowid = existing["rowid"] BEFORE the episodic_memory DELETE, plus broadens the cleanup exception catch to sqlite3.Error so a cleanup failure doesn't abort the import mid-loop.

5 regression tests, all passing. Originally flagged by Codex adversarial on PR #84 as a deferred follow-up.

Why this matters

Storage hygiene: long-running deployments that do regular import/export cycles (backups, multi-source sync) accumulate orphaned vec_episodes rows that never get cleaned. Each entry is ~768 bytes (float32) or ~48 bytes (binary quantized) — small per row but unbounded growth at scale.

This is the only DELETE FROM episodic_memory site in production code (verified via grep). Mnemosyne.forget() doesn't currently cascade to episodic_memory at all — that's the separate C17 concern in the ledger.

What this PR does

Single commit:

  • import_from_dict cascade-cleans vec_episodes before the episodic_memory DELETE on the force-overwrite path
  • Guarded by _vec_available(self.conn) — sqlite-vec is optional
  • Hoisted vec_ok = _vec_available(self.conn) from before the embeddings-import loop to before the episodic_memory loop — both code paths now share one check
  • Broad except sqlite3.Error (not narrow OperationalError) catches all sqlite3 exception subclasses with WARNING log — working_memory was already committed at line 3978, so propagating a cleanup error would abort the import mid-loop with partial state

/review army findings (all addressed)

Finding Sources Fix
Initial "best-effort failure" test dropped vec_episodes before calling import, causing vec_ok=False short-circuit instead of exercising try/except 2-source (Codex structured P2 GATE FAIL + Claude C1 CRITICAL) Test now monkey-patches _vec_available to return True after dropping the table, so the cascade runs and the DELETE fails, exercising the try/except
Missing end-to-end test with non-empty episodic_embeddings payload — every other test used [], leaving the old_to_new_rowid mapping path uncovered for the force-overwrite case Claude H1 Added test_import_from_dict_force_with_new_embeddings_no_orphan
except sqlite3.OperationalError too narrow — other sqlite3 exception subclasses propagate and abort import mid-loop Claude H2 Broadened to except sqlite3.Error as cleanup_exc with WARNING log

What is NOT in this PR (intentional, documented)

  • Mnemosyne.forget() doesn't cascade to episodic_memory at all (C17 in ledger) — pre-existing different concern
  • memory_embeddings fallback table gets stale content on force-overwrite — different orphan shape, separate concern
  • binary_vector column NULL after force-INSERT — pre-existing, not affected by this PR
  • Concurrent import_from_dict raceimport_from_dict is not advertised as concurrent-safe; out of scope

Test plan

  • All 5 tests in tests/test_orphan_vec_episodes_cleanup.py pass
  • All 20 tests in tests/test_integration.py pass
  • Targeted test_beam.py -k "import or export" run: green
  • Codex structured /review: GATE PASS after commit 1's fix
  • Claude adversarial /review: CRITICAL + 2 HIGH (all addressed)
  • CI: full suite green on Python 3.9 / 3.10 / 3.11 / 3.12

🤖 Generated with Claude Code

…orce-overwrite

`vec_episodes` is a sqlite-vec virtual table keyed by
`episodic_memory.rowid` (AUTOINCREMENT INTEGER PK). When
`BeamMemory.import_from_dict(force=True)` DELETEs an existing
episodic_memory row to replace it, the new row gets a fresh rowid
via `cursor.lastrowid` — leaving the old row's vector embedding
stranded in vec_episodes pointing at a rowid that AUTOINCREMENT
will never re-issue. Operators with high import churn (backup/
restore cycles, multi-source imports, periodic sync) accumulate
dead vec_episodes entries indefinitely.

Bug originally surfaced by Codex adversarial /review on PR AxDSan#84
as a deferred follow-up — not E2.a.5's bug, but a sibling
storage-hygiene concern in the same code area.

Fix:
- In `import_from_dict`'s episodic_memory loop, when an existing
  row is found and `force=True`, DELETE FROM vec_episodes
  WHERE rowid = existing["rowid"] BEFORE deleting the
  episodic_memory row.
- Guarded by `_vec_available(self.conn)` — sqlite-vec is optional.
- Hoisted `vec_ok = _vec_available(self.conn)` from before the
  embeddings-import loop to before the episodic_memory loop so
  both code paths share one check.
- Broad `except sqlite3.Error` (not just OperationalError) catches
  all sqlite3 exception subclasses with logging — `working_memory`
  was already committed, so propagating any cleanup error would
  abort the import mid-loop leaving partial state. Best-effort
  cleanup: log + continue. Data integrity > orphan cleanup.

/review army (Codex structured GATE FAIL + Claude adversarial
CRITICAL/HIGH) caught two issues in commit 1's initial test:

CRITICAL (2-source: Codex P2 + Claude C1) — the "best-effort
failure" test dropped vec_episodes before calling import, but the
production `vec_ok = _vec_available()` check then returned False
and the cascade was SKIPPED entirely. The try/except path was
never exercised — the test would have passed even with the
try/except removed. Fixed by monkey-patching `_vec_available` to
return True even after the table is dropped, so the cascade runs,
the DELETE fails, and the try/except is actually exercised.

HIGH (Claude H1) — every other test used `"episodic_embeddings":
[]`. The real-world payload carries embeddings, and the
interesting case is: (a) cascade cleans old vec entry, (b)
INSERT assigns new rowid, (c) embeddings section maps old-payload-
rowid → new-rowid → reinserts. Added
`test_import_from_dict_force_with_new_embeddings_no_orphan` to
verify the full round-trip produces exactly 1 episodic_memory
row + 1 vec_episodes row with vec_episodes.rowid ==
episodic_memory.rowid.

HIGH (Claude H2) — `except sqlite3.OperationalError` was too
narrow. Other sqlite3.Error subclasses (DatabaseError,
NotSupportedError, IntegrityError from a corrupted vec0 shadow,
etc.) would propagate and abort the import mid-loop with
working_memory already committed — partial state. Broadened to
`except sqlite3.Error` with a WARNING log so operators can see
when the cleanup failed.

Out of scope (documented, separate tickets):
- `Mnemosyne.forget()` doesn't cascade to episodic_memory at all
  (C17 in ledger) — pre-existing different concern
- `memory_embeddings` fallback table gets stale content on force-
  overwrite — different orphan shape, separate concern
- `binary_vector` column NULL after force-INSERT — pre-existing
- Concurrent `import_from_dict` race — `import_from_dict` is not
  advertised as concurrent-safe; out of scope

5 new tests in `tests/test_orphan_vec_episodes_cleanup.py`:
- test_import_from_dict_force_cleans_vec_episodes_orphan
- test_import_from_dict_no_force_does_not_touch_vec_episodes
- test_import_from_dict_force_idempotent_no_orphan_accumulation
- test_import_from_dict_cleanup_failure_is_best_effort
- test_import_from_dict_force_with_new_embeddings_no_orphan

29 tests pass (5 new + 24 existing import/integration/E4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AxDSan AxDSan merged commit 373e056 into AxDSan:main May 12, 2026
5 checks passed
@kohai-ut kohai-ut deleted the fix/orphan-vec-episodes-cleanup branch May 12, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants