fix(core): purge SQLite search_index on project delete#832
Conversation
SQLite stores search_index as an FTS5 virtual table, which can't carry a foreign key, so the ON DELETE CASCADE from search_index.project_id to project.id only applies on Postgres. On SQLite, deleting a project left its FTS rows behind — and when auto-increment handed the same id to a new project, the leftover rows masqueraded as the new tenant's data and leaked into searches scoped to that project. - ProjectRepository.delete now explicitly purges search_index and search_vector_chunks for the project id in the same session before the ORM delete. Idempotent on Postgres (the cascade FK still runs). - One-time cleanup migration sweeps two leftover shapes: rows whose project_id is gone, and rows whose entity_id is gone (the larger class from id reuse). Guarded by table-existence checks so fresh SQLite installs — where search_index is created at runtime by init_search_index, not by Alembic — don't fail the upgrade. - Regression test seeds both derived tables, calls remove_project, and asserts both come out clean. Verified red on pre-fix code. Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 3m 25s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummarySolid, well-scoped fix. The root cause diagnosis is accurate (SQLite FTS5 can't carry FKs → orphan rows survive project deletion → auto-increment id reuse leaks prior tenant's content into new project). The fix is minimal, idempotent on Postgres, and documented in the literate-programming style required by CLAUDE.md. A few minor items worth addressing before merge. Code Quality & Standards
The base # Base class has:
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")
# Override has: (nothing)Testing & Documentation
# project_repository.py:141-143
except NoResultFound:
return False # ← is this covered?Basic Memory Architecture
Execution order is correct — Migration Security & Performance
Compatibility
Minor Observation:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a85767ad92
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| {"project_id": entity_id}, | ||
| ) | ||
| await session.execute( | ||
| text("DELETE FROM search_vector_chunks WHERE project_id = :project_id"), |
There was a problem hiding this comment.
Guard vector-chunk delete when semantic tables are absent
ProjectRepository.delete now unconditionally executes DELETE FROM search_vector_chunks, but on SQLite that table is only created when semantic search is enabled (SQLiteSearchRepository.init_search_index calls _ensure_vector_tables() only under that flag). For installs where semantic dependencies are missing or disabled, deleting any project will raise no such table: search_vector_chunks and abort the removal flow. Add a table-existence/dialect guard (similar to the migration) before issuing this delete.
Useful? React with 👍 / 👎.
search_index is created lazily by SearchRepository.init_search_index and search_vector_chunks only materializes once semantic search initializes, so either table may be absent on minimal test DBs. The previous version of ProjectRepository.delete unconditionally issued DELETEs against both and crashed CLI integration tests on Postgres and SQLite where search_vector_chunks hadn't been created yet: relation "search_vector_chunks" does not exist no such table: search_vector_chunks Inspect the connection's tables once per call and only delete from whichever derived tables are present. Idempotent on every backend. Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 55s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummaryThe second commit ( Code Quality & Standards
Still missing: The base # base class — repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")
# ProjectRepository.delete — neither call is presentTesting & Documentation
# project_repository.py:147-148
except NoResultFound:
return False # ← no testBasic Memory Architecture
The table-existence check ( Migration The previous review flagged an inconsistency: the Security & Performance
Compatibility
VerdictTwo items remain from the previous review — both are minor but affect 100% coverage compliance:
The Codex P1 issue (unconditional delete against a potentially absent table) is resolved by commit |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1768de8571
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| existing_tables = await session.run_sync( | ||
| lambda sync_session: set(sa_inspect(sync_session.connection()).get_table_names()) | ||
| ) | ||
| for table in ("search_index", "search_vector_chunks"): |
There was a problem hiding this comment.
Purge sqlite-vec embeddings during project delete
When semantic search is enabled on SQLite, this purge removes search_vector_chunks but leaves search_vector_embeddings rows behind for the deleted chunks. Those orphan embeddings are still scanned by _run_vector_query (which fetches top-k from search_vector_embeddings before joining chunks), so enough stale vectors can consume candidate slots and reduce or eliminate semantic results for active projects after repeated project deletions. Include an embeddings cleanup step (rowids matching deleted chunk ids) in the same transaction.
Useful? React with 👍 / 👎.
Two follow-up fixes on the same branch: 1. **Purge sqlite-vec embeddings during project delete** (Codex P2) sqlite-vec stores vectors in a vec0 virtual table keyed by chunk rowid with no cascade. The previous purge removed search_vector_chunks but left the embeddings behind; `_run_vector_query` then keeps returning stale vectors that crowd live results. ProjectRepository.delete now deletes embeddings first (using the same rowid-IN-chunks pattern as SQLiteSearchRepository.delete_project_vector_rows), then the chunk rows. Both deletes are skipped if the underlying table is absent on a given install. New test test_remove_project_purges_vector_embeddings covers the happy path and skips cleanly when the embeddings table isn't initialized. 2. **Fix search_all_projects=True on local installs** `_search_all_projects` recurses into search_notes with both project= and project_id= set. project_id (external UUID) routes through the cloud v2 API path, which 401s on local installs because there's no JWT to present — so the inner calls silently failed and the merged result list stayed empty. The fan-out now mirrors get_project_client's cloud_available composite (factory mode OR explicit --cloud OR has_cloud_credentials). When that composite is false we forward project= only and take the name-routed local-ASGI path. Cloud disambiguation still works because the project name in project_ref is already the workspace/project qualified_name. The existing cloud-style fan-out tests now go through a cloud_routing fixture that pins the three signals; a new local_routing test confirms project_id is dropped when no cloud route is available. Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 2m 21s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummaryCommit Code Quality & Standards
Still missing: The base # repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")
# project_repository.py — no logger import, no debug callsTesting & Documentation
# project_repository.py:152-153
except NoResultFound:
return False # ← still no testBasic Memory Architecture
Ordering in
Minor gap: migration doesn't sweep orphaned
Security & Performance
Compatibility
VerdictTwo items remain open across three review cycles — both affect 100% coverage compliance:
The Codex P1 and P2 issues are fully resolved. The multi-project search feature is well-implemented and thoroughly tested. Everything else is clean. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2bd6552e84
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "DELETE FROM search_vector_embeddings WHERE rowid IN (" | ||
| "SELECT id FROM search_vector_chunks " | ||
| "WHERE project_id = :project_id)" |
There was a problem hiding this comment.
Gate sqlite-vec rowid delete to SQLite backends
The new embedding purge issues DELETE ... WHERE rowid IN (...), which is SQLite-specific; on Postgres, search_vector_embeddings uses chunk_id (not rowid), so deleting any project on a semantic-enabled Postgres install will raise a SQL error and abort ProjectRepository.delete. This path is reached whenever both vector tables exist, so project deletion regresses in that environment unless the delete is dialect-gated or uses the Postgres schema.
Useful? React with 👍 / 👎.
| DELETE FROM search_vector_chunks | ||
| WHERE entity_id NOT IN (SELECT id FROM entity) | ||
| """ |
There was a problem hiding this comment.
Remove orphaned embeddings in SQLite orphan cleanup migration
This migration deletes orphaned search_vector_chunks rows but never removes the corresponding search_vector_embeddings entries. In SQLite vector search, top-k candidates are pulled from search_vector_embeddings before joining chunks, so leftover embeddings from rows deleted here can still consume candidate slots and reduce semantic recall after upgrade. The migration should delete embeddings for chunk ids being removed in the same upgrade step.
Useful? React with 👍 / 👎.
The previous fix issued an unconditional
`DELETE FROM search_vector_embeddings WHERE rowid IN (...)` during project
deletion, which 500'd on Postgres:
column "rowid" does not exist
Postgres uses `chunk_id` (a real column with an FK to
search_vector_chunks.id ON DELETE CASCADE), so it doesn't need or accept
the SQLite query at all. The cascade picture by backend:
- search_index → project: Postgres has FK CASCADE; SQLite FTS5 virtual
table can't carry FKs and needs explicit cleanup.
- search_vector_chunks → project: neither backend has an FK, so both
need an explicit DELETE.
- search_vector_embeddings → search_vector_chunks: Postgres has FK
CASCADE on chunk_id; SQLite vec0 virtual table is keyed by rowid
with no cascade, so embeddings must be purged before the chunks.
Branch is now: chunks DELETE runs on both backends; search_index and
the vec0 embeddings DELETE only run on SQLite.
Verified locally against both backends (SQLite: 31 passed, Postgres
via testcontainers: 26 passed) for tests/services/test_project_removal_bug.py
and the CLI/MCP project-management integration suites.
Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 56s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummaryCommit Code Quality & Standards
Still missing:
# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")The override emits nothing. Add Testing & Documentation
# project_repository.py:152-154
except NoResultFound:
return False # ← no test in any test fileBasic Memory Architecture
Minor: migration doesn't sweep orphaned
Security & Performance
Compatibility
VerdictTwo items remain open across four review cycles — both affect 100% coverage compliance:
The Codex P1 dialect-gate regression is fully resolved by |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3619f97f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _table_exists(connection, "search_vector_chunks"): | ||
| op.execute( | ||
| """ | ||
| DELETE FROM search_vector_chunks | ||
| WHERE project_id NOT IN (SELECT id FROM project) | ||
| """ | ||
| ) | ||
| op.execute( | ||
| """ | ||
| DELETE FROM search_vector_chunks | ||
| WHERE entity_id NOT IN (SELECT id FROM entity) | ||
| """ |
There was a problem hiding this comment.
Purge sqlite-vec embeddings during orphan migration
The migration deletes orphaned rows from search_vector_chunks but never removes matching rows from search_vector_embeddings. On SQLite semantic-search installs, vec0 candidates are read from search_vector_embeddings first, so stale embeddings for deleted chunks can still consume top-k slots and reduce semantic recall after upgrade. Add an embeddings cleanup step keyed by the chunk ids being removed in this migration.
Useful? React with 👍 / 👎.
On Windows SQLite CI the embeddings cleanup hits sqlite3.OperationalError: no such module: vec0 [SQL: DELETE FROM search_vector_embeddings WHERE rowid IN (...)] because the sqlite-vec extension isn't loaded into the connection (some Windows Python builds don't expose enable_load_extension on sqlite3.Connection — see #711). The vec0 virtual table is registered in sqlite_master, so the table-existence check passes, but any access to it fails until the module is loaded. If vec0 isn't loadable, semantic search was never able to write embeddings, so there's nothing to clean up. Wrap the embeddings DELETE in a try/except that swallows OperationalError for vec0 and logs a debug line — the chunk DELETE below still runs. The KeyError from test_mcp_sse_forces_local on 3.14 SQLite is an unrelated transient — same code passes on 3.12/3.13 SQLite, Postgres 3.14, and locally on Python 3.14. Will let CI re-run after this push. Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 37s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummaryCommit Two items flagged across five consecutive review cycles remain open. Code Quality & Standards
Still missing:
# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete logic ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")The override has a async def delete(self, entity_id: int) -> bool:
logger.debug(f"Deleting Project and search rows for project_id: {entity_id}")
async with db.scoped_session(self.session_maker) as session:
...
await session.delete(project)
logger.debug(f"Deleted Project and search rows for project_id: {entity_id}")
return TrueTesting & Documentation
# project_repository.py:154-155
except NoResultFound:
return False # ← no test across all five review cyclesAdding one focused async test would close this: @pytest.mark.asyncio
async def test_remove_project_delete_returns_false_for_missing_id(project_service):
result = await project_service.repository.delete(9_999_999)
assert result is FalseBasic Memory Architecture
The Security & Performance
Compatibility
VerdictTwo items remain open across five review cycles — both affect 100% coverage compliance:
All Codex P1/P2 issues, the multi-project search feature, and the new vec0 tolerance are correct and well-implemented. Everything else is clean. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43c50ea683
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except OperationalError as exc: | ||
| # Trigger: the vec0 SQLite extension isn't loaded into this | ||
| # connection — common on Windows builds where | ||
| # enable_load_extension is unavailable (#711). | ||
| # Why: the embeddings table is registered as a vec0 virtual | ||
| # table, so any access (even a DELETE) needs the module | ||
| # loaded. If it isn't, no row was ever inserted either, so | ||
| # there's nothing to leak. | ||
| # Outcome: log and continue — the chunk DELETE below still | ||
| # runs, and the absence of vec0 means no orphan vectors. | ||
| if "vec0" not in str(exc): | ||
| raise |
There was a problem hiding this comment.
Ensure sqlite-vec is loaded before skipping embedding purge
ProjectRepository.delete now swallows OperationalError when deleting from search_vector_embeddings fails with vec0, but sqlite_search_repository._ensure_sqlite_vec_loaded explicitly documents that sqlite-vec is loaded per connection (not globally), so this error can happen on a different pooled connection even when embeddings already exist. In that case project deletion leaves orphan vectors behind, and later semantic searches on a vec-enabled connection can still consume those stale candidates. Please load sqlite-vec for this session (or fail the delete) instead of treating every vec0 error as “nothing to purge.”
Useful? React with 👍 / 👎.
Three follow-ups from PR review: **Codex P2 — Load sqlite-vec on the delete session.** The previous revision swallowed every `vec0` OperationalError as "no embeddings exist", but sqlite-vec is loaded **per connection**: a pooled connection that hosts ProjectRepository.delete may not have vec0 loaded even when another connection successfully wrote embeddings. That would silently leave orphan vectors behind. The new `_load_sqlite_vec_on_session` helper mirrors SQLiteSearchRepository._ensure_sqlite_vec_loaded as a free function and tries to load the extension on the current session. Only when the load itself fails — because the Python build lacks enable_load_extension, or the sqlite_vec package isn't installed — do we skip the embeddings DELETE. Every connection in the pool shares the same interpreter, so in that case no embeddings could have been written from any connection and skipping is safe. **Claude review — Missing logger.debug calls.** The override now logs at entry, when the project id isn't found, and after the ORM delete, matching the base Repository.delete contract. **Claude review — NoResultFound branch uncovered.** New test `test_delete_returns_false_for_missing_project_id` asserts the False return for a nonexistent project id. Verified locally: 32 passed / 1 skipped (SQLite), 27 passed / 1 skipped (Postgres via testcontainers). Signed-off-by: phernandez <paul@basicmachines.co>
|
Claude finished @phernandez's task in 1m 36s —— View job PR Review: fix(core): purge SQLite search_index on project delete
SummaryCommit Code Quality & Standards
The new helper (lines 17–73) explicitly loads vec0 on the same session's connection before the embeddings purge — addressing the Codex P2 concern that a pooled connection different from the one that wrote embeddings might not have the extension loaded. The docstring explains the per-connection constraint and mirrors the rationale in Testing & Documentation
Basic Memory Architecture
Ordering is still correct: Remaining minor gap: migration doesn't sweep orphaned
Security & Performance
Compatibility
VerdictAll issues across six review cycles are resolved. The PR is correct, complete, and ready to merge.
The only deferred item — orphaned |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8c11b843d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| WHERE entity_id IS NOT NULL | ||
| AND entity_id NOT IN (SELECT id FROM entity) |
There was a problem hiding this comment.
Purge reused-ID stale rows during orphan migration
Tighten the orphan predicates so they verify row ownership, not just ID existence. The current cleanup deletes rows only when project_id or entity_id is missing, so stale search_index/search_vector_chunks rows survive whenever both IDs have already been reused by newer records (a case SQLite rowid reuse allows, and the migration itself is addressing). In that scenario the leaked rows still satisfy both NOT IN checks and remain searchable under the new project, so the migration can leave cross-project contamination behind after upgrade.
Useful? React with 👍 / 👎.
Promotes the Unreleased breaking-change note and adds the full v0.21.0 section covering ~80 commits since v0.20.3: workspace-routing fixes across MCP/CLI/API, recent_activity ordering and search opt-in changes, sync hardening, sqlite-vec graceful degrade, perf wins on CLI startup and sync, and the project-delete cleanup landed in #832. Signed-off-by: phernandez <paul@basicmachines.co>
Summary
search_indexis an FTS5 virtual table and can't carry an FK, so theON DELETE CASCADEfromsearch_index.project_id → project.idonly runs on Postgres. On SQLite, deleting a project left its FTS rows behind — and when auto-increment reused the id for a new project, those leftover rows surfaced as the new tenant's data and leaked intosearch_notes(project=...)results.ProjectRepository.deletenow explicitly purgessearch_indexandsearch_vector_chunksfor the project id in the same session before the ORM delete. Idempotent on Postgres (the cascade FK still runs).n7i8j9k0l1m2) does a one-time sweep for existing installs that already have orphans. Catches both shapes: rows whoseproject_idis gone, and the larger class of rows whoseentity_idis gone (from id reuse). Guarded by table-existence checks so fresh SQLite installs — wheresearch_indexis created at runtime byinit_search_index, not by Alembic — don't fail the upgrade.How it surfaced
search_notes(project="basic-memory-testing-...", query="spec", search_type="title")returned 10 results, all from themainproject, with the response header still claiming the test project. Diagnosis: project_id 2 had ~1k FTS rows referencingentity_ids that no longer existed inentity— orphans from a previously-deleted project whose id was reused. The repo's WHERE clause filters bysearch_index.project_id, so the rows looked legitimate.Test plan
tests/services/test_project_removal_bug.py::test_remove_project_purges_search_rows— seeds both derived tables, callsremove_project, asserts both come out clean. Verified red on pre-fix code (search_index still has 1 rows for deleted project_id=2), green with fix.tests/services/+tests/repository/full sweep: 616 passed / 19 skipped / 0 failed.search_notes(project=test_proj, query="spec", search_type="title")returns "No results found";query="basic"returns only the 5 test-project notes.