fix(core): purge SQLite search_index on project delete by phernandez · Pull Request #832 · basicmachines-co/basic-memory

phernandez · 2026-05-16T16:43:49Z

Summary

SQLite's search_index is an FTS5 virtual table and can't carry an FK, so the ON DELETE CASCADE from search_index.project_id → project.id only runs on Postgres. On SQLite, deleting a project left its FTS rows behind — and when auto-increment reused the id for a new project, those leftover rows surfaced as the new tenant's data and leaked into search_notes(project=...) results.
ProjectRepository.delete now explicitly purges search_index and search_vector_chunks for the project id in the same session before the ORM delete. Idempotent on Postgres (the cascade FK still runs).
New migration (n7i8j9k0l1m2) does a one-time sweep for existing installs that already have orphans. Catches both shapes: rows whose project_id is gone, and the larger class of rows whose entity_id is gone (from id reuse). Guarded by table-existence checks so fresh SQLite installs — where search_index is created at runtime by init_search_index, not by Alembic — don't fail the upgrade.

How it surfaced

search_notes(project="basic-memory-testing-...", query="spec", search_type="title") returned 10 results, all from the main project, with the response header still claiming the test project. Diagnosis: project_id 2 had ~1k FTS rows referencing entity_ids that no longer existed in entity — orphans from a previously-deleted project whose id was reused. The repo's WHERE clause filters by search_index.project_id, so the rows looked legitimate.

Test plan

tests/services/test_project_removal_bug.py::test_remove_project_purges_search_rows — seeds both derived tables, calls remove_project, asserts both come out clean. Verified red on pre-fix code (search_index still has 1 rows for deleted project_id=2), green with fix.
tests/services/ + tests/repository/ full sweep: 616 passed / 19 skipped / 0 failed.
Live MCP roundtrip: after running the cleanup against my local DB, search_notes(project=test_proj, query="spec", search_type="title") returns "No results found"; query="basic" returns only the 5 test-project notes.

SQLite stores search_index as an FTS5 virtual table, which can't carry a foreign key, so the ON DELETE CASCADE from search_index.project_id to project.id only applies on Postgres. On SQLite, deleting a project left its FTS rows behind — and when auto-increment handed the same id to a new project, the leftover rows masqueraded as the new tenant's data and leaked into searches scoped to that project. - ProjectRepository.delete now explicitly purges search_index and search_vector_chunks for the project id in the same session before the ORM delete. Idempotent on Postgres (the cascade FK still runs). - One-time cleanup migration sweeps two leftover shapes: rows whose project_id is gone, and rows whose entity_id is gone (the larger class from id reuse). Guarded by table-existence checks so fresh SQLite installs — where search_index is created at runtime by init_search_index, not by Alembic — don't fail the upgrade. - Regression test seeds both derived tables, calls remove_project, and asserts both come out clean. Verified red on pre-fix code. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T16:44:04Z

Claude finished @phernandez's task in 3m 25s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

Solid, well-scoped fix. The root cause diagnosis is accurate (SQLite FTS5 can't carry FKs → orphan rows survive project deletion → auto-increment id reuse leaks prior tenant's content into new project). The fix is minimal, idempotent on Postgres, and documented in the literate-programming style required by CLAUDE.md. A few minor items worth addressing before merge.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices	✅
100-character line length maintained	✅
No commented-out code blocks	✅

project_repository.py — Missing logger calls

The base Repository.delete emits logger.debug() before and after deletion. The ProjectRepository.delete override drops those calls. This makes debugging harder when tracing delete flows in logs. Minor, but inconsistent with the base class pattern.

# Base class has:
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# Override has: (nothing)

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions	✅
Test coverage for edge cases	⚠️ See below
100% test coverage maintained	❓ Needs verification
Documentation updated	✅

test_project_removal_bug.py — NoResultFound branch not covered

ProjectRepository.delete has a return False path when the project ID doesn't exist (lines 141-143). The new test only exercises the happy path (project exists). The base Repository.delete has the same pattern, but since ProjectRepository overrides it, that branch needs its own coverage.

# project_repository.py:141-143
except NoResultFound:
    return False  # ← is this covered?

Fix this →

Basic Memory Architecture

Item	Status
Database changes include Alembic migration	✅
Preserves local-first architecture	✅
Knowledge graph consistency maintained	✅
Atomicity of delete + FTS purge	✅ Single session context
Idempotent on Postgres	✅ Explicit delete + FK cascade = no-op

Execution order is correct — session.delete(project) defers the ORM DELETE to flush time (session exit), so the two raw SQL deletes against search_index and search_vector_chunks run before the project row is removed. On Postgres where the FK cascade exists, the cascade fires after the explicit deletes but finds nothing — correct behavior.

Migration _table_exists guard is correct — FTS5 virtual tables appear in SQLite's sqlite_master and are returned by SQLAlchemy inspector's get_table_names(), so fresh installs where search_index doesn't exist yet will skip the sweep without error.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
Parameterized queries (no SQL injection)	✅
Proper error handling	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console output	✅
Windows path compatibility	✅ N/A for this change

Minor Observation: `entity_id IS NOT NULL` guard inconsistency in migration

src/basic_memory/alembic/versions/n7i8j9k0l1m2_cleanup_sqlite_search_orphans.py:61-66

The search_index sweep correctly guards with entity_id IS NOT NULL:

-- search_index sweep:
WHERE entity_id IS NOT NULL AND entity_id NOT IN (SELECT id FROM entity)

-- search_vector_chunks sweep:
WHERE entity_id NOT IN (SELECT id FROM entity)  -- no IS NOT NULL guard

If search_vector_chunks.entity_id is declared NOT NULL at the schema level, this is fine — the subquery SELECT id FROM entity will never return NULLs (PK), so the NOT IN is safe. If entity_id can be NULL in search_vector_chunks, the NOT IN would silently skip those rows (SQL NOT IN returns NULL/false when the value is NULL). Worth confirming against the model definition, but likely not a practical issue.

Verdict

Approve with minor suggestions. The fix is correct, idempotent, and well-tested for the primary scenario. The logger omission and missing NoResultFound coverage are the only items that touch 100% coverage compliance. Everything else is clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a85767ad92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T16:47:38Z

+                {"project_id": entity_id},
+            )
+            await session.execute(
+                text("DELETE FROM search_vector_chunks WHERE project_id = :project_id"),


Guard vector-chunk delete when semantic tables are absent

ProjectRepository.delete now unconditionally executes DELETE FROM search_vector_chunks, but on SQLite that table is only created when semantic search is enabled (SQLiteSearchRepository.init_search_index calls _ensure_vector_tables() only under that flag). For installs where semantic dependencies are missing or disabled, deleting any project will raise no such table: search_vector_chunks and abort the removal flow. Add a table-existence/dialect guard (similar to the migration) before issuing this delete.

Useful? React with 👍 / 👎.

search_index is created lazily by SearchRepository.init_search_index and search_vector_chunks only materializes once semantic search initializes, so either table may be absent on minimal test DBs. The previous version of ProjectRepository.delete unconditionally issued DELETEs against both and crashed CLI integration tests on Postgres and SQLite where search_vector_chunks hadn't been created yet: relation "search_vector_chunks" does not exist no such table: search_vector_chunks Inspect the connection's tables once per call and only delete from whichever derived tables are present. Idempotent on every backend. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T19:25:31Z

Claude finished @phernandez's task in 1m 55s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

The second commit (1768de8) addresses the Codex P1 issue: ProjectRepository.delete now uses sa_inspect to check for table existence before issuing the DELETE statements, so installs where search_vector_chunks hasn't been initialized yet won't break on project removal. Two minor items from the previous review are still open.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices (`run_sync` for sync inspect in async context)	✅
100-character line length maintained	✅
No commented-out code blocks	✅

Still missing: logger.debug calls in ProjectRepository.delete

The base Repository.delete (repository.py:337,346) emits debug logs at the start and end of every delete. The override drops both:

# base class — repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# ProjectRepository.delete — neither call is present

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions	✅
Test coverage for happy path	✅
100% test coverage maintained	⚠️ See below
Documentation / docstrings	✅

NoResultFound branch still uncovered

ProjectRepository.delete returns False when the project doesn't exist (line 148), but there is no test exercising that path. The previous review flagged this; it's still open.

# project_repository.py:147-148
except NoResultFound:
    return False  # ← no test

Fix this →

Basic Memory Architecture

Item	Status
Alembic migration included	✅
Local-first architecture preserved	✅
Atomicity: FTS purge + ORM delete in one session	✅
Idempotent on Postgres	✅
Codex P1 resolved (table-existence guard in `delete`)	✅ Commit `1768de8`

The table-existence check (sa_inspect(connection).get_table_names()) is the right approach and matches the migration's own _table_exists guard. FTS5 virtual tables appear in sqlite_master and are returned by the inspector, so the guard correctly handles both the absent-table case and the present-but-empty case.

Migration entity_id IS NOT NULL consistency — confirmed non-issue

The previous review flagged an inconsistency: the search_index sweep guards with entity_id IS NOT NULL AND entity_id NOT IN (...), while the search_vector_chunks sweep omits the guard. Verified against the DDL in models/search.py: entity_id INTEGER NOT NULL on both Postgres and SQLite schemas. Since entity_id can never be NULL, the NOT IN subquery is safe without the guard.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
Parameterized queries (`:project_id` binding, no SQL injection)	✅
Proper error handling	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console/log output	✅
Windows path compatibility	✅ N/A for this change

Verdict

Two items remain from the previous review — both are minor but affect 100% coverage compliance:

Add logger.debug calls to ProjectRepository.delete to match the base class pattern.
Add a test for ProjectRepository.delete returning False (nonexistent project id).

The Codex P1 issue (unconditional delete against a potentially absent table) is resolved by commit 1768de8. Everything else is clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1768de8571

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T19:29:28Z

+            existing_tables = await session.run_sync(
+                lambda sync_session: set(sa_inspect(sync_session.connection()).get_table_names())
+            )
+            for table in ("search_index", "search_vector_chunks"):


Purge sqlite-vec embeddings during project delete

When semantic search is enabled on SQLite, this purge removes search_vector_chunks but leaves search_vector_embeddings rows behind for the deleted chunks. Those orphan embeddings are still scanned by _run_vector_query (which fetches top-k from search_vector_embeddings before joining chunks), so enough stale vectors can consume candidate slots and reduce or eliminate semantic results for active projects after repeated project deletions. Include an embeddings cleanup step (rowids matching deleted chunk ids) in the same transaction.

Useful? React with 👍 / 👎.

Two follow-up fixes on the same branch: 1. **Purge sqlite-vec embeddings during project delete** (Codex P2) sqlite-vec stores vectors in a vec0 virtual table keyed by chunk rowid with no cascade. The previous purge removed search_vector_chunks but left the embeddings behind; `_run_vector_query` then keeps returning stale vectors that crowd live results. ProjectRepository.delete now deletes embeddings first (using the same rowid-IN-chunks pattern as SQLiteSearchRepository.delete_project_vector_rows), then the chunk rows. Both deletes are skipped if the underlying table is absent on a given install. New test test_remove_project_purges_vector_embeddings covers the happy path and skips cleanly when the embeddings table isn't initialized. 2. **Fix search_all_projects=True on local installs** `_search_all_projects` recurses into search_notes with both project= and project_id= set. project_id (external UUID) routes through the cloud v2 API path, which 401s on local installs because there's no JWT to present — so the inner calls silently failed and the merged result list stayed empty. The fan-out now mirrors get_project_client's cloud_available composite (factory mode OR explicit --cloud OR has_cloud_credentials). When that composite is false we forward project= only and take the name-routed local-ASGI path. Cloud disambiguation still works because the project name in project_ref is already the workspace/project qualified_name. The existing cloud-style fan-out tests now go through a cloud_routing fixture that pins the three signals; a new local_routing test confirms project_id is dropped when no cloud route is available. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T20:12:04Z

Claude finished @phernandez's task in 2m 21s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

Commit 2bd6552 ("address PR review feedback — P2 + multi-project search") finishes the last two Codex issues and adds a substantial multi-project search_notes feature. Two items flagged in the previous two reviews are still open.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices (`run_sync` for sync inspect in async)	✅
100-character line length maintained	✅
No commented-out code blocks	✅
Literate-programming comments (Trigger / Why / Outcome)	✅

Still missing: logger.debug calls in ProjectRepository.delete (third review)

The base Repository.delete (repository.py:337,346) emits two debug log lines. The override still has no logger import and no debug calls:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# project_repository.py — no logger import, no debug calls

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Multi-project search tests (5 cases, cloud + local routing)	✅
Vector embeddings cleanup test (`test_remove_project_purges_vector_embeddings`)	✅ Resolved P2
`NoResultFound` branch covered	❌ See below
100% coverage maintained	⚠️ Blocked by above

NoResultFound branch still uncovered (third review)

ProjectRepository.delete returns False at line 153 when the project id doesn't exist, but no test exercises this path. The base class has its own test for this, but since the override skips super().delete(), that coverage doesn't apply here.

# project_repository.py:152-153
except NoResultFound:
    return False  # ← still no test

Fix this →

Basic Memory Architecture

Item	Status
Alembic migration included	✅
Local-first architecture preserved	✅
Atomicity: FTS purge + embeddings purge + ORM delete in one session	✅
Idempotent on Postgres (FK cascade still runs after explicit DELETE)	✅
Codex P1 resolved (table-existence guard via `sa_inspect`)	✅
Codex P2 resolved (`search_vector_embeddings` swept before chunks)	✅ Commit `2bd6552`
Multi-project `search_all_projects` fan-out	✅
Cloud routing composite correctly mirrors `get_project_client`	✅

Ordering in ProjectRepository.delete is correct: embeddings deleted first (by chunk rowid) while chunk rows still exist, then chunks deleted, then project ORM-deleted at session flush. The vec0 table has no cascade, so this explicit ordering is required.

_search_all_projects cloud-routing guard is correct: the use_cloud_routing composite (is_factory_mode OR (_explicit_routing AND NOT _force_local_mode) OR has_cloud_credentials) mirrors get_project_client's routing logic. Forwarding project_id (external UUID) only on cloud routes prevents 401s on local installs.

Minor gap: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps orphans from search_index and search_vector_chunks, but not from search_vector_embeddings. After the migration purges orphaned chunks, their corresponding embeddings in the vec0 table remain behind. Because _run_vector_query joins on chunk rowid before filtering by project, orphan embeddings that lack matching chunks won't produce incorrect results — but they do consume top-k candidate slots and reduce effective search capacity over time on installs with semantic search. Adding a vec0-compatible DELETE in the migration would be the complete fix, though vec0 tables have non-standard SQL constraints that may require dialect-specific handling.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
All SQL uses parameterized bindings (`:project_id`, `:pid`, etc.)	✅
Proper error handling in multi-project fan-out (per-project try/except)	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console or log output	✅
Windows path compatibility	✅ N/A for this change

Verdict

Two items remain open across three review cycles — both affect 100% coverage compliance:

Add from loguru import logger and two logger.debug calls to ProjectRepository.delete to match the base class.
Add a test for the NoResultFound → return False branch at project_repository.py:153.

The Codex P1 and P2 issues are fully resolved. The multi-project search feature is well-implemented and thoroughly tested. Everything else is clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bd6552e84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T20:16:08Z

+                            "DELETE FROM search_vector_embeddings WHERE rowid IN ("
+                            "SELECT id FROM search_vector_chunks "
+                            "WHERE project_id = :project_id)"


Gate sqlite-vec rowid delete to SQLite backends

The new embedding purge issues DELETE ... WHERE rowid IN (...), which is SQLite-specific; on Postgres, search_vector_embeddings uses chunk_id (not rowid), so deleting any project on a semantic-enabled Postgres install will raise a SQL error and abort ProjectRepository.delete. This path is reached whenever both vector tables exist, so project deletion regresses in that environment unless the delete is dialect-gated or uses the Postgres schema.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-16T20:16:08Z

+            DELETE FROM search_vector_chunks
+            WHERE entity_id NOT IN (SELECT id FROM entity)
+            """


Remove orphaned embeddings in SQLite orphan cleanup migration

This migration deletes orphaned search_vector_chunks rows but never removes the corresponding search_vector_embeddings entries. In SQLite vector search, top-k candidates are pulled from search_vector_embeddings before joining chunks, so leftover embeddings from rows deleted here can still consume candidate slots and reduce semantic recall after upgrade. The migration should delete embeddings for chunk ids being removed in the same upgrade step.

Useful? React with 👍 / 👎.

The previous fix issued an unconditional `DELETE FROM search_vector_embeddings WHERE rowid IN (...)` during project deletion, which 500'd on Postgres: column "rowid" does not exist Postgres uses `chunk_id` (a real column with an FK to search_vector_chunks.id ON DELETE CASCADE), so it doesn't need or accept the SQLite query at all. The cascade picture by backend: - search_index → project: Postgres has FK CASCADE; SQLite FTS5 virtual table can't carry FKs and needs explicit cleanup. - search_vector_chunks → project: neither backend has an FK, so both need an explicit DELETE. - search_vector_embeddings → search_vector_chunks: Postgres has FK CASCADE on chunk_id; SQLite vec0 virtual table is keyed by rowid with no cascade, so embeddings must be purged before the chunks. Branch is now: chunks DELETE runs on both backends; search_index and the vec0 embeddings DELETE only run on SQLite. Verified locally against both backends (SQLite: 31 passed, Postgres via testcontainers: 26 passed) for tests/services/test_project_removal_bug.py and the CLI/MCP project-management integration suites. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T20:29:03Z

Claude finished @phernandez's task in 1m 56s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

Commit d3619f9 ("dialect-gate sqlite-vec embeddings purge") resolves the Codex P1 regression — the rowid-based embedding delete is now correctly gated to SQLite only, so Postgres project deletion no longer fails with a SQL error. Two items flagged across four consecutive review cycles remain open.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices (`run_sync` for sync inspect in async)	✅
100-character line length maintained	✅
No commented-out code blocks	✅
Literate-programming comments (Trigger / Why / Outcome)	✅

Still missing: logger.debug calls in ProjectRepository.delete (fourth review)

src/basic_memory/repository/project_repository.py has no from loguru import logger import and no debug log calls. The base Repository.delete (repository.py:337,346) emits two lines:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

The override emits nothing. Add from loguru import logger to the imports and two matching calls.

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Multi-project search tests (5 cases, cloud + local routing)	✅
Vector embeddings cleanup test (`test_remove_project_purges_vector_embeddings`)	✅
Happy-path search row purge test	✅
`NoResultFound` branch covered	❌ See below
100% coverage maintained	⚠️ Blocked by above

NoResultFound branch still uncovered (fourth review)

ProjectRepository.delete returns False at line 153 when the project id doesn't exist. There is still no test for this path. The base class has its own test for this pattern, but since the override calls scalars().one() directly (bypassing super().delete()), that coverage does not transfer.

# project_repository.py:152-154
except NoResultFound:
    return False  # ← no test in any test file

Fix this →

Basic Memory Architecture

Item	Status
Alembic migration included	✅
Local-first architecture preserved	✅
Atomicity: FTS purge + embeddings purge + ORM delete in one session	✅
Idempotent on Postgres (FK cascade still runs after explicit DELETE)	✅
Codex P1 resolved (`rowid` delete gated to SQLite via `is_sqlite` flag)	✅ Commit `d3619f9`
Codex P2: orphan embeddings not swept in migration	⚠️ Minor — see below

d3619f9 deletion ordering is correct: SQLite path deletes embeddings by chunk rowid while chunk rows still exist (project_repository.py:177-185), then deletes chunks (project_repository.py:186-189). The vec0 table has no cascade, so this ordering is required and correctly implemented.

Minor: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps search_index and search_vector_chunks orphans but leaves any corresponding search_vector_embeddings rows behind. Orphan embeddings whose chunk rows are deleted won't produce wrong results (the join in _run_vector_query filters them out), but they do consume top-k candidate slots on installs with semantic search enabled, reducing effective recall over time. This is not a correctness regression — it's a cleanliness gap in the one-time upgrade sweep. Acceptable to defer to a follow-up, but worth noting.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
All SQL uses parameterized bindings (`:project_id`, `:pid`, etc.)	✅
Error handling in multi-project fan-out (per-project try/except + logger.warning)	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console or log output	✅
Windows path compatibility	✅ N/A for this change

Verdict

Two items remain open across four review cycles — both affect 100% coverage compliance:

Add from loguru import logger and two logger.debug calls to ProjectRepository.delete to match the base class (repository.py:337,346).
Add a test for ProjectRepository.delete returning False when called with a nonexistent project id (covers the NoResultFound branch at project_repository.py:153).

The Codex P1 dialect-gate regression is fully resolved by d3619f9. The multi-project search feature and the vector embeddings purge are both correct and well-tested. Everything else is clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3619f97f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T20:34:11Z

+    if _table_exists(connection, "search_vector_chunks"):
+        op.execute(
+            """
+            DELETE FROM search_vector_chunks
+            WHERE project_id NOT IN (SELECT id FROM project)
+            """
+        )
+        op.execute(
+            """
+            DELETE FROM search_vector_chunks
+            WHERE entity_id NOT IN (SELECT id FROM entity)
+            """


Purge sqlite-vec embeddings during orphan migration

The migration deletes orphaned rows from search_vector_chunks but never removes matching rows from search_vector_embeddings. On SQLite semantic-search installs, vec0 candidates are read from search_vector_embeddings first, so stale embeddings for deleted chunks can still consume top-k slots and reduce semantic recall after upgrade. Add an embeddings cleanup step keyed by the chunk ids being removed in this migration.

Useful? React with 👍 / 👎.

On Windows SQLite CI the embeddings cleanup hits sqlite3.OperationalError: no such module: vec0 [SQL: DELETE FROM search_vector_embeddings WHERE rowid IN (...)] because the sqlite-vec extension isn't loaded into the connection (some Windows Python builds don't expose enable_load_extension on sqlite3.Connection — see #711). The vec0 virtual table is registered in sqlite_master, so the table-existence check passes, but any access to it fails until the module is loaded. If vec0 isn't loadable, semantic search was never able to write embeddings, so there's nothing to clean up. Wrap the embeddings DELETE in a try/except that swallows OperationalError for vec0 and logs a debug line — the chunk DELETE below still runs. The KeyError from test_mcp_sse_forces_local on 3.14 SQLite is an unrelated transient — same code passes on 3.12/3.13 SQLite, Postgres 3.14, and locally on Python 3.14. Will let CI re-run after this push. Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T20:54:07Z

Claude finished @phernandez's task in 1m 37s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

Commit 43c50ea ("tolerate missing vec0 module during project delete") adds OperationalError handling in ProjectRepository.delete so that environments where the vec0 SQLite extension isn't loaded don't fail project deletion — the purge is skipped with a logger.debug call and execution continues normally. That's the correct approach and the comment block explains the constraint well.

Two items flagged across five consecutive review cycles remain open.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices (`run_sync` for sync inspect in async)	✅
100-character line length maintained	✅
No commented-out code blocks	✅
Literate-programming comments (Trigger / Why / Outcome)	✅

Still missing: logger.debug calls in ProjectRepository.delete (fifth review)

from loguru import logger is now imported (project_repository.py:7), which resolves the import gap flagged last cycle. However, the delete method still emits no debug logs at its entry and exit points. The base Repository.delete (repository.py:337,346) wraps every deletion with:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete logic ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

The override has a logger.debug call for the vec0 skip case (line 200) but nothing at the broader method boundary. Add two calls to match the base class:

async def delete(self, entity_id: int) -> bool:
    logger.debug(f"Deleting Project and search rows for project_id: {entity_id}")
    async with db.scoped_session(self.session_maker) as session:
        ...
        await session.delete(project)
        logger.debug(f"Deleted Project and search rows for project_id: {entity_id}")
        return True

Fix this →

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Multi-project search tests (5 cases, cloud + local routing)	✅
Vector embeddings cleanup test (`test_remove_project_purges_vector_embeddings`)	✅
vec0 missing extension path tested	✅ Commit `43c50ea`
`NoResultFound` branch covered	❌ See below
100% coverage maintained	⚠️ Blocked by above

NoResultFound branch still uncovered (fifth review)

ProjectRepository.delete returns False at line 155 when the project id doesn't exist. test_project_removal_bug.py now has three tests — the happy path, vector embeddings cleanup, and FTS row purge — but none hit the except NoResultFound path:

# project_repository.py:154-155
except NoResultFound:
    return False  # ← no test across all five review cycles

Adding one focused async test would close this:

@pytest.mark.asyncio
async def test_remove_project_delete_returns_false_for_missing_id(project_service):
    result = await project_service.repository.delete(9_999_999)
    assert result is False

Fix this →

Basic Memory Architecture

Item	Status
Alembic migration included	✅
Local-first architecture preserved	✅
Atomicity: FTS purge + embeddings purge + ORM delete in one session	✅
Idempotent on Postgres (FK cascade still runs after explicit DELETE)	✅
Codex P1 resolved (table-existence guard via `sa_inspect`)	✅
Codex P2 resolved (`search_vector_embeddings` swept before chunks)	✅
Codex P1 dialect-gate regression resolved (`rowid` delete gated to SQLite)	✅
vec0 missing extension tolerance resolved	✅ Commit `43c50ea`

The OperationalError guard at lines 188-203 is correct: checking "vec0" not in str(exc) re-raises genuine SQL errors while swallowing only the extension-not-loaded case. The comment block follows the literate-programming style required by CLAUDE.md.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
All SQL uses parameterized bindings (`:project_id`, etc.)	✅
Proper error handling in multi-project fan-out (per-project try/except + logger.warning)	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console or log output	✅
Windows path compatibility	✅ N/A for this change

Verdict

Two items remain open across five review cycles — both affect 100% coverage compliance:

Add logger.debug calls at the start and end of ProjectRepository.delete to match the base class pattern (repository.py:337,346). The loguru import is already in place.
Add a one-line async test for ProjectRepository.delete returning False when called with a nonexistent project id (covers the NoResultFound branch at project_repository.py:155).

All Codex P1/P2 issues, the multi-project search feature, and the new vec0 tolerance are correct and well-implemented. Everything else is clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43c50ea683

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T21:01:11Z

+                    except OperationalError as exc:
+                        # Trigger: the vec0 SQLite extension isn't loaded into this
+                        # connection — common on Windows builds where
+                        # enable_load_extension is unavailable (#711).
+                        # Why: the embeddings table is registered as a vec0 virtual
+                        # table, so any access (even a DELETE) needs the module
+                        # loaded. If it isn't, no row was ever inserted either, so
+                        # there's nothing to leak.
+                        # Outcome: log and continue — the chunk DELETE below still
+                        # runs, and the absence of vec0 means no orphan vectors.
+                        if "vec0" not in str(exc):
+                            raise


Ensure sqlite-vec is loaded before skipping embedding purge

ProjectRepository.delete now swallows OperationalError when deleting from search_vector_embeddings fails with vec0, but sqlite_search_repository._ensure_sqlite_vec_loaded explicitly documents that sqlite-vec is loaded per connection (not globally), so this error can happen on a different pooled connection even when embeddings already exist. In that case project deletion leaves orphan vectors behind, and later semantic searches on a vec-enabled connection can still consume those stale candidates. Please load sqlite-vec for this session (or fail the delete) instead of treating every vec0 error as “nothing to purge.”

Useful? React with 👍 / 👎.

Three follow-ups from PR review: **Codex P2 — Load sqlite-vec on the delete session.** The previous revision swallowed every `vec0` OperationalError as "no embeddings exist", but sqlite-vec is loaded **per connection**: a pooled connection that hosts ProjectRepository.delete may not have vec0 loaded even when another connection successfully wrote embeddings. That would silently leave orphan vectors behind. The new `_load_sqlite_vec_on_session` helper mirrors SQLiteSearchRepository._ensure_sqlite_vec_loaded as a free function and tries to load the extension on the current session. Only when the load itself fails — because the Python build lacks enable_load_extension, or the sqlite_vec package isn't installed — do we skip the embeddings DELETE. Every connection in the pool shares the same interpreter, so in that case no embeddings could have been written from any connection and skipping is safe. **Claude review — Missing logger.debug calls.** The override now logs at entry, when the project id isn't found, and after the ORM delete, matching the base Repository.delete contract. **Claude review — NoResultFound branch uncovered.** New test `test_delete_returns_false_for_missing_project_id` asserts the False return for a nonexistent project id. Verified locally: 32 passed / 1 skipped (SQLite), 27 passed / 1 skipped (Postgres via testcontainers). Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-05-16T21:07:39Z

Claude finished @phernandez's task in 1m 36s —— View job

PR Review: fix(core): purge SQLite search_index on project delete

Gather context and read changed files
Review code quality & standards
Review testing & documentation
Review Basic Memory architecture
Review security & performance
Review compatibility
Post final summary

Summary

Commit e8c11b8 ("load sqlite-vec before embeddings purge; add coverage") closes all items from the previous five review cycles. Both of the long-standing open issues are resolved, and the Codex P2 feedback about per-connection extension loading is addressed properly.

Code Quality & Standards

Item	Status
Follows Basic Memory coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices (`run_sync` for sync inspect in async)	✅
100-character line length maintained	✅
No commented-out code blocks	✅
Literate-programming comments (Trigger / Why / Outcome)	✅

logger.debug calls resolved (sixth cycle — closed)

project_repository.py:207 and project_repository.py:259 now emit matching debug lines at method entry and exit, consistent with the base Repository.delete pattern. The NoResultFound path also logs at line 215 before returning False.

_load_sqlite_vec_on_session helper is well-designed

The new helper (lines 17–73) explicitly loads vec0 on the same session's connection before the embeddings purge — addressing the Codex P2 concern that a pooled connection different from the one that wrote embeddings might not have the extension loaded. The docstring explains the per-connection constraint and mirrors the rationale in SQLiteSearchRepository._ensure_sqlite_vec_loaded. The three failure modes (package absent, enable_load_extension unavailable, load failure) each log at the appropriate level and return False so the purge step is safely skipped rather than failing the whole delete.

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅
Multi-project search tests (5 cases, cloud + local routing)	✅
Vector embeddings cleanup test	✅
`NoResultFound` branch covered	✅ Resolved
100% coverage maintained	✅

test_delete_returns_false_for_missing_project_id (lines 258–266) — closed

await project_service.repository.delete(9_999_999) + assert result is False covers the except NoResultFound: return False branch at project_repository.py:215. All five tests now pass.

Basic Memory Architecture

Item	Status
Alembic migration included	✅
Local-first architecture preserved	✅
Atomicity: FTS purge + embeddings purge + ORM delete in one session	✅
Idempotent on Postgres (FK cascade still runs after explicit DELETE)	✅
Codex P1: table-existence guard via `sa_inspect`	✅
Codex P2: `search_vector_embeddings` swept before chunks	✅
Codex P1 dialect-gate regression: `rowid` delete gated to SQLite	✅
vec0 missing extension tolerance: `OperationalError` swallowed correctly	✅
Codex P2: vec0 loaded on this session's connection before purge	✅ Commit `e8c11b8`

Ordering is still correct: _load_sqlite_vec_on_session → embeddings deleted by chunk rowid → chunks deleted → ORM delete on session flush. The vec0 table has no cascade, so this explicit ordering is required and correctly implemented.

Remaining minor gap: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps search_index and search_vector_chunks orphans but leaves corresponding search_vector_embeddings rows behind. As noted in previous reviews, orphan embeddings don't produce wrong results (the join in _run_vector_query filters them out), but they consume top-k candidate slots on semantic-search installs. This is acceptable to defer — the upgrade sweep is a best-effort cleanup, not a correctness requirement, and the per-project delete path now handles this correctly going forward.

Security & Performance

Item	Status
No hardcoded secrets or credentials	✅
All SQL uses parameterized bindings (`:project_id`, etc.)	✅
Error handling in multi-project fan-out (per-project try/except + logger.warning)	✅
No sensitive data in logs	✅

Compatibility

Item	Status
No emoji/unicode in console or log output	✅
Windows path compatibility	✅ N/A for this change

Verdict

All issues across six review cycles are resolved. The PR is correct, complete, and ready to merge.

Root-cause fix: SQLite FTS5 orphan rows are now purged on project delete (idempotent on Postgres).
_load_sqlite_vec_on_session correctly handles the per-connection vec0 loading constraint.
All four derived table cleanups are in place: search_index, search_vector_embeddings, search_vector_chunks, ORM delete.
Both previously open items (logger.debug calls and NoResultFound coverage) are closed.
Multi-project search_notes fan-out is well-implemented and thoroughly tested.

The only deferred item — orphaned search_vector_embeddings in the one-time migration — is a minor cleanliness gap, not a correctness regression.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8c11b843d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T21:13:04Z

+            WHERE entity_id IS NOT NULL
+              AND entity_id NOT IN (SELECT id FROM entity)


Purge reused-ID stale rows during orphan migration

Tighten the orphan predicates so they verify row ownership, not just ID existence. The current cleanup deletes rows only when project_id or entity_id is missing, so stale search_index/search_vector_chunks rows survive whenever both IDs have already been reused by newer records (a case SQLite rowid reuse allows, and the migration itself is addressing). In that scenario the leaked rows still satisfy both NOT IN checks and remain searchable under the new project, so the migration can leave cross-project contamination behind after upgrade.

Useful? React with 👍 / 👎.

Promotes the Unreleased breaking-change note and adds the full v0.21.0 section covering ~80 commits since v0.20.3: workspace-routing fixes across MCP/CLI/API, recent_activity ordering and search opt-in changes, sync hardening, sqlite-vec graceful degrade, perf wins on CLI startup and sync, and the project-delete cleanup landed in #832. Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

phernandez merged commit 9e3fe26 into main May 16, 2026
27 checks passed

phernandez deleted the fix/sqlite-search-index-orphan-cleanup branch May 16, 2026 21:47

		WHERE entity_id IS NOT NULL
		AND entity_id NOT IN (SELECT id FROM entity)

Conversation

phernandez commented May 16, 2026

Summary

How it surfaced

Test plan

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(core): purge SQLite search_index on project delete

Summary

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Minor Observation: entity_id IS NOT NULL guard inconsistency in migration

Verdict

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(core): purge SQLite search_index on project delete

Summary

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Verdict

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(core): purge SQLite search_index on project delete

Summary

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Verdict

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(core): purge SQLite search_index on project delete

Summary

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Verdict

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

github-actions Bot commented May 16, 2026 •

edited

Loading

Minor Observation: `entity_id IS NOT NULL` guard inconsistency in migration

github-actions Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading

github-actions Bot commented May 16, 2026 •

edited

Loading