Skip to content

fix(core): purge SQLite search_index on project delete#832

Merged
phernandez merged 6 commits into
mainfrom
fix/sqlite-search-index-orphan-cleanup
May 16, 2026
Merged

fix(core): purge SQLite search_index on project delete#832
phernandez merged 6 commits into
mainfrom
fix/sqlite-search-index-orphan-cleanup

Conversation

@phernandez
Copy link
Copy Markdown
Member

Summary

  • SQLite's search_index is an FTS5 virtual table and can't carry an FK, so the ON DELETE CASCADE from search_index.project_id → project.id only runs on Postgres. On SQLite, deleting a project left its FTS rows behind — and when auto-increment reused the id for a new project, those leftover rows surfaced as the new tenant's data and leaked into search_notes(project=...) results.
  • ProjectRepository.delete now explicitly purges search_index and search_vector_chunks for the project id in the same session before the ORM delete. Idempotent on Postgres (the cascade FK still runs).
  • New migration (n7i8j9k0l1m2) does a one-time sweep for existing installs that already have orphans. Catches both shapes: rows whose project_id is gone, and the larger class of rows whose entity_id is gone (from id reuse). Guarded by table-existence checks so fresh SQLite installs — where search_index is created at runtime by init_search_index, not by Alembic — don't fail the upgrade.

How it surfaced

search_notes(project="basic-memory-testing-...", query="spec", search_type="title") returned 10 results, all from the main project, with the response header still claiming the test project. Diagnosis: project_id 2 had ~1k FTS rows referencing entity_ids that no longer existed in entity — orphans from a previously-deleted project whose id was reused. The repo's WHERE clause filters by search_index.project_id, so the rows looked legitimate.

Test plan

  • tests/services/test_project_removal_bug.py::test_remove_project_purges_search_rows — seeds both derived tables, calls remove_project, asserts both come out clean. Verified red on pre-fix code (search_index still has 1 rows for deleted project_id=2), green with fix.
  • tests/services/ + tests/repository/ full sweep: 616 passed / 19 skipped / 0 failed.
  • Live MCP roundtrip: after running the cleanup against my local DB, search_notes(project=test_proj, query="spec", search_type="title") returns "No results found"; query="basic" returns only the 5 test-project notes.

SQLite stores search_index as an FTS5 virtual table, which can't carry a
foreign key, so the ON DELETE CASCADE from search_index.project_id to
project.id only applies on Postgres. On SQLite, deleting a project left
its FTS rows behind — and when auto-increment handed the same id to a
new project, the leftover rows masqueraded as the new tenant's data and
leaked into searches scoped to that project.

- ProjectRepository.delete now explicitly purges search_index and
  search_vector_chunks for the project id in the same session before
  the ORM delete. Idempotent on Postgres (the cascade FK still runs).
- One-time cleanup migration sweeps two leftover shapes: rows whose
  project_id is gone, and rows whose entity_id is gone (the larger
  class from id reuse). Guarded by table-existence checks so fresh
  SQLite installs — where search_index is created at runtime by
  init_search_index, not by Alembic — don't fail the upgrade.
- Regression test seeds both derived tables, calls remove_project,
  and asserts both come out clean. Verified red on pre-fix code.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 3m 25s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

Solid, well-scoped fix. The root cause diagnosis is accurate (SQLite FTS5 can't carry FKs → orphan rows survive project deletion → auto-increment id reuse leaks prior tenant's content into new project). The fix is minimal, idempotent on Postgres, and documented in the literate-programming style required by CLAUDE.md. A few minor items worth addressing before merge.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices
100-character line length maintained
No commented-out code blocks

project_repository.py — Missing logger calls

The base Repository.delete emits logger.debug() before and after deletion. The ProjectRepository.delete override drops those calls. This makes debugging harder when tracing delete flows in logs. Minor, but inconsistent with the base class pattern.

# Base class has:
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# Override has: (nothing)

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions
Test coverage for edge cases ⚠️ See below
100% test coverage maintained ❓ Needs verification
Documentation updated

test_project_removal_bug.pyNoResultFound branch not covered

ProjectRepository.delete has a return False path when the project ID doesn't exist (lines 141-143). The new test only exercises the happy path (project exists). The base Repository.delete has the same pattern, but since ProjectRepository overrides it, that branch needs its own coverage.

# project_repository.py:141-143
except NoResultFound:
    return False  # ← is this covered?

Fix this →


Basic Memory Architecture

Item Status
Database changes include Alembic migration
Preserves local-first architecture
Knowledge graph consistency maintained
Atomicity of delete + FTS purge ✅ Single session context
Idempotent on Postgres ✅ Explicit delete + FK cascade = no-op

Execution order is correctsession.delete(project) defers the ORM DELETE to flush time (session exit), so the two raw SQL deletes against search_index and search_vector_chunks run before the project row is removed. On Postgres where the FK cascade exists, the cascade fires after the explicit deletes but finds nothing — correct behavior.

Migration _table_exists guard is correct — FTS5 virtual tables appear in SQLite's sqlite_master and are returned by SQLAlchemy inspector's get_table_names(), so fresh installs where search_index doesn't exist yet will skip the sweep without error.


Security & Performance

Item Status
No hardcoded secrets or credentials
Parameterized queries (no SQL injection)
Proper error handling
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console output
Windows path compatibility ✅ N/A for this change

Minor Observation: entity_id IS NOT NULL guard inconsistency in migration

src/basic_memory/alembic/versions/n7i8j9k0l1m2_cleanup_sqlite_search_orphans.py:61-66

The search_index sweep correctly guards with entity_id IS NOT NULL:

-- search_index sweep:
WHERE entity_id IS NOT NULL AND entity_id NOT IN (SELECT id FROM entity)

-- search_vector_chunks sweep:
WHERE entity_id NOT IN (SELECT id FROM entity)  -- no IS NOT NULL guard

If search_vector_chunks.entity_id is declared NOT NULL at the schema level, this is fine — the subquery SELECT id FROM entity will never return NULLs (PK), so the NOT IN is safe. If entity_id can be NULL in search_vector_chunks, the NOT IN would silently skip those rows (SQL NOT IN returns NULL/false when the value is NULL). Worth confirming against the model definition, but likely not a practical issue.


Verdict

Approve with minor suggestions. The fix is correct, idempotent, and well-tested for the primary scenario. The logger omission and missing NoResultFound coverage are the only items that touch 100% coverage compliance. Everything else is clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a85767ad92

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{"project_id": entity_id},
)
await session.execute(
text("DELETE FROM search_vector_chunks WHERE project_id = :project_id"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard vector-chunk delete when semantic tables are absent

ProjectRepository.delete now unconditionally executes DELETE FROM search_vector_chunks, but on SQLite that table is only created when semantic search is enabled (SQLiteSearchRepository.init_search_index calls _ensure_vector_tables() only under that flag). For installs where semantic dependencies are missing or disabled, deleting any project will raise no such table: search_vector_chunks and abort the removal flow. Add a table-existence/dialect guard (similar to the migration) before issuing this delete.

Useful? React with 👍 / 👎.

search_index is created lazily by SearchRepository.init_search_index and
search_vector_chunks only materializes once semantic search initializes,
so either table may be absent on minimal test DBs. The previous version
of ProjectRepository.delete unconditionally issued DELETEs against both
and crashed CLI integration tests on Postgres and SQLite where
search_vector_chunks hadn't been created yet:

  relation "search_vector_chunks" does not exist
  no such table: search_vector_chunks

Inspect the connection's tables once per call and only delete from
whichever derived tables are present. Idempotent on every backend.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 1m 55s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

The second commit (1768de8) addresses the Codex P1 issue: ProjectRepository.delete now uses sa_inspect to check for table existence before issuing the DELETE statements, so installs where search_vector_chunks hasn't been initialized yet won't break on project removal. Two minor items from the previous review are still open.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (run_sync for sync inspect in async context)
100-character line length maintained
No commented-out code blocks

Still missing: logger.debug calls in ProjectRepository.delete

The base Repository.delete (repository.py:337,346) emits debug logs at the start and end of every delete. The override drops both:

# base class — repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# ProjectRepository.delete — neither call is present

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions
Test coverage for happy path
100% test coverage maintained ⚠️ See below
Documentation / docstrings

NoResultFound branch still uncovered

ProjectRepository.delete returns False when the project doesn't exist (line 148), but there is no test exercising that path. The previous review flagged this; it's still open.

# project_repository.py:147-148
except NoResultFound:
    return False  # ← no test

Fix this →


Basic Memory Architecture

Item Status
Alembic migration included
Local-first architecture preserved
Atomicity: FTS purge + ORM delete in one session
Idempotent on Postgres
Codex P1 resolved (table-existence guard in delete) ✅ Commit 1768de8

The table-existence check (sa_inspect(connection).get_table_names()) is the right approach and matches the migration's own _table_exists guard. FTS5 virtual tables appear in sqlite_master and are returned by the inspector, so the guard correctly handles both the absent-table case and the present-but-empty case.

Migration entity_id IS NOT NULL consistency — confirmed non-issue

The previous review flagged an inconsistency: the search_index sweep guards with entity_id IS NOT NULL AND entity_id NOT IN (...), while the search_vector_chunks sweep omits the guard. Verified against the DDL in models/search.py: entity_id INTEGER NOT NULL on both Postgres and SQLite schemas. Since entity_id can never be NULL, the NOT IN subquery is safe without the guard.


Security & Performance

Item Status
No hardcoded secrets or credentials
Parameterized queries (:project_id binding, no SQL injection)
Proper error handling
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console/log output
Windows path compatibility ✅ N/A for this change

Verdict

Two items remain from the previous review — both are minor but affect 100% coverage compliance:

  1. Add logger.debug calls to ProjectRepository.delete to match the base class pattern.
  2. Add a test for ProjectRepository.delete returning False (nonexistent project id).

The Codex P1 issue (unconditional delete against a potentially absent table) is resolved by commit 1768de8. Everything else is clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1768de8571

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

existing_tables = await session.run_sync(
lambda sync_session: set(sa_inspect(sync_session.connection()).get_table_names())
)
for table in ("search_index", "search_vector_chunks"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Purge sqlite-vec embeddings during project delete

When semantic search is enabled on SQLite, this purge removes search_vector_chunks but leaves search_vector_embeddings rows behind for the deleted chunks. Those orphan embeddings are still scanned by _run_vector_query (which fetches top-k from search_vector_embeddings before joining chunks), so enough stale vectors can consume candidate slots and reduce or eliminate semantic results for active projects after repeated project deletions. Include an embeddings cleanup step (rowids matching deleted chunk ids) in the same transaction.

Useful? React with 👍 / 👎.

Two follow-up fixes on the same branch:

1. **Purge sqlite-vec embeddings during project delete** (Codex P2)
   sqlite-vec stores vectors in a vec0 virtual table keyed by chunk rowid
   with no cascade. The previous purge removed search_vector_chunks but
   left the embeddings behind; `_run_vector_query` then keeps returning
   stale vectors that crowd live results.

   ProjectRepository.delete now deletes embeddings first (using the same
   rowid-IN-chunks pattern as SQLiteSearchRepository.delete_project_vector_rows),
   then the chunk rows. Both deletes are skipped if the underlying table
   is absent on a given install. New test test_remove_project_purges_vector_embeddings
   covers the happy path and skips cleanly when the embeddings table
   isn't initialized.

2. **Fix search_all_projects=True on local installs**
   `_search_all_projects` recurses into search_notes with both project=
   and project_id= set. project_id (external UUID) routes through the
   cloud v2 API path, which 401s on local installs because there's no
   JWT to present — so the inner calls silently failed and the merged
   result list stayed empty.

   The fan-out now mirrors get_project_client's cloud_available composite
   (factory mode OR explicit --cloud OR has_cloud_credentials). When that
   composite is false we forward project= only and take the name-routed
   local-ASGI path. Cloud disambiguation still works because the project
   name in project_ref is already the workspace/project qualified_name.

   The existing cloud-style fan-out tests now go through a cloud_routing
   fixture that pins the three signals; a new local_routing test confirms
   project_id is dropped when no cloud route is available.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 2m 21s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

Commit 2bd6552 ("address PR review feedback — P2 + multi-project search") finishes the last two Codex issues and adds a substantial multi-project search_notes feature. Two items flagged in the previous two reviews are still open.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (run_sync for sync inspect in async)
100-character line length maintained
No commented-out code blocks
Literate-programming comments (Trigger / Why / Outcome)

Still missing: logger.debug calls in ProjectRepository.delete (third review)

The base Repository.delete (repository.py:337,346) emits two debug log lines. The override still has no logger import and no debug calls:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

# project_repository.py — no logger import, no debug calls

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions/methods
Multi-project search tests (5 cases, cloud + local routing)
Vector embeddings cleanup test (test_remove_project_purges_vector_embeddings) ✅ Resolved P2
NoResultFound branch covered ❌ See below
100% coverage maintained ⚠️ Blocked by above

NoResultFound branch still uncovered (third review)

ProjectRepository.delete returns False at line 153 when the project id doesn't exist, but no test exercises this path. The base class has its own test for this, but since the override skips super().delete(), that coverage doesn't apply here.

# project_repository.py:152-153
except NoResultFound:
    return False  # ← still no test

Fix this →


Basic Memory Architecture

Item Status
Alembic migration included
Local-first architecture preserved
Atomicity: FTS purge + embeddings purge + ORM delete in one session
Idempotent on Postgres (FK cascade still runs after explicit DELETE)
Codex P1 resolved (table-existence guard via sa_inspect)
Codex P2 resolved (search_vector_embeddings swept before chunks) ✅ Commit 2bd6552
Multi-project search_all_projects fan-out
Cloud routing composite correctly mirrors get_project_client

Ordering in ProjectRepository.delete is correct: embeddings deleted first (by chunk rowid) while chunk rows still exist, then chunks deleted, then project ORM-deleted at session flush. The vec0 table has no cascade, so this explicit ordering is required.

_search_all_projects cloud-routing guard is correct: the use_cloud_routing composite (is_factory_mode OR (_explicit_routing AND NOT _force_local_mode) OR has_cloud_credentials) mirrors get_project_client's routing logic. Forwarding project_id (external UUID) only on cloud routes prevents 401s on local installs.

Minor gap: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps orphans from search_index and search_vector_chunks, but not from search_vector_embeddings. After the migration purges orphaned chunks, their corresponding embeddings in the vec0 table remain behind. Because _run_vector_query joins on chunk rowid before filtering by project, orphan embeddings that lack matching chunks won't produce incorrect results — but they do consume top-k candidate slots and reduce effective search capacity over time on installs with semantic search. Adding a vec0-compatible DELETE in the migration would be the complete fix, though vec0 tables have non-standard SQL constraints that may require dialect-specific handling.


Security & Performance

Item Status
No hardcoded secrets or credentials
All SQL uses parameterized bindings (:project_id, :pid, etc.)
Proper error handling in multi-project fan-out (per-project try/except)
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console or log output
Windows path compatibility ✅ N/A for this change

Verdict

Two items remain open across three review cycles — both affect 100% coverage compliance:

  1. Add from loguru import logger and two logger.debug calls to ProjectRepository.delete to match the base class.
  2. Add a test for the NoResultFound → return False branch at project_repository.py:153.

The Codex P1 and P2 issues are fully resolved. The multi-project search feature is well-implemented and thoroughly tested. Everything else is clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bd6552e84

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +171 to +173
"DELETE FROM search_vector_embeddings WHERE rowid IN ("
"SELECT id FROM search_vector_chunks "
"WHERE project_id = :project_id)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate sqlite-vec rowid delete to SQLite backends

The new embedding purge issues DELETE ... WHERE rowid IN (...), which is SQLite-specific; on Postgres, search_vector_embeddings uses chunk_id (not rowid), so deleting any project on a semantic-enabled Postgres install will raise a SQL error and abort ProjectRepository.delete. This path is reached whenever both vector tables exist, so project deletion regresses in that environment unless the delete is dialect-gated or uses the Postgres schema.

Useful? React with 👍 / 👎.

Comment on lines +78 to +80
DELETE FROM search_vector_chunks
WHERE entity_id NOT IN (SELECT id FROM entity)
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove orphaned embeddings in SQLite orphan cleanup migration

This migration deletes orphaned search_vector_chunks rows but never removes the corresponding search_vector_embeddings entries. In SQLite vector search, top-k candidates are pulled from search_vector_embeddings before joining chunks, so leftover embeddings from rows deleted here can still consume candidate slots and reduce semantic recall after upgrade. The migration should delete embeddings for chunk ids being removed in the same upgrade step.

Useful? React with 👍 / 👎.

The previous fix issued an unconditional
`DELETE FROM search_vector_embeddings WHERE rowid IN (...)` during project
deletion, which 500'd on Postgres:

  column "rowid" does not exist

Postgres uses `chunk_id` (a real column with an FK to
search_vector_chunks.id ON DELETE CASCADE), so it doesn't need or accept
the SQLite query at all. The cascade picture by backend:

  - search_index → project: Postgres has FK CASCADE; SQLite FTS5 virtual
    table can't carry FKs and needs explicit cleanup.
  - search_vector_chunks → project: neither backend has an FK, so both
    need an explicit DELETE.
  - search_vector_embeddings → search_vector_chunks: Postgres has FK
    CASCADE on chunk_id; SQLite vec0 virtual table is keyed by rowid
    with no cascade, so embeddings must be purged before the chunks.

Branch is now: chunks DELETE runs on both backends; search_index and
the vec0 embeddings DELETE only run on SQLite.

Verified locally against both backends (SQLite: 31 passed, Postgres
via testcontainers: 26 passed) for tests/services/test_project_removal_bug.py
and the CLI/MCP project-management integration suites.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 1m 56s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

Commit d3619f9 ("dialect-gate sqlite-vec embeddings purge") resolves the Codex P1 regression — the rowid-based embedding delete is now correctly gated to SQLite only, so Postgres project deletion no longer fails with a SQL error. Two items flagged across four consecutive review cycles remain open.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (run_sync for sync inspect in async)
100-character line length maintained
No commented-out code blocks
Literate-programming comments (Trigger / Why / Outcome)

Still missing: logger.debug calls in ProjectRepository.delete (fourth review)

src/basic_memory/repository/project_repository.py has no from loguru import logger import and no debug log calls. The base Repository.delete (repository.py:337,346) emits two lines:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# repository.py:346
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

The override emits nothing. Add from loguru import logger to the imports and two matching calls.

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions/methods
Multi-project search tests (5 cases, cloud + local routing)
Vector embeddings cleanup test (test_remove_project_purges_vector_embeddings)
Happy-path search row purge test
NoResultFound branch covered ❌ See below
100% coverage maintained ⚠️ Blocked by above

NoResultFound branch still uncovered (fourth review)

ProjectRepository.delete returns False at line 153 when the project id doesn't exist. There is still no test for this path. The base class has its own test for this pattern, but since the override calls scalars().one() directly (bypassing super().delete()), that coverage does not transfer.

# project_repository.py:152-154
except NoResultFound:
    return False  # ← no test in any test file

Fix this →


Basic Memory Architecture

Item Status
Alembic migration included
Local-first architecture preserved
Atomicity: FTS purge + embeddings purge + ORM delete in one session
Idempotent on Postgres (FK cascade still runs after explicit DELETE)
Codex P1 resolved (rowid delete gated to SQLite via is_sqlite flag) ✅ Commit d3619f9
Codex P2: orphan embeddings not swept in migration ⚠️ Minor — see below

d3619f9 deletion ordering is correct: SQLite path deletes embeddings by chunk rowid while chunk rows still exist (project_repository.py:177-185), then deletes chunks (project_repository.py:186-189). The vec0 table has no cascade, so this ordering is required and correctly implemented.

Minor: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps search_index and search_vector_chunks orphans but leaves any corresponding search_vector_embeddings rows behind. Orphan embeddings whose chunk rows are deleted won't produce wrong results (the join in _run_vector_query filters them out), but they do consume top-k candidate slots on installs with semantic search enabled, reducing effective recall over time. This is not a correctness regression — it's a cleanliness gap in the one-time upgrade sweep. Acceptable to defer to a follow-up, but worth noting.


Security & Performance

Item Status
No hardcoded secrets or credentials
All SQL uses parameterized bindings (:project_id, :pid, etc.)
Error handling in multi-project fan-out (per-project try/except + logger.warning)
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console or log output
Windows path compatibility ✅ N/A for this change

Verdict

Two items remain open across four review cycles — both affect 100% coverage compliance:

  1. Add from loguru import logger and two logger.debug calls to ProjectRepository.delete to match the base class (repository.py:337,346).
  2. Add a test for ProjectRepository.delete returning False when called with a nonexistent project id (covers the NoResultFound branch at project_repository.py:153).

The Codex P1 dialect-gate regression is fully resolved by d3619f9. The multi-project search feature and the vector embeddings purge are both correct and well-tested. Everything else is clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3619f97f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +69 to +80
if _table_exists(connection, "search_vector_chunks"):
op.execute(
"""
DELETE FROM search_vector_chunks
WHERE project_id NOT IN (SELECT id FROM project)
"""
)
op.execute(
"""
DELETE FROM search_vector_chunks
WHERE entity_id NOT IN (SELECT id FROM entity)
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Purge sqlite-vec embeddings during orphan migration

The migration deletes orphaned rows from search_vector_chunks but never removes matching rows from search_vector_embeddings. On SQLite semantic-search installs, vec0 candidates are read from search_vector_embeddings first, so stale embeddings for deleted chunks can still consume top-k slots and reduce semantic recall after upgrade. Add an embeddings cleanup step keyed by the chunk ids being removed in this migration.

Useful? React with 👍 / 👎.

On Windows SQLite CI the embeddings cleanup hits

  sqlite3.OperationalError: no such module: vec0
  [SQL: DELETE FROM search_vector_embeddings WHERE rowid IN (...)]

because the sqlite-vec extension isn't loaded into the connection
(some Windows Python builds don't expose enable_load_extension on
sqlite3.Connection — see #711). The vec0 virtual table is registered
in sqlite_master, so the table-existence check passes, but any access
to it fails until the module is loaded.

If vec0 isn't loadable, semantic search was never able to write
embeddings, so there's nothing to clean up. Wrap the embeddings DELETE
in a try/except that swallows OperationalError for vec0 and logs a
debug line — the chunk DELETE below still runs.

The KeyError from test_mcp_sse_forces_local on 3.14 SQLite is an
unrelated transient — same code passes on 3.12/3.13 SQLite, Postgres
3.14, and locally on Python 3.14. Will let CI re-run after this push.

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 1m 37s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

Commit 43c50ea ("tolerate missing vec0 module during project delete") adds OperationalError handling in ProjectRepository.delete so that environments where the vec0 SQLite extension isn't loaded don't fail project deletion — the purge is skipped with a logger.debug call and execution continues normally. That's the correct approach and the comment block explains the constraint well.

Two items flagged across five consecutive review cycles remain open.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (run_sync for sync inspect in async)
100-character line length maintained
No commented-out code blocks
Literate-programming comments (Trigger / Why / Outcome)

Still missing: logger.debug calls in ProjectRepository.delete (fifth review)

from loguru import logger is now imported (project_repository.py:7), which resolves the import gap flagged last cycle. However, the delete method still emits no debug logs at its entry and exit points. The base Repository.delete (repository.py:337,346) wraps every deletion with:

# repository.py:337
logger.debug(f"Deleting {self.Model.__name__}: {entity_id}")
# ... delete logic ...
logger.debug(f"Deleted {self.Model.__name__}: {entity_id}")

The override has a logger.debug call for the vec0 skip case (line 200) but nothing at the broader method boundary. Add two calls to match the base class:

async def delete(self, entity_id: int) -> bool:
    logger.debug(f"Deleting Project and search rows for project_id: {entity_id}")
    async with db.scoped_session(self.session_maker) as session:
        ...
        await session.delete(project)
        logger.debug(f"Deleted Project and search rows for project_id: {entity_id}")
        return True

Fix this →


Testing & Documentation

Item Status
Unit tests for new functions/methods
Multi-project search tests (5 cases, cloud + local routing)
Vector embeddings cleanup test (test_remove_project_purges_vector_embeddings)
vec0 missing extension path tested ✅ Commit 43c50ea
NoResultFound branch covered ❌ See below
100% coverage maintained ⚠️ Blocked by above

NoResultFound branch still uncovered (fifth review)

ProjectRepository.delete returns False at line 155 when the project id doesn't exist. test_project_removal_bug.py now has three tests — the happy path, vector embeddings cleanup, and FTS row purge — but none hit the except NoResultFound path:

# project_repository.py:154-155
except NoResultFound:
    return False  # ← no test across all five review cycles

Adding one focused async test would close this:

@pytest.mark.asyncio
async def test_remove_project_delete_returns_false_for_missing_id(project_service):
    result = await project_service.repository.delete(9_999_999)
    assert result is False

Fix this →


Basic Memory Architecture

Item Status
Alembic migration included
Local-first architecture preserved
Atomicity: FTS purge + embeddings purge + ORM delete in one session
Idempotent on Postgres (FK cascade still runs after explicit DELETE)
Codex P1 resolved (table-existence guard via sa_inspect)
Codex P2 resolved (search_vector_embeddings swept before chunks)
Codex P1 dialect-gate regression resolved (rowid delete gated to SQLite)
vec0 missing extension tolerance resolved ✅ Commit 43c50ea

The OperationalError guard at lines 188-203 is correct: checking "vec0" not in str(exc) re-raises genuine SQL errors while swallowing only the extension-not-loaded case. The comment block follows the literate-programming style required by CLAUDE.md.


Security & Performance

Item Status
No hardcoded secrets or credentials
All SQL uses parameterized bindings (:project_id, etc.)
Proper error handling in multi-project fan-out (per-project try/except + logger.warning)
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console or log output
Windows path compatibility ✅ N/A for this change

Verdict

Two items remain open across five review cycles — both affect 100% coverage compliance:

  1. Add logger.debug calls at the start and end of ProjectRepository.delete to match the base class pattern (repository.py:337,346). The loguru import is already in place.
  2. Add a one-line async test for ProjectRepository.delete returning False when called with a nonexistent project id (covers the NoResultFound branch at project_repository.py:155).

All Codex P1/P2 issues, the multi-project search feature, and the new vec0 tolerance are correct and well-implemented. Everything else is clean.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43c50ea683

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +188 to +199
except OperationalError as exc:
# Trigger: the vec0 SQLite extension isn't loaded into this
# connection — common on Windows builds where
# enable_load_extension is unavailable (#711).
# Why: the embeddings table is registered as a vec0 virtual
# table, so any access (even a DELETE) needs the module
# loaded. If it isn't, no row was ever inserted either, so
# there's nothing to leak.
# Outcome: log and continue — the chunk DELETE below still
# runs, and the absence of vec0 means no orphan vectors.
if "vec0" not in str(exc):
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ensure sqlite-vec is loaded before skipping embedding purge

ProjectRepository.delete now swallows OperationalError when deleting from search_vector_embeddings fails with vec0, but sqlite_search_repository._ensure_sqlite_vec_loaded explicitly documents that sqlite-vec is loaded per connection (not globally), so this error can happen on a different pooled connection even when embeddings already exist. In that case project deletion leaves orphan vectors behind, and later semantic searches on a vec-enabled connection can still consume those stale candidates. Please load sqlite-vec for this session (or fail the delete) instead of treating every vec0 error as “nothing to purge.”

Useful? React with 👍 / 👎.

Three follow-ups from PR review:

**Codex P2 — Load sqlite-vec on the delete session.** The previous
revision swallowed every `vec0` OperationalError as "no embeddings
exist", but sqlite-vec is loaded **per connection**: a pooled
connection that hosts ProjectRepository.delete may not have vec0
loaded even when another connection successfully wrote embeddings.
That would silently leave orphan vectors behind.

The new `_load_sqlite_vec_on_session` helper mirrors
SQLiteSearchRepository._ensure_sqlite_vec_loaded as a free function
and tries to load the extension on the current session. Only when the
load itself fails — because the Python build lacks
enable_load_extension, or the sqlite_vec package isn't installed — do
we skip the embeddings DELETE. Every connection in the pool shares the
same interpreter, so in that case no embeddings could have been
written from any connection and skipping is safe.

**Claude review — Missing logger.debug calls.** The override now logs
at entry, when the project id isn't found, and after the ORM delete,
matching the base Repository.delete contract.

**Claude review — NoResultFound branch uncovered.** New test
`test_delete_returns_false_for_missing_project_id` asserts the False
return for a nonexistent project id.

Verified locally: 32 passed / 1 skipped (SQLite), 27 passed / 1
skipped (Postgres via testcontainers).

Signed-off-by: phernandez <paul@basicmachines.co>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @phernandez's task in 1m 36s —— View job


PR Review: fix(core): purge SQLite search_index on project delete

  • Gather context and read changed files
  • Review code quality & standards
  • Review testing & documentation
  • Review Basic Memory architecture
  • Review security & performance
  • Review compatibility
  • Post final summary

Summary

Commit e8c11b8 ("load sqlite-vec before embeddings purge; add coverage") closes all items from the previous five review cycles. Both of the long-standing open issues are resolved, and the Codex P2 feedback about per-connection extension loading is addressed properly.


Code Quality & Standards

Item Status
Follows Basic Memory coding conventions (CLAUDE.md)
Python 3.12+ type annotations and async patterns
SQLAlchemy 2.0 best practices (run_sync for sync inspect in async)
100-character line length maintained
No commented-out code blocks
Literate-programming comments (Trigger / Why / Outcome)

logger.debug calls resolved (sixth cycle — closed)

project_repository.py:207 and project_repository.py:259 now emit matching debug lines at method entry and exit, consistent with the base Repository.delete pattern. The NoResultFound path also logs at line 215 before returning False.

_load_sqlite_vec_on_session helper is well-designed

The new helper (lines 17–73) explicitly loads vec0 on the same session's connection before the embeddings purge — addressing the Codex P2 concern that a pooled connection different from the one that wrote embeddings might not have the extension loaded. The docstring explains the per-connection constraint and mirrors the rationale in SQLiteSearchRepository._ensure_sqlite_vec_loaded. The three failure modes (package absent, enable_load_extension unavailable, load failure) each log at the appropriate level and return False so the purge step is safely skipped rather than failing the whole delete.


Testing & Documentation

Item Status
Unit tests for new functions/methods
Multi-project search tests (5 cases, cloud + local routing)
Vector embeddings cleanup test
NoResultFound branch covered Resolved
100% coverage maintained

test_delete_returns_false_for_missing_project_id (lines 258–266) — closed

await project_service.repository.delete(9_999_999) + assert result is False covers the except NoResultFound: return False branch at project_repository.py:215. All five tests now pass.


Basic Memory Architecture

Item Status
Alembic migration included
Local-first architecture preserved
Atomicity: FTS purge + embeddings purge + ORM delete in one session
Idempotent on Postgres (FK cascade still runs after explicit DELETE)
Codex P1: table-existence guard via sa_inspect
Codex P2: search_vector_embeddings swept before chunks
Codex P1 dialect-gate regression: rowid delete gated to SQLite
vec0 missing extension tolerance: OperationalError swallowed correctly
Codex P2: vec0 loaded on this session's connection before purge ✅ Commit e8c11b8

Ordering is still correct: _load_sqlite_vec_on_session → embeddings deleted by chunk rowid → chunks deleted → ORM delete on session flush. The vec0 table has no cascade, so this explicit ordering is required and correctly implemented.

Remaining minor gap: migration doesn't sweep orphaned search_vector_embeddings

n7i8j9k0l1m2 sweeps search_index and search_vector_chunks orphans but leaves corresponding search_vector_embeddings rows behind. As noted in previous reviews, orphan embeddings don't produce wrong results (the join in _run_vector_query filters them out), but they consume top-k candidate slots on semantic-search installs. This is acceptable to defer — the upgrade sweep is a best-effort cleanup, not a correctness requirement, and the per-project delete path now handles this correctly going forward.


Security & Performance

Item Status
No hardcoded secrets or credentials
All SQL uses parameterized bindings (:project_id, etc.)
Error handling in multi-project fan-out (per-project try/except + logger.warning)
No sensitive data in logs

Compatibility

Item Status
No emoji/unicode in console or log output
Windows path compatibility ✅ N/A for this change

Verdict

All issues across six review cycles are resolved. The PR is correct, complete, and ready to merge.

  • Root-cause fix: SQLite FTS5 orphan rows are now purged on project delete (idempotent on Postgres).
  • _load_sqlite_vec_on_session correctly handles the per-connection vec0 loading constraint.
  • All four derived table cleanups are in place: search_index, search_vector_embeddings, search_vector_chunks, ORM delete.
  • Both previously open items (logger.debug calls and NoResultFound coverage) are closed.
  • Multi-project search_notes fan-out is well-implemented and thoroughly tested.

The only deferred item — orphaned search_vector_embeddings in the one-time migration — is a minor cleanliness gap, not a correctness regression.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8c11b843d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +64 to +65
WHERE entity_id IS NOT NULL
AND entity_id NOT IN (SELECT id FROM entity)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Purge reused-ID stale rows during orphan migration

Tighten the orphan predicates so they verify row ownership, not just ID existence. The current cleanup deletes rows only when project_id or entity_id is missing, so stale search_index/search_vector_chunks rows survive whenever both IDs have already been reused by newer records (a case SQLite rowid reuse allows, and the migration itself is addressing). In that scenario the leaked rows still satisfy both NOT IN checks and remain searchable under the new project, so the migration can leave cross-project contamination behind after upgrade.

Useful? React with 👍 / 👎.

@phernandez phernandez merged commit 9e3fe26 into main May 16, 2026
27 checks passed
@phernandez phernandez deleted the fix/sqlite-search-index-orphan-cleanup branch May 16, 2026 21:47
phernandez added a commit that referenced this pull request May 16, 2026
Promotes the Unreleased breaking-change note and adds the full v0.21.0
section covering ~80 commits since v0.20.3: workspace-routing fixes
across MCP/CLI/API, recent_activity ordering and search opt-in changes,
sync hardening, sqlite-vec graceful degrade, perf wins on CLI startup
and sync, and the project-delete cleanup landed in #832.

Signed-off-by: phernandez <paul@basicmachines.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant