Skip to content

Add staged background ingest promotion#34

Merged
brianmeyer merged 1 commit into
masterfrom
codex/rec-78-background-ingest-consistency
May 17, 2026
Merged

Add staged background ingest promotion#34
brianmeyer merged 1 commit into
masterfrom
codex/rec-78-background-ingest-consistency

Conversation

@brianmeyer
Copy link
Copy Markdown
Owner

Summary

  • stage document, video, audio, and conversation reindexes as hidden batches before promotion
  • filter hidden rows from search, memory listing/get, graph navigation, and visible counts
  • update docs and add regression coverage for hidden batches and conversation child replacement

Validation

  • python3 -m compileall -q src tests
  • python3 -m pytest -q
  • bash tests/uat/test_mcp_server.sh
  • .venv/bin/python -m pip wheel . -w /tmp/recallforge-wheel-rec78
  • PYTHONPATH=/tmp/recallforge-twine .venv/bin/python -m twine check /tmp/recallforge-wheel-rec78/recallforge-0.3.0-py3-none-any.whl

@brianmeyer brianmeyer merged commit 13334eb into master May 17, 2026
4 checks passed
@brianmeyer brianmeyer deleted the codex/rec-78-background-ingest-consistency branch May 17, 2026 19:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce589e1e65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +558 to +560
except Exception:
self._backend.delete_index_batch(batch_id)
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make batch-failure cleanup safe after partial promotion

This handler always calls delete_index_batch when promote_index_batch raises, but promotion itself updates visibility across multiple tables in sequence (activate new rows, then deactivate old rows). If an error happens mid-promotion (for example after old embeddings are deactivated but before all tables finish), this cleanup can delete the newly activated batch and leave part of the old memory already hidden, producing an inconsistent or unreadable memory state. The failure path should either be transactional/rollback-safe or avoid deleting staged rows once promotion has started mutating active rows.

Useful? React with 👍 / 👎.

Comment on lines +2747 to +2751
self._embeddings_table.search()
.where(active_row_filter())
.select(["hash_seq"])
.limit(10_000_000)
.to_list()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid capped in-memory scans for count APIs

This count implementation pulls row IDs into Python and uses len(...) with a hard .limit(10_000_000), which underreports any store larger than 10M rows and can cause large memory/latency spikes before falling into the broad except path. That turns counting into an O(N) materialization step and can return incorrect 0 on heavy datasets; counting should stay backend-side (filtered count/aggregate) without a hard truncation cap.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant