feat: auto-reindex search/tags/qa after ingest#10
Merged
Conversation
run_ingest now refreshes the FTS5 search index, tag index, and Q&A
index for every project that gained new messages, so users no longer
have to POST to /api/{search,tags,qa}/reindex after new sessions land.
Per-project re-index, not full reindex_all — only touched slugs are
touched. Each service is invoked in its own try/except so a beta
service failure (tags/qa) cannot break ingest, and search itself
fails soft. Gated by a new auto_reindex_on_ingest setting (default
True, env AUTO_REINDEX_ON_INGEST) for opt-out.
The hook lives in run_ingest itself (single source of truth) so all
four call sites — server.py startup, cli.py reindex, two routes/data.py
refresh handlers — pick it up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pyright flagged the generator-yielding fixture as returning Connection when it actually yields one. Use Iterator[sqlite3.Connection] to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07
added a commit
that referenced
this pull request
May 14, 2026
0bserver07
added a commit
that referenced
this pull request
May 14, 2026
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
* feat(ingest): auto-reindex search/tags/qa after ingest
run_ingest now refreshes the FTS5 search index, tag index, and Q&A
index for every project that gained new messages, so users no longer
have to POST to /api/{search,tags,qa}/reindex after new sessions land.
Per-project re-index, not full reindex_all — only touched slugs are
touched. Each service is invoked in its own try/except so a beta
service failure (tags/qa) cannot break ingest, and search itself
fails soft. Gated by a new auto_reindex_on_ingest setting (default
True, env AUTO_REINDEX_ON_INGEST) for opt-out.
The hook lives in run_ingest itself (single source of truth) so all
four call sites — server.py startup, cli.py reindex, two routes/data.py
refresh handlers — pick it up automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(ingest): annotate conn fixture as Iterator[Connection]
Pyright flagged the generator-yielding fixture as returning Connection
when it actually yields one. Use Iterator[sqlite3.Connection] to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
…use-embeddings) Closes HANDOFF follow-up #10. Adds an opt-in semantic-search mode to search-past-decisions (and its MCP counterpart) via local sentence-transformers embeddings. The substring filter still runs first; --use-embeddings only re-ranks the candidate set, never widens it. sentence-transformers lands as an optional `stackunderflow[embeddings]` extra — a user who never flags semantic mode pays zero (no torch import, no model load). - New optional extra in pyproject.toml: `embeddings = ["sentence-transformers>=2.7,<3"]`. - Default model `sentence-transformers/all-MiniLM-L6-v2` (90 MB, 384-dim); override via STACKUNDERFLOW_EMBED_MODEL or --embed-model. Process-wide cache so a single agent invocation only loads the model once. - New `stackunderflow/services/discovery_embeddings.py`: pull-through cache (compute_or_load), query embed, cosine scoring. Lazy imports throughout; MissingEmbeddingsDependencyError surfaces a clean install hint when the extra isn't installed. - Re-ranking math: cosine (dot product over unit-normalised vectors) mapped to [0, 1]. Replaces the LIKE-density relevance term in pack_within_budget; recency + cost weights unchanged. - Migration v014: additive `discovery_embeddings` table keyed on (session_id, message_id, model_name), IF NOT EXISTS-guarded, with indexes on session_id and (message_id, model_name). schema.CURRENT_VERSION bumps to 14. - CLI: --use-embeddings + --embed-model flags on search-past-decisions. JSON output gains embedding_score: float in [0, 1]; text appends cos=X.XX to the headline. Substring-mode JSON shape unchanged (embedding_score omitted when None). - MCP: search_past_decisions gains use_embeddings: bool + embed_model: str | None args. Missing-dep error surfaces as a ValueError (with install hint) for clean JSON-RPC failure. - 44 new tests: 21 in test_discovery_embeddings (cache hit/miss/mix, cosine math, dim mismatch, missing dep), 7 in test_search_past_decisions_embeddings (re-rank changes order, doesn't widen, surfaces score, propagates ImportError), 12 in test_migration_v014 (shape, PK, indexes, idempotency, partial-apply recovery), 4 in test_discovery_cli (JSON score, text cos=, missing-dep exit, no-load-when-off). Stubbed via _DeterministicStub / _OrderedStub classes — the real 90 MB MiniLM model never loads during tests. - Docs: cli-reference.md, mcp.md, skills.md updated. CHANGELOG entry under [Unreleased]. Tests: 2399 pass (baseline 2355 + 44). Ruff baseline preserved (37). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Search, tag, and Q&A indexes go stale every time new sessions land via
run_ingest. Today users have to manually POST to/api/{search,tags,qa}/reindexafter every ingest — surprising and easy to forget.Change
auto_reindex_touched()helper instackunderflow/ingest/__init__.pyruns afterrun_ingestfinishes.index_project), not fullreindex_all. Only touches projects whose pre→post message count actually grew.auto_reindex_on_ingestsetting (defaultTrue, envAUTO_REINDEX_ON_INGEST) for opt-out.UNIQUE(provider, slug)correctly — same slug across Claude+Codex maps to one logical project.Evidence
tests/stackunderflow/ingest/test_auto_reindex.py(touched-project, no-op when no new messages, fail-soft, opt-out, real-SearchService end-to-end).pytest tests/ -q: 425 passed, 2 skipped (baseline 420).-Users-yadkonrad-dev-dev-year25-dec25-ClaudeCodePlugins(193 records) into isolated$HOME. Search for "claude" returned 44 hits + 133 FTS rows without any manual reindex POST.Notes
Not pushed previously (worktree-isolated agent run). Branch renamed from
worktree-agent-*.