feat: auto-reindex search/tags/qa after ingest by 0bserver07 · Pull Request #10 · 0bserver07/StackUnderflow

0bserver07 · 2026-04-29T14:56:00Z

Problem

Search, tag, and Q&A indexes go stale every time new sessions land via run_ingest. Today users have to manually POST to /api/{search,tags,qa}/reindex after every ingest — surprising and easy to forget.

Change

New auto_reindex_touched() helper in stackunderflow/ingest/__init__.py runs after run_ingest finishes.
Per-project reindex (via index_project), not full reindex_all. Only touches projects whose pre→post message count actually grew.
Each service (search/tag/qa) wrapped in its own try/except — a beta-service failure cannot break ingest or the other services.
New auto_reindex_on_ingest setting (default True, env AUTO_REINDEX_ON_INGEST) for opt-out.
Handles UNIQUE(provider, slug) correctly — same slug across Claude+Codex maps to one logical project.

Evidence

5 new tests under tests/stackunderflow/ingest/test_auto_reindex.py (touched-project, no-op when no new messages, fail-soft, opt-out, real-SearchService end-to-end).
pytest tests/ -q: 425 passed, 2 skipped (baseline 420).
Smoke test: ingested -Users-yadkonrad-dev-dev-year25-dec25-ClaudeCodePlugins (193 records) into isolated $HOME. Search for "claude" returned 44 hits + 133 FTS rows without any manual reindex POST.

Notes

Not pushed previously (worktree-isolated agent run). Branch renamed from worktree-agent-*.

run_ingest now refreshes the FTS5 search index, tag index, and Q&A index for every project that gained new messages, so users no longer have to POST to /api/{search,tags,qa}/reindex after new sessions land. Per-project re-index, not full reindex_all — only touched slugs are touched. Each service is invoked in its own try/except so a beta service failure (tags/qa) cannot break ingest, and search itself fails soft. Gated by a new auto_reindex_on_ingest setting (default True, env AUTO_REINDEX_ON_INGEST) for opt-out. The hook lives in run_ingest itself (single source of truth) so all four call sites — server.py startup, cli.py reindex, two routes/data.py refresh handlers — pick it up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pyright flagged the generator-yielding fixture as returning Connection when it actually yields one. Use Iterator[sqlite3.Connection] to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…earch (HANDOFF #10)

* feat(ingest): auto-reindex search/tags/qa after ingest run_ingest now refreshes the FTS5 search index, tag index, and Q&A index for every project that gained new messages, so users no longer have to POST to /api/{search,tags,qa}/reindex after new sessions land. Per-project re-index, not full reindex_all — only touched slugs are touched. Each service is invoked in its own try/except so a beta service failure (tags/qa) cannot break ingest, and search itself fails soft. Gated by a new auto_reindex_on_ingest setting (default True, env AUTO_REINDEX_ON_INGEST) for opt-out. The hook lives in run_ingest itself (single source of truth) so all four call sites — server.py startup, cli.py reindex, two routes/data.py refresh handlers — pick it up automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ingest): annotate conn fixture as Iterator[Connection] Pyright flagged the generator-yielding fixture as returning Connection when it actually yields one. Use Iterator[sqlite3.Connection] to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…use-embeddings) Closes HANDOFF follow-up #10. Adds an opt-in semantic-search mode to search-past-decisions (and its MCP counterpart) via local sentence-transformers embeddings. The substring filter still runs first; --use-embeddings only re-ranks the candidate set, never widens it. sentence-transformers lands as an optional `stackunderflow[embeddings]` extra — a user who never flags semantic mode pays zero (no torch import, no model load). - New optional extra in pyproject.toml: `embeddings = ["sentence-transformers>=2.7,<3"]`. - Default model `sentence-transformers/all-MiniLM-L6-v2` (90 MB, 384-dim); override via STACKUNDERFLOW_EMBED_MODEL or --embed-model. Process-wide cache so a single agent invocation only loads the model once. - New `stackunderflow/services/discovery_embeddings.py`: pull-through cache (compute_or_load), query embed, cosine scoring. Lazy imports throughout; MissingEmbeddingsDependencyError surfaces a clean install hint when the extra isn't installed. - Re-ranking math: cosine (dot product over unit-normalised vectors) mapped to [0, 1]. Replaces the LIKE-density relevance term in pack_within_budget; recency + cost weights unchanged. - Migration v014: additive `discovery_embeddings` table keyed on (session_id, message_id, model_name), IF NOT EXISTS-guarded, with indexes on session_id and (message_id, model_name). schema.CURRENT_VERSION bumps to 14. - CLI: --use-embeddings + --embed-model flags on search-past-decisions. JSON output gains embedding_score: float in [0, 1]; text appends cos=X.XX to the headline. Substring-mode JSON shape unchanged (embedding_score omitted when None). - MCP: search_past_decisions gains use_embeddings: bool + embed_model: str | None args. Missing-dep error surfaces as a ValueError (with install hint) for clean JSON-RPC failure. - 44 new tests: 21 in test_discovery_embeddings (cache hit/miss/mix, cosine math, dim mismatch, missing dep), 7 in test_search_past_decisions_embeddings (re-rank changes order, doesn't widen, surfaces score, propagates ImportError), 12 in test_migration_v014 (shape, PK, indexes, idempotency, partial-apply recovery), 4 in test_discovery_cli (JSON score, text cos=, missing-dep exit, no-load-when-off). Stubbed via _DeterministicStub / _OrderedStub classes — the real 90 MB MiniLM model never loads during tests. - Docs: cli-reference.md, mcp.md, skills.md updated. CHANGELOG entry under [Unreleased]. Tests: 2399 pass (baseline 2355 + 44). Ruff baseline preserved (37). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…earch (HANDOFF #10)

0bserver07 and others added 2 commits April 29, 2026 10:40

test(ingest): annotate conn fixture as Iterator[Connection]

9e58be2

Pyright flagged the generator-yielding fixture as returning Connection when it actually yields one. Use Iterator[sqlite3.Connection] to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 mentioned this pull request Apr 29, 2026

fix: ruff E501 in services reindex error paths #14

Merged

0bserver07 merged commit 75d2a9f into main Apr 29, 2026
8 of 9 checks passed

0bserver07 added a commit that referenced this pull request May 14, 2026

merge: feat/discovery-embeddings — opt-in --use-embeddings semantic s…

ac410d6

…earch (HANDOFF #10)

0bserver07 added a commit that referenced this pull request May 14, 2026

docs(handoff): mark item #10 closed (discovery embeddings)

3f2d711

0bserver07 added a commit that referenced this pull request May 20, 2026

merge: feat/discovery-embeddings — opt-in --use-embeddings semantic s…

8b5634b

…earch (HANDOFF #10)

0bserver07 added a commit that referenced this pull request May 20, 2026

docs(handoff): mark item #10 closed (discovery embeddings)

5e3cc72

0bserver07 deleted the feat/auto-reindex-on-ingest branch May 20, 2026 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: auto-reindex search/tags/qa after ingest#10

feat: auto-reindex search/tags/qa after ingest#10
0bserver07 merged 2 commits into
mainfrom
feat/auto-reindex-on-ingest

0bserver07 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented Apr 29, 2026

Problem

Change

Evidence

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant