Skip to content

feat: auto-reindex search/tags/qa after ingest#10

Merged
0bserver07 merged 2 commits into
mainfrom
feat/auto-reindex-on-ingest
Apr 29, 2026
Merged

feat: auto-reindex search/tags/qa after ingest#10
0bserver07 merged 2 commits into
mainfrom
feat/auto-reindex-on-ingest

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Problem

Search, tag, and Q&A indexes go stale every time new sessions land via run_ingest. Today users have to manually POST to /api/{search,tags,qa}/reindex after every ingest — surprising and easy to forget.

Change

  • New auto_reindex_touched() helper in stackunderflow/ingest/__init__.py runs after run_ingest finishes.
  • Per-project reindex (via index_project), not full reindex_all. Only touches projects whose pre→post message count actually grew.
  • Each service (search/tag/qa) wrapped in its own try/except — a beta-service failure cannot break ingest or the other services.
  • New auto_reindex_on_ingest setting (default True, env AUTO_REINDEX_ON_INGEST) for opt-out.
  • Handles UNIQUE(provider, slug) correctly — same slug across Claude+Codex maps to one logical project.

Evidence

  • 5 new tests under tests/stackunderflow/ingest/test_auto_reindex.py (touched-project, no-op when no new messages, fail-soft, opt-out, real-SearchService end-to-end).
  • pytest tests/ -q: 425 passed, 2 skipped (baseline 420).
  • Smoke test: ingested -Users-yadkonrad-dev-dev-year25-dec25-ClaudeCodePlugins (193 records) into isolated $HOME. Search for "claude" returned 44 hits + 133 FTS rows without any manual reindex POST.

Notes

Not pushed previously (worktree-isolated agent run). Branch renamed from worktree-agent-*.

0bserver07 and others added 2 commits April 29, 2026 10:40
run_ingest now refreshes the FTS5 search index, tag index, and Q&A
index for every project that gained new messages, so users no longer
have to POST to /api/{search,tags,qa}/reindex after new sessions land.

Per-project re-index, not full reindex_all — only touched slugs are
touched. Each service is invoked in its own try/except so a beta
service failure (tags/qa) cannot break ingest, and search itself
fails soft. Gated by a new auto_reindex_on_ingest setting (default
True, env AUTO_REINDEX_ON_INGEST) for opt-out.

The hook lives in run_ingest itself (single source of truth) so all
four call sites — server.py startup, cli.py reindex, two routes/data.py
refresh handlers — pick it up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pyright flagged the generator-yielding fixture as returning Connection
when it actually yields one. Use Iterator[sqlite3.Connection] to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07 0bserver07 merged commit 75d2a9f into main Apr 29, 2026
8 of 9 checks passed
0bserver07 added a commit that referenced this pull request May 14, 2026
0bserver07 added a commit that referenced this pull request May 20, 2026
* feat(ingest): auto-reindex search/tags/qa after ingest

run_ingest now refreshes the FTS5 search index, tag index, and Q&A
index for every project that gained new messages, so users no longer
have to POST to /api/{search,tags,qa}/reindex after new sessions land.

Per-project re-index, not full reindex_all — only touched slugs are
touched. Each service is invoked in its own try/except so a beta
service failure (tags/qa) cannot break ingest, and search itself
fails soft. Gated by a new auto_reindex_on_ingest setting (default
True, env AUTO_REINDEX_ON_INGEST) for opt-out.

The hook lives in run_ingest itself (single source of truth) so all
four call sites — server.py startup, cli.py reindex, two routes/data.py
refresh handlers — pick it up automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ingest): annotate conn fixture as Iterator[Connection]

Pyright flagged the generator-yielding fixture as returning Connection
when it actually yields one. Use Iterator[sqlite3.Connection] to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07 added a commit that referenced this pull request May 20, 2026
…use-embeddings)

Closes HANDOFF follow-up #10. Adds an opt-in semantic-search mode to
search-past-decisions (and its MCP counterpart) via local
sentence-transformers embeddings. The substring filter still runs first;
--use-embeddings only re-ranks the candidate set, never widens it.
sentence-transformers lands as an optional `stackunderflow[embeddings]`
extra — a user who never flags semantic mode pays zero (no torch import,
no model load).

- New optional extra in pyproject.toml: `embeddings = ["sentence-transformers>=2.7,<3"]`.
- Default model `sentence-transformers/all-MiniLM-L6-v2` (90 MB, 384-dim);
  override via STACKUNDERFLOW_EMBED_MODEL or --embed-model. Process-wide
  cache so a single agent invocation only loads the model once.
- New `stackunderflow/services/discovery_embeddings.py`: pull-through
  cache (compute_or_load), query embed, cosine scoring. Lazy imports
  throughout; MissingEmbeddingsDependencyError surfaces a clean install
  hint when the extra isn't installed.
- Re-ranking math: cosine (dot product over unit-normalised vectors)
  mapped to [0, 1]. Replaces the LIKE-density relevance term in
  pack_within_budget; recency + cost weights unchanged.
- Migration v014: additive `discovery_embeddings` table keyed on
  (session_id, message_id, model_name), IF NOT EXISTS-guarded, with
  indexes on session_id and (message_id, model_name). schema.CURRENT_VERSION
  bumps to 14.
- CLI: --use-embeddings + --embed-model flags on search-past-decisions.
  JSON output gains embedding_score: float in [0, 1]; text appends
  cos=X.XX to the headline. Substring-mode JSON shape unchanged
  (embedding_score omitted when None).
- MCP: search_past_decisions gains use_embeddings: bool + embed_model:
  str | None args. Missing-dep error surfaces as a ValueError (with
  install hint) for clean JSON-RPC failure.
- 44 new tests: 21 in test_discovery_embeddings (cache hit/miss/mix,
  cosine math, dim mismatch, missing dep), 7 in
  test_search_past_decisions_embeddings (re-rank changes order, doesn't
  widen, surfaces score, propagates ImportError), 12 in
  test_migration_v014 (shape, PK, indexes, idempotency, partial-apply
  recovery), 4 in test_discovery_cli (JSON score, text cos=, missing-dep
  exit, no-load-when-off). Stubbed via _DeterministicStub / _OrderedStub
  classes — the real 90 MB MiniLM model never loads during tests.
- Docs: cli-reference.md, mcp.md, skills.md updated. CHANGELOG entry
  under [Unreleased].

Tests: 2399 pass (baseline 2355 + 44). Ruff baseline preserved (37).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07 added a commit that referenced this pull request May 20, 2026
@0bserver07 0bserver07 deleted the feat/auto-reindex-on-ingest branch May 20, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant