Spec A: TheoremSearch backend for the librarian + mathematics as 9th default field

**Parent**: #111. **Type**: spec-005 amendment (not a new spec dir — Q1 anticipated backend expansion).

## Goal

Add [TheoremSearch](https://www.theoremsearch.com/) as a third candidate-source backend to the librarian agent (alongside Semantic Scholar + arXiv), and add `mathematics` as the 9th default field so llmXive can do math-theory projects. Output contract unchanged — TheoremSearch `Candidate`s flow through the existing verify chain (URL resolves → title-overlap ≥0.7 → summary-grounded ≥0.5 → LLM relevance judge).

## Scope (in)

1. **`mathematics` as the 9th default field.** Add to the librarian's `DEFAULT_FIELDS` (`tests/phase2/test_librarian_cross_domain.py`), the brainstorm agent's field enum, and any field-list constant. Brainstorm **5 seed math projects** so the cross-domain test has math projects to run against (per maintainer answer #3 on #111).
2. **`src/llmxive/librarian/theoremsearch.py`** — `TheoremSearchClient.search(term, *, limit=10) -> list[Candidate]`:
   - `POST https://api.theoremsearch.com/search` with `{query: term, limit: ...}`
   - For each result where `paper.source == "arXiv"`: extract `paper.paper_id`, strip the `v\d+` suffix, call the librarian's existing `ArxivClient.get_by_id(arxiv_id)` for the full record, emit `Candidate(backend=\"theoremsearch\", primary_pointer=arxiv_id, claimed_title=..., claimed_authors=..., claimed_year=..., claimed_venue=\"arXiv\", claimed_abstract=...)`.
   - **Skip non-arXiv sources** (ProofWiki, Stacks Project — no DOI/arXiv ID; reserved for Spec B per maintainer answer #4).
   - Token-bucket rate limiter (~1 req/2s, same pattern as `ArxivClient`).
   - Errors → `TransientBackendError` so the router falls through to SS+arXiv; the librarian never *depends* on TheoremSearch being up.
3. **LLM math-classifier** (`src/llmxive/librarian/math_classifier.py` or folded into `query_extractor.py`) — `is_math_theory_question(term, idea_body_excerpt, *, model, ...) -> bool` via a single LLM call ("Is this a pure-mathematics question about theorems, proofs, or formal mathematical structures? yes/no"). Fail-open to `False`. **Cache the verdict per-project** (per maintainer answer #2) — keyed by (project_id or question_hash, prompt_version) so re-runs are stable.
4. **Wire into `LibrarianAgent.invoke()`**, after the existing extracted-query searches:
   ```
   if field == \"mathematics\":
       candidates += TheoremSearchClient().search(term)        # always
   elif _is_math_theory_question(term, idea_body_excerpt):     # cached LLM classifier
       candidates += TheoremSearchClient().search(term)
   ```
   Thread new candidates into the existing merge → verify → judge chain (no chain changes — TheoremSearch `Candidate`s carry arXiv IDs the verify chain already handles; `merge_candidates` dedups by `primary_pointer`).
5. **Bump librarian `prompt_version`** (the classifier is a new LLM call) → cache-invalidating; re-run cross-domain + PROJ-261/262 re-validation.
6. **Tests**: `tests/phase2/test_theoremsearch.py` (parser tests on recorded `/search` response — arXiv vs ProofWiki branching; real-API smoke gated on network; `Candidate` mapping; dedup), `tests/phase2/test_math_classifier.py` (parser + real-LLM smoke: math question → True, non-math → False), add a `mathematics` parametrization to `test_librarian_cross_domain.py`.
7. **Docs**: update `agents/registry.yaml` librarian entry comment (3 backends + math classifier), the diagnostic (`notes/2026-05-07-spec-005-librarian-diagnostic.md`), and `specs/005-librarian-agent/spec.md` Q1 clarification (it anticipated "future spec may expand the backend list" — this is that amendment).

## Scope (out — deferred to Spec B, see #114)

- Theorem-*statement* artifacts (`## Related theorems` idea.md subsection)
- Non-arXiv TheoremSearch sources (ProofWiki, Stacks Project)
- The `theorem_searcher` agent itself

## Verified API facts (from live probing — see #111)

- `POST /search` → `{theorems: [{theorem_id, name, body, slogan, theorem_type, link, paper: {paper_id, source, title, authors, link, summary, journal_ref, primary_category, categories, citations, year, journal_published}, similarity, score, has_metadata}]}`
- arXiv-sourced: `paper.paper_id` = versioned arXiv ID (`1306.5434v2`), `paper.link` = `http://arxiv.org/abs/<id>`, `paper.year` present, `paper.source == \"arXiv\"`
- No auth, no documented rate limits. OpenAPI at `https://api.theoremsearch.com/openapi.json`. MCP server at `/mcp`.
- Stability caveat: small academic project ([uw-math-ai/TheoremSearch](https://github.com/uw-math-ai/TheoremSearch), 8 stars, no license, no SLA) — mitigated by the fall-through-to-SS+arXiv design.

## Acceptance

- The librarian, invoked with `field=\"mathematics\"` on a real math arXiv question, returns ≥1 verified citation that came via the TheoremSearch backend (visible in `verification_log` / `Candidate.backend`).
- A non-math-field question that the classifier flags as math-theory also triggers the TheoremSearch path; a clearly-non-math question does not.
- Cross-domain test passes for all 9 fields (or skips `mathematics` cleanly if no math project is brainstormed yet — same pattern as the existing "no brainstormed projects found" skip).
- PROJ-261 + PROJ-262 re-validate `verified` under the new librarian version (no regression on non-math projects).
- All Phase 2 tests pass; ruff clean.

## Constitution note

Per Principle I (single source of truth): TheoremSearch is a *backend*, not a competing agent. The librarian remains the single entry point and single verifier for literature search.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec A: TheoremSearch backend for the librarian + mathematics as 9th default field #113

Goal

Scope (in)

Scope (out — deferred to Spec B, see #114)

Verified API facts (from live probing — see #111)

Acceptance

Constitution note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Spec A: TheoremSearch backend for the librarian + mathematics as 9th default field #113

Description

Goal

Scope (in)

Scope (out — deferred to Spec B, see #114)

Verified API facts (from live probing — see #111)

Acceptance

Constitution note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions