You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parent: #111. Type: spec-005 amendment (not a new spec dir — Q1 anticipated backend expansion).
Goal
Add TheoremSearch as a third candidate-source backend to the librarian agent (alongside Semantic Scholar + arXiv), and add mathematics as the 9th default field so llmXive can do math-theory projects. Output contract unchanged — TheoremSearch Candidates flow through the existing verify chain (URL resolves → title-overlap ≥0.7 → summary-grounded ≥0.5 → LLM relevance judge).
Scope (in)
mathematics as the 9th default field. Add to the librarian's DEFAULT_FIELDS (tests/phase2/test_librarian_cross_domain.py), the brainstorm agent's field enum, and any field-list constant. Brainstorm 5 seed math projects so the cross-domain test has math projects to run against (per maintainer answer Quantum Cognition in LLMs: Superposition States for Ambiguous Reasoning #3 on integrate theoremsearch #111).
POST https://api.theoremsearch.com/search with {query: term, limit: ...}
For each result where paper.source == "arXiv": extract paper.paper_id, strip the v\d+ suffix, call the librarian's existing ArxivClient.get_by_id(arxiv_id) for the full record, emit Candidate(backend=\"theoremsearch\", primary_pointer=arxiv_id, claimed_title=..., claimed_authors=..., claimed_year=..., claimed_venue=\"arXiv\", claimed_abstract=...).
Token-bucket rate limiter (~1 req/2s, same pattern as ArxivClient).
Errors → TransientBackendError so the router falls through to SS+arXiv; the librarian never depends on TheoremSearch being up.
LLM math-classifier (src/llmxive/librarian/math_classifier.py or folded into query_extractor.py) — is_math_theory_question(term, idea_body_excerpt, *, model, ...) -> bool via a single LLM call ("Is this a pure-mathematics question about theorems, proofs, or formal mathematical structures? yes/no"). Fail-open to False. Cache the verdict per-project (per maintainer answer Neural Narrative Networks: Brain-Inspired Story Generation and Comprehension #2) — keyed by (project_id or question_hash, prompt_version) so re-runs are stable.
Wire into LibrarianAgent.invoke(), after the existing extracted-query searches:
if field == \"mathematics\":
candidates += TheoremSearchClient().search(term) # always
elif _is_math_theory_question(term, idea_body_excerpt): # cached LLM classifier
candidates += TheoremSearchClient().search(term)
Thread new candidates into the existing merge → verify → judge chain (no chain changes — TheoremSearch Candidates carry arXiv IDs the verify chain already handles; merge_candidates dedups by primary_pointer).
Bump librarian prompt_version (the classifier is a new LLM call) → cache-invalidating; re-run cross-domain + PROJ-261/262 re-validation.
Tests: tests/phase2/test_theoremsearch.py (parser tests on recorded /search response — arXiv vs ProofWiki branching; real-API smoke gated on network; Candidate mapping; dedup), tests/phase2/test_math_classifier.py (parser + real-LLM smoke: math question → True, non-math → False), add a mathematics parametrization to test_librarian_cross_domain.py.
Docs: update agents/registry.yaml librarian entry comment (3 backends + math classifier), the diagnostic (notes/2026-05-07-spec-005-librarian-diagnostic.md), and specs/005-librarian-agent/spec.md Q1 clarification (it anticipated "future spec may expand the backend list" — this is that amendment).
No auth, no documented rate limits. OpenAPI at https://api.theoremsearch.com/openapi.json. MCP server at /mcp.
Stability caveat: small academic project (uw-math-ai/TheoremSearch, 8 stars, no license, no SLA) — mitigated by the fall-through-to-SS+arXiv design.
Acceptance
The librarian, invoked with field=\"mathematics\" on a real math arXiv question, returns ≥1 verified citation that came via the TheoremSearch backend (visible in verification_log / Candidate.backend).
A non-math-field question that the classifier flags as math-theory also triggers the TheoremSearch path; a clearly-non-math question does not.
Cross-domain test passes for all 9 fields (or skips mathematics cleanly if no math project is brainstormed yet — same pattern as the existing "no brainstormed projects found" skip).
PROJ-261 + PROJ-262 re-validate verified under the new librarian version (no regression on non-math projects).
All Phase 2 tests pass; ruff clean.
Constitution note
Per Principle I (single source of truth): TheoremSearch is a backend, not a competing agent. The librarian remains the single entry point and single verifier for literature search.
Parent: #111. Type: spec-005 amendment (not a new spec dir — Q1 anticipated backend expansion).
Goal
Add TheoremSearch as a third candidate-source backend to the librarian agent (alongside Semantic Scholar + arXiv), and add
mathematicsas the 9th default field so llmXive can do math-theory projects. Output contract unchanged — TheoremSearchCandidates flow through the existing verify chain (URL resolves → title-overlap ≥0.7 → summary-grounded ≥0.5 → LLM relevance judge).Scope (in)
mathematicsas the 9th default field. Add to the librarian'sDEFAULT_FIELDS(tests/phase2/test_librarian_cross_domain.py), the brainstorm agent's field enum, and any field-list constant. Brainstorm 5 seed math projects so the cross-domain test has math projects to run against (per maintainer answer Quantum Cognition in LLMs: Superposition States for Ambiguous Reasoning #3 on integrate theoremsearch #111).src/llmxive/librarian/theoremsearch.py—TheoremSearchClient.search(term, *, limit=10) -> list[Candidate]:POST https://api.theoremsearch.com/searchwith{query: term, limit: ...}paper.source == "arXiv": extractpaper.paper_id, strip thev\d+suffix, call the librarian's existingArxivClient.get_by_id(arxiv_id)for the full record, emitCandidate(backend=\"theoremsearch\", primary_pointer=arxiv_id, claimed_title=..., claimed_authors=..., claimed_year=..., claimed_venue=\"arXiv\", claimed_abstract=...).ArxivClient).TransientBackendErrorso the router falls through to SS+arXiv; the librarian never depends on TheoremSearch being up.src/llmxive/librarian/math_classifier.pyor folded intoquery_extractor.py) —is_math_theory_question(term, idea_body_excerpt, *, model, ...) -> boolvia a single LLM call ("Is this a pure-mathematics question about theorems, proofs, or formal mathematical structures? yes/no"). Fail-open toFalse. Cache the verdict per-project (per maintainer answer Neural Narrative Networks: Brain-Inspired Story Generation and Comprehension #2) — keyed by (project_id or question_hash, prompt_version) so re-runs are stable.LibrarianAgent.invoke(), after the existing extracted-query searches:Candidates carry arXiv IDs the verify chain already handles;merge_candidatesdedups byprimary_pointer).prompt_version(the classifier is a new LLM call) → cache-invalidating; re-run cross-domain + PROJ-261/262 re-validation.tests/phase2/test_theoremsearch.py(parser tests on recorded/searchresponse — arXiv vs ProofWiki branching; real-API smoke gated on network;Candidatemapping; dedup),tests/phase2/test_math_classifier.py(parser + real-LLM smoke: math question → True, non-math → False), add amathematicsparametrization totest_librarian_cross_domain.py.agents/registry.yamllibrarian entry comment (3 backends + math classifier), the diagnostic (notes/2026-05-07-spec-005-librarian-diagnostic.md), andspecs/005-librarian-agent/spec.mdQ1 clarification (it anticipated "future spec may expand the backend list" — this is that amendment).Scope (out — deferred to Spec B, see #114)
## Related theoremsidea.md subsection)theorem_searcheragent itselfVerified API facts (from live probing — see #111)
POST /search→{theorems: [{theorem_id, name, body, slogan, theorem_type, link, paper: {paper_id, source, title, authors, link, summary, journal_ref, primary_category, categories, citations, year, journal_published}, similarity, score, has_metadata}]}paper.paper_id= versioned arXiv ID (1306.5434v2),paper.link=http://arxiv.org/abs/<id>,paper.yearpresent,paper.source == \"arXiv\"https://api.theoremsearch.com/openapi.json. MCP server at/mcp.Acceptance
field=\"mathematics\"on a real math arXiv question, returns ≥1 verified citation that came via the TheoremSearch backend (visible inverification_log/Candidate.backend).mathematicscleanly if no math project is brainstormed yet — same pattern as the existing "no brainstormed projects found" skip).verifiedunder the new librarian version (no regression on non-math projects).Constitution note
Per Principle I (single source of truth): TheoremSearch is a backend, not a competing agent. The librarian remains the single entry point and single verifier for literature search.
🤖 Generated with Claude Code