Skip to content

Spec A: TheoremSearch backend for the librarian + mathematics as 9th default field #113

@jeremymanning

Description

@jeremymanning

Parent: #111. Type: spec-005 amendment (not a new spec dir — Q1 anticipated backend expansion).

Goal

Add TheoremSearch as a third candidate-source backend to the librarian agent (alongside Semantic Scholar + arXiv), and add mathematics as the 9th default field so llmXive can do math-theory projects. Output contract unchanged — TheoremSearch Candidates flow through the existing verify chain (URL resolves → title-overlap ≥0.7 → summary-grounded ≥0.5 → LLM relevance judge).

Scope (in)

  1. mathematics as the 9th default field. Add to the librarian's DEFAULT_FIELDS (tests/phase2/test_librarian_cross_domain.py), the brainstorm agent's field enum, and any field-list constant. Brainstorm 5 seed math projects so the cross-domain test has math projects to run against (per maintainer answer Quantum Cognition in LLMs: Superposition States for Ambiguous Reasoning #3 on integrate theoremsearch #111).
  2. src/llmxive/librarian/theoremsearch.pyTheoremSearchClient.search(term, *, limit=10) -> list[Candidate]:
    • POST https://api.theoremsearch.com/search with {query: term, limit: ...}
    • For each result where paper.source == "arXiv": extract paper.paper_id, strip the v\d+ suffix, call the librarian's existing ArxivClient.get_by_id(arxiv_id) for the full record, emit Candidate(backend=\"theoremsearch\", primary_pointer=arxiv_id, claimed_title=..., claimed_authors=..., claimed_year=..., claimed_venue=\"arXiv\", claimed_abstract=...).
    • Skip non-arXiv sources (ProofWiki, Stacks Project — no DOI/arXiv ID; reserved for Spec B per maintainer answer The Binding Problem in LLMs: Implementing Synchronized Oscillations for Feature Integration #4).
    • Token-bucket rate limiter (~1 req/2s, same pattern as ArxivClient).
    • Errors → TransientBackendError so the router falls through to SS+arXiv; the librarian never depends on TheoremSearch being up.
  3. LLM math-classifier (src/llmxive/librarian/math_classifier.py or folded into query_extractor.py) — is_math_theory_question(term, idea_body_excerpt, *, model, ...) -> bool via a single LLM call ("Is this a pure-mathematics question about theorems, proofs, or formal mathematical structures? yes/no"). Fail-open to False. Cache the verdict per-project (per maintainer answer Neural Narrative Networks: Brain-Inspired Story Generation and Comprehension #2) — keyed by (project_id or question_hash, prompt_version) so re-runs are stable.
  4. Wire into LibrarianAgent.invoke(), after the existing extracted-query searches:
    if field == \"mathematics\":
        candidates += TheoremSearchClient().search(term)        # always
    elif _is_math_theory_question(term, idea_body_excerpt):     # cached LLM classifier
        candidates += TheoremSearchClient().search(term)
    
    Thread new candidates into the existing merge → verify → judge chain (no chain changes — TheoremSearch Candidates carry arXiv IDs the verify chain already handles; merge_candidates dedups by primary_pointer).
  5. Bump librarian prompt_version (the classifier is a new LLM call) → cache-invalidating; re-run cross-domain + PROJ-261/262 re-validation.
  6. Tests: tests/phase2/test_theoremsearch.py (parser tests on recorded /search response — arXiv vs ProofWiki branching; real-API smoke gated on network; Candidate mapping; dedup), tests/phase2/test_math_classifier.py (parser + real-LLM smoke: math question → True, non-math → False), add a mathematics parametrization to test_librarian_cross_domain.py.
  7. Docs: update agents/registry.yaml librarian entry comment (3 backends + math classifier), the diagnostic (notes/2026-05-07-spec-005-librarian-diagnostic.md), and specs/005-librarian-agent/spec.md Q1 clarification (it anticipated "future spec may expand the backend list" — this is that amendment).

Scope (out — deferred to Spec B, see #114)

  • Theorem-statement artifacts (## Related theorems idea.md subsection)
  • Non-arXiv TheoremSearch sources (ProofWiki, Stacks Project)
  • The theorem_searcher agent itself

Verified API facts (from live probing — see #111)

  • POST /search{theorems: [{theorem_id, name, body, slogan, theorem_type, link, paper: {paper_id, source, title, authors, link, summary, journal_ref, primary_category, categories, citations, year, journal_published}, similarity, score, has_metadata}]}
  • arXiv-sourced: paper.paper_id = versioned arXiv ID (1306.5434v2), paper.link = http://arxiv.org/abs/<id>, paper.year present, paper.source == \"arXiv\"
  • No auth, no documented rate limits. OpenAPI at https://api.theoremsearch.com/openapi.json. MCP server at /mcp.
  • Stability caveat: small academic project (uw-math-ai/TheoremSearch, 8 stars, no license, no SLA) — mitigated by the fall-through-to-SS+arXiv design.

Acceptance

  • The librarian, invoked with field=\"mathematics\" on a real math arXiv question, returns ≥1 verified citation that came via the TheoremSearch backend (visible in verification_log / Candidate.backend).
  • A non-math-field question that the classifier flags as math-theory also triggers the TheoremSearch path; a clearly-non-math question does not.
  • Cross-domain test passes for all 9 fields (or skips mathematics cleanly if no math project is brainstormed yet — same pattern as the existing "no brainstormed projects found" skip).
  • PROJ-261 + PROJ-262 re-validate verified under the new librarian version (no regression on non-math projects).
  • All Phase 2 tests pass; ruff clean.

Constitution note

Per Principle I (single source of truth): TheoremSearch is a backend, not a competing agent. The librarian remains the single entry point and single verifier for literature search.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions