chore(rag): v0.34.0 — delete LLMJudgeReranker outright by cipher813 · Pull Request #71 · cipher813/alpha-engine-lib

cipher813 · 2026-05-25T19:29:43Z

Summary

Removes the LLM-as-judge RAG reranker per Brian directive 2026-05-25: `[[preference_llm_calls_confined_to_research_module]]` strict guardrail + operator-validated no-lift finding + tier-5-not-SOTA implementation framing.

Why now

1. Operator-validated regressor. 2026-05-12 eval (EXPERIMENTS.md): -14.2% recall@10 vs hybrid w=0.7 baseline on SEC-filings corpus.

2. Sub-SOTA implementation. Tier-5 SOTA at best — LLM-as-judge reranking is for novel rubrics without training labels, not general relevance scoring. Single-integer 1-5 parse with neutral-3 fallback, sequential per-candidate calls (no listwise batching), no rationale, hardcoded model snapshot.

3. Architectural exposure. Strict-rule reading: any LLM call site in the lib is a latent breach — a future caller setting `RAG_RERANK=llm_judge` outside research would slip a covert LLM call past the guardrail.

Institutional rerank-revisit path (recorded)

Domain-finetune `CrossEncoderReranker` on operator-labeled (query, doc, relevance) triples mined from production retrieval logs. P2 ROADMAP entry filed via companion alpha-engine-config PR.

Removed

`LLMJudgeReranker` class, `_DEFAULT_LLM_RUBRIC`, `_default_llm_judge_factory`, `_LLM_JUDGE_FACTORY`
`"llm_judge"` branch in `get_reranker`
`TestLLMJudgeReranker` + `_mock_anthropic_client` helper

Kept

`CrossEncoderReranker` (zero LLM, same protocol shape for future domain-finetune retry)
`RerankCache`, `Reranker` protocol, `_attach_and_sort`
`get_reranker` with `"cross_encoder"` as sole supported name (explicit `"llm_judge" was removed v0.34.0` docstring)

Test plan

Suite 741 → 738 (-3 deleted LLMJudge tests)
CrossEncoderReranker tests unchanged
`get_reranker("llm_judge")` raises `Unknown reranker` per the simplified registry
Companion alpha-engine-research PR drops the `"hybrid w=0.7 + llm rerank"` eval Condition + bumps lib pin v0.33→v0.34

🤖 Generated with Claude Code

Removes the LLM-as-judge RAG reranker class + factory + registry branch + tests. Per Brian directive 2026-05-25: ``[[preference_llm_calls_confined_to_research_module]]`` strict guardrail + operator-validated no-lift finding on the SEC-filings corpus + tier-5-not-SOTA implementation framing. **Why now:** 1. **Operator-validated regressor.** 2026-05-12 eval (recorded in ``alpha-engine-config/private-docs/EXPERIMENTS.md``) measured -14.2% recall@10 vs the hybrid w=0.7 baseline. The cross-encoder variant also regressed (-33.3% recall@10) but stays in the lib (no LLM exposure; same protocol surface for future revisits). 2. **Sub-SOTA implementation.** Tier-5 SOTA at best — LLM-as-judge reranking is for cases that need novel rubrics with no training labels (e.g., "rerank by recency-weighted financial materiality"), not for general relevance scoring. Single-integer 1-5 parse with neutral-3 fallback, per-candidate sequential calls (no listwise batching), no rationale capture, hardcoded model snapshot — these are all known anti-patterns vs. modern LLM-rerank surveys. 3. **Architectural exposure.** Strict-rule reading of ``[[preference_llm_calls_confined_to_research_module]]`` flags any LLM call site in the lib as a latent breach — even default-off, a future caller setting ``RAG_RERANK=llm_judge`` outside research would slip a covert LLM call past the guardrail. Outright deletion is structurally compliant. **Institutional rerank-revisit path** (recorded for the future): domain-finetune ``CrossEncoderReranker`` on operator-labeled (query, doc, relevance) triples mined from production retrieval logs. That's the SOTA pattern for institutional RAG reranking — finetuned CE models lift +5-15% recall@10 on domain corpora vs general-purpose CE. A P2 ROADMAP entry covering scope + gate is filed via ``alpha-engine-config`` companion PR. **Removed surfaces:** - ``LLMJudgeReranker`` class - ``_DEFAULT_LLM_RUBRIC`` constant - ``_default_llm_judge_factory`` + ``_LLM_JUDGE_FACTORY`` global - ``"llm_judge"`` branch in ``get_reranker`` - ``Callable`` typing import (unused after factory removal) - ``os`` import in this module (factory was the only consumer) - ``TestLLMJudgeReranker`` class + ``_mock_anthropic_client`` helper **Kept:** - ``CrossEncoderReranker`` (zero LLM exposure, regressor on SEC but same protocol shape for future domain-finetune retry) - ``RerankCache`` (still used by CE) - ``Reranker`` protocol + ``_attach_and_sort`` helper - ``get_reranker`` with ``"cross_encoder"`` as the sole supported name (and an explicit ``"llm_judge" was removed v0.34.0`` docstring note) **Consumer updates** (separate PRs this session): - alpha-engine-research: bump lib pin v0.33→v0.34, drop the ``"hybrid w=0.7 + llm rerank"`` entry from ``evals/rag_retrieval.py`` ``DEFAULT_CONDITIONS``, update ``qual_tools.py`` env-var docstring - alpha-engine-config: P2 ROADMAP entry + EXPERIMENTS.md update Suite 741 → 738 (-3: ``test_parses_haiku_integer_response``, ``test_cache_hit_skips_llm_call``, ``test_parse_failure_returns_neutral_three``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit 686649f into main May 25, 2026
5 checks passed

cipher813 deleted the chore/delete-llm-judge-reranker-v0.34.0 branch May 25, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(rag): v0.34.0 — delete LLMJudgeReranker outright#71

chore(rag): v0.34.0 — delete LLMJudgeReranker outright#71
cipher813 merged 1 commit into
mainfrom
chore/delete-llm-judge-reranker-v0.34.0

cipher813 commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 25, 2026

Summary

Why now

Institutional rerank-revisit path (recorded)

Removed

Kept

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant