Skip to content

chore(rag): v0.34.0 — delete LLMJudgeReranker outright#71

Merged
cipher813 merged 1 commit into
mainfrom
chore/delete-llm-judge-reranker-v0.34.0
May 25, 2026
Merged

chore(rag): v0.34.0 — delete LLMJudgeReranker outright#71
cipher813 merged 1 commit into
mainfrom
chore/delete-llm-judge-reranker-v0.34.0

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Removes the LLM-as-judge RAG reranker per Brian directive 2026-05-25: `[[preference_llm_calls_confined_to_research_module]]` strict guardrail + operator-validated no-lift finding + tier-5-not-SOTA implementation framing.

Why now

1. Operator-validated regressor. 2026-05-12 eval (EXPERIMENTS.md): -14.2% recall@10 vs hybrid w=0.7 baseline on SEC-filings corpus.

2. Sub-SOTA implementation. Tier-5 SOTA at best — LLM-as-judge reranking is for novel rubrics without training labels, not general relevance scoring. Single-integer 1-5 parse with neutral-3 fallback, sequential per-candidate calls (no listwise batching), no rationale, hardcoded model snapshot.

3. Architectural exposure. Strict-rule reading: any LLM call site in the lib is a latent breach — a future caller setting `RAG_RERANK=llm_judge` outside research would slip a covert LLM call past the guardrail.

Institutional rerank-revisit path (recorded)

Domain-finetune `CrossEncoderReranker` on operator-labeled (query, doc, relevance) triples mined from production retrieval logs. P2 ROADMAP entry filed via companion alpha-engine-config PR.

Removed

  • `LLMJudgeReranker` class, `_DEFAULT_LLM_RUBRIC`, `_default_llm_judge_factory`, `_LLM_JUDGE_FACTORY`
  • `"llm_judge"` branch in `get_reranker`
  • `TestLLMJudgeReranker` + `_mock_anthropic_client` helper

Kept

  • `CrossEncoderReranker` (zero LLM, same protocol shape for future domain-finetune retry)
  • `RerankCache`, `Reranker` protocol, `_attach_and_sort`
  • `get_reranker` with `"cross_encoder"` as sole supported name (explicit `"llm_judge" was removed v0.34.0` docstring)

Test plan

  • Suite 741 → 738 (-3 deleted LLMJudge tests)
  • CrossEncoderReranker tests unchanged
  • `get_reranker("llm_judge")` raises `Unknown reranker` per the simplified registry
  • Companion alpha-engine-research PR drops the `"hybrid w=0.7 + llm rerank"` eval Condition + bumps lib pin v0.33→v0.34

🤖 Generated with Claude Code

Removes the LLM-as-judge RAG reranker class + factory + registry
branch + tests. Per Brian directive 2026-05-25:
``[[preference_llm_calls_confined_to_research_module]]`` strict
guardrail + operator-validated no-lift finding on the SEC-filings
corpus + tier-5-not-SOTA implementation framing.

**Why now:**

1. **Operator-validated regressor.** 2026-05-12 eval (recorded in
   ``alpha-engine-config/private-docs/EXPERIMENTS.md``) measured
   -14.2% recall@10 vs the hybrid w=0.7 baseline. The cross-encoder
   variant also regressed (-33.3% recall@10) but stays in the lib
   (no LLM exposure; same protocol surface for future revisits).

2. **Sub-SOTA implementation.** Tier-5 SOTA at best — LLM-as-judge
   reranking is for cases that need novel rubrics with no training
   labels (e.g., "rerank by recency-weighted financial materiality"),
   not for general relevance scoring. Single-integer 1-5 parse with
   neutral-3 fallback, per-candidate sequential calls (no listwise
   batching), no rationale capture, hardcoded model snapshot — these
   are all known anti-patterns vs. modern LLM-rerank surveys.

3. **Architectural exposure.** Strict-rule reading of
   ``[[preference_llm_calls_confined_to_research_module]]`` flags any
   LLM call site in the lib as a latent breach — even default-off, a
   future caller setting ``RAG_RERANK=llm_judge`` outside research
   would slip a covert LLM call past the guardrail. Outright deletion
   is structurally compliant.

**Institutional rerank-revisit path** (recorded for the future):
domain-finetune ``CrossEncoderReranker`` on operator-labeled
(query, doc, relevance) triples mined from production retrieval logs.
That's the SOTA pattern for institutional RAG reranking — finetuned
CE models lift +5-15% recall@10 on domain corpora vs general-purpose
CE. A P2 ROADMAP entry covering scope + gate is filed via
``alpha-engine-config`` companion PR.

**Removed surfaces:**

- ``LLMJudgeReranker`` class
- ``_DEFAULT_LLM_RUBRIC`` constant
- ``_default_llm_judge_factory`` + ``_LLM_JUDGE_FACTORY`` global
- ``"llm_judge"`` branch in ``get_reranker``
- ``Callable`` typing import (unused after factory removal)
- ``os`` import in this module (factory was the only consumer)
- ``TestLLMJudgeReranker`` class + ``_mock_anthropic_client`` helper

**Kept:**

- ``CrossEncoderReranker`` (zero LLM exposure, regressor on SEC but
  same protocol shape for future domain-finetune retry)
- ``RerankCache`` (still used by CE)
- ``Reranker`` protocol + ``_attach_and_sort`` helper
- ``get_reranker`` with ``"cross_encoder"`` as the sole supported name
  (and an explicit ``"llm_judge" was removed v0.34.0`` docstring note)

**Consumer updates** (separate PRs this session):
- alpha-engine-research: bump lib pin v0.33→v0.34, drop the
  ``"hybrid w=0.7 + llm rerank"`` entry from ``evals/rag_retrieval.py``
  ``DEFAULT_CONDITIONS``, update ``qual_tools.py`` env-var docstring
- alpha-engine-config: P2 ROADMAP entry + EXPERIMENTS.md update

Suite 741 → 738 (-3: ``test_parses_haiku_integer_response``,
``test_cache_hit_skips_llm_call``, ``test_parse_failure_returns_neutral_three``).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 686649f into main May 25, 2026
5 checks passed
@cipher813 cipher813 deleted the chore/delete-llm-judge-reranker-v0.34.0 branch May 25, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant