feat: llm judge for docs retrieval #32

enitrat · 2025-07-25T20:26:56Z

PR Summary: Introduction of a Retrieval Judge to the RAG Pipeline

This PR introduces a new, critical component to the RAG pipeline: a RetrievalJudge. The primary goal is to enhance the quality and relevance of the context provided to the generation model. It acts as an intelligent filter, using an LLM to score and discard irrelevant documents retrieved from the vector store, leading to more accurate and contextually-aware final answers.

This change also includes a significant refactoring of the RAG pipeline's unit tests, introducing pytest fixtures for better modularity and adding comprehensive test coverage for the new judging functionality.

Key Changes

New RetrievalJudge Module:
- A new dspy.Module (retrieval_judge.py) is implemented.
- It uses an LLM (Gemini Flash) to rate each retrieved document's relevance on a scale of 0.0 to 1.0 against the user's query.
- Documents scoring below a configurable threshold (defaulting to 0.4) are filtered out.
- It operates in parallel to efficiently score multiple documents.
Pipeline Integration:
- The RetrievalJudge is seamlessly integrated into the RagPipeline, running immediately after the DocumentRetriever and before the GenerationProgram.
- Token usage from the judge is now tracked and included in the total get_lm_usage() report.
Optimizations and Resiliency:
- Template Passthrough: Standard contract and test templates are automatically kept without being sent to the judge, saving latency and cost.
- Robust Error Handling: The pipeline is designed to be resilient. If the RetrievalJudge fails for any reason, it logs a warning and proceeds with all retrieved documents, ensuring the user still receives an answer.
Test Suite Overhaul:
- A new test file (test_retrieval_judge.py) provides focused unit tests for the judge's logic, including score clamping, error handling, and template passthrough.
- The main pipeline test file (test_rag_pipeline.py) has been completely refactored to use pytest fixtures, improving readability and maintainability.
- A new test suite (TestRagPipelineWithJudge) specifically validates the pipeline's behavior with the judge enabled, covering filtering, error fallbacks, and metadata enrichment.

Updated RAG Pipeline Flow

The diagram below illustrates the new RAG pipeline, highlighting the addition of the Retrieval Judge step.

flowchart TD
    subgraph RAG Pipeline
        A[User Query] --> B("1Query Processor");
        B -- Semantic Queries --> C("Document Retriever");
        C -- Retrieves from --> D[Vector DB];
        C -- Retrieved Docs --> E{{"Retrieval Judge"}};
        E -- Filters using LLM --> F[Filtered, Relevant Docs];
        F -- High-Quality Context --> G("Generation Program");
        G -- Final Answer --> H[User Response];
    end

    style E fill:#cde4ff,stroke:#0066ff,stroke-width:2px,stroke-dasharray: 5 5

Component Interaction

This sequence diagram shows how the RagPipeline orchestrates the new retrieval and judging process.

sequenceDiagram
    User->>Pipeline: forward(query)
    Pipeline->>Retriever: aforward(query)
    Retriever-->>Pipeline: Returns list of documents
    
    Note over Pipeline, Judge: Pipeline invokes the new Judge
    Pipeline->>Judge: aforward(query, documents)
    Judge-->>Pipeline: Returns filtered list of documents
    
    Pipeline->>Generator: aforward(query, filtered_documents)
    Generator-->>Pipeline: Returns final answer
    Pipeline-->>User: Streams final answer

enitrat changed the base branch from feat/migrate-dspy to main July 30, 2025 15:29

enitrat force-pushed the feat/llm-judge-docs branch 2 times, most recently from c339e58 to b0ed8d1 Compare July 30, 2025 15:33

enitrat mentioned this pull request Jul 30, 2025

fix(refacto): token tracking in response and readme #34

Merged

enitrat force-pushed the feat/llm-judge-docs branch from dfb17fe to bb56031 Compare July 31, 2025 17:59

enitrat added 9 commits July 31, 2025 18:59

migrate to DSPy

586ee0b

dspy migration

f0f12fb

feat: add llm judge for retrieved docs

05344f3

use gemini flashlite to triage docs

257f67e

remove useless config

c80515e

fix contract/test examples

b2c2729

fix test

c874195

fix tests post-rebase

107cf8a

fix tests

2a90d98

enitrat force-pushed the feat/llm-judge-docs branch from bb56031 to 2a90d98 Compare July 31, 2025 17:59

enitrat requested a review from ijusttookadnatest July 31, 2025 18:00

enitrat merged commit 17f12fc into main Jul 31, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: llm judge for docs retrieval #32

feat: llm judge for docs retrieval #32

Uh oh!

enitrat commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: llm judge for docs retrieval #32

feat: llm judge for docs retrieval #32

Uh oh!

Conversation

enitrat commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary: Introduction of a Retrieval Judge to the RAG Pipeline

Key Changes

Updated RAG Pipeline Flow

Component Interaction

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

enitrat commented Jul 25, 2025 •

edited

Loading