feat: llm judge for docs retrieval #32
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Summary: Introduction of a Retrieval Judge to the RAG Pipeline
This PR introduces a new, critical component to the RAG pipeline: a
RetrievalJudge. The primary goal is to enhance the quality and relevance of the context provided to the generation model. It acts as an intelligent filter, using an LLM to score and discard irrelevant documents retrieved from the vector store, leading to more accurate and contextually-aware final answers.This change also includes a significant refactoring of the RAG pipeline's unit tests, introducing
pytestfixtures for better modularity and adding comprehensive test coverage for the new judging functionality.Key Changes
New
RetrievalJudgeModule:dspy.Module(retrieval_judge.py) is implemented.0.4) are filtered out.Pipeline Integration:
RetrievalJudgeis seamlessly integrated into theRagPipeline, running immediately after theDocumentRetrieverand before theGenerationProgram.get_lm_usage()report.Optimizations and Resiliency:
RetrievalJudgefails for any reason, it logs a warning and proceeds with all retrieved documents, ensuring the user still receives an answer.Test Suite Overhaul:
test_retrieval_judge.py) provides focused unit tests for the judge's logic, including score clamping, error handling, and template passthrough.test_rag_pipeline.py) has been completely refactored to usepytestfixtures, improving readability and maintainability.TestRagPipelineWithJudge) specifically validates the pipeline's behavior with the judge enabled, covering filtering, error fallbacks, and metadata enrichment.Updated RAG Pipeline Flow
The diagram below illustrates the new RAG pipeline, highlighting the addition of the Retrieval Judge step.
flowchart TD subgraph RAG Pipeline A[User Query] --> B("1Query Processor"); B -- Semantic Queries --> C("Document Retriever"); C -- Retrieves from --> D[Vector DB]; C -- Retrieved Docs --> E{{"Retrieval Judge"}}; E -- Filters using LLM --> F[Filtered, Relevant Docs]; F -- High-Quality Context --> G("Generation Program"); G -- Final Answer --> H[User Response]; end style E fill:#cde4ff,stroke:#0066ff,stroke-width:2px,stroke-dasharray: 5 5Component Interaction
This sequence diagram shows how the
RagPipelineorchestrates the new retrieval and judging process.sequenceDiagram User->>Pipeline: forward(query) Pipeline->>Retriever: aforward(query) Retriever-->>Pipeline: Returns list of documents Note over Pipeline, Judge: Pipeline invokes the new Judge Pipeline->>Judge: aforward(query, documents) Judge-->>Pipeline: Returns filtered list of documents Pipeline->>Generator: aforward(query, filtered_documents) Generator-->>Pipeline: Returns final answer Pipeline-->>User: Streams final answer