In [None]:
# Retrieval is the process of fetching relevant information from an external data source (like a document database or vector store) to help a language model generate more accurate, grounded, and context-aware responses.

In [None]:
from dotenv import load_dotenv
load_dotenv() # load environment variables

### Problem Statement

Retrieval is a critical step in RAG pipelines. If the retriever fails to surface relevant documents, the generation stage will be starved of the context it needs — even if the answer exists somewhere in the data.

**Example:**

User query:
- “Can I bring my dog on the plane?”

Problem:
- The relevant document says: “Small pets are allowed in the cabin if kept in an airline-approved carrier.”
- The word “dog” never appears — only “pets” or “animals.”

Retrieval Failure:
- A naive retriever might not consider “dog” and “pet” semantically similar.
- The relevant document is not retrieved, even though it answers the question.

Result:
- The LLM, lacking the key passage, responds: “I couldn’t find any information about dogs on the plane.”
- Users perceive this as a model failure, but the real issue was the retriever’s blind spot.

### Strategy #1: Ranking

![Image](rsc/jupyter/rag-fusion.png)

**Idea**: Post-processing step where the initial set of retrieved documents is re-scored and re-ordered based on how relevant they are to the user’s query — typically using a more powerful model than was used during initial retrieval.

---

Three strategies that use this idea:
- **Re-Rank**: Score and reorder initially retrieved documents using a query-document relevance model (e.g., a cross-encoder).
- **RankGPT**: Use a language model itself to rank candidate documents by how well they answer the query — using generation and reasoning rather than embeddings.
- **RAG-Fusion**: Run multiple diverse queries (original + variations) through the retriever, then fuse the results for better coverage (we covered this in "query translation").

### Strategy #2: CRAG

![Image](rsc/jupyter/crag.png)

**Idea**: The language model identifies and corrects retrieval errors during generation — essentially giving itself a second chance if the first retrieval step was inadequate.

### Strategy #3: Self-RAG

![Image](rsc/jupyter/self-rag.png)

**Idea**: The language model itself actively critiques and improves its own retrieval process, often by reformulating the query, selecting better documents, or rerunning retrieval — all within a single inference loop.