# **Retrieval-Augmented Generation (RAG) & Advanced Retrieval Techniques**

## **1. Introduction to RAG**
### **What is Retrieval-Augmented Generation (RAG)?**
- RAG combines information retrieval and language generation to improve response accuracy.  
- It retrieves relevant documents from a knowledge base before generating a response.  
- Used for **Question Answering, Summarization, and Chatbots.**  

### **How RAG Works**
1. **Query Encoding:** Converts user queries into vector representations.  
2. **Document Retrieval:** Searches a vector database for relevant information.  
3. **Response Generation:** Uses a language model to generate an answer based on retrieved context.  


## **2. Retrieval in RAG**
### **Indexing Documents**
- Documents are split into **chunks** for efficient retrieval.  
- Indexed using **vector databases** (e.g., **Chroma, FAISS, Weaviate**).  
- Each chunk is converted into **vector embeddings** using models like **OpenAI Embeddings** or **Cohere Embeddings**.  

### **Retrieving Relevant Context**
- Uses **similarity search** to find documents relevant to a query.  
- Retrieval models:
  - **Dense retrieval** (e.g., FAISS, ChromaDB)
  - **Sparse retrieval** (e.g., BM25)
  - **Hybrid retrieval** (combining dense and sparse methods)

In [None]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain cohere

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()  

os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

# Required API keys with validation
LANGCHAIN_API_KEY = os.getenv("LANGCHAIN_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
COHERE_API_KEY = os.getenv('COHERE_API_KEY')


## **3. Re-Ranking in RAG**
### **What is Re-Ranking?**
- Improves the ranking of retrieved documents based on relevance.  
- Helps when **retrieved documents are not in the best order** for generation.  

### **Re-Ranking Approaches**
#### 1. **Reciprocal Rank Fusion (RRF)**
   - Combines multiple ranked lists into one.  
   - Assigns higher scores to documents appearing in multiple lists.  



   **Implementation:**

> ##### **We start by preparing the retriever**

In [None]:
# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

# Index
from langchain_openai import OpenAIEmbeddings
# from langchain_cohere import CohereEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits, 
                                    # embedding=CohereEmbeddings()
                                    embedding=OpenAIEmbeddings())


retriever = vectorstore.as_retriever()

### **Create Chat Chain**

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG-Fusion
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_rag_fusion 
    | ChatOpenAI(temperature=0)
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

### **Reciprocal Rank Fusion (RRF) Implemetaion for Re-Ranking in RAG**

In [None]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results


### **Test**

In [None]:

question = "What is task decomposition for LLM agents?"
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

In [None]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

#### 2. **Cohere Re-Rank**
   - Uses a **pre-trained ranking model** from Cohere.  
   - Scores retrieved documents and sorts them by relevance.  

**Formula:**  

$$
Score = \sum \frac{1}{\text{Rank} + k}
$$

(where **k** is a smoothing factor)


   **Implementation:**

In [None]:
from langchain_community.llms import Cohere
from langchain.retrievers import  ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank

In [None]:
from langchain.retrievers.document_compressors import CohereRerank

retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(question)

## **4. CRAG (Context-Rich Augmented Generation)**
### **What is CRAG?**
- Enhances RAG by improving **retrieval quality and context selection.**  
- Ensures that retrieved documents **contain rich, high-value information** instead of noise.  
- Uses **filtering and ranking techniques** to optimize context.  


### **CRAG vs Standard RAG**
| Feature       | Standard RAG | CRAG |
|--------------|-------------|------|
| Simple Retrieval | ✅ | ✅ |
| Context Filtering | ❌ | ✅ |
| Re-Ranking | ❌ | ✅ |
| Knowledge Optimization | ❌ | ✅ |

### **Resources for CRAG**
- 📺 **Deep Dive Video:** [YouTube](https://www.youtube.com/watch?v=E2shqsYwxck)  
- 📄 **Notebooks:**
  - [LangGraph CRAG](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag.ipynb)  
  - [LangGraph CRAG Mistral](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag_mistral.ipynb)  

---

## **5. Self-RAG**
### **What is Self-RAG?**
- A **self-improving RAG model** that **learns to retrieve better over time.**  
- Uses **feedback loops** to refine retrieval and context selection.  

### **How Self-RAG Works**
1. **Initial Retrieval:** Retrieve documents for a query.  
2. **Generation & Feedback:** Generate a response and assess quality.  
3. **Self-Improvement:** Modify retrieval behavior based on feedback.  

### **Resources for Self-RAG**
- 📄 **Notebooks:**  
  - [Self-RAG Example](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_self_rag_mistral_nomic.ipynb)  

---

## **6. Impact of Long Context on Retrieval**
### **Challenges with Long Context**
- Large context windows **increase computation cost.**  
- **Irrelevant information** can dilute useful context.  
- **Token limitations** in LLMs can affect response quality.  

### **Solutions**
1. **Efficient Chunking:**  
   - Use **RecursiveCharacterTextSplitter** to split text intelligently. 

2. **Adaptive Context Selection:**  
   - Rank retrieved documents based on **semantic similarity** rather than just keyword matching.  

### **Resources on Long Context**
- 📺 **Deep Dive Video:** [YouTube](https://www.youtube.com/watch?v=SsHUNfhF32s)  
- 📑 **Slides:** [Google Slides](https://docs.google.com/presentation/d/1mJUiPBdtf58NfuSEQ7pVSEQ2Oqmek7F1i4gBwR6JDss/edit#slide=id.g26c0cb8dc66_0_0)  

---

# **Conclusion**
### **Key Takeaways**
- **RAG** enhances language models by incorporating external knowledge.  
- **Re-Ranking** improves retrieved document quality.  
- **CRAG** optimizes retrieval by filtering low-value context.  
- **Self-RAG** enables models to learn and improve retrieval over time.  
- **Long Context** can degrade retrieval efficiency but can be managed using **chunking and ranking techniques.**  

---

## **Further Reading & References**
-  **LangChain RAG Guide:** [LangChain Docs](https://python.langchain.com/docs/use_cases/question_answering)  
-  **LangGraph RAG Examples:** [GitHub](https://github.com/langchain-ai/langgraph/tree/main/examples/rag)  
-  **YouTube Lectures:**  
  - [Self-RAG Deep Dive](https://www.youtube.com/watch?v=E2shqsYwxck)  
  - [Impact of Long Context](https://www.youtube.com/watch?v=SsHUNfhF32s)