#### **Re-ranking methods in RAG**

##### **Overview** 

- Reranking is a crucial step in Retrieval-Augmented Generation (RAG) systems that aims to improve the relevance and quality of retrieved documents.

- It involves reassessing and reordering initially retrieved documents to ensure that the most pertinent information is prioritized for subsequent processing or presentation.

#### **Motivation** 

- The primary motivation for reranking in RAG systems is to overcome limitations of initial retrieval methods, which often rely on simpler similarity metrics

- Reranking allows for more sophisticated relevance assessment, taking into account nuanced relationships between queries and documents that might be missed by traditional retrieval techniques.

- This process aims to enhance the overall performance of RAG systems by ensuring that the most relevant information is used in the generation phase.

#### **Key Components** 

Reranking systems typically include the following components:

1) Initial Retriever : Often a vector store using embedding-based similarity search.

2) Reranking Model : 
    - A Large Language Model (LLM) for scoring relevance
    - A Cross-Encoder model specifically trained for relevance assessment 

3) Scoring Mechanism: A method to assign relevance scores to documents

4) Sorting and Selection Logic: To reorder documents based on new scores

#### **Benefits of this Approach** 

- Improved Relevance: By using more sophisticated models, reranking can capture subtle relevance factors.

- Flexibility: Different reranking methods can be applied based on specific needs and resources.

- Enhanced Context Quality: Providing more relevant documents to the RAG system improves the quality of generated responses.

- Reduced Noise: Reranking helps filter out less relevant information, focusing on the most pertinent content.

#### **Conclusion** 

- Reranking is a powerful technique in RAG systems that significantly enhances the quality of retrieved information. 

- Whether using LLM-based scoring or specialized Cross-Encoder models, reranking allows for more nuanced and accurate assessment of document relevance.

- This improved relevance translates directly to better performance in downstream tasks, making reranking an essential component in advanced RAG implementations.

---

#### **LLM used**

In [1]:
from langchain_ollama import ChatOllama 

llm = ChatOllama(
    model='llama3.2',
    temperature=0,
    verbose=True
)

llm.invoke("Hey What are you doing right now")

  from .autonotebook import tqdm as notebook_tqdm


AIMessage(content='I\'m just a language model, I don\'t have personal experiences or emotions like humans do. However, I am currently:\n\n1. Processing your question and generating a response.\n2. Waiting for any additional input from you to continue our conversation.\n3. Running on computer servers, responding to queries from users like you.\n\nIn other words, I\'m always "on" and ready to help with any questions or topics you\'d like to discuss!', additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-12-22T17:00:13.977262Z', 'done': True, 'done_reason': 'stop', 'total_duration': 25815412875, 'load_duration': 3755575375, 'prompt_eval_count': 32, 'prompt_eval_duration': 12692930000, 'eval_count': 90, 'eval_duration': 6515591207, 'logprobs': None, 'model_name': 'llama3.2', 'model_provider': 'ollama'}, id='lc_run--019b4700-b82b-7da0-a509-41619d6fd3f8-0', usage_metadata={'input_tokens': 32, 'output_tokens': 90, 'total_tokens': 122})

--- 

#### **Embedding model**

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings
import time 

# this is all-MiniLM-L6-v2 model 
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."

start = time.time()
query_result = embedding_model.embed_query(text)
total_time = time.time() - start
# show only the first 100 characters of the stringified vector
print(f"Length of text embedding : {len(text)}")
print(f"Time taken to convert text to embedding : {total_time :.2f} sec")
print(str(query_result)[:100] + "...")

Length of text embedding : 24
Time taken to convert text to embedding : 0.40 sec
[-0.0383385606110096, 0.1234646886587143, -0.02864295430481434, 0.05365273356437683, 0.0088453618809...


---

#### **Load the Data**

In [9]:
from langchain_community.document_loaders import PyPDFLoader 

file_path = '../data/Understanding_Climate_Change.pdf'

loader = PyPDFLoader(file_path)
docs = loader.load()

print(f"Number of docs : {len(docs)}")

avg_doc_words = 0

for doc in docs:
    total_words_in_doc = len(doc.page_content.split(' '))
    avg_doc_words += total_words_in_doc

print(f"Average count of words in a doc : {round(avg_doc_words/len(docs))}")



Number of docs : 33
Average count of words in a doc : 280


--- 

#### Creating Chunks

In [11]:
from langchain_text_splitters import RecursiveCharacterTextSplitter 

text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)

chunks = text_splitter.split_documents(docs)

print(f"Total number of chunks : {len(chunks)}")

Total number of chunks : 215


--- 

#### **Creating Vcetorstore and retriever**

In [13]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embedding_model.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embedding_model,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

## adding chunks to the vectorstore 
vector_store.add_documents(chunks)

['9d50aa77-fa64-4b69-baee-22f8c92024bb',
 '0fc9495f-c7bb-40d5-9949-cbcb85f81047',
 '560d65b6-ab1d-426f-8892-aa8ea796a137',
 '7202f5dc-411a-4ed7-8d4e-bbbf0b7177b5',
 '185e4018-051e-4e4a-b7cc-f7e2c4566096',
 '8b08e520-7c52-4f7f-91cc-ac1625f7a3ac',
 'eec75510-9fcc-4e60-bb16-9c1d9ba3c5f0',
 '226d47a1-fcd5-4e0f-83e8-96aec466e03d',
 '96085cff-ceb3-4cec-a331-c1689c1adad9',
 'd130f4bc-07cb-4e03-89e5-b51eeaa9060b',
 'b3794d0a-4c71-4806-836e-dc9f1c905eca',
 '55db4db7-90e6-450e-b2eb-5444bd11e8c8',
 '2bdbfc88-5c95-4f81-acb5-cd39c690cd43',
 '0d6dcc17-d260-4378-832c-27c0c02df320',
 'e0946cb3-75a9-4596-9131-706a2fc50c01',
 '59fc0849-ec1e-4984-8d55-bcf140b473d0',
 'c9a293aa-1044-4fff-9992-e20252c2c601',
 '21406c96-3df8-451f-83cd-5b2d8695a94a',
 '4aaee346-d0b4-45fa-b9cf-0883b001eeac',
 '04bd0f73-9521-4b47-9da1-581a2fd04bce',
 '7069a6d1-5d66-402a-8cdf-4c3a778118a7',
 'd379ead9-2572-49ad-bbae-c83966347fa8',
 '79f7c8e7-c6cb-4d2d-a92f-d74e758ca6b9',
 'e771ec80-f967-4579-b738-6148362a3d63',
 '6345e7d7-42a8-

In [14]:
# adding a retriever

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 2})

# try the retriever 
relevant_docs = retriever.invoke("What is Climate Change?")

for doc in relevant_docs:
    print("="*89)
    print(doc.page_content)

Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human
Chapter 14: Climate Change and the Economy 
Economic Transformation


---

#### **Two types of Reranking**

- **LLM based**
    - For this we only use LLM with a scoring prompt for query and document
    - Based on those assigned score we sort the docs
    - And retrieves top-k docs

    - Cons : 
        - Expensive üí∞
        - Slow ‚è≥
        - Hard to scale (O(K) LLM calls)

- **CrossEncoder reranker**
    - A neural model trained explicitly for reranking that encodes query and document together.
    - Pros
        - Much faster than general LLMs
        - Cheaper
        - Very strong ranking accuracy
        - Deterministic scores
    - Cons
        - Still O(K) forward passes
        - Less flexible than promptable LLMs
        - Limited context window

---

**LLM Based**

In [None]:
from pydantic import BaseModel