#### **Re-ranking methods in RAG**

##### **Overview** 

- Reranking is a crucial step in Retrieval-Augmented Generation (RAG) systems that aims to improve the relevance and quality of retrieved documents.

- It involves reassessing and reordering initially retrieved documents to ensure that the most pertinent information is prioritized for subsequent processing or presentation.

#### **Motivation** 

- The primary motivation for reranking in RAG systems is to overcome limitations of initial retrieval methods, which often rely on simpler similarity metrics

- Reranking allows for more sophisticated relevance assessment, taking into account nuanced relationships between queries and documents that might be missed by traditional retrieval techniques.

- This process aims to enhance the overall performance of RAG systems by ensuring that the most relevant information is used in the generation phase.

#### **Key Components** 

Reranking systems typically include the following components:

1) Initial Retriever : Often a vector store using embedding-based similarity search.

2) Reranking Model : 
    - A Large Language Model (LLM) for scoring relevance
    - A Cross-Encoder model specifically trained for relevance assessment 

3) Scoring Mechanism: A method to assign relevance scores to documents

4) Sorting and Selection Logic: To reorder documents based on new scores

#### **Benefits of this Approach** 

- Improved Relevance: By using more sophisticated models, reranking can capture subtle relevance factors.

- Flexibility: Different reranking methods can be applied based on specific needs and resources.

- Enhanced Context Quality: Providing more relevant documents to the RAG system improves the quality of generated responses.

- Reduced Noise: Reranking helps filter out less relevant information, focusing on the most pertinent content.

#### **Conclusion** 

- Reranking is a powerful technique in RAG systems that significantly enhances the quality of retrieved information. 

- Whether using LLM-based scoring or specialized Cross-Encoder models, reranking allows for more nuanced and accurate assessment of document relevance.

- This improved relevance translates directly to better performance in downstream tasks, making reranking an essential component in advanced RAG implementations.

---

#### **LLM used**

In [24]:
from langchain_ollama import ChatOllama 

llm = ChatOllama(
    model='llama3.2',
    temperature=0,
    verbose=True
)

llm.invoke("Hey What are you doing right now")

AIMessage(content='I\'m just a language model, I don\'t have personal experiences or emotions like humans do. However, I am currently:\n\n1. Processing your question and preparing to respond.\n2. Running on computer servers, responding to queries from users like you.\n3. Continuously learning and improving my knowledge base through machine learning algorithms.\n\nI\'m always "on" and ready to help with any questions or topics you\'d like to discuss! What about you? What are you doing right now?', additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-12-25T15:16:15.715686Z', 'done': True, 'done_reason': 'stop', 'total_duration': 23676567541, 'load_duration': 1218095333, 'prompt_eval_count': 32, 'prompt_eval_duration': 15533685625, 'eval_count': 98, 'eval_duration': 6921936541, 'logprobs': None, 'model_name': 'llama3.2', 'model_provider': 'ollama'}, id='lc_run--019b5614-a438-7fc2-b85d-be2d0456d34c-0', usage_metadata={'input_tokens': 32, 'output_tokens': 98, 't

--- 

#### **Embedding model**

In [25]:
from langchain_huggingface import HuggingFaceEmbeddings
import time 

# this is all-MiniLM-L6-v2 model 
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."

start = time.time()
query_result = embedding_model.embed_query(text)
total_time = time.time() - start
# show only the first 100 characters of the stringified vector
print(f"Length of text embedding : {len(text)}")
print(f"Time taken to convert text to embedding : {total_time :.2f} sec")
print(str(query_result)[:100] + "...")

Length of text embedding : 24
Time taken to convert text to embedding : 0.12 sec
[-0.0383385606110096, 0.1234646886587143, -0.02864295430481434, 0.05365273356437683, 0.0088453618809...


---

#### **Load the Data**

In [26]:
from langchain_community.document_loaders import PyPDFLoader 

file_path = '../data/Understanding_Climate_Change.pdf'

loader = PyPDFLoader(file_path)
docs = loader.load()

print(f"Number of docs : {len(docs)}")

avg_doc_words = 0

for doc in docs:
    total_words_in_doc = len(doc.page_content.split(' '))
    avg_doc_words += total_words_in_doc

print(f"Average count of words in a doc : {round(avg_doc_words/len(docs))}")



Number of docs : 33
Average count of words in a doc : 280


--- 

#### Creating Chunks

In [27]:
from langchain_text_splitters import RecursiveCharacterTextSplitter 

text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)

chunks = text_splitter.split_documents(docs)

print(f"Total number of chunks : {len(chunks)}")

Total number of chunks : 215


--- 

#### **Creating Vcetorstore and retriever**

In [28]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embedding_model.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embedding_model,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

## adding chunks to the vectorstore 
vector_store.add_documents(chunks)

['f1b1f535-9e42-4bd6-aafe-d809c0619f62',
 '69b45fba-0a4c-494f-bb26-fc84d70aa0eb',
 'bf64dec9-a6c0-43a2-8613-92ca6c9a5077',
 '08a4a5af-6aad-4185-beb8-543c62bdc6d3',
 '6f5cd744-8cfb-4829-a1dd-7c74ab5d09fa',
 '8e7372fa-9222-477f-975b-a65ea687448a',
 '6e532d39-f85a-40bb-acb0-ec39dccf6482',
 '2ae80f52-ddbb-4f6e-8e02-ea055aefd6dc',
 'd658f334-8c6c-40f4-9b12-19941954bfe4',
 '779b3bbe-31e6-4210-b090-07cad85e142a',
 'adb106a3-4beb-424b-901f-5fc1ead6372d',
 'f6dc73cc-e5d6-458f-a4b4-ea654c1913b9',
 '0c788f6b-2c04-44cf-bbaf-e802b4527719',
 '97f8f9de-e9a8-4ef5-a17e-c29d01d4e980',
 '1e08b214-c9a5-4881-b44a-ce1eb70229fc',
 'd27147f0-bbbd-406c-8bb7-708be0851dad',
 '47762291-d8d6-45c6-9483-0c6e7991e63a',
 '02be4837-936e-4300-b7f4-d72a478af01b',
 '1ee63439-cb9a-4f6a-91b1-98e154f6a227',
 'f614f4f7-ea7e-43b8-87ed-ff187e568653',
 '6c1a9f2e-0dfd-42c7-ab4c-c981655d8de1',
 '5d32612f-a304-4199-8965-5f4d4d988d76',
 '66deb4c6-0725-4995-b73e-7354022f2bee',
 '7836ec97-9949-47bb-a7df-4c7af09f11ab',
 'e3845b00-9be7-

In [29]:
# adding a retriever

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 5})

# try the retriever 
relevant_docs = retriever.invoke("What is Climate Change?")

for doc in relevant_docs:
    print("="*89)
    print(doc.page_content)

Understanding Climate Change 
Chapter 1: Introduction to Climate Change 
Climate change refers to significant, long-term changes in the global climate. The term 
"global climate" encompasses the planet's overall weather patterns, including temperature, 
precipitation, and wind patterns, over an extended period. Over the past century, human
Chapter 14: Climate Change and the Economy 
Economic Transformation
and infrastructure. Cities are particularly vulnerable due to the "urban heat island" effect. 
Heatwaves can lead to heat-related illnesses and exacerbate existing health conditions. 
Changing Seasons 
Climate change is altering the timing and length of seasons, affecting ecosystems and human 
activities. For example, spring is arriving earlier, and winters are becoming shorter and
of biodiversity and disrupt ecological balance. 
Marine Ecosystems 
Marine ecosystems are highly vulnerable to climate change. Rising sea temperatures, ocean 
acidification, and changing currents affect ma

---

#### **Two types of Reranking**

- **LLM based**
    - For this we only use LLM with a scoring prompt for query and document
    - Based on those assigned score we sort the docs
    - And retrieves top-k docs

    - Cons : 
        - Expensive üí∞
        - Slow ‚è≥
        - Hard to scale (O(K) LLM calls)

- **CrossEncoder reranker**
    - A neural model trained explicitly for reranking that encodes query and document together.
    - Pros
        - Much faster than general LLMs
        - Cheaper
        - Very strong ranking accuracy
        - Deterministic scores
    - Cons
        - Still O(K) forward passes
        - Less flexible than promptable LLMs
        - Limited context window

---

**LLM Based**

In [30]:
from pydantic import BaseModel, Field
from typing import Annotated 
from langchain_core.prompts import PromptTemplate

# Data Validation
class RerankerDataModel(BaseModel):
    """ 
    Assigns a score on scale of 0 to 10 for the relevance of query to document.
    """
    score: Annotated[int, Field(description="a score on scale of 0 to 10 for the relevance of query to document")]

# configuring LLM eith structured output 
scoring_llm = llm.with_structured_output(RerankerDataModel)

# prompt template for scoring 
scoring_template_content = """
    You are a good evaluator. 
    You are provided with a query : {query}
    and a document : {doc}.

    You need to give a score on 0 to 10.
"""

scoring_template = PromptTemplate(
    template=scoring_template_content,
    input_variables=['query', 'doc']
)

scoring_chain = scoring_template | scoring_llm 


In [35]:
# lets try to score the LLM results 
query = "What is Climate Change?"

retrieved_docs = retriever.invoke(query)

docs_scores = []
for doc in retrieved_docs:
    score_result = scoring_chain.invoke({'query' : query, 'doc' : doc.page_content})
    doc_score = (doc.page_content, score_result.score)
    docs_scores.append(doc_score)
    print(f"Doc : {doc.page_content[:50]}")
    print(f"Score : {score_result.score}")
    print("="*89)

Doc : Understanding Climate Change 
Chapter 1: Introduct
Score : 8
Doc : Chapter 14: Climate Change and the Economy 
Econom
Score : 8
Doc : and infrastructure. Cities are particularly vulner
Score : 6
Doc : of biodiversity and disrupt ecological balance. 
M
Score : 6
Doc : Mitigation involves reducing or preventing the emi
Score : 6


In [45]:
## rerank the docs 
rerank_docs = sorted(docs_scores, key=lambda x: x[1], reverse=True)

for i, doc in enumerate(rerank_docs):
    print(f"Doc : {i+1}, Score : {doc[1]}")

Doc : 1, Score : 8
Doc : 2, Score : 8
Doc : 3, Score : 6
Doc : 4, Score : 6
Doc : 5, Score : 6


Now we can use top k documents from reranked documents

#### **Custom Retriever which contains Reranking logic**

In [46]:
class CustomRetriever:
    def __init__(self, retriever, top_k=1):
        self.retriever = retriever 
        self.top_k = top_k

    def _get_retrieved_docs(self, query):
        retrieved_docs = self.retriever.invoke(query)
        return retrieved_docs 
    
    def _get_relevancy_score(self, query, docs):
        score_docs = []
        for doc in docs:
            score_result = scoring_chain.invoke({'query' : query, 'doc' : doc.page_content})
            doc_score = (doc.page_content, score_result.score)
            score_docs.append(doc_score)
        
        return score_docs 
    
    def _get_reranked_docs(self, score_docs):
        reranked_docs = sorted(score_docs, key=lambda x: x[1], reverse=True)
        return reranked_docs[:self.top_k]
    
    def main(self, query):
        retrieved_docs = self._get_retrieved_docs(query)
        score_docs = self._get_relevancy_score(query, retrieved_docs)
        reranked_docs = self._get_reranked_docs(score_docs)
        
        return reranked_docs 


In [65]:
from langchain_core.prompts import PromptTemplate

class RAG:
    def __init__(self, llm, custom_retriever):
        self.llm = llm
        self.retriever = custom_retriever

    def get_template(self):
        template_text = """ 
                        You are a helpful assistant that helps user to get answers of query they asked {query}, from the provided context {context}.
                        """
        prompt_template = PromptTemplate(
            template=template_text,
            input_variables=['query', 'context']
        )

        return prompt_template
    
    def rag_chain(self, query, context):
        prompt = self.get_template()

        chain = prompt | self.llm 
        response = chain.invoke({'query' : query, 'context' : context})
        return response

    def main(self, query):
        retrieved_docs = self.retriever.main(query)
        
        context = ""
        for doc in retrieved_docs:
            context += doc[0]
            context += "/n"
        
        response = self.rag_chain(query, context)
        return response.content


In [66]:
if __name__ == "__main__":
    custom_retriever = CustomRetriever(retriever, top_k=2)
    rag = RAG(llm, custom_retriever)
    query = input("Write your query : ")
    print(f"Query : {query}")
    response = rag.main(query)
    print("="*89)
    print(f"Response : {response}")

Query : What is Climate Chage?
Response : I'd be happy to help you understand Climate Change.

So, according to the provided context, Climate Change refers to significant, long-term changes in the global climate. This includes changes in:

1. Temperature
2. Precipitation (amount of rainfall or snowfall)
3. Wind patterns

These changes occur over an extended period, and they are considered "significant" because they have a substantial impact on our planet's weather patterns.

In simple terms, Climate Change is about the long-term shifts in the Earth's climate system, which can be caused by human activities and natural factors.

Would you like to know more about the causes of Climate Change or its effects?
