# RAG Fusion Implementation:
A Guide to Advanced Retrieval-Augmented Generation

<a href="https://github.com/adithya-s-k/AI-Engineering.academy">
<img src="https://raw.githubusercontent.com/adithya-s-k/AI-Engineering.academy/main/assets/banner.png" width="50%">
</a>

Welcome to the RAG Fusion Implementation guide! This notebook is designed to introduce you to the concept of RAG Fusion, an advanced technique that enhances the traditional Retrieval-Augmented Generation (RAG) approach. We'll provide a step-by-step walkthrough of implementing a RAG Fusion system.

## Introduction

RAG Fusion is an advanced technique that builds upon the foundations of Retrieval-Augmented Generation (RAG). It combines the power of large language models with sophisticated information retrieval methods to produce more accurate, diverse, and contextually rich responses. RAG Fusion enhances the basic RAG approach by employing query expansion, multiple retrievals, and intelligent result combination techniques.

This notebook aims to provide a clear and comprehensive introduction to RAG Fusion, suitable for those who are familiar with basic RAG and want to explore more advanced implementations.

## Getting Started

To get the most out of this notebook, you should have a good understanding of Python and be familiar with basic RAG concepts. Don't worry if some advanced ideas are new to you – we'll guide you through each step of the RAG Fusion process!

### Prerequisites

- Python 3.9+
- Jupyter Notebook or JupyterLab
- Familiarity with basic RAG concepts
- Understanding of vector databases and embeddings
- Basic knowledge of natural language processing (NLP) concepts

## Notebook Contents

Our notebook is structured into the following main sections:

1. **Environment Set Up**: We'll guide you through setting up your Python environment with all the necessary libraries and dependencies for RAG Fusion.

2. **Query Expansion**: Learn how to generate multiple queries from a single user input to capture different aspects of the user's intent.

3. **Multiple Retrievals**: Understand how to perform and manage multiple retrieval operations using various queries.

4. **Reciprocal Rank Fusion (RRF)**: Dive into the RRF algorithm and how it's used to combine results from multiple retrievals effectively.

5. **Context Selection and Reranking**: Explore techniques for selecting diverse and relevant contexts from the fused retrieval results.

6. **Enhanced Prompting**: Learn how to craft effective prompts that leverage the diverse contexts obtained through RAG Fusion.

7. **RAG Fusion Pipeline**: We'll walk you through the process of setting up a complete RAG Fusion pipeline, integrating all the components.

8. **Advanced Topics and Optimizations**: Explore advanced concepts like dynamic query expansion, adaptive fusion techniques, and performance optimizations.

By the end of this notebook, you'll have a solid understanding of RAG Fusion and be able to implement this advanced technique in your own projects. Let's dive in and explore the cutting edge of Retrieval-Augmented Generation!

## Set up environment

In [50]:
# Import necessary libraries
import os
from typing import List, Dict
from dotenv import load_dotenv
from IPython.display import Markdown, display

# OpenAI import
from openai import OpenAI

# LlamaIndex imports
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document
from llama_index.core import Settings
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.base.llms.types import ChatMessage, MessageRole

# Qdrant client import
import qdrant_client

In [51]:
# Load environment variables
load_dotenv()

# Get OpenAI API key from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if OPENAI_API_KEY is None:
    raise Exception("No OpenAI API key found. Please set it as an environment variable.")

# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

# Set the embedding model
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")

## Function to generate queries using OpenAI's ChatGPT


In [52]:
def generate_queries_chatgpt(original_query: str) -> List[str]:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that generates multiple search queries based on a single input query."},
            {"role": "user", "content": f"Generate multiple search queries related to: {original_query}"},
            {"role": "user", "content": "OUTPUT (4 queries):"}
        ]
    )
    generated_queries = response.choices[0].message.content.strip().split("\n")
    return generated_queries

## Function to perform vector search


In [53]:
def vector_search(query: str, index: VectorStoreIndex) -> Dict[str, float]:
    retriever = index.as_retriever(similarity_top_k=5)
    nodes = retriever.retrieve(query)
    return {node.node.get_content(): node.score for node in nodes}

## Reciprocal Rank Fusion algorithm


In [54]:
def reciprocal_rank_fusion(search_results_dict: Dict[str, Dict[str, float]], k: int = 60) -> Dict[str, float]:
    fused_scores = {}
    print("Initial individual search result ranks:")
    for query, doc_scores in search_results_dict.items():
        print(f"For query '{query}': {doc_scores}")
        
    for query, doc_scores in search_results_dict.items():
        for rank, (doc, score) in enumerate(sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)):
            if doc not in fused_scores:
                fused_scores[doc] = 0
            previous_score = fused_scores[doc]
            fused_scores[doc] += 1 / (rank + k)
            print(f"Updating score for {doc} from {previous_score} to {fused_scores[doc]} based on rank {rank} in query '{query}'")

    reranked_results = {doc: score for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)}
    print("Final reranked results:", reranked_results)
    return reranked_results

## Load the Data

In [55]:
reader = SimpleDirectoryReader("data", recursive=True)
documents = reader.load_data(show_progress=True)
documents = Document(text="\n\n".join([doc.text for doc in documents]))

Loading files: 100%|██████████| 1/1 [00:00<00:00,  8.30file/s]


## Setting up Vector Database

We will be using qDrant as the Vector database
There are 4 ways to initialize qdrant 

1. Inmemory
```python
client = qdrant_client.QdrantClient(location=":memory:")
```
2. Disk
```python
client = qdrant_client.QdrantClient(path="./data")
```
3. Self hosted or Docker
```python

client = qdrant_client.QdrantClient(
    # url="http://<host>:<port>"
    host="localhost",port=6333
)
```

4. Qdrant cloud
```python
client = qdrant_client.QdrantClient(
    url=QDRANT_CLOUD_ENDPOINT,
    api_key=QDRANT_API_KEY,
)
```

for this notebook we will be using qdrant cloud

In [56]:
qdrant_client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=qdrant_client, collection_name="RAG_Fusion")

### Ingest Data into vector DB

In [58]:
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=20),
        Settings.embed_model,
    ],
    vector_store=vector_store,
)

nodes = pipeline.run(documents=[documents], show_progress=True)
print("Number of chunks added to vector DB:", len(nodes))

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 193.16it/s]
Generating embeddings: 100%|██████████| 2/2 [00:00<00:00,  2.51it/s]

Number of chunks added to vector DB: 2





## Setting Up Retriever

In [59]:
# Create index
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## ChatEngineInterface with RAG Fusion

In [60]:
class ChatEngineInterface:
    def __init__(self, index: VectorStoreIndex):
        self.index = index
        self.chat_history: List[ChatMessage] = []

    def display_message(self, role: str, content: str):
        if role == "USER":
            display(Markdown(f"**Human:** {content}"))
        else:
            display(Markdown(f"**AI:** {content}"))

    def chat(self, message: str) -> str:
        user_message = ChatMessage(role=MessageRole.USER, content=message)
        self.chat_history.append(user_message)
        
        # Generate multiple queries
        generated_queries = generate_queries_chatgpt(message)
        
        # Perform vector search for each query
        all_results = {}
        for query in generated_queries:
            search_results = vector_search(query, self.index)
            all_results[query] = search_results
        
        # Apply Reciprocal Rank Fusion
        reranked_results = reciprocal_rank_fusion(all_results)
        
        # Use reranked results to generate response
        top_docs = list(reranked_results.keys())[:3]  # Get top 3 documents
        context = "\n".join(top_docs)
        
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that answers questions based on the given context."},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {message}"}
            ]
        )
        
        ai_response = response.choices[0].message.content.strip()
        ai_message = ChatMessage(role=MessageRole.ASSISTANT, content=ai_response)
        self.chat_history.append(ai_message)
        
        self.display_message("USER", message)
        self.display_message("ASSISTANT", ai_response)
        
        print("\n" + "-"*50 + "\n")  # Separator for readability

        return ai_response

    def get_chat_history(self) -> List[ChatMessage]:
        return self.chat_history

In [61]:
# Usage
chat_interface = ChatEngineInterface(index)

In [62]:
# Example usage
chat_interface.chat("What is Samarth's CGPA?")
chat_interface.chat("What are all the AI projects done by samarth?")

Initial individual search result ranks:
For query '1. Samarth CGPA transcript': {'PES University EC Campus\n2022 - Present  | 8.56 CGPA\nGEAR Innovative International School\nChinmaya Vidyalaya, Koramangala\n2010 - 2020 | Grade X ICSE - 93%Profile\nsamarthprakash8@gmail.com\nhttps://samarth.arthttps://github.com/samarth777\n+91 7337610771\n2020 - 2022 | Grade XII CBSE - 87%1st Sem     | 8.23 SGPA\n2nd Sem  | 8.82 SGPA\n3rd Sem   | 8.63 SGPA\nAwarded CNR Scholarship in Semester 3https://www.linkedin.com/in/samarth-\np-3964721b3/Experience\nSummer Intern,': 0.7078029412854389, "Certifications/CoursesSA M A RTH  P\n3rd Year, B. Tech CSE at PES University, EC Campus\nSkills\nProficient in Python, C, C++, JavaScript\nIn depth knowledge of Machine\nlearning Algorithms, GenAI, LLMs,\nRAG, Agentic AI Systems, Diffusion\nModels\nFull stack deveopment with Flutter,\nHTML, CSS, JavaScript, React Node.js,\nNext.js, MongoDB, Firebase, Flask,\nDjango\nExperience in IoT and robotics, deep\nunderstand

**Human:** What is Samarth's CGPA?

**AI:** Samarth's CGPA is 8.56 at PES University EC Campus where he is pursuing B. Tech CSE from 2022 to present.


--------------------------------------------------

Initial individual search result ranks:
For query '1. "List of AI projects by Samarth"': {"Certifications/CoursesSA M A RTH  P\n3rd Year, B. Tech CSE at PES University, EC Campus\nSkills\nProficient in Python, C, C++, JavaScript\nIn depth knowledge of Machine\nlearning Algorithms, GenAI, LLMs,\nRAG, Agentic AI Systems, Diffusion\nModels\nFull stack deveopment with Flutter,\nHTML, CSS, JavaScript, React Node.js,\nNext.js, MongoDB, Firebase, Flask,\nDjango\nExperience in IoT and robotics, deep\nunderstanding of embedded systems\nnotably Arduino\nFoundational understanding of\nQuantum Physics principles and\npractical experience using Qiskit for\nQuantum Computing \nSkilled in fine arts especially acrylic\nand watercolor paintings\nEducation Background\nB Tech CSE, PES University EC Campus\n2022 - Present  | 8.56 CGPA\nGEAR Innovative International School\nChinmaya Vidyalaya, Koramangala\n2010 - 2020 | Grade X ICSE - 93%Profile\nsamarth

**Human:** What are all the AI projects done by samarth?

**AI:** The AI projects done by Samarth are:

1. Gen AI Project for Bosch under the University Connect Program
2. AlienWear - AI-powered fashion e-commerce platform
3. KissanDial - Voice call-based AI agent assistant


--------------------------------------------------



'The AI projects done by Samarth are:\n\n1. Gen AI Project for Bosch under the University Connect Program\n2. AlienWear - AI-powered fashion e-commerce platform\n3. KissanDial - Voice call-based AI agent assistant'

In [63]:
# To view chat history:
history = chat_interface.get_chat_history()
for message in history:
    print(f"{message.role}: {message.content}")

MessageRole.USER: What is Samarth's CGPA?
MessageRole.ASSISTANT: Samarth's CGPA is 8.56 at PES University EC Campus where he is pursuing B. Tech CSE from 2022 to present.
MessageRole.USER: What are all the AI projects done by samarth?
MessageRole.ASSISTANT: The AI projects done by Samarth are:

1. Gen AI Project for Bosch under the University Connect Program
2. AlienWear - AI-powered fashion e-commerce platform
3. KissanDial - Voice call-based AI agent assistant
