<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPHeader.png" alt="New MEAP" style="width: 100%;" />
</a>

# Chapter 06 - Progression of RAG Systems: Naïve to Advanced, and Modular RAG

We have familiarized ourselves with the utility of RAG along with the development and evaluation of a basic RAG system. The basic, or the Naïve RAG approach that we have seen so far is, generally, inadequate when it comes to production-grade systems.

<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/6.1.png" alt="Naive RAG Challenges" style="width: 100%;" />
</a>


In this chapter we will focus on more advanced concepts in RAG that make RAG possible in production. Let's begin by installing dependencies.

## Installing Dependencies

All the necessary libraries for running this notebook along with their versions can be found in __requirements.txt__ file in the root directory of this repository

You should go to the root directory and run the following command to install the libraries

```
pip install -r requirements.txt
```

This is the recommended method of installing the dependencies

___
Alternatively, you can run the command from this notebook too. The relative path may vary

In [2]:
%pip install -r ../../requirements.txt --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Advanced RAG Techniques

Advanced techniques in RAG have continued to emerge since the earliest experiments with Naïve RAG. There are three stages in which we can discuss these techniques – 
1.	Pre-retrieval Stage: Like the name suggests, there are certain interventions that can be employed before the retriever comes into action. This broadly covers two aspects 
    - Index Optimization – The way documents are stored in the knowledge base
    - Query Optimization – Optimizing the user query so it aligns better to the retrieval and generation tasks
2.	Retrieval Stage: Certain strategies can improve the recall and precision of the retrieval process. This goes beyond the capability of the underlying retrieval algorithms that we discussed in Chapter 4.
3.	Post-retrieval Stage: Once the information has been retrieved, the context can be further optimized to better align with the generation task and the downstream LLM.


<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/6.2.png" alt="Naive RAG Challenges" style="width: 50%;" />
</a>

We will explore these techniques one by one.

To initialize the __OpenAI client__, we need to pass the api key.

Creating a .env file for storing the API key and using it # Recommended

Install the __dotenv__ library

_The dotenv library is a popular tool used in various programming languages, including Python and Node.js, to manage environment variables in development and deployment environments. It allows developers to load environment variables from a .env file into their application's environment._

- Create a file named .env in the root directory of their project.
- Inside the .env file, then define environment variables in the format VARIABLE_NAME=value. 

e.g.

OPENAI_API_KEY=YOUR API KEY

In [3]:
from dotenv import load_dotenv
import os

if load_dotenv():
    print("Success: .env file found with some environment variables")
else:
    print("Caution: No environment variables found. Please create .env file in the root directory or add environment variables in the .env file")

Success: .env file found with some environment variables


In [4]:
api_key=os.environ["OPENAI_API_KEY"]

from openai import OpenAI

client = OpenAI()


if api_key:
    try:
        client.models.list()
        print("OPENAI_API_KEY is set and is valid")
    except openai.APIError as e:
        print(f"OpenAI API returned an API Error: {e}")
        pass
    except openai.APIConnectionError as e:
        print(f"Failed to connect to OpenAI API: {e}")
        pass
    except openai.RateLimitError as e:
        print(f"OpenAI API request exceeded rate limit: {e}")
        pass

else:
    print("Please set you OpenAI API key as an environment variable OPENAI_API_KEY")



OPENAI_API_KEY is set and is valid


## 1. Pre-retrieval Stage

The primary objective of employing pre-retrieval techniques is to facilitate better retrieval. Retrieval failures can happen because of 2 reasons.
    
- Knowledge Base is not suited for retrieval
    
- Retriever doesn’t completely understand the input query

### 1.1 INDEX OPTIMIZATION

The objective of index Optimization is to set up the knowledge base for better retrieval. 

#### Context Enriched Chunking

This method adds the summary of the larger document to each chunk to enrich the context of the smaller chunk

In [5]:
# Import FAISS class from vectorstore library
from langchain_community.vectorstores import FAISS

# Import OpenAIEmbeddings from the library
from langchain_openai import OpenAIEmbeddings

# Instantiate the embeddings object
embeddings=OpenAIEmbeddings(model="text-embedding-3-small")


In [6]:
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [7]:
url="https://en.wikipedia.org/wiki/2023_Cricket_World_Cup"

In [8]:
loader = AsyncHtmlLoader (url)
data = loader.load()
html2text = Html2TextTransformer()
data_transformed = html2text.transform_documents(data)

Fetching pages:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching pages: 100%|##########| 1/1 [00:00<00:00,  3.24it/s]


In [9]:
document_text=data_transformed[0].page_content

In [10]:
summary_prompt = f"Summarize the given document in a single paragraph\ndocument: {document_text}"

In [11]:
# Importing the ChatOpenAI library
from langchain_openai import ChatOpenAI

# Set up LLM 
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

#Craft the prompt message
messages=[("human",summary_prompt)]


# Invoke the LLM
ai_msg = llm.invoke(messages)

# Extract the answer from the response object
answer=ai_msg.content

import textwrap
print(textwrap.fill(answer, width=80))

The 2023 ICC Men's Cricket World Cup, held in India from October 5 to November
19, 2023, marked the 13th edition of this prestigious One Day International
(ODI) tournament, featuring ten national teams. Australia emerged as champions,
claiming their sixth title by defeating India in the final at the Narendra Modi
Stadium in Ahmedabad. The tournament attracted a record attendance of over 1.25
million spectators and set viewership records in India, with 518 million viewers
for the final. Virat Kohli was named Player of the Series, scoring the most runs
(765), while Mohammed Shami led in wickets taken (24). The event was initially
scheduled for early 2023 but was postponed due to the COVID-19 pandemic, and it
introduced new penalties for slow over-rates. The tournament format included a
round-robin group stage followed by knockout rounds, culminating in a highly
anticipated final.


In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
#Set the CharacterTextSplitter parameters
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, #Number of characters in each chunk 
chunk_overlap=200, #Number of overlapping characters between chunks
)
#Create Chunks
chunks=text_splitter.split_text(data_transformed[0].page_content)

In [17]:
print(len(chunks))

70


In [None]:
context_enriched_chunks = [answer + "\n" + chunk for chunk in chunks]

In [19]:
print(len(context_enriched_chunks))

70


In [22]:
from langchain_openai import OpenAIEmbeddings

# Instantiate the embeddings object
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatIP(1536)

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

vector_store.add_texts(texts=context_enriched_chunks)

['0fe322d5-19c7-435d-9e31-0816850fcb86',
 '42343421-501e-4057-bec9-ae4401a403e1',
 'bf8d26d4-eb96-44a7-ac81-65cacd7f9cca',
 '50f75d67-5b7a-485b-9b55-ea5b9a0a552f',
 'd23a17e1-79f8-466d-8074-e90fdee1c47a',
 'd14b38ca-f394-4212-95e6-c5aa97a2b37e',
 '1f6dc75b-d2e4-452b-881a-ca8fbd4bf77e',
 '2199d18e-76e0-4677-8e16-b0676e3622d5',
 '50cec4ba-a827-4794-b8f9-30d096bdd51d',
 '72d7001d-8e30-42c7-8f3c-c1eeaf28aee1',
 '8462aab0-71ff-4e87-b773-eb3d86e2f37b',
 '7aeacd45-b1b5-45fd-93c4-9888aaeeb637',
 'efcd743c-ec80-4523-b20c-ada55f374a98',
 '4650499b-81b4-44f0-b39e-3d7e24a59a37',
 '9b4ab0ab-79ed-441a-8d46-646e274056c2',
 'b8373fad-f7eb-4f6c-bbf7-4e3519a1aec9',
 'f0c8b2b7-277f-41fb-9858-a70e819caea4',
 '6bf6fc2a-7f28-4c92-8ad1-6469420eecbc',
 'ec1c567d-6ba9-41a6-b344-1bd5b23eef73',
 '39b6c881-3713-4f90-8def-2b56fd69815b',
 'c75e669b-4a23-4a6c-aa74-780b63ec84ab',
 '9ebca45a-1b5a-4d3a-805f-e5b3ea0cff3a',
 'bd023640-0d35-4716-b746-71c54d2b9b0c',
 '8e3bd3f2-dd1f-4e21-8833-2a5d0b736032',
 '42fbefb0-a11d-

In [23]:
query = "What records did Virat Kohli make?"
retrieved_docs = vector_store.similarity_search(query, k=2)


In [26]:
print(retrieved_docs[0].page_content)

The 2023 ICC Men's Cricket World Cup, held in India from October 5 to November 19, 2023, marked the 13th edition of this prestigious One Day International (ODI) tournament, featuring ten national teams. Australia emerged as champions, claiming their sixth title by defeating India in the final at the Narendra Modi Stadium in Ahmedabad. The tournament attracted a record attendance of over 1.25 million spectators and set viewership records in India, with 518 million viewers for the final. Virat Kohli was named Player of the Series, scoring the most runs (765), while Mohammed Shami led in wickets taken (24). The event was initially scheduled for early 2023 but was postponed due to the COVID-19 pandemic, and it introduced new penalties for slow over-rates. The tournament format included a round-robin group stage followed by knockout rounds, culminating in a highly anticipated final.
Main article: 2023 Cricket World Cup final

19 November 2023  
14:00 (D/N)  
Scorecard  
---  
**India **  
2

---


#### Metadata Enhancement

This method adds the summary of the larger document to each chunk to enrich the context of the smaller chunk

In [27]:
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI
from langchain_openai import ChatOpenAI

# Initialize the OpenAI client
client = OpenAI()

# Function to extract fixed metadata using GPT-4o-mini with JSON response
def extract_fixed_metadata_from_chunk(chunk_text):
    prompt = f"""
    Extract the following fixed metadata in JSON format from the given text:
    {{
      "player_1": "",
      "player_2": "",
      "player_3": "",
      "player_4": "",
      "player_5": "",
      "team_1": "",
      "team_2": "",
      "team_3": "",
      "team_4": "",
      "team_5": "",
      "keyword_1": "",
      "keyword_2": "",
      "keyword_3": "",
      "keyword_4": "",
      "keyword_5": ""
    }}
    Here's the text:
    {chunk_text}
    """

    llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
    )

    json_llm = llm.bind(response_format={"type": "json_object"})


    #Craft the prompt message
    messages=[("human",prompt)]


    # Invoke the LLM
    ai_msg = json_llm.invoke(messages)
    

    
    # Extract the response in JSON format
    metadata_response = ai_msg.content
    print(metadata_response)
    try:
        # Convert the response into a dictionary
        metadata = eval(metadata_response)  # This ensures it is a valid dictionary
    except Exception as e:
        print(f"Error parsing metadata: {e}")
        metadata = {
            "player_1": "", "player_2": "", "player_3": "", "player_4": "", "player_5": "",
            "team_1": "", "team_2": "", "team_3": "", "team_4": "", "team_5": "",
            "keyword_1": "", "keyword_2": "", "keyword_3": "", "keyword_4": "", "keyword_5": ""
        }
    return metadata

# Step 1: Load data from a URL (Wikipedia page)
url = "https://en.wikipedia.org/wiki/2023_Cricket_World_Cup"
loader = AsyncHtmlLoader(url)
data = loader.load()

# Step 2: Transform the HTML content to plain text
html2text = Html2TextTransformer()
data_transformed = html2text.transform_documents(data)

# Step 3: Split the text into smaller chunks using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=10000,  # Number of characters in each chunk
    chunk_overlap=200  # Number of overlapping characters between chunks
)
chunks = text_splitter.split_text(data_transformed[0].page_content)

# Step 4: Initialize OpenAI Embeddings model
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")

# Step 5: Initialize FAISS index for L2 (Euclidean) distance
embedding_dim = len(embedding_model.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

# Step 6: Initialize the InMemoryDocstore to store documents and metadata in memory
docstore = InMemoryDocstore()

# Step 7: Create FAISS vector store using the embedding function, FAISS index, and docstore
vector_store = FAISS(
    embedding_function=embedding_model,
    index=index,
    docstore=docstore,
    index_to_docstore_id={}
)

# Step 8: Add chunks (documents) with extracted metadata and embeddings to FAISS vector store
documents = []
for i, chunk in enumerate(chunks):
    # Extract fixed metadata using the LLM
    extracted_metadata = extract_fixed_metadata_from_chunk(chunk)
    
    # Create a document object with both the chunk content and the extracted metadata
    document = Document(
        page_content=chunk, 
        metadata={
            "source": url, 
            "category": "cricket world cup",
            "extracted_metadata": extracted_metadata  # Store the structured metadata
        }
    )
    
    # Append the document to the list
    documents.append(document)

# Create unique IDs for each chunk
ids = [f"chunk_{i}" for i in range(len(chunks))]

# Add the documents and their embeddings to the FAISS vector store
vector_store.add_documents(documents=documents, ids=ids)

# Step 9: Define a function to extract metadata from a query
def extract_fixed_metadata_from_query(query_text):
    prompt = f"""
    Extract the following fixed metadata in JSON format from the query:
    {{
      "player_1": "",
      "player_2": "",
      "player_3": "",
      "player_4": "",
      "player_5": "",
      "team_1": "",
      "team_2": "",
      "team_3": "",
      "team_4": "",
      "team_5": "",
      "keyword_1": "",
      "keyword_2": "",
      "keyword_3": "",
      "keyword_4": "",
      "keyword_5": ""
    }}
    Here's the query:
    {query_text}
    """
    

    llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
    )

    json_llm = llm.bind(response_format={"type": "json_object"})


    #Craft the prompt message
    messages=[("human",prompt)]


    # Invoke the LLM
    ai_msg = json_llm.invoke(messages)




    # Extract the response in JSON format
    metadata_response = ai_msg.content
    try:
        # Convert the response into a dictionary
        metadata = eval(metadata_response)
    except Exception as e:
        print(f"Error parsing metadata: {e}")
        metadata = {
            "player_1": "", "player_2": "", "player_3": "", "player_4": "", "player_5": "",
            "team_1": "", "team_2": "", "team_3": "", "team_4": "", "team_5": "",
            "keyword_1": "", "keyword_2": "", "keyword_3": "", "keyword_4": "", "keyword_5": ""
        }
    return metadata

# Step 10: Extract metadata from the query
query = "Virat Kohli records in 2023 Cricket World Cup"
query_metadata = extract_fixed_metadata_from_query(query)

# Step 11: Define a metadata filter based on the query's extracted metadata
def metadata_filter(doc_metadata):
    query_players = {query_metadata[f"player_{i}"] for i in range(1, 6) if query_metadata[f"player_{i}"]}
    query_teams = {query_metadata[f"team_{i}"] for i in range(1, 6) if query_metadata[f"team_{i}"]}
    query_keywords = {query_metadata[f"keyword_{i}"] for i in range(1, 6) if query_metadata[f"keyword_{i}"]}
    doc_players = {doc_metadata["extracted_metadata"][f"player_{i}"] for i in range(1, 6) if doc_metadata["extracted_metadata"][f"player_{i}"]}
    doc_teams = {doc_metadata["extracted_metadata"][f"team_{i}"] for i in range(1, 6) if doc_metadata["extracted_metadata"][f"team_{i}"]}
    doc_keywords = {doc_metadata["extracted_metadata"][f"keyword_{i}"] for i in range(1, 6) if doc_metadata["extracted_metadata"][f"keyword_{i}"]}
    
    # Check if there's any overlap between the query metadata and document metadata
    return bool(query_players & doc_players or query_teams & doc_teams or query_keywords & doc_keywords)

# Step 12: Perform a similarity search on the stored chunks with the metadata filter
results = vector_store.similarity_search(query=query, k=3, filter=metadata_filter)

# Step 13: Display the results with metadata
for doc in results:
    print(f"Document: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")


Fetching pages:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching pages: 100%|##########| 1/1 [00:00<00:00,  2.65it/s]


{
  "player_1": "Virat Kohli",
  "player_2": "Mohammed Shami",
  "player_3": "",
  "player_4": "",
  "player_5": "",
  "team_1": "Australia",
  "team_2": "India",
  "team_3": "New Zealand",
  "team_4": "South Africa",
  "team_5": "",
  "keyword_1": "2023 Cricket World Cup",
  "keyword_2": "One Day International",
  "keyword_3": "International Cricket Council",
  "keyword_4": "Knockout stage",
  "keyword_5": "Prize money"
}
{
  "player_1": "",
  "player_2": "",
  "player_3": "",
  "player_4": "",
  "player_5": "",
  "team_1": "India",
  "team_2": "Afghanistan",
  "team_3": "Australia",
  "team_4": "Bangladesh",
  "team_5": "England",
  "keyword_1": "2023 Cricket World Cup",
  "keyword_2": "qualification",
  "keyword_3": "Super League",
  "keyword_4": "Sri Lanka",
  "keyword_5": "Netherlands"
}
{
  "player_1": "India",
  "player_2": "South Africa",
  "player_3": "Australia",
  "player_4": "New Zealand",
  "player_5": "Pakistan",
  "team_1": "Afghanistan",
  "team_2": "England",
  "team_3

### 1.2 QUERY OPTIMIZATION

The objective of this stage is to optimize the input user query in a manner that makes it better suited for the retrieval tasks

#### Query Expansion

In query expansion, the original user query is enriched with the aim of retrieving more relevant information. This helps in increasing the recall of the system and overcomes the challenge of incomplete or very brief user queries.

In [28]:
original_query="How does climate change affect polar bears?"
num=5

In [29]:
response_structure='''
{
    "queries": [
        {
            "query": "query",
    },
    ...
]}
'''

In [30]:
expansion_prompt=f"Generate {num} variations of the following query: {original_query}. Respond in JSON format.\Stick to this Structure :\n{response_structure}"

  expansion_prompt=f"Generate {num} variations of the following query: {original_query}. Respond in JSON format.\Stick to this Structure :\n{response_structure}"


In [31]:
step_back_expansion_prompt = f"Given the query: '{original_query}', generate a more abstract, higher-level conceptual query."

In [32]:
sub_query_expansion_prompt=f"Break down the following query into {num} sub-queries targeting different aspects of the query: '{original_query}'. Respond in JSON format."


In [33]:
# Importing the OpenAI library
from openai import OpenAI

# Instantiate the OpenAI client
client = OpenAI()

# Make the API call passing the augmented prompt to the LLM
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=	[
    {"role": "user", "content": expansion_prompt}
  		],
          response_format={ "type": "json_object" }
)

# Extract the answer from the response object
answer=response.choices[0].message.content

In [34]:
print(answer)

{
    "queries": [
        {
            "query": "What impact does climate change have on polar bear populations?"
        },
        {
            "query": "In what ways does climate change influence the habitat of polar bears?"
        },
        {
            "query": "How are polar bears being affected by global warming?"
        },
        {
            "query": "What are the consequences of climate change for polar bear survival?"
        },
        {
            "query": "How is the behavior of polar bears changing due to climate change?"
        }
    ]
}


In [35]:


# Make the API call passing the augmented prompt to the LLM
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=	[
    {"role": "user", "content": step_back_expansion_prompt}
  ]
)

# Extract the answer from the response object
answer=response.choices[0].message.content

In [36]:
print(answer)

"What are the broader ecological and environmental impacts of climate change on specialized species in arctic ecosystems?"


In [37]:

# Make the API call passing the augmented prompt to the LLM
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=	[
    {"role": "user", "content": sub_query_expansion_prompt}
  ],
  response_format={ "type": "json_object" }
)

# Extract the answer from the response object
answer=response.choices[0].message.content

In [38]:
print(answer)

{
  "sub_queries": [
    {
      "id": 1,
      "focus": "Impact on Habitat",
      "query": "What changes in habitat occur for polar bears due to climate change?"
    },
    {
      "id": 2,
      "focus": "Food Availability",
      "query": "How does climate change affect the availability of food sources for polar bears?"
    },
    {
      "id": 3,
      "focus": "Reproductive Health",
      "query": "What are the effects of climate change on the reproductive health and population dynamics of polar bears?"
    },
    {
      "id": 4,
      "focus": "Behavioral Changes",
      "query": "How does climate change influence the behavior and migration patterns of polar bears?"
    },
    {
      "id": 5,
      "focus": "Long-term Survival",
      "query": "What are the long-term implications of climate change on the survival of polar bear populations?"
    }
  ]
}


#### Query Transformation

Compared to query expansion, in query transformation, instead of the original user query retrieval happens on a transformed query which is more suitable for the retriever

In [39]:
original_query="How does climate change affect polar bears?"

In [40]:
system_prompt="You are an expert in climate change and arctic life."
hyde_prompt=f"Generate an answer to the question: {original_query}"

In [41]:

# Make the API call passing the augmented prompt to the LLM
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=	[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": hyde_prompt}
  ]
)

# Extract the answer from the response object
answer=response.choices[0].message.content

In [42]:
print(answer)

Climate change significantly affects polar bears primarily through the loss of sea ice, which is crucial for their survival. Here are some key ways in which climate change impacts polar bears:

1. **Loss of Habitat**: Polar bears depend on sea ice as a platform for hunting seals, their primary food source. As global temperatures rise, sea ice is melting at an alarming rate during the summer months and forming later in the fall. This reduced availability of ice means bears have to swim longer distances to find food.

2. **Decreased Prey Availability**: As ice melts, the seals that polar bears hunt also face challenges. With the loss of stable ice platforms, seal populations may decrease, ultimately leading to food scarcity for polar bears. This reduced access to prey impacts their health, reproduction, and survival rates.

3. **Increased Energy Expenditure**: With the melting ice, polar bears may have to travel further in search of food. This increased energy expenditure can lead to fat

In [44]:
# Initialize the OpenAIEmbeddings model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create embedding for the hypothetical answer
hyde_embedding = embeddings.embed_query(answer)

# Check and print the dimension of the embedding
embedding_dimension = len(hyde_embedding)
print(f"The embedding dimension is: {embedding_dimension}")


The embedding dimension is: 1536


## 2. Retrieval Strategies

Interventions in the pre-retrieval stage can bring significant improvements in the performance of the RAG system if the query and the knowledge base becomes well aligned with the retrieval algorithm. 

#### Hybrid Retrieval

In [46]:
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from langchain_community.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import Html2TextTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.retrievers import BM25Retriever

# Step 1: Load data from a URL (Wikipedia page)
url = "https://en.wikipedia.org/wiki/2023_Cricket_World_Cup"
loader = AsyncHtmlLoader(url)
data = loader.load()

# Step 2: Transform the HTML content to plain text
html2text = Html2TextTransformer()
data_transformed = html2text.transform_documents(data)

# Step 3: Split the text into smaller chunks using RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,  # Number of characters in each chunk
    chunk_overlap=200  # Number of overlapping characters between chunks
)
chunks = text_splitter.split_text(data_transformed[0].page_content)

# Step 4: Dense Retrieval (FAISS + OpenAI Embeddings)

# Initialize OpenAI Embeddings model for dense retrieval
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Initialize FAISS index for dense retrieval
embedding_dim = len(embedding_model.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

# Create an in-memory document store to support adding documents
docstore = InMemoryDocstore()

# Initialize FAISS vector store
vector_store = FAISS(embedding_function=embedding_model, index=index, docstore=docstore, index_to_docstore_id={})

# Add chunks to FAISS vector store
documents = [Document(page_content=chunk) for chunk in chunks]
vector_store.add_documents(documents)

# Step 5: Sparse Retrieval (BM25 using LangChain's BM25Retriever)

# Initialize BM25Retriever
bm25_retriever = BM25Retriever.from_documents(documents)

# Step 6: Hybrid Retrieval Strategy

def hybrid_search(query, k=5):
    # Step 6.1: Perform dense retrieval using FAISS
    dense_results = vector_store.similarity_search(query=query, k=k)
    
    # Step 6.2: Perform sparse retrieval using BM25Retriever
    sparse_results = bm25_retriever.get_relevant_documents(query)
    
    # Limit sparse results to top-k
    sparse_results = sparse_results[:k]
    
    # Step 6.3: Combine dense and sparse results
    combined_results = []
    for dense_doc in dense_results:
        combined_results.append(("dense", dense_doc.page_content))

    for sparse_doc in sparse_results:
        combined_results.append(("sparse", sparse_doc.page_content))

    # Optionally, re-rank or further process combined results
    return combined_results

# Step 7: Perform a hybrid search
query = "Virat Kohli records in 2023 Cricket World Cup"
hybrid_results = hybrid_search(query)

# Step 8: Display the results
for retrieval_type, result in hybrid_results:
    print(f"Retrieval Type: {retrieval_type}")
    print(f"Result: {result}\n")


Fetching pages: 100%|##########| 1/1 [00:00<00:00,  3.25it/s]


Retrieval Type: dense
Result: 46. **^** "It's official! India set up 2023 World Cup semi-final against New Zealand in 2019 rematch; Pakistan knocked out". _Hindustan Times_. 11 November 2023. Archived from the original on 14 November 2023. Retrieved 12 November 2023.
  47. **^** "2023 World Cup Cricket Batting Records & Stats runs". _ESPNcricinfo_. Archived from the original on 18 October 2023. Retrieved 19 October 2023.

Retrieval Type: dense
Result: 48. **^** "2023 World Cup Cricket bowling Records & Stats wickets". _ESPNcricinfo_. Archived from the original on 9 October 2023. Retrieved 10 October 2023.
  49. **^** "India star named Player of the Tournament at ICC Men's Cricket World Cup". _Cricket World Cup_. Archived from the original on 19 November 2023. Retrieved 19 November 2023.

Retrieval Type: dense
Result: Main article: 2023 Cricket World Cup final

19 November 2023  
14:00 (D/N)  
Scorecard  
---  
**India **  
240 (50 overs) | **v** | **Australia**  
241/4 (43 overs)  
---

  sparse_results = bm25_retriever.get_relevant_documents(query)


## 3. Post Retrieval Stage

At the post-retrieval stage the approaches of reranking and compression help in providing better context to the LLM for generation.

#### Compression

In prompt compression, language models are used to detect and remove unimportant and irrelevant tokens

In [47]:
document_to_compress=retrieved_docs[0].page_content

In [48]:
compress_prompt = f"Compress the following document into very short sentences, retaining only the extremely essential information:\n\n{document_to_compress}"

In [49]:
# Make the API call passing the augmented prompt to the LLM
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=	[
    {"role": "user", "content": compress_prompt}
  ]
)

# Extract the answer from the response object
answer=response.choices[0].message.content

In [50]:
print(textwrap.fill(answer, width=80))

The 2023 ICC Men's Cricket World Cup took place in India from October 5 to
November 19, 2023. It was the 13th ODI tournament with ten teams. Australia won,
defeating India in the final. The match was held at the Narendra Modi Stadium in
Ahmedabad. Attendance exceeded 1.25 million, and the final had 518 million
viewers. Virat Kohli was Player of the Series with 765 runs. Mohammed Shami took
the most wickets, with 24. The event was postponed from early 2023 due to
COVID-19. New penalties for slow over-rates were introduced. The format included
a round-robin stage and knockout rounds.  Final Match: - Date: 19 November 2023
- India: 240 (50 overs) - Australia: 241/4 (43 overs) - Australia won by 6
wickets.  Top Performers: - Most Runs: Virat Kohli (765), Rohit Sharma (597),
Quinton de Kock (594). - Most Wickets: Mohammed Shami (24), Adam Zampa (23),
Dilshan Madushanka (21).


In [51]:
print(textwrap.fill(document_to_compress, width=80))

The 2023 ICC Men's Cricket World Cup, held in India from October 5 to November
19, 2023, marked the 13th edition of this prestigious One Day International
(ODI) tournament, featuring ten national teams. Australia emerged as champions,
claiming their sixth title by defeating India in the final at the Narendra Modi
Stadium in Ahmedabad. The tournament attracted a record attendance of over 1.25
million spectators and set viewership records in India, with 518 million viewers
for the final. Virat Kohli was named Player of the Series, scoring the most runs
(765), while Mohammed Shami led in wickets taken (24). The event was initially
scheduled for early 2023 but was postponed due to the COVID-19 pandemic, and it
introduced new penalties for slow over-rates. The tournament format included a
round-robin group stage followed by knockout rounds, culminating in a highly
anticipated final. Main article: 2023 Cricket World Cup final  19 November 2023
14:00 (D/N)   Scorecard   ---   **India **   240

---

<img src="../../Assets/Images/profile_s.png" width=100> 

Hi! I'm Abhinav! I am an entrepreneur and Vice President of Artificial Intelligence at Yarnit. I have spent over 15 years consulting and leadership roles in data science, machine learning and AI. My current focus is in the applied Generative AI domain focussing on solving enterprise needs through contextual intelligence. I'm passionate about AI advancements constantly exploring emerging technologies to push the boundaries and create positive impacts in the world. Let’s build the future, together!

[If you haven't already, please subscribe to the MEAP of A Simple Guide to Retrieval Augmented Generation here](https://mng.bz/8wdg)

<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPFooter.png" alt="New MEAP" style="width: 100%;" />
</a>

#### If you'd like to chat, I'd be very happy to connect

[![GitHub followers](https://img.shields.io/badge/Github-000000?style=for-the-badge&logo=github&logoColor=black&color=orange)](https://github.com/abhinav-kimothi)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-000000?style=for-the-badge&logo=linkedin&logoColor=orange&color=black)](https://www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=abhinav-kimothi)
[![Medium](https://img.shields.io/badge/Medium-000000?style=for-the-badge&logo=medium&logoColor=black&color=orange)](https://medium.com/@abhinavkimothi)
[![Insta](https://img.shields.io/badge/Instagram-000000?style=for-the-badge&logo=instagram&logoColor=orange&color=black)](https://www.instagram.com/akaiworks/)
[![Mail](https://img.shields.io/badge/email-000000?style=for-the-badge&logo=gmail&logoColor=black&color=orange)](mailto:abhinav.kimothi.ds@gmail.com)
[![X](https://img.shields.io/badge/Follow-000000?style=for-the-badge&logo=X&logoColor=orange&color=black)](https://twitter.com/abhinav_kimothi)
[![Linktree](https://img.shields.io/badge/Linktree-000000?style=for-the-badge&logo=linktree&logoColor=black&color=orange)](https://linktr.ee/abhinavkimothi)
[![Gumroad](https://img.shields.io/badge/Gumroad-000000?style=for-the-badge&logo=gumroad&logoColor=orange&color=black)](https://abhinavkimothi.gumroad.com/)

---