#Introducing Naïve, Advanced, and Modular RAG

Copyright 2024, Denis Rothman

This notebook introduces Naïve, Advanced, and Modular RAG through basic educational examples.

It explores keyword matching, vector search, and index-based retrieval methods. Using OpenAI's GPT models, it generates responses based on input queries and retrieved documents.

The modular RAG system offers flexibility in selecting retrieval strategies, allowing adaptation to various tasks and data characteristics.

**Summary**

- Introduction of Naïve, Advanced, and Modular RAG
- Environment setup for OpenAI API integration
- Generator function using GPT models
- Formatted response printing
- **Data** setup with a list of documents (db_records)
- **Query** user request
- **1.Naïve RAG**:
  - Keyword search and matching function
  - Augmented input creation
  - Generation with GPT
- **2.Advanced RAG**:
  - Vector search:
    - Cosine similarity calculation
    - Augmented input creation
    - Generation with GPT
  - Index-based retrieval:
    - Setup of TF-IDF vectorizer and matrix
    - Cosine similarity calculation
    - Augmented input creation
    - Generation with GPT
- **3.Modular RAG Retriever**:
  - RetrievalComponent class with methods for keyword, vector, and indexed search
  - Usage example with different retrieval methods
  - Augmented input creation
  - Generation with GPT

# The Environment

In [None]:
!pip install openai==1.19.0

Collecting openai==1.19.0
  Downloading openai-1.19.0-py3-none-any.whl (292 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m292.8/292.8 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai==1.19.0)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai==1.19.0)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai==1.19.0)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully instal

In [None]:
#API Key
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline()
f.close()

#The OpenAI Key
import os
import openai
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

# The Generator


In [None]:
import openai
from openai import OpenAI
import time

client = OpenAI()
gptmodel="gpt-4-o" # or select gpt-3.5-turbo
start_time = time.time()  # Start timing before the request

def call_gpt4_with_full_text(itext):
    # Join all lines to form a single string
    text_input = '\n'.join(itext)
    prompt = f"Please elaborate on the following content:\n{text_input}"

    try:
      response = client.chat.completions.create(
         model=gptmodel,
         messages=[
            {"role": "system", "content": "You are an expert Natural Language Processing exercise expert."},
            {"role": "assistant", "content": "1.You can explain read the input and answer in detail"},
            {"role": "user", "content": prompt}
         ],
         temperature=0.1  # Add the temperature parameter here and other parameters you need
        )
      return response.choices[0].message.content.strip()
    except Exception as e:
        return str(e)

### Formatted response

In [None]:
import textwrap

def print_formatted_response(response):
    # Define the width for wrapping the text
    wrapper = textwrap.TextWrapper(width=80)  # Set to 80 columns wide, but adjust as needed
    wrapped_text = wrapper.fill(text=response)

    # Print the formatted response with a header and footer
    print("Response:")
    print("---------------")
    print(wrapped_text)
    print("---------------\n")

 # The Data

In [None]:
db_records = [
    "Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach in the field of artificial intelligence, particularly within the realm of natural language processing (NLP).",
    "It innovatively combines the capabilities of neural network-based language models with retrieval systems to enhance the generation of text, making it more accurate, informative, and contextually relevant.",
    "This methodology leverages the strengths of both generative and retrieval architectures to tackle complex tasks that require not only linguistic fluency but also factual correctness and depth of knowledge.",
    "At the core of Retrieval Augmented Generation (RAG) is a generative model, typically a transformer-based neural network, similar to those used in models like GPT (Generative Pre-trained Transformer) or BERT (Bidirectional Encoder Representations from Transformers).",
    "This component is responsible for producing coherent and contextually appropriate language outputs based on a mixture of input prompts and additional information fetched by the retrieval component.",
    "Complementing the language model is the retrieval system, which is usually built on a database of documents or a corpus of texts.",
    "This system uses techniques from information retrieval to find and fetch documents that are relevant to the input query or prompt.",
    "The mechanism of relevance determination can range from simple keyword matching to more complex semantic search algorithms which interpret the meaning behind the query to find the best matches.",
    "This component merges the outputs from the language model and the retrieval system.",
    "It effectively synthesizes the raw data fetched by the retrieval system into the generative process of the language model.",
    "The integrator ensures that the information from the retrieval system is seamlessly incorporated into the final text output, enhancing the model's ability to generate responses that are not only fluent and grammatically correct but also rich in factual details and context-specific nuances.",
    "When a query or prompt is received, the system first processes it to understand the requirement or the context.",
    "Based on the processed query, the retrieval system searches through its database to find relevant documents or information snippets.",
    "This retrieval is guided by the similarity of content in the documents to the query, which can be determined through various techniques like vector embeddings or semantic similarity measures.",
    "The retrieved documents are then fed into the language model.",
    "In some implementations, this integration happens at the token level, where the model can access and incorporate specific pieces of information from the retrieved texts dynamically as it generates each part of the response.",
    "The language model, now augmented with direct access to retrieved information, generates a response.",
    "This response is not only influenced by the training of the model but also by the specific facts and details contained in the retrieved documents, making it more tailored and accurate.",
    "By directly incorporating information from external sources, Retrieval Augmented Generation (RAG) models can produce responses that are more factual and relevant to the given query.",
    "This is particularly useful in domains like medical advice, technical support, and other areas where precision and up-to-date knowledge are crucial.",
    "Retrieval Augmented Generation (RAG) systems can dynamically adapt to new information since they retrieve data in real-time from their databases.",
    "This allows them to remain current with the latest knowledge and trends without needing frequent retraining.",
    "With access to a wide range of documents, Retrieval Augmented Generation (RAG) systems can provide detailed and nuanced answers that a standalone language model might not be capable of generating based solely on its pre-trained knowledge.",
    "While Retrieval Augmented Generation (RAG) offers substantial benefits, it also comes with its challenges.",
    "These include the complexity of integrating retrieval and generation systems, the computational overhead associated with real-time data retrieval, and the need for maintaining a large, up-to-date, and high-quality database of retrievable texts.",
    "Furthermore, ensuring the relevance and accuracy of the retrieved information remains a significant challenge, as does managing the potential for introducing biases or errors from the external sources.",
    "In summary, Retrieval Augmented Generation represents a significant advancement in the field of artificial intelligence, merging the best of retrieval-based and generative technologies to create systems that not only understand and generate natural language but also deeply comprehend and utilize the vast amounts of information available in textual form."
]

In [None]:
import textwrap
paragraph = ' '.join(db_records)
wrapped_text = textwrap.fill(paragraph, width=80)
print(wrapped_text)

Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach
in the field of artificial intelligence, particularly within the realm of
natural language processing (NLP). It innovatively combines the capabilities of
neural network-based language models with retrieval systems to enhance the
generation of text, making it more accurate, informative, and contextually
relevant. This methodology leverages the strengths of both generative and
retrieval architectures to tackle complex tasks that require not only linguistic
fluency but also factual correctness and depth of knowledge. At the core of
Retrieval Augmented Generation (RAG) is a generative model, typically a
transformer-based neural network, similar to those used in models like GPT
(Generative Pre-trained Transformer) or BERT (Bidirectional Encoder
Representations from Transformers). This component is responsible for producing
coherent and contextually appropriate language outputs based on a mixture of
input prompts

## The Query

In [None]:
query = "define a rag store"

### Generation without augmentation

In [None]:
# Call the function and print the result
gpt4_response = call_gpt4_with_full_text(query)
# Assuming 'gpt4_response' contains the response from the previous GPT-4 call
print_formatted_response(gpt4_response)

Response:
---------------
It seems like you're asking for an elaboration on a fragmented input that
appears to spell out "define a rag store." If that's correct, here's a detailed
explanation:  ### Define a Rag Store  **Definition:** A "rag store" typically
refers to a store or a place where old clothes, rags, or fabric remnants are
sold. These items can be used for various purposes such as cleaning, crafting,
or even recycled into new products. Rag stores might also deal in second-hand
clothing, vintage fabrics, and other textile-related items.  **Purpose and
Uses:** 1. **Cleaning Supplies:** Rags are commonly used in both industrial and
domestic settings for cleaning because they are absorbent and reusable. 2.
**Crafting Materials:** Artists and crafters often use old fabrics and rags
sourced from rag stores for projects like quilting, patchwork, or creative
fashion design. 3. **Recycling and Upcycling:** Environmentally conscious
individuals and businesses buy old fabrics to recycle

# 1.Naïve Retrieval Augmented Generation(RAG)

## Keyword search and matching

In [None]:
def find_best_match_keyword_search(query, db_records):
    best_score = 0
    best_record = None

    # Split the query into individual keywords
    query_keywords = set(query.lower().split())

    # Iterate through each record in db_records
    for record in db_records:
        # Split the record into keywords
        record_keywords = set(record.lower().split())

        # Calculate the number of common keywords
        common_keywords = query_keywords.intersection(record_keywords)
        current_score = len(common_keywords)

        # Update the best score and record if the current score is higher
        if current_score > best_score:
            best_score = current_score
            best_record = record

    return best_score, best_record

# Assuming 'query' and 'db_records' are defined in previous cells in your Colab notebook
best_keyword_score, best_matching_record = find_best_match_keyword_search(query, db_records)

print(f"Best Keyword Score: {best_keyword_score}")
#print(f"Best Matching Record: {best_matching_record}")
print_formatted_response(best_matching_record)

Best Keyword Score: 1
Response:
---------------
Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach
in the field of artificial intelligence, particularly within the realm of
natural language processing (NLP).
---------------



## Augmented input

In [None]:
augmented_input=query+best_matching_record

In [None]:
print_formatted_response(augmented_input)

Response:
---------------
define a rag storeRetrieval Augmented Generation (RAG) represents a
sophisticated hybrid approach in the field of artificial intelligence,
particularly within the realm of natural language processing (NLP).
---------------



## Generation

In [None]:
# Call the function and print the result
gpt4_response = call_gpt4_with_full_text(augmented_input)
# Assuming 'gpt4_response' contains the response from the previous GPT-4 call
print_formatted_response(gpt4_response)

Response:
---------------
The term "Retrieval Augmented Generation (RAG)" refers to an advanced hybrid
approach in the field of artificial intelligence, particularly within the realm
of natural language processing (NLP). This method combines the strengths of two
major components in AI: retrieval systems and generative models.  **Retrieval
Systems**: These are designed to fetch relevant information from a large dataset
or database. In the context of NLP, retrieval systems are used to find text
segments that are relevant to a given query. This is crucial for tasks where the
answer needs to be supported by specific data, such as in question answering
systems.  **Generative Models**: These models are capable of generating coherent
text based on the input they receive. In NLP, generative models are often used
for tasks like text completion, summarization, and translation. They are trained
on large corpora of text and learn to predict the probability of a sequence of
words.  **Hybrid Approac

# 2.Advanced Retrieval Augmented Generation(RAG)

## 2.1.Vector search

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def calculate_cosine_similarity(text1, text2):
    vectorizer = TfidfVectorizer()
    tfidf = vectorizer.fit_transform([text1, text2])
    similarity = cosine_similarity(tfidf[0:1], tfidf[1:2])
    return similarity[0][0]

def find_best_match(text_input, records):
    best_score = 0
    best_record = None
    for record in records:
        current_score = calculate_cosine_similarity(text_input, record)
        if current_score > best_score:
            best_score = current_score
            best_record = record
    return best_score, best_record

best_similarity_score, best_matching_record = find_best_match(query, db_records)

print(f"Best Cosine Similarity Score: {best_similarity_score:.3f}")
print_formatted_response(best_matching_record)

Best Cosine Similarity Score: 0.087
Response:
---------------
While Retrieval Augmented Generation (RAG) offers substantial benefits, it also
comes with its challenges.
---------------



### Augmented input

In [None]:
augmented_input=query+" "+best_matching_record

In [None]:
print_formatted_response(augmented_input)

Response:
---------------
define a rag store While Retrieval Augmented Generation (RAG) offers substantial
benefits, it also comes with its challenges.
---------------



### Generation

In [None]:
# Call the function and print the result
gpt4_response = call_gpt4_with_full_text(augmented_input)
# Assuming 'gpt4_response' contains the response from the previous GPT-4 call
print_formatted_response(gpt4_response)

Response:
---------------
The content you've provided seems to be about "Retrieval Augmented Generation
(RAG)" which is a technique used in the field of Natural Language Processing
(NLP). Let's define RAG and discuss its benefits and challenges in a more
coherent form.  **Retrieval Augmented Generation (RAG)**: RAG is an NLP
technique that combines the power of language models with information retrieval
methods to enhance the generation of text. This approach typically involves
retrieving relevant documents or data from a large corpus and then using this
information to assist a generative model in producing more accurate and
contextually relevant responses.  **Benefits of RAG**: 1. **Enhanced Accuracy
and Relevance**: By retrieving information from relevant documents, RAG can
generate responses that are not only contextually appropriate but also factually
accurate. 2. **Richer Content**: The integration of retrieved data allows the
model to produce richer and more detailed content, whi

## 2.2.Index-based search

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def setup_vectorizer(records):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(records)
    return vectorizer, tfidf_matrix

def find_best_match(query, vectorizer, tfidf_matrix):
    query_tfidf = vectorizer.transform([query])
    similarities = cosine_similarity(query_tfidf, tfidf_matrix)
    best_index = similarities.argmax()  # Get the index of the highest similarity score
    best_score = similarities[0, best_index]
    return best_score, best_index

vectorizer, tfidf_matrix = setup_vectorizer(db_records)

best_similarity_score, best_index = find_best_match(query, vectorizer, tfidf_matrix)
best_matching_record = db_records[best_index]

print(f"Best Cosine Similarity Score: {best_similarity_score:.3f}")
#print(f"Best Matching Record: {best_matching_record}")
print_formatted_response(best_matching_record)

Best Cosine Similarity Score: 0.216
Response:
---------------
While Retrieval Augmented Generation (RAG) offers substantial benefits, it also
comes with its challenges.
---------------



In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

def setup_vectorizer(records):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(records)

    # Convert the TF-IDF matrix to a DataFrame for display purposes
    tfidf_df = pd.DataFrame(tfidf_matrix.toarray(), columns=vectorizer.get_feature_names_out())

    # Display the DataFrame
    print(tfidf_df)

    return vectorizer, tfidf_matrix

vectorizer, tfidf_matrix = setup_vectorizer(db_records)

     ability    access  accuracy  accurate    adapt  additional  advancement  \
0   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
1   0.000000  0.000000   0.00000  0.216814  0.00000    0.000000     0.000000   
2   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
3   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
4   0.000000  0.000000   0.00000  0.000000  0.00000    0.236798     0.000000   
5   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
6   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
7   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
8   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
9   0.000000  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
10  0.186722  0.000000   0.00000  0.000000  0.00000    0.000000     0.000000   
11  0.000000  0.000000   0.00000  0.0000

### Augmented input

In [None]:
augmented_input=query+" "+best_matching_record

In [None]:
print_formatted_response(augmented_input)

Response:
---------------
define a rag store While Retrieval Augmented Generation (RAG) offers substantial
benefits, it also comes with its challenges.
---------------



### Generation

In [None]:
# Call the function and print the result
gpt4_response = call_gpt4_with_full_text(augmented_input)
# Assuming 'gpt4_response' contains the response from the previous GPT-4 call
print_formatted_response(gpt4_response)

Response:
---------------
The content you've provided seems to be about "Retrieval Augmented Generation
(RAG)" which is a technique used in the field of Natural Language Processing
(NLP). Let's define RAG and discuss its benefits and challenges in more detail.
### Definition of Retrieval Augmented Generation (RAG)  Retrieval Augmented
Generation (RAG) is a hybrid approach that combines the power of retrieval-based
and generative NLP models. In this approach, a retrieval system is first used to
fetch relevant documents or information from a large corpus or database. This
retrieved information is then fed into a generative model, which uses it to
generate responses or complete tasks. This method leverages both the precision
of retrieval systems in finding relevant information and the flexibility of
generative models in producing coherent and contextually appropriate text.  ###
Benefits of RAG  1. **Improved Accuracy**: By using relevant context from
retrieved documents, RAG can generate 

# 3.Modular RAG Retriever

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

class RetrievalComponent:
    def __init__(self, method='vector'):
        self.method = method
        if self.method == 'vector' or self.method == 'indexed':
            self.vectorizer = TfidfVectorizer()
            self.tfidf_matrix = None

    def fit(self, records):
        if self.method == 'vector' or self.method == 'indexed':
            self.tfidf_matrix = self.vectorizer.fit_transform(records)

    def retrieve(self, query):
        if self.method == 'keyword':
            return self.keyword_search(query)
        elif self.method == 'vector':
            return self.vector_search(query)
        elif self.method == 'indexed':
            return self.indexed_search(query)

    def keyword_search(self, query):
        best_score = 0
        best_record = None
        query_keywords = set(query.lower().split())
        for index, doc in enumerate(self.documents):
            doc_keywords = set(doc.lower().split())
            common_keywords = query_keywords.intersection(doc_keywords)
            score = len(common_keywords)
            if score > best_score:
                best_score = score
                best_record = self.documents[index]
        return best_record

    def vector_search(self, query):
        query_tfidf = self.vectorizer.transform([query])
        similarities = cosine_similarity(query_tfidf, self.tfidf_matrix)
        best_index = similarities.argmax()
        return db_records[best_index]

    def indexed_search(self, query):
        # Assuming the tfidf_matrix is precomputed and stored
        query_tfidf = self.vectorizer.transform([query])
        similarities = cosine_similarity(query_tfidf, self.tfidf_matrix)
        best_index = similarities.argmax()
        return db_records[best_index]

### Modular RAG Strategies

In [None]:
# Usage example
retrieval = RetrievalComponent(method='vector')  # Choose from 'keyword', 'vector', 'indexed'
retrieval.fit(db_records)
best_matching_record = retrieval.retrieve(query)

#print(f"Best Matching Record: {best_matching_record}")
print_formatted_response(best_matching_record)

Response:
---------------
While Retrieval Augmented Generation (RAG) offers substantial benefits, it also
comes with its challenges.
---------------



### Augmented Input

In [None]:
augmented_input=query+best_matching_record

In [None]:
print_formatted_response(augmented_input)

Response:
---------------
define a rag storeWhile Retrieval Augmented Generation (RAG) offers substantial
benefits, it also comes with its challenges.
---------------



### Generation

In [None]:
# Call the function and print the result
gpt4_response = call_gpt4_with_full_text(augmented_input)
# Assuming 'gpt4_response' contains the response from the previous GPT-4 call
print_formatted_response(gpt4_response)

Response:
---------------
The content you've provided seems to be about "Retrieval Augmented Generation
(RAG)" and mentions its benefits and challenges. Let's elaborate on this
concept:  **Retrieval Augmented Generation (RAG)** is a technique used in
natural language processing that combines the capabilities of pre-trained
language models with information retrieval methods to enhance the generation of
text. This approach allows the model to dynamically retrieve external knowledge
from a large corpus of documents and use this information to generate more
accurate, informative, and contextually relevant responses.  ### Benefits of
RAG: 1. **Enhanced Knowledge**: RAG models can access a vast amount of
information beyond what they were originally trained on. This allows them to
provide responses that are not only contextually relevant but also factually
accurate, drawing from up-to-date and expansive databases. 2. **Improved
Contextual Relevance**: By retrieving information related to the 