# **Advanced RAG element effectively used**

The first chatbot was not able to answer the following questions:

*   What is the Parent-Child Document Retriever?
*   Why do RAG applications benefit from using a Parent-Child Document Retriever?

The final chatbot has improved by following these steps:
*   Splitting text into sentences.
*   Duplicating the first and last page texts.
*   Using the Parent-Child Document Retriever.

**After implementing these steps, the second chatbot can answer the questions that the first model could not.**

In [1]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
%pip install -U --quiet langchain-google-genai langchain tiktoken pypdf sentence_transformers chromadb langchain_community PyPDF2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━


*   paraphrase-multilingual-MiniLM-L12-v2: A multilingual sentence embedding model that excels at capturing semantic similarity between sentences.
*   all-mpnet-base-v2: A versatile sentence embedding model suitable for various tasks, including text classification and semantic search.
*   multi-qa-mpnet-base-dot-v1: A model specifically designed for calculating question-answer similarity, useful for chatbots and question answering systems.


In [3]:
from langchain.embeddings import HuggingFaceEmbeddings

model_name = "all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### **First Chat Bot**

In [4]:
import os
from google.colab import drive
from IPython.display import Markdown
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from huggingface_hub import hf_hub_download
from concurrent.futures import ThreadPoolExecutor, as_completed

In [19]:
def question_answering(quest):
    # Mount Google Drive
    drive.mount('/content/drive')

    # Set Google API Key
    os.environ["GOOGLE_API_KEY"] = "Your GOOGLE API KEY"

    # Initialize Language Model
    llm = ChatGoogleGenerativeAI(model="gemini-pro")

    # Load and split PDF using DataLoader_sentence_chunks
    pdf_path = "/content/drive/MyDrive/Colab Notebooks/GenAI_Handbook.pdf"
    loader = PyPDFLoader(pdf_path)
    pages = loader.load_and_split()

    # Split text into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    texts = text_splitter.split_documents(pages)


    # Initialize HuggingFace Embeddings
    model_name = "all-mpnet-base-v2"
    hf = HuggingFaceEmbeddings(
        model_name=model_name,
        model_kwargs={'device': 'cpu'},
        encode_kwargs={'normalize_embeddings': True}
    )

    # Create a document search index
    docsearch = Chroma.from_documents(texts, hf)

    # Configure the retriever
    retriever = docsearch.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 3, "fetch_k": 28}
    )

    # Retrieve relevant documents based on a query
    retriever.get_relevant_documents(quest)

    # Define the prompt template
    template = """Answer the question based only on the following context:
    {context}

    Question: {question}
    """
    prompt = ChatPromptTemplate.from_template(template)

    # Initialize the language model for question answering
    gemini = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0)

    # Define the processing chain
    chain = RunnableMap({
        "context": lambda x: retriever.get_relevant_documents(x['question']),
        "question": lambda x: x['question']
    }) | prompt | gemini

    # Run the chain with a sample question
    response = chain.invoke({'question': quest})
    return response.content

In [11]:
quest = "what is Token Consumption?"
question_answering(quest)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




The process for calculating the number of input and output tokens.

In [29]:
quest = "what do developers do to reduce the number of LLM requests?"
question_answering(quest)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'Developers can reduce the number of LLM requests by restricting the number of output tokens generated by the LLM, implementing a process to determine the number of input and output tokens consumed by a request, and using chunking to split content into smaller chunks that are semantically related.'

In [33]:
quest = "what is Parent-Child Document Retriever?"
question_answering(quest) #??????????????????

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'The provided context does not mention anything about Parent-Child Document Retriever, so I cannot answer this question from the provided context.'

In [34]:
quest = "Why do the RAG applications benefit from using a Parent-Child Document Retriever?"
question_answering(quest) #??????????????????

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'The provided context does not mention anything about RAG applications or the benefits of using a Parent-Child Document Retriever in such applications.'

### **Second Chat Bot**

The final chatbot has improved by following these steps:

*   Split text into sentences
*   Duplicate first and last page texts
*   Use Parent-Child Document Retriever

In [35]:
import os
import re
import PyPDF2
import pandas as pd
from google.colab import drive
from markdown import Markdown
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from concurrent.futures import ThreadPoolExecutor, as_completed
from concurrent.futures import ThreadPoolExecutor, as_completed

In [36]:
# Define a Document class to store page content and metadata
class Document:
    def __init__(self, page_content, metadata):
        self.page_content = page_content
        self.metadata = metadata

    def __repr__(self):
        return f"Document(page_content={self.page_content!r}, metadata={self.metadata})"

# Define a DataLoader class to handle the loading and chunking of PDF data
class DataLoaderParentChildChunks:
    def __init__(self, input_file, parent_chunk_size, child_chunk_size, chunk_overlap):
        self.input_file = input_file
        self.parent_chunk_size = parent_chunk_size
        self.child_chunk_size = child_chunk_size
        self.chunk_overlap = chunk_overlap

    # Get the total number of pages in the PDF
    def get_total_pages(self):
        with open(self.input_file, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            total_pages = len(reader.pages)
        return total_pages

    # Load a specific page from the PDF
    def load_pdf_page(self, page_num):
        with open(self.input_file, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            page = reader.pages[page_num]
            page_text = page.extract_text()
        return page_text, page_num + 1

    # Split text into sentences using regular expressions
    def split_into_sentences(self, text):
        sentence_endings = re.compile(r'(?<=[.!?])\s+(?=[A-Z])')
        return sentence_endings.split(text)

    # Create chunks from sentences with specified chunk size and overlap
    def create_chunks(self, sentences, page_numbers, chunk_size):
        chunks = []
        num_sentences = len(sentences)
        step = chunk_size - self.chunk_overlap
        for i in range(0, num_sentences, step):
            chunk_sentences = sentences[i:i + chunk_size]
            chunk_pages = page_numbers[i:i + chunk_size]
            chunk = ' '.join(chunk_sentences)
            if chunk:
                chunks.append({
                    'Text': chunk,
                    'Source': self.input_file,
                    'Page': ', '.join(map(str, sorted(set(chunk_pages))))
                })
        return chunks

    # Main function to load PDF, split into sentences, and create chunks
    def run(self, num_pages=None):
        total_pages = self.get_total_pages()
        if num_pages is None:
            num_pages = total_pages

        combined_text = ""
        page_texts = []

        # Use ThreadPoolExecutor to load pages in parallel
        with ThreadPoolExecutor() as executor:
            future_to_page = {executor.submit(self.load_pdf_page, page_num): page_num for page_num in range(min(num_pages, total_pages))}
            for future in as_completed(future_to_page):
                page_text, page_num = future.result()
                page_texts.append((page_text, page_num))
                combined_text += page_text + " "

        # Add first and last page texts again for context
        if total_pages > 0:
            first_page_text = self.load_pdf_page(0)[0]
            last_page_text = self.load_pdf_page(total_pages - 1)[0]
            page_texts.insert(1, (first_page_text, 1))
            page_texts.append((last_page_text, total_pages))
            combined_text = first_page_text + " " + combined_text + " " + last_page_text

        # Split combined text into sentences and track page numbers
        sentences = []
        page_numbers = []
        for page_text, page_num in page_texts:
            page_sentences = self.split_into_sentences(page_text)
            sentences.extend(page_sentences)
            page_numbers.extend([page_num] * len(page_sentences))

        # Create parentchunks and child chunks
        parent_chunks = self.create_chunks(sentences, page_numbers, self.parent_chunk_size)
        child_chunks = self.create_chunks(sentences, page_numbers, self.child_chunk_size)

        # Create document texts from the child chunks
        child_documents = []
        for idx, row in pd.DataFrame(child_chunks).iterrows():
            page_content = row['Text']
            metadata = {'source': row['Source'], 'page': row['Page']}
            child_documents.append(Document(page_content=page_content, metadata=metadata))

        # Create document texts from the parent chunks
        parent_documents = []
        for idx, row in pd.DataFrame(parent_chunks).iterrows():
            page_content = row['Text']
            metadata = {'source': row['Source'], 'page': row['Page']}
            parent_documents.append(Document(page_content=page_content, metadata=metadata))

        return child_documents, parent_documents

In [48]:
def question_answering(quest):
    # Mount Google Drive
    drive.mount('/content/drive')

    os.environ["GOOGLE_API_KEY"] = "Your GOOGLE API KEY"

    # Load and split PDF using DataLoaderParentChildChunks
    pdf_path = "/content/drive/MyDrive/Colab Notebooks/GenAI_Handbook.pdf"
    loader = DataLoaderParentChildChunks(pdf_path, parent_chunk_size=500, child_chunk_size=200, chunk_overlap=100)
    child_documents, parent_documents = loader.run()

    # Initialize HuggingFace Embeddings
    model_name = "all-mpnet-base-v2"
    hf = HuggingFaceEmbeddings(
        model_name=model_name,
        model_kwargs={'device': 'cpu'},
        encode_kwargs={'normalize_embeddings': True}
    )

    # Create a document search index and save embeddings in vector DB
    child_docsearch = Chroma.from_documents(child_documents, hf)
    parent_docsearch = Chroma.from_documents(parent_documents, hf)

    # Configure the retrieve
    retriever = child_docsearch.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 5, "fetch_k": 152}
    )

    template = """Answer the question based only on the following context:
    {context}

    Question: {question}
    """
    prompt = ChatPromptTemplate.from_template(template)

    gemini = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0)

    chain = RunnableMap({
        "context": lambda x: retriever.get_relevant_documents(x['question']),
        "question": lambda x: x['question']
    }) | prompt | gemini

    response = chain.invoke({'question': quest})

    # Find the parent chunks related to the retrieved child chunks
    relevant_child_chunks = retriever.get_relevant_documents(quest)
    relevant_parent_chunks = []
    for child_chunk in relevant_child_chunks:
        parent_docs = parent_docsearch.as_retriever(
            search_type="mmr",
            search_kwargs={"k": 5}
        ).get_relevant_documents(child_chunk.page_content)
        relevant_parent_chunks.extend(parent_docs)

    # Ensure relevant_parent_chunks is unique
    unique_parent_chunks = {doc.page_content: doc for doc in relevant_parent_chunks}.values()

    # Update the context in the template with the unique parent chunks
    parent_context = ' '.join([doc.page_content for doc in unique_parent_chunks])
    template = template.replace("{context}", parent_context)

    response = chain.invoke({'question': quest})
    return response.content


In [49]:
quest = "what is Token Consumption?"
question_answering(quest)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'The process for calculating the number of input and output tokens.'

In [51]:
quest = "what do developers do to reduce the number of LLM requests?"
question_answering(quest)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'To reduce the number of LLM requests, developers can batch requests or implement delays between subsequent requests.'

In [53]:
quest = "what is Parent-Child Document Retriever?"
question_answering(quest)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'The Parent-Child Document Retriever is a component of the RAG application that enables the application to realize the benefits of both smaller and larger chunks. The smaller chunks contain less noise and are used to create vector embeddings, which are used during similarity search. The larger chunks provide the necessary context to the LLM for it to generate an informed response.'

In [52]:
quest = "Why do the RAG applications benefit from using a Parent-Child Document Retriever?"
question_answering(quest)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




'RAG applications benefit from using a Parent-Child Document Retriever because it utilizes a multi-level chunking strategy. This strategy involves first performing vector similarity search, retrieving the chunks associated with the most similar vectors, and then including these chunks in the LLM prompt. Vector embeddings account for semantics at both a local level (adjacent words) and a global level (entire paragraphs/pages). This can help to reduce the amount of noise in the vector embeddings for large chunks, which can lead to improved similarity search performance.'