## PDF Q&A RAG from Scratch with Google Gemini and ChromaDB ##

Steps that we will follow. I will not use Langchain or any ready-made library for this video to help you understand the core basics of the workflow.

1. Using requests, download the pdf of interest
2. load the pdf
3. chunk the pdf. we will create our own splitter.
4. use Google Gemini embedding function for creating embeddings
5. Create a chroma db collection with a name and embedding function.
6. ingest documents into chroma.
7. basis the query, find relevant passage by making a call to chroma.
8. send the passages as context and question by the user to gemini pro 1.5 latest.
9. output results.

In [None]:
!pip install requests
!pip install PyPDF2
!pip install google-generativeai
!pip install chromadb

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Collecting chromadb
  Downloading chromadb-0.5.3-py3-none-any.whl (559 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m52.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting uvicorn[standard]

In [None]:
# configure google gemini
import google.generativeai as genai

from google.colab import userdata
genai.configure(api_key=userdata.get("GOOGLE_API_KEY"))

In [None]:
for m in genai.list_models():
  if "generateContent" in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-001
models/gemini-1.5-flash-latest
models/gemini-1.5-pro
models/gemini-1.5-pro-001
models/gemini-1.5-pro-latest
models/gemini-pro
models/gemini-pro-vision


In [None]:
import requests
import os

# create a method called download_pdf which will take a url and a save_path
# and download the pdf from the url and save it in the path specified.

def download_pdf(url, save_path):

  response = requests.get(url)
  response.raise_for_status()

  with open(save_path, "wb") as file:
    file.write(response.content)

In [None]:
url = "https://arxiv.org/abs/1706.03762"
save_path = "attention_is_all_you_need.pdf"

download_pdf(url, save_path)

In [None]:
from PyPDF2 import PdfReader

# write a function called load_pdf which will take a pdf file as file_path and
# extract the text from it using PdfReader.

def load_pdf(file_path):
  pdf_reader = PdfReader(file_path)
  text = ""
  for page in pdf_reader.pages:
    text += page.extract_text()
  return text


In [None]:
pdf_text = load_pdf(save_path)


In [None]:
pdf_text[:500]

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaise'

In [None]:
len(pdf_text)

39472

In [None]:
# build a function called split_text_recursively which will take
# text which is the original text which needs to be split.
# the max_length of the chunk and the chunk_overlap to specify how much overlap is allowed
# between two chunks.

def split_text_recursively(text, max_length=1000, chunk_overlap=0):
  chunks = []
  start = 0 # start at the beginning
  text_length = len(text) # figure out how long is the text provided.
  while start < text_length: # keep going until we have looked at all the text
    end = start + max_length
    if end < text_length : # if we are not yet at the end of the text
      end = text.rfind(' ', start, end) + 1 # end the chunk at an empty space.

      if end <= start: # if there is no space, then just split at the max length.
        end = start + max_length
    chunk = text[start:end].strip() # take the text from start to end and remove extra spaces.

    if chunk:
      chunks.append(chunk)

    start = end - chunk_overlap  # moving the start position forward minus any overlaps.

    if start >= text_length: # if we have reached the end of the text
      break
  return chunks





In [None]:
chunks = split_text_recursively(pdf_text, max_length=2000, chunk_overlap=200)

In [None]:
len(chunks)

23

In [None]:
chunks[0]

'Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and con

In [None]:
import chromadb
import chromadb.utils.embedding_functions as embedding_functions

google_ef  = embedding_functions.GoogleGenerativeAiEmbeddingFunction(api_key=userdata.get("GOOGLE_API_KEY"))
client = chromadb.PersistentClient(path="embeddings/gemini")

collection = client.get_or_create_collection(name="pdf_rag", embedding_function=google_ef)


for i, d in enumerate(chunks):
  collection.add(documents=[d], ids=[str(i)])

In [None]:
collection.count()

23

In [None]:
def build_escaped_context(context):
  escaped_context = ""
  for item in context:
    escaped_context += item + "\n\n"
  return escaped_context

def find_relevant_context(query, db, n_results=3):
  results = db.query(query_texts=[query], n_results=n_results)
  escaped_context = build_escaped_context(results['documents'][0])
  return escaped_context

In [None]:
context = find_relevant_context("role of encoders", collection )

In [None]:
context

'different tasks.\n15\n\nence. This mimics the\ntypical encoder-decoder attention mechanisms in sequence-to-sequence models such as\n[38, 2, 9].\n•The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in the decoder up to and including that position. We need to prevent leftward\ninformation flow in the decoder to preserve the auto-regressive property. We implement this\ninside of scaled dot-product attention by masking out (setting to −∞) all values in the input\nof the softmax which correspond to illegal connections. See Figure 2.\n3.3 Position-wise Feed-Forward Networks\nIn addition to attention sub-layers, each of the layers in our encod

In [None]:
context

['different tasks.\n15',
 'ence. This mimics the\ntypical encoder-decoder attention mechanisms in sequence-to-sequence models such as\n[38, 2, 9].\n•The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in the decoder up to and including that position. We need to prevent leftward\ninformation flow in the decoder to preserve the auto-regressive property. We implement this\ninside of scaled dot-product attention by masking out (setting to −∞) all values in the input\nof the softmax which correspond to illegal connections. See Figure 2.\n3.3 Position-wise Feed-Forward Networks\nIn addition to attention sub-layers, each of the layers in our enc

In [None]:
len(context)

4021

In [None]:
escaped_context = build_escaped_context(context)

In [None]:
print(escaped_context)

different tasks.
15

ence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence models such as
[38, 2, 9].
•The encoder contains self-attention layers. In a self-attention layer all of the keys, values
and queries come from the same place, in this case, the output of the previous layer in the
encoder. Each position in the encoder can attend to all positions in the previous layer of the
encoder.
•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive property. We implement this
inside of scaled dot-product attention by masking out (setting to −∞) all values in the input
of the softmax which correspond to illegal connections. See Figure 2.
3.3 Position-wise Feed-Forward Networks
In addition to attention sub-layers, each of the layers in our encoder and decoder co

In [None]:
def create_prompt_for_gemini(query, context):
  prompt = f"""
  You are a helpful agent that answers questions using the text from the context below.
  Both the question and the context is shared with you and you should answer the
  question basis the context. If the context does not have enough information
  for you to answer the question correctly, inform about the absence of relevant
  context as part of your answer.

  Question : {query}
  \n
  Context : {context}
  \n
  Answer :
  """
  return prompt

In [None]:
def generate_answer_from_gemini(prompt):
  model = genai.GenerativeModel('gemini-1.5-flash-latest')
  result = model.generate_content(prompt)
  return result


In [None]:
prompt = create_prompt_for_gemini("role of encoders", context)

In [None]:
print(prompt)


  You are a helpful agent that answers questions using the text from the context below. 
  Both the question and the context is shared with you and you should answer the
  question basis the context. If the context does not have enough information
  for you to answer the question correctly, inform about the absence of relevant 
  context as part of your answer. 

  Question : role of encoders
  

  Context : different tasks.
15

ence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence models such as
[38, 2, 9].
•The encoder contains self-attention layers. In a self-attention layer all of the keys, values
and queries come from the same place, in this case, the output of the previous layer in the
encoder. Each position in the encoder can attend to all positions in the previous layer of the
encoder.
•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that positi

In [None]:
answer = generate_answer_from_gemini(prompt)

In [None]:
print(answer.text)

The context describes the role of encoders in the context of  "encoder-decoder attention mechanisms" within sequence-to-sequence models.  The text highlights that:

* **The encoder contains self-attention layers.**  These layers allow each position in the encoder to attend to all other positions within the previous layer of the encoder. 
* **Encoders use a fully connected feed-forward network** applied to each position separately. 

While the context outlines the key components and functions of encoders, it doesn't explicitly state a specific "role" for them.  

**In general, encoders are responsible for transforming input data into a meaningful representation that can be used by the decoder.**  They are often used to process sequential data like text, making them a crucial part of tasks like machine translation, text summarization, and question answering. 

