## PDF chatbot using gemini-1.5-pro

1. ENTIRE PDF IS LOADED AS A SINGLE DOCUMENT
2. SPLIT THE DATA INTO SMALLER CHUNKS WITH chunk_overlap AND SAVE INTO docs
3. CONVERT THE DOCS INTO VECTOR-EMBEDDING
4. APPLY THE EMBEDDING ON 'docs' and store in the chroma db
5. FETCH THE DOCS RELATED TO QUESTIONS
6. CREATE PROMPT
7. CREATE A CHAIN TO EXECUTE THE TASK
8. INPUT A QUESTION



In [None]:
!pip install langchain langchain_community langchain-google-genai python-dotenv streamlit langchain_experimental sentence-transformers langchain_chroma langchainhub pypdf rapidocr-onnxruntime

In [35]:
# 1.entire PDF is loaded as a single Document
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("yolov9_paper.pdf")
data = loader.load()  # entire PDF is loaded as a single Document

In [43]:
# PDF file have 18 pages and each page have seperate doc - data
print("Total pages:",len(data))
data[0]

Total pages: 18


Document(metadata={'source': 'yolov9_paper.pdf', 'page': 0}, page_content='YOLOv9: Learning What You Want to Learn\nUsing Programmable Gradient Information\nChien-Yao Wang1,2, I-Hau Yeh2, and Hong-Yuan Mark Liao1,2,3\n1Institute of Information Science, Academia Sinica, Taiwan\n2National Taipei University of Technology, Taiwan\n3Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan\nkinyiu@iis.sinica.edu.tw, ihyeh@emc.com.tw, and liao@iis.sinica.edu.tw\nAbstract\nToday’s deep learning methods focus on how to design\nthe most appropriate objective functions so that the pre-\ndiction results of the model can be closest to the ground\ntruth. Meanwhile, an appropriate architecture that can\nfacilitate acquisition of enough information for prediction\nhas to be designed. Existing methods ignore a fact that\nwhen input data undergoes layer-by-layer feature extrac-\ntion and spatial transformation, large amount of informa-\ntion will be lost. This paper wi

In [38]:
# 2. SPLIT THE DATA INTO SMALLER CHUNKS WITH chunk_overlap AND SAVE INTO docs

from langchain.text_splitter import RecursiveCharacterTextSplitter

# split data
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
    )

docs = text_splitter.split_documents(data)

# split data into smaller doc because we only want pick data relevant question
print("Total number of documents: ",len(docs))

Total number of documents:  96


In [45]:
docs[0]

Document(metadata={'source': 'yolov9_paper.pdf', 'page': 0}, page_content='YOLOv9: Learning What You Want to Learn\nUsing Programmable Gradient Information\nChien-Yao Wang1,2, I-Hau Yeh2, and Hong-Yuan Mark Liao1,2,3\n1Institute of Information Science, Academia Sinica, Taiwan\n2National Taipei University of Technology, Taiwan\n3Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan\nkinyiu@iis.sinica.edu.tw, ihyeh@emc.com.tw, and liao@iis.sinica.edu.tw\nAbstract\nToday’s deep learning methods focus on how to design\nthe most appropriate objective functions so that the pre-\ndiction results of the model can be closest to the ground\ntruth. Meanwhile, an appropriate architecture that can\nfacilitate acquisition of enough information for prediction\nhas to be designed. Existing methods ignore a fact that\nwhen input data undergoes layer-by-layer feature extrac-\ntion and spatial transformation, large amount of informa-\ntion will be lost. This paper wi

In [10]:
import os
os.environ["GOOGLE_API_KEY"] = "API-KEY"

In [46]:
# 3.CONVERT THE DOCS INTO VECTOR-EMBEDDING

from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# store the gemeni api key into .env
from dotenv import load_dotenv
load_dotenv()

#Get an API key:
# Head to https://ai.google.dev/gemini-api/docs/api-key to generate a Google AI API key. Paste in .env file

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")  # THIS IS JUST FOR DEMO
vector[:5]

[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734449505806,
 0.01813092641532421]

In [47]:
# 4. APPLY THE EMBEDDING ON 'docs' and store in the chroma db
# provide 'docs' and embedding model

vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"))

In [48]:
# 5. FETCH THE DOCS RELATED TO QUESTIONS

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 10})

# users question
# from all 96 doc it will retrive the top  10 related doc to question
retrieved_docs = retriever.invoke("What is Real-time Unauthorized Access Detection?")

len(retrieved_docs)

10

In [49]:
print(retrieved_docs[5].page_content)

ter performance than past artificial intelligence systems in
various fields, such as computer vision, language process-
ing, and speech recognition. In recent years, researchers
Figure 1. Comparisons of the real-time object detecors on MS
COCO dataset. The GELAN and PGI-based object detection
method surpassed all previous train-from-scratch methods in terms
of object detection performance. In terms of accuracy, the new
method outperforms RT DETR [43] pre-trained with a large
dataset, and it also outperforms depth-wise convolution-based de-
sign YOLO MS [7] in terms of parameters utilization.
in the field of deep learning have mainly focused on how
to develop more powerful system architectures and learn-
ing methods, such as CNNs [21–23, 42, 55, 71, 72], Trans-
formers [8, 9, 40, 41, 60, 69, 70], Perceivers [26, 26, 32, 52,
56, 81, 81], and Mambas [17, 38, 80]. In addition, some
researchers have tried to develop more general objective


In [28]:
# call the model
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0.3, # 0 = more focus related to answer & 1 = more diverse output (random o/p)
    max_tokens=500)  # Number words in output

In [29]:
# 6. CREATE PROMPT

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"   # context = outputs
)

# create a prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),  # users question
    ]
)

In [30]:
# 7. CREATE A CHAIN TO EXECUTE THE TASK

question_answer_chain = create_stuff_documents_chain(
    llm,   # provide model
    prompt) # provide promt

rag_chain = create_retrieval_chain(
    retriever,       # retriever provide top 10 doc related to the question_answer_chain
    question_answer_chain)

In [60]:
#  8. INPUT A QUESTION

response = rag_chain.invoke({"input": "Real-time Unauthorized Access Detection"})
print(response)

{'input': 'Real-time Unauthorized Access Detection', 'context': [Document(metadata={'page': 0, 'source': 'Digital meter reading.pdf'}, page_content='This\nreal-time\nnotification\nnot\nonly\nbolsters\nsystem\nsecurity\nbut\nalso\nfacilitates\nswift\nresponse\nmeasures\nagainst\npotential\ntampering\nor\nunauthorized\nactivities,\nensuring\ndata\nintegrity\nand\nreliability\nat\nall\ntimes.\nMoreover ,\nwe\nprovide\nan\noptional\nyet\ninvaluable\nfeature\nincorporating\nthe\nESP32\nCamera\nmodule.\nUpon\ndetecting\nthe\nopening\nof\nthe\nmeter\nbox,\nthis\nmodule\ncaptures\neither\na\npicture\nor\na\nvideo,\ncapturing\nvisual\nevidence\nof\nthe\nactivity .\nSubsequently ,\nthe\ncaptured\nmedia\nis\nautomatically\nforwarded\nto\nthe\ndesignated\nrecipient,\nsuch\nas\nthe\nmeter\nowner\nor\nelectric\ncompany .\nThis\nfeature\nadds\nan\nextra\nlayer\nof\nsecurity\nand\naccountability\nto\nmetering\nprocesses,\nfurther\nsolidifying\nour\ncommitment\nto\nimproved\nreliability ,\nheightened\n