Step 1: Load the Document from PDF

In [3]:
from langchain_community.document_loaders import PyPDFLoader, TextLoader
loader = PyPDFLoader("./german-story.pdf")
data = loader.load()
print(data)

[Document(metadata={'producer': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creator': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creationdate': '2022-08-27T01:45:35+00:00', 'author': 'Richards, Olly & Rawlings, Alex', 'moddate': '2022-08-27T01:45:37+00:00', 'title': 'Short Stories in German for Beginners', 'source': './german-story.pdf', 'total_pages': 211, 'page': 0, 'page_label': '1'}, page_content=''), Document(metadata={'producer': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creator': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creationdate': '2022-08-27T01:45:35+00:00', 'author': 'Richards, Olly & Rawlings, Alex', 'moddate': '2022-08-27T01:45:37+00:00', 'title': 'Short Stories in German for Beginners', 'source': './german-story.pdf', 'total_pages': 211, 'page': 1, 'page_label': '2'}, page_content=''), Document(metadata={'producer': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creator': 'calibre (5.44.0) [http://calibre-ebook.com]', 'creationdate': '2022-08-27T0

Step 2 : Split the Document into Chunks

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=400)
docs = text_splitter.split_documents(data)

print("Total number of chunks:", len(docs))
for chunk in docs:
    print(chunk.page_content)

Total number of chunks: 358
Contents 
About the Author
Introduction
How to Read Effectively
The Six-Step Reading Process
Die verrückte Currywurst
Das Wesen
Der Ritter
Die Uhr
Die Truhe
Unbekannte Länder
Laura, die Unsichtbare
Die Kapsel
Lösungsschlüssel
German-English Glossary
Acknowledgements
About the Author
Olly Richards, author of the Teach Yourself Foreign Language Graded
Readers series, speaks eight languages and is the man behind the popular
language learning blog: I Will Teach You a Language.
Olly started learning his first foreign language at age 19 when he bought
a one-way ticket to Paris. With no exposure to languages growing up, and
no special talent to speak of, Olly had to figure out how to learn a foreign
language from scratch.
Fifteen years later, Olly holds a master’s degree in TESOL from Aston
University and Cambridge CELTA and Delta. He has now studied several
other languages and become an expert in language learning techniques. He
also collaborates with organization

Step 3: Generate Embeddings with Gemini AI

In [6]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
#from google.colab import userdata

load_dotenv()

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

vector = embeddings.embed_query("hello world")
print(len(vector))
print(vector[0])

768
0.04909781739115715


Step 4: Create a Vector Store for Document Retrieval

In [7]:
from langchain_chroma import Chroma

vectorstoredb = Chroma.from_documents(documents=docs, embedding=embeddings)
retriever = vectorstoredb.as_retriever(search_type = "similarity", search_kwargs={"k":5})

Step 5: Retrieve Documents Based on a Query


In [8]:
retrivec_docs = retriever.invoke("who is Olly Richards?")
print(len(retrivec_docs))
print(retrivec_docs[0].page_content)

5
also provided helpful feedback and inspired me to continue.
More recently, to Sarah, the Publishing Director for the Teach Yourself
series, for her vision for this collaboration and unwavering positivity in
bringing the project to fruition.
To Rebecca, almost certainly the best editor in the world, for bringing a
staggering level of expertise and good humour to the project, and to Karyn
and Melissa, for their work in coordinating publication behind the scenes.
My thanks to James, Dave and Sarah for helping I Will Teach You A
Language continue to grow, even when my attention has been elsewhere.
To my parents, for an education that equipped me for such an endeavour.
Lastly, to JJ and EJ. This is for you.
Olly Richards


Step 6: Build a Question-Answering System

In [9]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0.3)

Step 7: Create the RAG Chain


In [12]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

#System Prompt
system_prompt = (
    "You are a pdf reader. Provide clear and crisp answers based on the question asked."
    "If you don't have the information say I don't know/answer not available"
    "Use maximum of 3 sentences"
    "\n\n"
    "{context}"
)

# set up the prompt for the QA chain

prompt = ChatPromptTemplate.from_messages(
    [
        ("system",system_prompt),
        ("human","{input}")
    ]
)

# Create the RAG Chain
chain = create_stuff_documents_chain(llm,prompt)
rag_chain = create_retrieval_chain(retriever, chain)

Step 8: Ask a Question


In [15]:
query = "who is Olly Richard?"
response = rag_chain.invoke({"input":query})
print(response["answer"])

Olly Richards is the author of the Teach Yourself Foreign Language Graded Readers series and the man behind the popular language learning blog, I Will Teach You a Language. He speaks eight languages and is an expert in language learning techniques. He started learning his first foreign language at age 19 and holds a master’s degree in TESOL.


In [16]:
query = "Summary of chapter 2?"
response = rag_chain.invoke({"input":query})
print(response["answer"])

Arthur travels to Sylt to meet Sabine Link, who owns a company and is wealthy. He shows her a photo, and she recalls a necklace with a number, just like David. Arthur learns the second number from her.


In [None]:
#Ref : https://python.plainenglish.io/building-a-rag-application-with-gemini-ai-step-by-step-guide-24636dd21f5b