# RAG Implementation of all textbook subject wise

In [2]:
%pip install langchain langchain-community pypdf

Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Downloading pypdf-5.4.0-py3-none-any.whl (302 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-5.4.0
Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader

In [3]:
loader = DirectoryLoader(
    path="data/sixth/science",
    glob="*.pdf",
    loader_cls=PyPDFLoader
)

In [4]:
documents = loader.load()

In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [6]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=10)

In [7]:
chunks = splitter.split_documents(documents)

In [12]:
print(f"Total chunks: {len(chunks)}")

Total chunks: 566


In [8]:
for chunk in chunks[1:10]:
    print(chunk)


page_content='something new and exciting to discover. Have you ever 
looked up at the night sky 
and wondered why the stars 
shine? Or watched a flower 
bloom and wondered how 
it knows when to open? 
These are just a few of 
the many mysteries that 
science helps us unravel. 
The most wonderful thing 
about science is that it 
is everywhere. From the 
depths of the ocean to the 
What is 
Science? ?
A mountainous region
Chapter 1.indd   1 10/4/2024   3:18:10 PM
Reprint 2025-26' metadata={'producer': 'Adobe PDF Library 10.0.1', 'creator': 'Adobe InDesign CS6 (Windows)', 'creationdate': '2024-10-04T15:18:03+05:30', 'moddate': '2025-04-01T14:54:59+05:30', 'trapped': '/False', 'source': 'data/sixth/science/fecu101.pdf', 'total_pages': 8, 'page': 0, 'page_label': '1'}
page_content='Curiosity | Textbook of Science | Grade 6
2
Science is like a giant and unending jigsaw puzzle. Every 
new discovery we make adds another piece to that puzzle. 
And you know the best thing about this puzzle? Ther

In [11]:
%pip install -qU langchain-google-vertexai chromadb

Note: you may need to restart the kernel to use updated packages.


In [13]:
from langchain_google_vertexai import VertexAIEmbeddings

embedding_model = "text-embedding-005"
embeddings = VertexAIEmbeddings(model_name=embedding_model)

In [15]:
from langchain.vectorstores import Chroma

chroma_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print("Stored in Chroma.")


Stored in Chroma.


In [16]:
retriever = chroma_store.as_retriever(search_kwargs={"k": 3})

In [17]:
results = retriever.get_relevant_documents("How do we Measure?")

  results = retriever.get_relevant_documents("How do we Measure?")


In [18]:
results = retriever.invoke("How do we Measure?")

In [19]:
for result in results:
    print(result.page_content)

Curiosity | Textbook of Science | Grade 6
80
5.1 How do we Measure?
Hardeep says, “I have seen my grandmother measuring 
cloth by the length of her arm.”
“Have you ever seen how a farmer measures length to 
divide his field into beds? He walks and counts the number 
of his strides,” says Padma.
“Oh, not just the length of the strides—sometimes they 
also use the length of their feet to measure,” adds Anish.
Deepa says excitedly, “Measuring length using body parts 
must be so much fun! Let us also measure something using 
a body part.” 
“What should we measure? Okay, let us 
measure the length of the table in our classroom,” 
says Tasneem.
Padma adds, “And which body part should 
we use to measure it?”
Deepa says, “Let us use our handspan. I will 
show you how to use it. I have seen my mother 
using it. She calls it balisht.”
Hardeep adds, “Okay. Let us also note down 
our measurements.” 
Fig. 5.1: Use of handspan  
for measuring
Deepa
Padma
Tasneem
Anish Hardeep
Curiosity | Textbook of

In [20]:
%pip install streamlit

Collecting streamlit
  Downloading streamlit-1.44.1-py3-none-any.whl.metadata (8.9 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting blinker<2,>=1.0.0 (from streamlit)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting pandas<3,>=1.4.0 (from streamlit)
  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.9/89.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pillow<12,>=7.1.0 (from streamlit)
  Downloading pillow-11.2.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (8.9 kB)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-19.0.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading 