<a href="https://colab.research.google.com/github/amityadav108/Project-RAG-PDF-Chatbot/blob/main/Project_RAG_PDF_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PROJECT - RAG PDF Chatbot Using Uploaded PDF

We will build:

* Read your PDF
* Split into small pieces
* Create embeddings
* Build a vector database
* Ask questions and get answers
* Add summary feature

### STEP - 1 Import Required Libraries

In [None]:
!pip install pypdf sentence-transformers faiss-cpu langchain langchain_community

Collecting langchain
  Downloading langchain-1.0.8-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain_community
  Using cached langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain-core<2.0.0,>=1.0.6 (from langchain)
  Downloading langchain_core-1.1.0-py3-none-any.whl.metadata (3.6 kB)
Collecting langgraph<1.1.0,>=1.0.2 (from langchain)
  Downloading langgraph-1.0.3-py3-none-any.whl.metadata (7.8 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain)
  Downloading pydantic-2.12.4-py3-none-any.whl.metadata (89 kB)
Collecting jsonpatch<2.0.0,>=1.33.0 (from langchain-core<2.0.0,>=1.0.6->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<1.0.0,>=0.3.45 (from langchain-core<2.0.0,>=1.0.6->langchain)
  Downloading langsmith-0.4.46-py3-none-any.whl.metadata (14 kB)
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0 (from langchain-core<2.0.0,>=1.0.6->langchain)
  Downloading tenacity-9.1.2-py3-none-any.whl.metadata (1.2

In [None]:
import numpy as np
from pypdf import PdfReader
from sentence_transformers import SentenceTransformer
import faiss

### STEP - 2 Upload PDF File

In [None]:
# This code:
# Opens your PDF
# Reads all pages
# Stores everything inside a single text variable

pdf_path = "AI Foundations of Computational Agents_Pdf.pdf"
reader = PdfReader(pdf_path)

text = ""
for page in reader.pages:
    page_text = page.extract_text()
    if page_text:
        text += page_text + "\n"

print("PDF loaded Characters:", len(text))

PDF loaded Characters: 1987773


### STEP - 3 Split Text Into Small Chunks

In [None]:
# Chatbots can’t read huge text at once
# So we split the PDF text into small pieces

def make_chunks(text, chunk_size=500, overlap=50):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

chunks = make_chunks(text)
print("Total chunks:", len(chunks))

Total chunks: 4418


### STEP - 4 Convert Chunks Into Embeddings

In [None]:
# This converts text into numbers the AI can search through.

model = SentenceTransformer("all-MiniLM-L6-v2")

chunk_embeddings = model.encode(chunks)
chunk_embeddings = np.array(chunk_embeddings).astype("float32")

print("Embeddings shape:", chunk_embeddings.shape)

Embeddings shape: (4418, 384)


### STEP - 5 Store Embeddings in FAISS Vector Database

In [None]:
# FAISS helps us quickly find the most relevant text pieces.

dimension = chunk_embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(chunk_embeddings)

print("FAISS Index Created ")

FAISS Index Created 


### STEP - 6 Build the Question Anwering Function

In [None]:
# This function:
# Converts your question to an embedding
# Searches similar text in FAISS
# Returns the best 3 text chunks

def ask_pdf(question):
    q_embed = model.encode([question]).astype("float32")
    distances, result_ids = index.search(q_embed,3)

    answer_text = ""
    for idx in result_ids[0]:
        answer_text += chunks[idx] + "\n\n"

    return answer_text

### STEP - 7 ASK Questions from PDF

In [None]:
print(ask_pdf("What is AI"))

tiﬁcial Intelligence?
Artiﬁcial intelligence,o r AI, is the ﬁeld that studies the synthesis and analysis of
computational agents that act intelligently. Consider each part of this deﬁnition.
An agent is something that acts in an environment; it does something.
Agents include worms, dogs, thermostats, airplanes, robots, humans, compa-
nies, and countries.
An agent is judged solely by how it acts. Agents that have the same effect
in the world are equally good.
3
4 1. Artiﬁcial Intelligence and Age

e on AI. In particular, Russell and Norvig
[2020] give a more encyclopedic overview of AI. They provide an excellent
complementary source for many of the topics covered in this book and also an
outstanding review of the scientiﬁc literature, which we do not try to duplicate.
The Association for the Advancement of Artiﬁcial Intelligence (AAAI) pro-
vides introductory material and news at theirAI T opicswebsite (https://aitopics.
org/). AI Magazine, published by AAAI, often has excellent overvie

In [None]:
ask_pdf("Explain the history of AI.")

' Networks \n& Deep Learning\n11: Causality\n18: Social Impact of AI\nFigure 1: Overview of chapters and dependencies\nPart I\nAgents in the World\nWhat are Agents and How Can They be Built?\nChapter 1\nArtiﬁcial Intelligence and Agents\nThe history of AI is a history of fantasies, possibilities, demonstrations,\nand promise. Ever since Homer wrote of mechanical “tripods” waiting on\nthe gods at dinner, imagined mechanical assistants have been a part of our\nculture. However, only in the last half century hav\n\nd that has lost contact with what\nAI should be about.”\nGary Marcus, NYU, author ofRebooting AI\n“Artiﬁcial Intelligence: Foundations of Computational Agentsskillfully delivers a compre-\nhensive exploration of AI ideas, demonstrating exceptional organization and clarity of\npresentation. Navigating the broad arc of important concepts and methods in AI, the\nbook covers essential technical topics, historical context, and the growing importance\nof the societal inﬂuences of AI,

### Step - 9 Add Summary Feature

In [None]:
def summerize_text(text):
    sentences = text.split(".")
    return ". ".join(sentences[:5]) + "."

print(summerize_text(text))

Artiﬁcial Intelligence
Foundations of Computational Agents
Third Edition
A comprehensive textbook for undergraduate and graduate AI courses, explaining
modern artiﬁcial intelligence and its social impact, and integrating theory and prac-
tice.  This extensively revised new edition now includes chapters on deep learning,
including generative AI, the social impacts of AI, and causality. 
Students and instructors will beneﬁt from these features:
• The novel agent design space, which provides a coherent framework for teaching
and learning, making both easier. 
 Every concept or algorithm is illustrated with a motivating concrete example. 
 Each chapter now has a social impact section, enabling students to understand the
impact of the various techniques as they learn.
