<a href="https://colab.research.google.com/github/Saba-Gul/Chat-With-PDF-Retrieval-Augmented-Generation-RAG-Implementation/blob/main/RAG_ChatwithPDF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chat With PDF:**
This code loads a PDF document, processes it into smaller chunks, and stores the embeddings in a vector store. It then performs a similarity search to retrieve relevant document chunks based on a query, formats the context and question into a prompt, and generates a response using OpenAI's language model. The process is structured to handle large documents efficiently and provide accurate answers based on the content of the PDF.

![image.png](https://i0.wp.com/www.phdata.io/wp-content/uploads/2023/11/image1-3.png)

credit:https://www.phdata.io/blog/what-is-retrieval-augmented-generation-rag/

In [1]:
!pip install openai langchain chromadb tiktoken

Collecting openai
  Downloading openai-1.40.0-py3-none-any.whl.metadata (22 kB)
Collecting langchain
  Downloading langchain-0.2.12-py3-none-any.whl.metadata (7.1 kB)
Collecting chromadb
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting langchain-core<0.3.0,>=0.2.27 (from langchain)
  Downloading langchain_core-0.2.28-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.98-py3-none-any.wh

In [2]:
!pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.2.11-py3-none-any.whl.metadata (2.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.21.3-py3-none-any.whl.metadata (7.1 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Downloading langchain_community-0.2.11-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dataclasses_json-0.6.7-py3-none-any.whl (

In [13]:
!pip install pypdf

Collecting pypdf
  Downloading pypdf-4.3.1-py3-none-any.whl.metadata (7.4 kB)
Downloading pypdf-4.3.1-py3-none-any.whl (295 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/295.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.8/295.8 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-4.3.1


In [3]:
import langchain
import openai
import chromadb # vector store for storing dense embeddings
import tiktoken # tokenization for text encoding

In [6]:
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_API_KEY')

# **Load and ingest data into pipline**

In [14]:
from langchain.document_loaders import PyPDFLoader
loader= PyPDFLoader('/content/attentionisallyouneed.pdf')
pages = loader.load()

In [15]:
pages[1].page_content

'1 Introduction\nRecurrent neural networks, long short-term memory [ 13] and gated recurrent [ 7] neural networks\nin particular, have been firmly established as state of the art approaches in sequence modeling and\ntransduction problems such as language modeling and machine translation [ 35,2,5]. Numerous\nefforts have since continued to push the boundaries of recurrent language models and encoder-decoder\narchitectures [38, 24, 15].\nRecurrent models typically factor computation along the symbol positions of the input and output\nsequences. Aligning the positions to steps in computation time, they generate a sequence of hidden\nstates ht, as a function of the previous hidden state ht−1and the input for position t. This inherently\nsequential nature precludes parallelization within training examples, which becomes critical at longer\nsequence lengths, as memory constraints limit batching across examples. Recent work has achieved\nsignificant improvements in computational efficiency th

In [16]:
len(pages)

15

# **Preprocessing**

In [17]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(pages)

In [18]:
len(docs)

49

Tiktoken is a swift BPE (Byte Pair Encoding) tokenizer designed specifically for OpenAI's models

In [19]:
import tiktoken
encoding = tiktoken.encoding_for_model("text-embedding-3-small")

doc_tokens = [len(encoding.encode(page.page_content)) for page in docs]

In [20]:
doc_tokens

[254,
 206,
 167,
 64,
 184,
 181,
 191,
 180,
 156,
 211,
 202,
 215,
 209,
 167,
 254,
 200,
 213,
 116,
 230,
 227,
 194,
 175,
 198,
 207,
 208,
 189,
 380,
 245,
 217,
 96,
 439,
 218,
 209,
 60,
 341,
 205,
 229,
 119,
 313,
 276,
 279,
 190,
 278,
 296,
 305,
 184,
 245,
 295,
 282]

In [21]:
total_tokens = sum(doc_tokens)
print(f"Total tokens: {total_tokens}")

Total tokens: 10699


In [22]:
cost = total_tokens * 0.0004
print(f"Cost: ${cost}")

Cost: $4.2796


# **Vector Embeddings**

In [30]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


In [31]:
embedding_function=OpenAIEmbeddings(model="text-embedding-3-small",api_key=openai.api_key)

Initializes a Chroma vector store by creating embeddings for a set of documents using a specified embedding function, and it persists these embeddings in a specified directory for future use.

In [32]:
db=Chroma.from_documents(documents=docs,embedding=embedding_function, persist_directory='my-embeddings')

# **Similarity Search**

In [33]:
results=db.similarity_search_with_relevance_scores('What are self attention??',k=5)

for (doc,score) in results:
  print('score',score)
  print(doc)
  print('-------------------')

score 0.33666051995451773
page_content='described in section 3.2.
Self-attention, sometimes called intra-attention is an attention mechanism relating different positions
of a single sequence in order to compute a representation of the sequence. Self-attention has been
used successfully in a variety of tasks including reading comprehension, abstractive summarization,
textual entailment and learning task-independent sentence representations [4, 27, 28, 22].
End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-
aligned recurrence and have been shown to perform well on simple-language question answering and
language modeling tasks [34].
To the best of our knowledge, however, the Transformer is the first transduction model relying
entirely on self-attention to compute representations of its input and output without using sequence-
aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate' metadata={'page': 1, 's

# **RAG Enriched Prompt**

In [34]:
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chat_models import ChatOpenAI

In [35]:
question='Please give me an introduction to transformer architecture'

context_docs=db.similarity_search(question,k=5)

#Create a prompt with two variables 'context ' and 'question'
prompt=PromptTemplate(
    template=""""Use the following pieces of context to answer the questions at the end.If
    you don't know the answer, just say don't know. do not try to make up the answer.

    <context>
    {context}
    </context>

    Question: {question}
    Helpful Answer""",
    input_variables=["context","question"]
)

llm=ChatOpenAI(model='gpt-4o-mini',temperature=0.9,api_key=openai.api_key)
qa_chain=LLMChain(llm=llm,prompt=prompt)

result=qa_chain(
    {'question':question,
     'context':"/n".join([doc.page_content for doc in context_docs])}
)

  warn_deprecated(
  warn_deprecated(
  warn_deprecated(
