### Chat with PDF

Chat with PDF is a Retreival Augmented Generation (RAG) method or process that allows a user/person to converse with a document with the help of Large Language Models (LLMs). The principle or methodology of the RAG approach is very simple and it goes thus:
- The user uploads a document and the document is loaded
- The loaded document is splitted into chunks
- The chunks are finally converted into embeddings so they can be stored in a vector database
- The next process is creating a prompt template that will be used by the LLM to query the vector database
- A question can be asked
- A response is gotten
The methodology above simple describes the working principle of a Retrieval Augmented Generation (RAG) process. An app can be developed to put all the pieces together and make the RAG system/app designed for users usage

#### Installing Dependencies

In [1]:
 !pip install langchain chromadb pypdf pytest --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m603.0/603.0 kB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m57.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.5/294.5 kB[0m [31m19.5 MB/s[0m eta [36m0:00:

In [2]:
!pip install --upgrade --quiet langchain-community gpt4all

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m36.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.6/121.6 MB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
 pip install langchain_chroma --quiet

#### Step 1: Load the document

In [3]:
# Import the PDF document loader from Langchain
from langchain.document_loaders import PyPDFLoader

# store the file path of the pdf document in a variable called filepath
filepath = r"/content/The JC Team Volume 2 – Bible Stories.pdf"

# Create a function that loads the document and returns the loaded document
def load_document(filepath):
    loader = PyPDFLoader(filepath) # loads the document using the pypdfloader class from the langchain library
    return loader.load() # returns the loaded document

#### Step 2: Split the document into smaller chunks

In [5]:
# Import the RecursiveCharacterTextSplitter that allows us to divide our document into smaller chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create a function that splits document into smaller chunks
def split_document(document):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size = 1000,
        chunk_overlap = 150
    ) # text splitter does the work of splitting the documents into smaller chunks with
    return text_splitter.split_documents(document) # returns a list of the chunked documents

#### Testing Step 1 and 2

In [6]:
document = load_document(filepath=filepath) # load the document
chunk = split_document(document) # splits the document into smaller chunks



In [None]:
# Viewing the first item in our chunk list
chunk[0]

Document(metadata={'source': '/content/The JC Team Volume 2 – Bible Stories.pdf', 'page': 1}, page_content='1      The JC Team Bible Stories told like never before Volume 2')

#### Step 3: Create Embeddings using a Local Model (GPT4all or ollama)

In [8]:
# Import GPT4all class from langchain community
from langchain_community.embeddings import GPT4AllEmbeddings

# create embeddings
# model_name = "nomic-embed-text-v1.5.f16.gguf" # model name
model_name = "all-MiniLM-L6-v2.gguf2.f16.gguf"
gpt4all_kwargs = {'allow_download': 'True'}
# embeddings = GPT4AllEmbeddings(
#     model_name=model_name,
#     gpt4all_kwargs=gpt4all_kwargs
# )
embeddings = GPT4AllEmbeddings()

Downloading: 100%|██████████| 45.9M/45.9M [00:00<00:00, 119MiB/s]
Verifying: 100%|██████████| 45.9M/45.9M [00:00<00:00, 478MiB/s]


#### Creating a vector database

In [9]:
test = embeddings.embed_query("This is a text")

In [10]:
test[:5]

[-0.01996530033648014,
 0.10024803876876831,
 -0.03600193187594414,
 -0.016954611986875534,
 -0.00673701474443078]

In [11]:
# Using the Chroma vector database
from langchain_chroma import Chroma

db = Chroma.from_documents(documents = chunk, embedding=embeddings)

In [12]:
#@title Testing our Database

question = "When did Nicodemus meet Jesus?"
docs = db.similarity_search(question)

In [13]:
#@title Creating a retriever

retriever = db.as_retriever()

In [14]:
#@title Creating the LLM

from transformers import AutoTokenizer, AutoModelForDocumentQuestionAnswering, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl")

model = AutoModelForSeq2SeqLM.from_pretrained(
    "google/flan-t5-xl"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.45G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [21]:
#@title Creating a pipeline
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
import torch

pipe = pipeline(
    "text2text-generation",
    model = model,
    tokenizer = tokenizer,
    max_length = 512,
    # temperature = 0,
    # top_p = 0.95,
    repetition_penalty = 1.15
)

local_llm = HuggingFacePipeline(
    pipeline=pipe,
    # device="auto",
    )

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [22]:
#@title Retrieval QA

# RetrievalQA from langchain_chains
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm = local_llm,
    chain_type = "stuff",
    retriever = retriever,
    return_source_documents=True
)

In [23]:
#@title Test running the RetrievalQA

ans = qa("How can one be born again?")

In [19]:
ans["result"]

'To make an effort to constantly tune yourself to the promptings of the Spirit within, sense the push and take that step in faith.'

In [24]:
qa("When did Nicodemus meet Jesus")["result"]

'One night'