# Custom Knowledge ChatGPT with LangChain - Chat with PDFs

- Installs, Imports and API Keys
- Loading PDFs and chunking with LangChain
- Embedding text and storing embeddings
- Creating retrieval function
- Creating chatbot with chat memory (OPTIONAL)

In [1]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "Screenshot 2024-09-20 at 2.41.38 PM.png")

### Installs, Imports and API Keys

In [None]:
!pip install -q langchain==0.0.150 pypdf transformers openai faiss-cpu

In [2]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from transformers import GPT2TokenizerFast
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain

In [4]:
# You'll need to have a Paid OpenAI account for this

os.environ["OPENAI_API_KEY"] = "{YOURAPIKEY}"

### Loading PDFs and Text Extraction

In [3]:
# You MUST add your PDF to the local file directory. Change the PATH accordingly!

# Simple method - Split by pages 
loader = PyPDFLoader("./SBP-Act.pdf")
pages = loader.load_and_split()
print(pages[15])

chunks = pages

page_content='- 11 - \n    meetings without the right to vote.  \n \n(4) The Governor shall be the Chairperson of the Board. In the Governor’s  \nabsence, the Board shall be chaired by the Deputy Governor in charge of the \nBoard meeting agenda items:  \n \n       Provided that when the Deputy Governor chairs the Board meeting in \nabsenc e of the Governor, the Deputy Governor shall have the right of casting \nvote.  \n  \n(5) The non -executive Directors shall be eminent professionals each of whom is \nwell-known for his integrity, exper tise, and experience in  the fields of \neconomics, financial services, banking, law, information technology, risk \nmanagement or accountancy to perform the oversight. They shall have an \nadvanced degree from a recognized university or hold professional \naccreditation, and relevant ex perience in any such fields for not less than ten \nyears.  \n \n389A. Powers of the Board.  —(1) The Board, with the exception of the powers  \nentrusted to the M on

### Embed text and store embeddings

In [5]:
# Get embedding model
embeddings = OpenAIEmbeddings()

# Create vector database
db = FAISS.from_documents(chunks, embeddings)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..
Retrying langchain.embeddings.openai.embed_wit

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

### Setup retrieval function

In [None]:
# Check similarity search is working

query = "What do you know about Dismissal of the Governor and Deputy Governor"
docs = db.similarity_search(query)
docs[0]

In [None]:
# similar documents

len(docs)

In [None]:
# Create QA chain to integrate similarity search with user queries (answer query from knowledge base)

chain = load_qa_chain(OpenAI(temperature=0, model='gpt-3.5-turbo-instruct'), chain_type="stuff")

query = "What do you know about Dismissal of the Governor and Deputy Governor?"
docs = db.similarity_search(query)

chain.run(input_documents=docs, question=query)

### Create chatbot with chat memory (OPTIONAL)

In [None]:
from IPython.display import display
import ipywidgets as widgets

# Create conversation chain that uses our vectordb as retriver, this also allows for chat history management
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.1, model='gpt-3.5-turbo-instruct'), db.as_retriever())

In [None]:
chat_history = []

def on_submit(_):
    query = input_box.value
    input_box.value = ""
    
    if query.lower() == 'exit':
        print("Thank you for using the SBP chatbot!")
        return
    
    result = qa({"question": query, "chat_history": chat_history})
    chat_history.append((query, result['answer']))
    
    display(widgets.HTML(f'<b>User:</b> {query}'))
    display(widgets.HTML(f'<b><font color="blue">Chatbot:</font></b> {result["answer"]}'))

print("Welcome to the SBP chatbot! Type 'exit' to stop.")

input_box = widgets.Text(placeholder='Please enter your question:')
input_box.on_submit(on_submit)

display(input_box)