# **Custom Knowledge ChatGPT with LangChain - Chat with PDFs**

**By Liam Ottley:**  [YouTube](https://youtube.com/@LiamOttley)

0.   Installs, Imports and API Keys
1.   Loading PDFs and chunking with LangChain
2.   Embedding text and storing embeddings
3.   Creating retrieval function
4.   Creating chatbot with chat memory (OPTIONAL) 

# 0. Installs, Imports and API Keys

Anaconda powershell prompt:

```shell
cd <path>
conda activate envpy39 #conda deactivate
jupyer notebook

```

In [None]:
# RUN THIS CELL FIRST!
!pip install -q langchain==0.0.150 
!pip install -q pypdf 
!pip install -q pandas 
!pip install -q matplotlib 
!pip install -q tiktoken 
!pip install -q textract 
!pip install -q transformers 
!pip install -q openai 
!pip install -q faiss-cpu 
!pip install -q python-dotenv

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
from transformers import GPT2TokenizerFast
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
# protect secret key

# 1. Loading PDFs and chunking with LangChain

In [None]:
# You MUST add your PDF to local files in this notebook (folder icon on left hand side of screen)

# Simple method - Split by pages 
loader = PyPDFLoader("./reports/resume.pdf")
pages = loader.load_and_split()
print(pages[0])

chunks = pages

# chunks = pages


# 2. Embed text and store embeddings

In [None]:
# Get embedding model
embeddings = OpenAIEmbeddings()

# Create vector database
db = FAISS.from_documents(chunks, embeddings)

# 3. Setup retrieval function

In [None]:
# Check similarity search is working
query = "Please summarise kristy's experience?"
docs = db.similarity_search(query)
docs[0]

In [None]:
# Create QA chain to integrate similarity search with user queries (answer query from knowledge base)

chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")

query = "please summarise kristy's experience?"
docs = db.similarity_search(query)

chain.run(input_documents=docs, question=query)

# 4. Create chatbot with chat memory (OPTIONAL) 

In [None]:
!pip install -q ipywidgets 
# the issue here is that the ipywidgets version is not compatible with the notebook version. 
# !jupyter nbextension enable --py widgetsnbextension # alternative

In [None]:
# import widgets
from IPython.display import display
import ipywidgets as widgets

# Create conversation chain that uses our vectordb as retriever, this also allows for chat history management
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.1), db.as_retriever()) 
# this variable will store the chatbot. Temperature is a parameter that controls the randomness of the chatbot's responses. retriever is the vector database we created earlier
# conversational retrieval chain is a chain that allows for chat history management

chat_history = [] # this variable will store the chat history

def on_submit(_):
    '''
    this function is called when the user clicks the submit button
    it does the following actions:
    - gets the user query from the input box
    - clears the input box
    - runs the query through the chatbot
    - displays the chatbot output
    '''
    query = input_box.value # get the user query from the input box
    input_box.value = "" # clear the input box
    
    if query.lower() == 'exit':
        '''
        this if statement checks if the user wants to exit the chatbot
        '''
        print("Thank you for using the State of the Union chatbot!")
        return
    
    result = qa({"question": query, "chat_history": chat_history}) # run the query through the chatbot
    chat_history.append((query, result['answer'])) # add the query and the chatbot output to the chat history
    
    display(widgets.HTML(f'<b>User:</b> {query}')) # display the user query
    display_chatbot_output(result) # display the chatbot output

def display_chatbot_output(result):
    '''
    this function displays the chatbot output
    it does the following actions:
    - displays the chatbot output
    '''
    display(widgets.HTML(f'<b><font color="blue">Chatbot:</font></b> {result["answer"]}'))

print("Welcome to Kristy's Career chatbot. Type 'exit' to stop.")

input_box = widgets.Text(placeholder='Please enter your question:')
display(input_box)

button = widgets.Button(description="Submit") # create a button that will be used to submit the user query
button.on_click(on_submit)
display(button)