# **Build a Customer Support Question Answering Chatbot**
**Introduction**

As we witness accelerated technological progress, large language models like GPT-4 and ChatGPT have emerged as significant breakthroughs in the tech landscape. These state-of-the-art models demonstrate exceptional prowess in content generation. However, they are not without their share of challenges, such as biases and hallucinations. Despite these limitations, LLMs have the potential to bring about a transformative impact on chatbot development.

Traditional, primarily intent-based chatbots have been designed to respond to specific user intents. These intents comprise a collection of sample questions and corresponding responses. For instance, a "Restaurant Recommendations" intent might include sample questions like "Can you suggest a good Italian restaurant nearby?" or "Where can I find the best sushi in town?" with responses such as "You can try the Italian restaurant 'La Trattoria' nearby" or "The top-rated sushi place in town is 'Sushi Palace.'"

When users interact with the chatbot, their queries are matched to the most similar intent, generating the associated response. However, as LLMs continue to evolve, chatbot development is shifting toward more sophisticated and dynamic solutions capable of handling a broader range of user inquiries with greater precision and nuance.

**Having a Knowledge Base**
LLMs can significantly enhance chatbot functionality by associating broader intents with documents from a Knowledge Base (KB) instead of specific questions and answers. This approach streamlines intent management and generates more tailored responses to user inquiries.

GPT3 has a maximum prompt size of around 4,000 tokens, which is substantial but insufficient for incorporating an entire knowledge base in a single prompt.

Future LLMs may not have this limitation while retaining the text generation capabilities. However, for now, we need to design a solution around it.

**Workflow**
This project aims to build a chatbot that leverages GPT3 to search for answers within documents. The workflow for the experiment is explained in the following diagram.

First we scrape some content from online articles, we split them into small chunks, compute their embeddings and store them in Deep Lake. Then, we use a user query to retrieve the most relevant chunks from Deep Lake, we put them into a prompt, which will be used to generate the final answer by the LLM.

It is important to note that there is always a risk of generating hallucinations or false information when using LLMs. Although this might not be acceptable for many customers support use cases, the chatbot can still be helpful for assisting operators in drafting answers that they can double-check before sending them to the user.

In the next steps, we'll explore how to manage conversations with GPT-3 and provide examples to demonstrate the effectiveness of this workflow:

# **First, set up the OPENAI_API_KEY and ACTIVELOOP_TOKEN environment variables with your API keys and tokens.**

As we’re going to use the SeleniumURLLoader LangChain class, and it uses the unstructured and selenium Python library, let’s install it using pip. It is recommended to install the latest version of the library. Nonetheless, please be aware that the code has been tested specifically on version 0.7.7.

In [None]:
pip install unstructured selenium

Collecting unstructured
  Downloading unstructured-0.12.4-py3-none-any.whl (1.8 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.8 MB[0m [31m8.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m29.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting selenium
  Downloading selenium-4.17.2-py3-none-any.whl (9.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m94.8 MB/s[0m eta [36m0:00:00[0m
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.10.1-py2.py3-none-any.whl (421 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
pip install langchain==0.0.208 deeplake openai==0.27.8 tiktoken

Collecting langchain==0.0.208
  Downloading langchain-0.0.208-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m17.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting deeplake
  Downloading deeplake-3.8.20.tar.gz (585 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m585.8/585.8 kB[0m [31m34.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting openai==0.27.8
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m33.2 MB/s[0m e

In [None]:
pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [None]:
from dotenv import load_dotenv

load_dotenv('/content/APIKeys.env')

True

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_= load_dotenv(find_dotenv())

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
ACTIVELOOP_TOKEN_new= os.environ['ACTIVELOOP_TOKEN_new']
HUGGINGFACEHUB_API_TOKEN  = os.environ['HUGGINGFACEHUB_API_TOKEN']


In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain import OpenAI
from langchain.document_loaders import SeleniumURLLoader
from langchain import PromptTemplate

These libraries provide functionality for handling OpenAI embeddings, managing vector storage, splitting text, and interacting with the OpenAI API. They also enable the creation of a context-aware question-answering system, incorporating retrieval and text generation.

The database for our chatbot will consist of articles regarding technical issues.

Copy


In [None]:
# we'll use information from the following articles
urls = ['https://beebom.com/what-is-nft-explained/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-download-gif-twitter/',
        'https://beebom.com/how-use-chatgpt-linux-terminal/',
        'https://beebom.com/how-delete-spotify-account/',
        'https://beebom.com/how-save-instagram-story-with-music/',
        'https://beebom.com/how-install-pip-windows/',
        'https://beebom.com/how-check-disk-usage-linux/']

# **1: Split the documents into chunks and compute their embeddings**
We load the documents from the provided URLs and split them into chunks using the CharacterTextSplitter with a chunk size of 1000 and no overlap:

In [None]:
# use the selenium scraper to load the documents
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()

# we split the documents into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(docs_not_splitted)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


Next, we compute the embeddings using OpenAIEmbeddings and store them in a Deep Lake vector store on the cloud. In an ideal production scenario, we could upload a whole website or course lesson on a Deep Lake dataset, allowing for search among even thousands or millions of documents. As we are using a cloud serverless Deep Lake dataset, applications running on different locations can easily access the same centralized dataset without the need of deploying a vector store on a custom machine.

Let’s now modify the following code by adding your Activeloop organization ID. It worth noting that the org id is your username by default.

In [None]:
# Before executing the following code, make sure to have
# your OpenAI key saved in the “OPENAI_API_KEY” environment variable.
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "mubisain"
my_activeloop_dataset_name = "langchain_course_customer_support"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings,token=ACTIVELOOP_TOKEN_new)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Your Deep Lake dataset has been successfully created!


Creating 149 embeddings in 1 batches of size 149:: 100%|██████████| 1/1 [00:04<00:00,  4.32s/it]

Dataset(path='hub://mubisain/langchain_course_customer_support', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype       shape      dtype  compression
  -------    -------     -------    -------  ------- 
   text       text      (149, 1)      str     None   
 metadata     json      (149, 1)      str     None   
 embedding  embedding  (149, 1536)  float32   None   
    id        text      (149, 1)      str     None   





['0673e9fe-ce7a-11ee-92e0-0242ac1c000c',
 '0673eb98-ce7a-11ee-92e0-0242ac1c000c',
 '0673ec2e-ce7a-11ee-92e0-0242ac1c000c',
 '0673ecb0-ce7a-11ee-92e0-0242ac1c000c',
 '0673ed3c-ce7a-11ee-92e0-0242ac1c000c',
 '0673edaa-ce7a-11ee-92e0-0242ac1c000c',
 '0673ee18-ce7a-11ee-92e0-0242ac1c000c',
 '0673ee86-ce7a-11ee-92e0-0242ac1c000c',
 '0673eef4-ce7a-11ee-92e0-0242ac1c000c',
 '0673ef62-ce7a-11ee-92e0-0242ac1c000c',
 '0673efd0-ce7a-11ee-92e0-0242ac1c000c',
 '0673f048-ce7a-11ee-92e0-0242ac1c000c',
 '0673f14c-ce7a-11ee-92e0-0242ac1c000c',
 '0673f1ce-ce7a-11ee-92e0-0242ac1c000c',
 '0673f23c-ce7a-11ee-92e0-0242ac1c000c',
 '0673f2b4-ce7a-11ee-92e0-0242ac1c000c',
 '0673f336-ce7a-11ee-92e0-0242ac1c000c',
 '0673f3a4-ce7a-11ee-92e0-0242ac1c000c',
 '0673f412-ce7a-11ee-92e0-0242ac1c000c',
 '0673f480-ce7a-11ee-92e0-0242ac1c000c',
 '0673f4ee-ce7a-11ee-92e0-0242ac1c000c',
 '0673f566-ce7a-11ee-92e0-0242ac1c000c',
 '0673f5de-ce7a-11ee-92e0-0242ac1c000c',
 '0673f656-ce7a-11ee-92e0-0242ac1c000c',
 '0673f6c4-ce7a-

To retrieve the most similar chunks to a given query, we can use the similarity_search method of the Deep Lake vector store:

In [None]:
# let's see the top relevant documents to a specific query
query = "how to check disk usage in linux?"
docs = db.similarity_search(query)
print(docs[0].page_content)

Check Disk Usage Using Gnome Disk Tool

Check Disk Usage Using Disk Usage Analyzer Tool

Cleanup Disk using Disk Usage Analyzer

Check Disk Space Using the df Command

In Linux, there are many commands to check disk usage, the most common being the df command. The df stands for “Disk Filesystem” in the command, which is a handy way to check the current disk usage and the available disk space in Linux. The syntax for the df command in Linux is as follows:

df <options> <file_system>

The options to use with the df command are:

Options Description -a Show information about all file systems including pseudo, duplicate and inaccessible file systems -h Display the sizes in human-readable format i.e in powers of 1024 -t Display the disk usage of only the file system of a particular type -x Display the disk usage excluding a particular file type

Display Disk Usage in Human Readable Format


# **2: Craft a prompt for GPT-3 using the suggested strategies**
We will create a prompt template that incorporates role-prompting, relevant Knowledge Base information, and the user's question:

In [None]:
# let's write a prompt for a customer support chatbot that
# answer questions using information extracted from our db
template = """You are an exceptional customer support chatbot that gently answer questions.

You know the following context information.

{chunks_formatted}

Answer to the following question from a customer. Use only information from the previous context information. Do not invent stuff.

Question: {query}

Answer:"""

prompt = PromptTemplate(
    input_variables=["chunks_formatted", "query"],
    template=template,
)

**The template sets the chatbot's persona as an exceptional customer support chatbot.** : The template takes two input variables: chunks_formatted, which consists of the pre-formatted chunks from articles, and query, representing the customer's question. The objective is to generate an accurate answer using only the provided chunks without creating any false or invented information.



# **3: Utilize the GPT3 model with a temperature of 0 for text generation**
To generate a response, we first retrieve the top-k (e.g., top-3) chunks most similar to the user query, format the prompt, and send the formatted prompt to the GPT3 model with a temperature of 0.




# **The full pipeline**

In [None]:
# the full pipeline

# user question
query = "How to check disk usage in linux?"

# retrieve relevant chunks
docs = db.similarity_search(query)
retrieved_chunks = [doc.page_content for doc in docs]

# format the prompt
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted, query=query)

# generate answer
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
answer = llm(prompt_formatted)
(answer)

' There are several ways to check disk usage in Linux. One of the most common methods is by using the df command. You can also use GUI tools like the Gnome Disk Tool or the Disk Usage Analyzer Tool to easily monitor disk usage. These tools provide a visual representation of the disk occupancy and allow you to delete unnecessary files or folders to free up space.'

# **Issues with Generating Answers using GPT-3**
In the previous example, the chatbot generally performs well. However, there are certain situations where it could fail.

Suppose we ask, "Is the Linux distribution free?" and provide GPT-3 with a document about kernel features as context. It might generate an answer like "Yes, the Linux distribution is free to download and use," even if such information is not present in the context document. Producing false information is highly undesirable for customer service chatbots!

**GPT-3 is less likely to generate false information when the answer to the user's question is contained within the context.** Since user questions are often brief and ambiguous, we cannot always rely on the semantic search step to retrieve the correct document. Thus, there is always a risk of generating false information.

Conclusion
GPT-3 is highly effective in creating conversational chatbots capable of answering specific questions based on the contextual information provided in the prompt. **However, it can be challenging to ensure that the model generates answers solely based on the context, as it has a tendency to hallucinate (i.e., generate new, potentially false information)**. The severity of generating false information varies depending on the use case.

To conclude, we implemented a context-aware question-answering system using LangChain, following the provided code and strategies. The process involved splitting documents into chunks, computing their embeddings, implementing a retriever to find similar chunks, crafting a prompt for GPT-3, and using the GPT3 model for text generation. This approach demonstrates the potential of leveraging GPT-3 to create powerful and contextually accurate chatbots while also highlighting the need to be cautious about the possibility of generating false information.

→ ConversationBufferMemory, which is the most straightforward, then

→ ConversationBufferWindowMemory, which maintains a memory window that keeps a limited number of past interactions based on the specified window size.

→ And the most complex variant, ConversationSummaryMemory that holds a summary of previous converations.