# Python LLM Q&A Bot with Long-Term Memory

If you've been working with LLMs like ChatGPT for a while, you'll know that ChatGPT's knowledge cut-off is at the year 2021. This means _it doesn't know events or information beyond that point._

For example, if you ask it about major events in the year 2023, it's not going to be able to answer.

So yeah, for anything beyond 2021, you will have to teach it yourself. One way is to give it some context along with your questions. But that only works for something not too long and it mostly works when you want a summary of something because, logically speaking, if you already have the answer, why ask a bot again.

And for longer or larger context, like books or private documents, it's unrealistic to do it.

So instead, what if we store our large text data in some sort of database and let the bot query the data and answer our question based on that?

Yes, it is possible to do that. We're going to utilize what we call a **vector database**, a type of database that stores data in a high-dimensional vector form. That data can be many things: text, sound, image, etc.

Vector database is different from traditional database in that, with traditional ones like SQL, your queries have to be exact in order to get the correct data. With vector database, there's something called "similarity search", where you get the results based on the similarity of your queries and the information stored in the database.

And because of that, it's quite good for working with something like recommendation system, asking questions, text search, etc.

Enough about that. Let's get to building stuff.

## What We're Going to Build

This mini-project we're going to be working on is essentially a GPT-based chatbot that can answer stuff that doesn't exist in GPT's knowledge such as things beyond the year 2021, private documents that aren't available publically, or product-specific information.

For example, you can feed it an internal rules or code of conduct for your company so your employees can ask about them easily without having to read through the entire document just to get an answer.

### Tools and Frameworks

These are the tools and frameworks we're going to be using:
- **Language:** Python
- [**ChromaDB**](https://www.trychroma.com/) - An open-source vector database you can host on your machine.
    - You can also use other vector database like [Pinecone](https://www.pinecone.io/), [Milvus](https://milvus.io/), and etc. But for simplicity's sake, I will use Chroma's transient storage to store the data.
- [**LangChain**](https://python.langchain.com/docs/get_started/introduction.html) - A framework that makes it much easier to work with LLMs and vector database. It basically acts as a bridge between language models and vector database. It also provides many functionalities that help prompt generation and query processing.
- [**OpenAI**](https://openai.com/) - This will be our LLM bot that convert the data queried from the database into proper sentences.

**Python Packages**

These are the Python packages we're going to be using:
- `python-dotenv`
- `langchain`
- `tiktoken`
- `wikipedia`
- `chromadb`

### AN IMPORTANT NOTE

LangChain is still very early in the development and is constantly being updated with more features. The code you see in this project will very likely break in the future. If you're stuck, check out the official documentation.

In [4]:
# Install required packages
%pip install -q openai langchain python-dotenv tiktoken chromadb wikipedia

Note: you may need to restart the kernel to use updated packages.


## Let's get started

Below are all the packages we will need. You might be confused as to which does what. Don't worry. **I will also be importing them in the cells that require them** so you know when to use them.

In [2]:
import tiktoken
import os

from dotenv import load_dotenv, find_dotenv
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain, RetrievalQA
from langchain.document_loaders import WikipediaLoader

### Prepare The Environment

We're going to be utilizing an environment variable named `OPENAI_API_KEY`. This is entirely optional and you can pass the key manually in the functions that need it. I'll be using it to save some time.

An example of `.env` file:
```
OPENAI_API_KEY=your_openai_api_key

```

In [5]:
import os
from dotenv import load_dotenv, find_dotenv

# Load the environment variables from .env file
# This will set the API key for OpenAI so we don't have to manually enter it later
load_dotenv(find_dotenv(), override=True)

True

`load_dotenv()` loads the environment varibles from a specified file. In this case, it loads from the `.env` file found in the root folder with `find_dotenv()`.

### Prepare And Store Text Data As Vectors

Before we can ask a bot to answer our questions, we need to have the information that the bot will be using first.

#### Load Text Data

Let's get some text content. LangChain supports tons of ways to load text ranging from `.docx` or `.pdf` to scraping websites like Wikipedia or Hacker News.

Check out LangChain's [document loaders](https://python.langchain.com/docs/integrations/document_loaders/) for a complete list of loaders it supports.

In this project, We will be loading the Apple Inc. Wikipedia page. This will be the content that we will store in a vector database.

In [4]:
from langchain.document_loaders import WikipediaLoader

# Let's load a Wikipedia page about YouTuber
loader = WikipediaLoader(query='Apple Inc.', lang='en', load_max_docs=30)
data = loader.load()

`WikipediaLoader()` creates a loader instance. The loader instance has the `load()` function that loads Wikipedia documents, clean them, and return the text data.

In [5]:
print(len(data))

10


#### Split Text Into Chunks

To store the text data as vectors, we need to split them into chunks.

In [6]:
# Split data into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter
    
text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=0)
chunks = text_splitter.split_documents(data)

`RecuriveCharacterTextSplitter` splits text into smaller chunks. The `chunk_size` parameter is the length of each chunk. By default, the length of each chunk is determined by the number of characters (using `len()`), but you can change it to other functions by passing a function to the `length_function` parameter. In this project, I'm going to use the default `len` function. `chunk_overlap` is how much should one chunk's text overlap with the next.

#### Vector Embedding and Storing Data

Vector embedding is a way to turn data into vectors, numerical representation of pieces of information, that can be stored in vector databases. Vectors are used to compare pieces of data. The closer their _directions_ to each other, the more similar the pieces are.

To embed vectors, we'll have to rely on embedding models. In this project, we will use the embedding model "Ada v2" by OpenAI.

Vector embedding costs money so let's calculate the cost. Check out OpenAI's [pricing](https://openai.com/pricing) page for more information.

In [7]:
# Calculate the number of tokens used for the documents
import tiktoken

# Currently, there are only one embedding model: 'text-embedding-ada-002'
enc = tiktoken.encoding_for_model('text-embedding-ada-002')
token_count = sum([len(enc.encode(page.page_content)) for page in chunks])

# Embedding cost per 1000 tokens as of the time of this writing
embedding_cost = 0.0001
total_cost = (token_count / 1000) * embedding_cost

print(f'Token count: {token_count}')
print(f'Total cost (USD): {total_cost:.6f}')

Token count: 8472
Total cost (USD): 0.000847


Let's continue on with text embedding. As mentioned earlier, we will be using OpenAI's text embedding model.

LangChain supports many vector databases such as Chroma, Pinecone, Milvus, and Weaviate. For simplicity's sake, we will store the text data along with its vector embeddings in a transient Chroma vector database.

In [8]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Create an OpenAI Embedding model instance
# using the API key from the environment variable OPENAI_API_KEY
embeddings = OpenAIEmbeddings()
# or do this if you want to manually enter your API key
# embeddings = OpenAIEmbeddings(openai_api_key="my-api-key")

# Store the chunks and their vector embeddings with Chroma
# and return a vectorstore object
# This is a transient database, meaning it's not persistent
# and will be gone with the session is closed.
# To make the data persistent, pass in `persist_directory="./chroma_db"`
vectorstore = Chroma.from_documents(chunks, embeddings)

`OpenAIEmbeddings` creates an embedding model instance, and `Chroma.from_documents` uses that model to "embed" the text data with vectors and store them inside a database and then returns you a `vectorstore` object which can be used to query data.

### Setup A Q&A Bot

Next, let's setup a GPT bot that we will be using to ask questions about the data we just stored. We're going to use GPT-3.5 Turbo as our AI assistent to answer our questions. You can also use GPT-4 if your OpenAI account is eligible.

In [9]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain, RetrievalQA

# Create a OpenAI LLM instance
# You can also try the GPT-4 model if your account is eligible
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.7)

# Turn the Chroma vectorstore into a retriever to be used in the chain
# You can increase the 'k' value to make it return more results for better context
retriever = vectorstore.as_retriever(search_type='similarity', search_kwargs={'k': 10})

# Create a Q&A chain instance
# Use RetrievalQA instead if you don't want the ability to have a chat history
qa_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=retriever)

`ChatOpenAI` creates an OpenAI LLM instance. This is the chat model that will basically look at the queried text data and answer our questions.

`ConversationalRetrievalChain.from_llm` is a LangChain function that creates an Conversational Chain instance that handles fetching of data from a vector store, create a prompt for the LLM, and then return the answer. It can also retain "memory" and use previous questions as the context to answer other questions, too.

### Ask Questions

Now, we're ready to ask our bot questions.

In [15]:
# Create an empty list to store chat history
chat_history = []

# Ask the chat bot something
query = "When was Apple founded?"

# Get the result
result = qa_chain({"question": query,
                   "chat_history": chat_history
                  })

# Append the chat history
# The format is (question, answer)
chat_history.append((query, result['answer']))

result['answer']

'Apple was founded on April 1, 1976.'

In this project, we'll be storing chat history inside a simple list. That format for each object in chat history is `(question, answer)`.

We ask a question by calling the chain (in this case, `qa_chain`) and passing in a question along with the chat history list. The bot will return a result object. This is where we record the answer to the `chat_history` list.

As you can see, the bot is able to answer our question. Now, what if we ask it to multiply the year number by 2?

In [16]:
# Ask the chat bot something again
query = "Multiply the year by 2"

# Get the result
result = qa_chain({"question": query,
                   "chat_history": chat_history
                  })

# Append the chat history
chat_history.append((query, result['answer']))

result['answer']

'The year Apple was founded is 1976. When you multiply 1976 by 2, the result is 3952.'

As you can see, even if we kept the question vague by not specifying what year we were referring to, the bot successfully deduced that we meant the year Apple was founded because of the previous question.

Here is the chat history:

In [17]:
chat_history

[('When was Apple founded?', 'Apple was founded on April 1, 1976.'),
 ('Multiply the year by 2',
  'The year Apple was founded is 1976. When you multiply 1976 by 2, the result is 3952.')]

And that's it! LangChain and vector database are very powerful. We can utilize this with many things like a chat bot for answering customers about your products or services, or a bot for answering questions regarding rules and legal documents. The possibilities are endless.