# Data storage

Although we have been implementing our own storage for chat history, and the ability to summarize conversations, it would be nice to have a more robust storage solution. It would also be nice to be able to search over our previous conversations.

There are many different options for storing data:
- Redis
- Postgres
- DynamoDB
- Pinecone

But we will use ChromaDB. Everybody has an opinion about various vectorstores, and many of them are valid. The reason we chose ChromaDB is because it is very easy to use, and get up and running quickly.

In this section, we will first set up a database and use it to store query over our chat history.

In [125]:
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

# All the usual imports
from rich.pretty import pprint
import dotenv
import os
dotenv.load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## Create the database

First we create a client to connect to our database.

We will use an OpenAI embedding model, `text-embedding-3-small`, to embed our chat history entries.

We create a class so we can add some extra functionality, such as clearing the database, and a counter to keep track of the number of entries.

In [129]:
class ChatDB:
    def __init__(self, name: str, model_name: str = "text-embedding-3-small"):
        self.model_name = model_name
        self.client = chromadb.PersistentClient(path="./")
        self.embedding_function = OpenAIEmbeddingFunction(api_key=OPENAI_API_KEY, model_name=model_name)
        self.chat_db = self.client.create_collection(name=name, embedding_function=self.embedding_function, metadata={"hnsw:space": "cosine"})
        self.id_counter = 0


    def add_conversation_to_db(self, user_message: str, ai_message: str):
        """Add a conversation between user and AI to the database.

        Args:
            user_message (str): User input message.
            ai_message (str): Response from the AI.
        """
        self.chat_db.add(
            documents=[f"User: {user_message}\nAI: {ai_message}"],
            metadatas=[{"user_message": user_message, "ai_message": ai_message}],
            ids=[str(self.id_counter)]
        )
        self.id_counter += 1


    def get_all_entries(self) -> dict:
        """Grab all of the entries in the database.

        Returns:
            dict: All entries in the database.
        """
        return self.chat_db.get()
    

    def clear_db(self, reinitialize: bool = True):
        """Clear the database of all entries, and reinitialize it.

        Args:
            reinitialize (bool, optional): _description_. Defaults to True.
        """
        self.client.delete_collection(self.chat_db.name)
        # re-initialize the database
        if reinitialize:
            self.__init__(self.chat_db.name, self.model_name)


    def query_db(self, query_text: str, n_results: int = 2) -> dict:
        """Given some query text, return the n_results most similar entries in the database.

        Args:
            query_text (str): The text to query the database with.
            n_results (int): The number of results to return.

        Returns:
            dict: The most similar entries in the database.
        """
        return self.chat_db.query(query_texts=[query_text], n_results=n_results)

Now we can initialize our database and add some entries.

In [130]:
chat_db = ChatDB("chat_db", "text-embedding-3-small")

In [131]:
chat_db.add_conversation_to_db(
    "Hello, my name is Alice, how are you?",
    "Nice to meet you Alice, I am Bob. I am fine, thank you for asking. How can I help you today?",
)
chat_db.add_conversation_to_db(
    "I am looking for a restaurant in the area.",
    "Great! What type of cuisine are you in the mood for?",
)

chat_db.add_conversation_to_db(
    "I am looking for some Italian food.",
    "There are many good Italian restaurants in the area. What is your budget?",
)

In [132]:
entries = chat_db.get_all_entries()
for entry in entries["documents"]:
    print(entry)
    print("-"*100)

User: Hello, my name is Alice, how are you?
AI: Nice to meet you Alice, I am Bob. I am fine, thank you for asking. How can I help you today?
----------------------------------------------------------------------------------------------------
User: I am looking for a restaurant in the area.
AI: Great! What type of cuisine are you in the mood for?
----------------------------------------------------------------------------------------------------
User: I am looking for some Italian food.
AI: There are many good Italian restaurants in the area. What is your budget?
----------------------------------------------------------------------------------------------------


## Querying the database

Now we can try and query the database.

In [133]:
results = chat_db.query_db("Food", n_results=3)
pprint(results, expand_all=True)

Notice that we have access to the cosine distance scores for each entry. The closer the score to 0, the more similar the query is to the entry.

In [135]:

for i, entry in enumerate(results["documents"][0]):
    print(entry)
    print(f"score: {results['distances'][0][i]}")
    print("-"*100)

User: I am looking for a restaurant in the area.
AI: Great! What type of cuisine are you in the mood for?
score: 0.7267490239862444
----------------------------------------------------------------------------------------------------
User: I am looking for some Italian food.
AI: There are many good Italian restaurants in the area. What is your budget?
score: 0.757357007227763
----------------------------------------------------------------------------------------------------
User: Hello, my name is Alice, how are you?
AI: Nice to meet you Alice, I am Bob. I am fine, thank you for asking. How can I help you today?
score: 0.8727205850443006
----------------------------------------------------------------------------------------------------


Now we can clear the entries

In [136]:
chat_db.clear_db()
entries = chat_db.get_all_entries()
for entry in entries["documents"]:
    print(entry)
    print("-"*100)

And as expected it is empty.

## Integration with a chat model

Coming soon...