Vectorstore-backed Memory

This means storing the personality preferences in a data structure that's easy to look up. In the case of LLMs, since Q&A is unstructured data and queries, a vector database might come to mind. This is not the only option (can always convert unstructured queries to structured, and query data in SQL or other ways).

Pinecone setup to store user's personality bank , and only query the relevant parts when the assistant looks at the user's query.

First, we set up Pinecone as our vector database.

Step one is to install Pinecone.
https://github.com/trancethehuman/ai-workshop-code/blob/main/Long_term_memory_%26_personalized_LLM_responses.ipynb

In [None]:
pip install pinecone-client --quiet

In [None]:
PINECONE_API_KEY = getpass.getpass('Enter your Pinecone API key: ')

setup Pinecone SDK client. using Pinecone's new serverless architecture (free lol)

In [None]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=PINECONE_API_KEY)

create an index. An index for a vector database is a way to organize embeddings in an efficient manner for search. Popular indexing algorithms are: HNSW, IVF, etc.. I use HNSW with Supabase (which is just a Postgres table and pg-vector extension underneath)

In [None]:
index_name = "user-preferences"

EMBEDDINGS_DIMENSIONS = 512

# Just checking to see if this index already exist
existing_indexes = pc.list_indexes().names()

# If index doesn't exist yet, then delete it and create one (we're starting from scratch)
if index_name in existing_indexes:
  pc.delete_index(index_name)
  print("Deleted index.")

pc.create_index(
    name=index_name,
    dimension=EMBEDDINGS_DIMENSIONS,
    metric="cosine",
    spec=ServerlessSpec(
        cloud='aws',
        region='us-east-1'
    )
)
print("Created index.")

index = pc.Index(index_name)

few things about this setup:

- "cosine" stands for cosine similarity, which is a way to calculate distance between vectors in 3D space. It's just one way to do vector search.
- dimension: How many dimensions does embedding model create for each piece of text? Heavily fine-tuned models with high quality data can get away with lower numbers (like OpenAI's recent text-embedding-3-large can throw away half the dimensions and still perform as well as dumber models that require more dimensions to represent the same concept. 
went with 512 because that's an optimal number for this chosen embeddings model. Lower dimensions also saves  database storage and potentially save lots of money.
also need to setup embedding model so easily convert text to numbers for easy vector search.

In [None]:
pip install langchain-openai --quiet

In [None]:

from langchain_openai import OpenAIEmbeddings

embedding_client = OpenAIEmbeddings(api_key=OPENAI_API_KEY,
    model="text-embedding-3-small", dimensions=EMBEDDINGS_DIMENSIONS)

now update chat pipeline to do a search of the user's existing preferences in the vector index to get similar things. This will help LLM make a better decision when recommending things.

need to:
- Embed the latest user query. This allows us to use the embeddings to find similar data in user's preferences.
- Find the list of existing preferences.
- Let the LLM know that these are the preferences that are relevant to the latest query.

here's a handy function to just print the streaming content to our terminal.

In [None]:
system_message = {"role": "system", "content": "You are a helpful personal assistant. Your main goal is to take into account what you know about the user and answer them in pirate speak."}
messages_history = [system_message]


def print_ai_answer(user_input: str):
    for chunk in get_ai_answer(user_input, messages_history):
        print(chunk, end="")

In [None]:
def extract_personality(user_input: str):
  entity_extraction_system_message = {"role": "system", "content": "Looking at the user's message, you must extract their likes and dislikes and put them as strings into lists, and respond only in JSON in this format: {{""likes"": ""[]"", ""dislikes"": ""[]""}}"}

  messages = [entity_extraction_system_message]
  messages.append({"role": "user", "content": user_input})

  response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        stream=False,
        response_format={"type": "json_object"}
    )

  return response.choices[0].message.content

In [None]:
import uuid
import json
from pprint import pprint

# Declare this here just to wipe the slate clean again.
extracted_personality = {"likes": [], "dislikes": []}

def print_ai_answer(user_input: str):
  # Clear the current messages history for fair testing :)
  messages_history = []
  print("Messages history cleared.")

  # Extract relevant personality info and update personality dictionary
  new_personalities = json.loads(extract_personality(user_input))
  print("Done extracting personality.")
  extracted_personality["likes"].extend(new_personalities["likes"])
  extracted_personality["dislikes"].extend(new_personalities["dislikes"])

  # Embed each new personality item, give them metadata so we can filter later, and upsert them to our vector database
  embeddings = []
  for dislike in extracted_personality["dislikes"]:
    text_to_embed = f"The user dislikes {dislike}"
    current_embeddings = embedding_client.embed_query(text_to_embed)

    dislike_with_metadata = {
        "id": str(uuid.uuid4()), "values": current_embeddings, "metadata": {"type": "dislikes", "content": dislike}
    }
    embeddings.append(dislike_with_metadata)

  for like in extracted_personality["likes"]:
    text_to_embed = f"The user likes {like}"
    current_embeddings = embedding_client.embed_query(text_to_embed)

    dislike_with_metadata = {
        "id": str(uuid.uuid4()), "values": current_embeddings, "metadata": {"type": "likes", "content": like}
    }
    embeddings.append(dislike_with_metadata)

  # Push all of our embeddings (likes and dislikes) to vector database (Pinecone)
  index.upsert(vectors=embeddings)

  # Embed the user's question so we can compare to our embedded personality items
  user_query_embedded = embedding_client.embed_query(user_input)

  # Search for relevant personalities. Here, to make my life easier, we just look up things the person likes
  likes_filter={
        "type": {"$eq": "likes"}
    }

  found_likes = index.query(
      vector=user_query_embedded,
      filter=likes_filter,
      top_k=1, # we only want one piece of personality trait from our bank
      include_values=True,
      include_metadata=True
  )


  def get_content_out_of_pinecone_query_result():
    content_list = []
    # Loop through each match in the query result
    for match in found_likes.get('matches', []):
        # Get the 'metadata' dictionary from the match
        metadata = match.get('metadata', {})
        # Extract the 'content' from the 'metadata'
        if 'content' in metadata:
            content_list.append(metadata['content'])

    return content_list

  found_likes_formatted = get_content_out_of_pinecone_query_result()

  print("Found the following relevant things that the user liked in the past:")
  pprint(found_likes_formatted)

  # Insert our user's likes into the messages_history list
  messages_history.append({"role": "assistant", "content": f"The user really likes: {found_likes_formatted}. This is what you know about the user. Use it in your answer."})

  # Generate a final answer
  for chunk in get_ai_answer(user_input, messages_history):
      print(chunk, end="")