# Embeddings and Vector Databases with Pinecone

**Embeddings** are vector representations of text. They are used to represent text in a vector space, where the distance between vectors represents the semantic similarity between the texts.

**Vector Databases** are databases that store vectors and their associated metadata. They are used to store and retrieve embeddings. Vector databases are organized into indexes (also called namespaces) - similar to tables in a relational database.

[Pinecone](https://www.pinecone.io/) is a vector database service that allows you to store and retrieve embeddings. It is a hosted service that allows you to scale your vector database as needed.

There are two main ways to use Pinecone:

1. **Store an embedding** - Store an embedding in a vector database.
    - Embed the text you want to store.
    - Create a document with the embedding and metadata.
    - Store the document in a vector database.
2. **Query a vector database** - Query a vector database for the most similar embeddings to a given query.
    - Embed the query.
    - Query the vector database with the embedded query.
    - Retrieve the most similar embeddings to the query.

### Install Pinecone and OpenAI

In [None]:
!pip install pinecone openai

### Import Libraries

In [1]:
import os
import uuid
from datetime import datetime, timezone
from pinecone import Pinecone
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

True

### Define Environment variables

In [2]:
PINECONE_API_KEY=os.getenv("PINECONE_API_KEY") # Pinecone API key
PINECONE_INDEX_NAME=os.getenv("PINECONE_INDEX_NAME") # Name of the vector database index
PINECONE_NAMESPACE=os.getenv("PINECONE_NAMESPACE") # Namespace in your index on Pinecone.io

### Initialize Pinecone and OpenAI

In [3]:
# Initialize Pinecone for vector database
pc = Pinecone(PINECONE_API_KEY)
# Initialize the vector database index
index = pc.Index(PINECONE_INDEX_NAME)
# Initialize OpenAI for embeddings 
client = OpenAI()

## 1. Store an embedding
-----------

In [4]:
string_to_store = "I like museums."

### Embed the string

In [5]:
#OpenAI embeddings
def get_embeddings(string_to_embed):
    response = client.embeddings.create(
        input=string_to_embed,
        model="text-embedding-ada-002"
    )
    return response.data[0].embedding

In [6]:
vector = get_embeddings(string_to_store)

In [7]:
print(f"Vector representation of {string_to_store}: \n", vector)

Vector representation of I like museums.: 
 [-0.02799811214208603, -0.009131494909524918, 0.0014231839450076222, -0.017543772235512733, -0.02284800074994564, 0.007545363157987595, -0.013382583856582642, -0.009715858846902847, 0.020215151831507683, -0.002706698374822736, -0.0010113996686413884, 0.010171791538596153, 0.027047717943787575, -0.016323670744895935, -0.01905926503241062, -0.010467184707522392, 0.027330268174409866, 0.0021496256813406944, 0.011359784752130508, -0.03257028013467789, 0.004623541608452797, 0.003307116450741887, 0.00547119090333581, -0.040815599262714386, -0.002648903988301754, -0.007500411942601204, 0.012534935027360916, -0.013523858971893787, -0.014564155600965023, -0.005602833349257708, 0.04497678577899933, -0.0007906577084213495, -0.005458347499370575, -0.03411146253347397, 0.0062514133751392365, -0.01288170088082552, -0.008303110487759113, -0.0011518718674778938, -0.011436844244599342, 0.004424472339451313, -0.008495757356286049, 0.00919571053236723, 0.001648

### Define the vector metadata to store in the vector database

In [8]:
user_id = "1234"
path = "user/{user_id}/recall/{event_id}"
current_time = datetime.now(tz=timezone.utc)
path = path.format(
    user_id=user_id,
    event_id=str(uuid.uuid4()),
)

### Build the vector document to be stored

In [9]:
# Build document dictionary
documents = [
    {
        "id": str(uuid.uuid4()),
        "values": vector,
        "metadata": {
            "payload": string_to_store,
            "path": path,
            "timestamp": str(current_time),
            "type": "recall", # Define the type of document i.e recall memory
            "user_id": user_id,
        },
    }
]


### Store the vector document in the vector database

In [10]:
index.upsert(
    vectors=documents,
    namespace=PINECONE_NAMESPACE
)

{'upserted_count': 1}

## 2. Query a vector database
-----------

In [11]:
query_string = "What do I like?"
user_id = "1234"
top_k = 10 # This is the number of most similar embeddings to return

### Embed the query

In [12]:
vector = get_embeddings(query_string)

### Query the vector database for similar top_k embeddings + filters

In [13]:
response = index.query(
    vector=vector,
    filter={
        "user_id": {"$eq": user_id},
        "type": {"$eq": "recall"},
    },
    namespace=PINECONE_NAMESPACE,
    include_metadata=True,
    top_k=top_k,
)

In [14]:
response

{'matches': [{'id': '7ade8ba3-e703-4b4a-8925-b323329ef8b3',
              'metadata': {'path': 'user/1234/recall/5dc29c57-5fde-4aec-9a5a-6eeade74edbc',
                           'payload': 'I like cars.',
                           'timestamp': '2025-02-18 08:19:28.966392+00:00',
                           'type': 'recall',
                           'user_id': '1234'},
              'score': 0.844723523,
              'values': []},
             {'id': 'b9dfbd06-bbc8-4dae-82e6-69d58e40bcab',
              'metadata': {'path': 'user/1234/recall/07a8dec0-53ff-41dc-bd52-b710c856d0a7',
                           'payload': 'I like museums.',
                           'timestamp': '2025-04-24 12:32:27.464081+00:00',
                           'type': 'recall',
                           'user_id': '1234'},
              'score': 0.826801479,
              'values': []},
             {'id': 'user/1234/recall/a0f8e2ee-ab25-4409-a6d7-f54c75042b38',
              'metadata': {'path': 'user/1

### Build the memories list

In [15]:
memories = []
if matches := response.get("matches"):
    memories = [m["metadata"]["payload"] for m in matches]
    memories
memories

['I like cars.',
 'I like museums.',
 "I don't have any information about your preferences yet. If you share some of your interests, hobbies, or favorite topics, I can save that information and help provide more personalized responses in the future!",
 "I don't have any information about your preferences yet. If you share some of your interests, hobbies, or favorite topics, I can save that information and help provide more personalized responses in the future!",
 'User likes pineapple.',
 'User likes avocados.',
 'User likes popcorn.',
 'User likes potatoes.',
 'You like potatoes! If there are any specific dishes or ways you enjoy potatoes, feel free to share, and I can remember that for future interactions.',
 'User loves mangoes.']