# Embeddings and Vector Databases with Pinecone

**Embeddings** are vector representations of text. They are used to represent text in a vector space, where the distance between vectors represents the semantic similarity between the texts.

**Vector Databases** are databases that store vectors and their associated metadata. They are used to store and retrieve embeddings. Vector databases are organized into indexes (also called namespaces) - similar to tables in a relational database.

[Pinecone](https://www.pinecone.io/) is a vector database service that allows you to store and retrieve embeddings. It is a hosted service that allows you to scale your vector database as needed.

There are two main ways to use Pinecone:

1. **Store an embedding** - Store an embedding in a vector database.
    - Embed the text you want to store.
    - Create a document with the embedding and metadata.
    - Store the document in a vector database.
2. **Query a vector database** - Query a vector database for the most similar embeddings to a given query.
    - Embed the query.
    - Query the vector database with the embedded query.
    - Retrieve the most similar embeddings to the query.

### Install Pinecone and OpenAI

In [None]:
!pip install pinecone openai

### Import Libraries

In [None]:
import os
import uuid
from datetime import datetime, timezone
from pinecone import Pinecone
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

### Define Environment variables

In [5]:
PINECONE_API_KEY=os.getenv("PINECONE_API_KEY") # Pinecone API key
PINECONE_INDEX_NAME=os.getenv("PINECONE_INDEX_NAME") # Name of the vector database index
PINECONE_NAMESPACE=os.getenv("PINECONE_NAMESPACE") # Namespace in your index on Pinecone.io

### Initialize Pinecone and OpenAI

In [6]:
# Initialize Pinecone for vector database
pc = Pinecone(PINECONE_API_KEY)
# Initialize the vector database index
index = pc.Index(PINECONE_INDEX_NAME)
# Initialize OpenAI for embeddings 
client = OpenAI()

## 1. Store an embedding
-----------

In [7]:
string_to_store = "I like cars."

### Embed the string

In [8]:
#OpenAI embeddings
def get_embeddings(string_to_embed):
    response = client.embeddings.create(
        input=string_to_embed,
        model="text-embedding-ada-002"
    )
    return response.data[0].embedding

In [9]:
vector = get_embeddings(string_to_store)

In [None]:
print(f"Vector representation of {string_to_store}: \n", vector)

### Define the vector metadata to store in the vector database

In [11]:
user_id = "1234"
path = "user/{user_id}/recall/{event_id}"
current_time = datetime.now(tz=timezone.utc)
path = path.format(
    user_id=user_id,
    event_id=str(uuid.uuid4()),
)

### Build the vector document to be stored

In [12]:
# Build document dictionary
documents = [
    {
        "id": str(uuid.uuid4()),
        "values": vector,
        "metadata": {
            "payload": string_to_store,
            "path": path,
            "timestamp": str(current_time),
            "type": "recall", # Define the type of document i.e recall memory
            "user_id": user_id,
        },
    }
]


### Store the vector document in the vector database

In [None]:
index.upsert(
    vectors=documents,
    namespace=PINECONE_NAMESPACE
)

## 2. Query a vector database
-----------

In [30]:
query_string = "What do I like?"
user_id = "1234"
top_k = 10 # This is the number of most similar embeddings to return

### Embed the query

In [31]:
vector = get_embeddings(query_string)

### Query the vector database for similar top_k embeddings + filters

In [32]:
response = index.query(
    vector=vector,
    filter={
        "user_id": {"$eq": user_id},
        "type": {"$eq": "recall"},
    },
    namespace=PINECONE_NAMESPACE,
    include_metadata=True,
    top_k=top_k,
)

In [None]:
response

### Build the memories list

In [None]:
memories = []
if matches := response.get("matches"):
    memories = [m["metadata"]["payload"] for m in matches]
    memories
memories