# Embeddings Generator for OMDB Movies Dataset

The generator uses Ollama to create embeddings for the movies dataset as well as for some phrases used in the similarity search.

**0. Prerequsites**

* Install python and pip
* Start a Postgres container with pgvector and upload the movies dataset following the instructions from [chapter8.md](../chapter8.md)
* Start an Ollama container and download required embedding and large language models follow the [ai_samples/README.md](README.md) instructions.

**1. Install Required Modules**

In [None]:
pip install -q psycopg2-binary==2.9.9 langchain-ollama==0.2.3

**2. Initialize Ollama Embedding Model**

In [44]:
from langchain_ollama import OllamaEmbeddings

embedding_model = OllamaEmbeddings(model="mxbai-embed-large:335m")

**3. Connect to Postgres**

In [47]:
import psycopg2

db_params = {
    "host": "localhost",
    "port": 5432,
    "dbname": "postgres",
    "user": "postgres",
    "password": "password"
}

conn = psycopg2.connect(**db_params)

**4. Generate Embeddings for Movies**

In [None]:
cursor = conn.cursor()

# Reset the movie_embedding column to NULL
cursor.execute("UPDATE omdb.movies SET movie_embedding = NULL")
conn.commit()

# Fetch all movies from the database
cursor.execute("SELECT id, name, description FROM omdb.movies")
movies = cursor.fetchall()

counter = 0

# Iterate over each movie and generate the embedding
for movie in movies:
    id, name, description = movie
    combined_text = f"{name} {description}"
    embedding = embedding_model.embed_query(combined_text)
    
    # Update the database with the generated embedding
    cursor.execute(
        "UPDATE omdb.movies SET movie_embedding = %s WHERE id = %s",
        (embedding, id)
    )

    counter += 1
    if counter % 100 == 0:
        print(f"Processed {counter} movies")
        conn.commit()    

print(f"Finished processing {counter} movies in total")

# Close the cursor and connection
conn.commit()
cursor.close()

**5. Generate Embeddings for Phrases**

In [49]:
cursor = conn.cursor()

cursor.execute("TRUNCATE TABLE omdb.phrases_dictionary")
conn.commit()

phrases = [
    'May the force be with you',
    'A movie about a Jedi who fights against the dark side of the force',
    'A pirate captain who sails the seven seas in search of treasure',
    'A clown fish who gets lost in the ocean and tries to find his way home'
]

for phrase in phrases:
    embedding = embedding_model.embed_query(phrase)
    cursor.execute(
        "INSERT INTO omdb.phrases_dictionary (phrase, phrase_embedding) VALUES (%s, %s)",
        (phrase, embedding)
    )

conn.commit()
cursor.close()
conn.close()
