<h1> Personalized RAG using Movie Lens Datatset </h1>

In [1]:
import requests
import pandas as pd
import re


def get_movie_description(movie_name, api_key):
    movie_name = movie_name.strip().replace(' ', '+')
    url = f"http://www.omdbapi.com/?t={movie_name}&apikey={api_key}"
    print(f"Requesting: {url}")  # Debug URL being called
    response = requests.get(url)
    data = response.json()

    if data.get("Response") == "True":
        return 1, data["Plot"]
    else:
        return 0, f"Movie not found. Error: {data.get('Error', 'Unknown error.')}"

In [2]:

# Load your CSV
df = pd.read_csv('items.csv')

# Function to clean movie names
def clean_name(name):
    # Remove the year in parentheses
    name = re.sub(r'\s*\(\d{4}\)', '', name)
    # Move ", The" to front if present
    if ', The' in name:
        name = 'The ' + name.replace(', The', '')
        if '(' in name:
            name = name.split('(')[0].strip()
        return name

    if ', A' in name:
        name = 'A ' + name.replace(', A', '')
        return name

    if ', An' in name:
        name = 'An ' + name.replace(', An', '')
        return name
    
    if '(' in name:
        name = name.split('(')[0].strip()
        return name
    
    if ', ' in name:
        name = name.split(',')
        name = name[1].strip() + ' ' + name[0].strip()
        return name
    

    return name

# Apply cleaning function
df['title'] = df['title'].apply(clean_name)

# Save to new CSV
df.to_csv('cleaned_movies.csv', index=False, header=True)



In [3]:
df = pd.read_csv('cleaned_movies.csv')

movie_names = []

# Iterate through the DataFrame and populate the dictionary
for index, row in df.iterrows():
    
    movie_name = row['title']
    
    if movie_name not in movie_names:
        movie_names.append(movie_name)
    

In [5]:
len(movie_names)

1336

In [6]:
movie_desc = {}

fail_count = 0

for movie_name in movie_names:
    
    # Get the movie description using the function
    success, description = get_movie_description(movie_name, "d72d6b39")
    
    if success == 0:
        success, description = get_movie_description(movie_name, "e800227a")
    
    if success == 0:
        success, description = get_movie_description(movie_name, "b4621301")
    
    if success == 0:
        print(f"Failed to retrieve description for {movie_name}.")
        fail_count += 1
        continue
        
    movie_desc[movie_name] = description

Requesting: http://www.omdbapi.com/?t=Kolya&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=L.A.+Confidential&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Heavyweights&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Legends+of+the+Fall&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Jackie+Brown&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Dr.+Strangelove+or:+How+I+Learned+to+Stop+Worrying+and+Love+the+Bomb&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=The+Hunt+for+Red+October&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=The+Jungle+Book&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Grease&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=The+Remains+of+the+Day&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Men+in+Black&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Romy+and+Michele's+High+School+Reunion&apikey=d72d6b39
Requesting: http://www.omdbapi.com/?t=Star+Trek:+First+Contact&apikey=d72d6b39
Requesting:

In [7]:
fail_count

42

In [8]:
movie_desc

{'Kolya': 'A confirmed bachelor is in for the surprise of his life when a get-rich-quick scheme backfires and leaves him with a pint-sized new roommate.',
 'L.A. Confidential': 'As corruption grows in 1950s Los Angeles, three policemen - one strait-laced, one brutal, and one sleazy - investigate a series of murders with their own brand of justice.',
 'Heavyweights': 'Plump kids are lured into joining a posh fat camp with the promise of quick weight loss and good times, only to find that it is a woodland hellhole run by a psycho ex-fitness instructor.',
 'Legends of the Fall': 'In the early 1900s, three brothers and their father living in the remote wilderness of Montana are affected by betrayal, history, love, nature, and war.',
 'Jackie Brown': 'A flight attendant with a criminal past gets nabbed by the ATF for smuggling. Under pressure to become an informant against the illegal arms dealer she works for, she must find a way to secure her future without getting killed.',
 'Dr. Strange

In [9]:
import pickle
# Save dictionary to file
with open('movie_desc.pkl', 'wb') as f:
    pickle.dump(movie_desc, f)

In [10]:
# Load it back
with open('movie_desc.pkl', 'rb') as f:
    loaded_dict = pickle.load(f)

print(loaded_dict)

{'Kolya': 'A confirmed bachelor is in for the surprise of his life when a get-rich-quick scheme backfires and leaves him with a pint-sized new roommate.', 'L.A. Confidential': 'As corruption grows in 1950s Los Angeles, three policemen - one strait-laced, one brutal, and one sleazy - investigate a series of murders with their own brand of justice.', 'Heavyweights': 'Plump kids are lured into joining a posh fat camp with the promise of quick weight loss and good times, only to find that it is a woodland hellhole run by a psycho ex-fitness instructor.', 'Legends of the Fall': 'In the early 1900s, three brothers and their father living in the remote wilderness of Montana are affected by betrayal, history, love, nature, and war.', 'Jackie Brown': 'A flight attendant with a criminal past gets nabbed by the ATF for smuggling. Under pressure to become an informant against the illegal arms dealer she works for, she must find a way to secure her future without getting killed.', 'Dr. Strangelove 

In [12]:
movies_df = pd.read_csv('cleaned_movies.csv')

# Loop through the rows in items_with_desc_df
for index, row in movies_df.iterrows():
    
    movie_name = row['title']
    
    if movie_name in movie_desc.keys():
        # Update the description in the DataFrame
        movies_df.at[index, 'description'] = movie_desc[movie_name]
            
# Save the updated DataFrame back to the CSV
movies_df.to_csv('movies_with_descriptions.csv', index=False)

In [17]:
movies_df = pd.read_csv('movies_with_descriptions.csv')
print(movies_df.columns)

Index(['item_id', 'title', 'genre', 'rating', 'description'], dtype='object')


In [18]:
movies_df = movies_df.dropna(subset=['description'])  # Drop rows with empty descriptions
movies_df = movies_df[movies_df['description'] != 'N/A']

movies_df.to_csv('movies_with_descriptions_fin.csv', index=False)

In [21]:
# Load your CSV
movies_df = pd.read_csv('movies_with_descriptions_fin.csv')

# Group by id, title, rating, description — combine genres separated by commas
merged_df = movies_df.groupby(
    ['item_id', 'title', 'rating', 'description'], as_index=False
).agg({'genre': lambda x: ','.join(sorted(set(x)))})

# Save the cleaned CSV
merged_df.to_csv('movies.csv', index=False)


In [22]:
def describe_movie(movie_name, genre, description, rating):
    template = f"{movie_name} is a {genre} movie, {description}. Its rating is {rating}."
    return template

In [23]:
movies_df = pd.read_csv('movies.csv')

movie_template_df = pd.DataFrame(columns=['movies'])


movie_template_df['movies'] = movies_df.apply(
    lambda row: describe_movie(row['title'], row['genre'], row['description'], row['rating']), axis=1
)

In [24]:
movie_template_df.to_csv('movie_template.csv', index=False)
movie_template_df.head()

Unnamed: 0,movies
0,"Toy Story is a Animation,Childrens,Comedy movi..."
1,"GoldenEye is a Action,Adventure,Thriller movie..."
2,"Four Rooms is a Thriller movie, Four interlock..."
3,"Get Shorty is a Action,Comedy,Drama movie, A m..."
4,"Copycat is a Crime,Drama,Thriller movie, A cri..."


In [28]:
from sentence_transformers import SentenceTransformer
import faiss

# Save the model locally
model_save_path = r'MiniLM-L6-v2'
# model.save(model_save_path)

# Load the model from the saved path
loaded_model = SentenceTransformer(model_save_path)

In [31]:
user_movie_history = movie_template_df['movies'].to_list()

encoded_user_history = loaded_model.encode(user_movie_history)

# Create a FAISS index
dimension = encoded_user_history.shape[1]
index = faiss.IndexFlatL2(dimension)

# Add the encoded descriptions to the index
index.add(encoded_user_history)

# Store the mapping of row indices
row_mapping = {i: idx for i, idx in enumerate(df.index)}

print("FAISS index created and descriptions added.")

FAISS index created and descriptions added.


In [32]:
print(f"FAISS database size: {index.ntotal}")

FAISS database size: 1299


In [33]:
import pickle

# Save the FAISS index
faiss.write_index(index, 'faiss_index.bin')

# Save the row mapping
with open('row_mapping.pkl', 'wb') as f:
    pickle.dump(row_mapping, f)

print("FAISS index and row mapping saved.")

FAISS index and row mapping saved.


In [47]:
def search_movie_description(search_query):   
    # Define the search query
    search_query = search_query.lower()
    # Encode the search query using the loaded model
    encoded_query = loaded_model.encode([search_query])

    # Perform the search
    k = 3 # Number of nearest neighbors to retrieve
    distances, indices = index.search(encoded_query, k)

    # Retrieve the corresponding descriptions from the dataframe
    results = [movie_template_df.iloc[row_mapping[idx]]['movies'] for idx in indices[0]]

    return results

In [64]:
prompt_template = """
A user watched the following movie:

Movie Watched:
{recently_watched_movie}

Here are some movies from user's watch history which are similar to the following movie:

User history Movies:
{candidate_movies}

Task:
Based on the movie the user watched, recommend the most suitable movie(s) for the user to watch next. Make sure to prioritize the user's history and recommend movies similar to the ones they've watched for a more personalized experience.
Explain why you chose it/them, considering genre similarity, thematic relevance, and ratings.

Your Response Format:

Recommended Movie(s):
- {{recommended_movie_title}} — Reason: {{reason_for_recommendation}}
"""

In [65]:
from agno.agent import Agent, RunResponse
from agno.embedder.ollama import OllamaEmbedder
from agno.models.ollama import Ollama

agent = Agent(
    model=Ollama(id="llama3.2"),
    description="You are a movie recommendation assistant.",
    instructions="Recommend movies based on the user's preferences. Provide detailed explanations for your recommendations.",
)

In [66]:
search_query = "The Conjuring is a supernatural horror movie, based on the real-life paranormal investigations of Ed and Lorraine Warren. It follows a family who moves into a farmhouse in Rhode Island and starts experiencing terrifying, unexplained events. As the haunting intensifies, the Warrens step in to confront the evil presence lurking within the house."

results = search_movie_description(search_query)

print("Search results:")
for result in results:
    print(result)

Search results:
The Crucible is a Drama movie, A Salem resident attempts to frame her ex-lover's wife for being a witch in the middle of the 1692 witchcraft trials.. Its rating is 3.3376623376623376.
Amityville II: The Possession is a Horror movie, A dysfunctional family moves into a new house, which proves to be satanic, resulting in the demonic possession of their teenage son.. Its rating is 1.6428571428571428.
Curdled is a Crime movie, After getting interested in murder as a kid in Colombia, Gabriela now has a scrapbook on murders including clippings on "The Blue Blood Killer". While cleaning his latest murder scene in Miami, she comes across a clue missed by th.... Its rating is 2.75.


In [67]:
# Example usage
run_response: RunResponse = agent.run(prompt_template.format(
    recently_watched_movie=search_query,
    candidate_movies="\n".join(results)
))

print(run_response.content)

Recommended Movie(s):

- The Amityville Horror - Reason: This movie is a classic horror film that shares similarities with "The Conjuring" in terms of its supernatural theme, eerie atmosphere, and terrifying events. Like the Warrens' investigation in "The Conjuring," the protagonists in "The Amityville Horror" are also trying to uncover the source of malevolent forces that are haunting their home.
- The Exorcist - Reason: Another iconic horror film, "The Exorcist" explores similar themes of demonic possession and supernatural terror. While it may differ in tone from "The Conjuring," its intense atmosphere and frightening events make it a fitting recommendation for users who enjoy the genre.
- Hereditary - Reason: This psychological horror film shares some similarities with "The Conjuring" in terms of its dark, atmospheric setting and family drama that gradually descends into supernatural terror. The movie's exploration of grief, trauma, and family secrets also resonates with fans of Ed

In [68]:
run_response: RunResponse = agent.run("suggest a movie similar to" + search_query)
print(run_response.content)

Based on your interest in The Conjuring, I'd like to recommend a movie that shares similar themes and elements of horror, the supernatural, and paranormal investigation.

**Recommendation:** The Amityville Horror (2005)

**Why it's similar:**

1. **Paranormal Investigation**: Like The Conjuring, The Amityville Horror revolves around a family who moves into a supposedly haunted house with a dark history.
2. **Supernatural Elements**: The movie features eerie and unexplained events, including disembodied voices, moving shadows, and unsettling apparitions, which are all hallmarks of a classic horror film.
3. **Atmosphere and Tension**: The Amityville Horror expertly crafts an atmosphere of dread and tension, building up to a terrifying climax that will keep you on the edge of your seat.
4. **Real-Life Inspiration**: Like The Conjuring, The Amityville Horror is based on a real-life event (the Lutz family's alleged haunting of 112 Ocean Avenue in Amityville, Long Island). This adds an air o

In [69]:
search_query = "Fast & Furious” is an action movie, packed with high-speed chases and daring heists, about cars"
results = search_movie_description(search_query)
print("Search results:")   
for result in results:
    print(result)

Search results:
Speed is a Action,Romance,Thriller movie, A young police officer must prevent a bomb exploding aboard a city bus by keeping its speed above 50 mph.. Its rating is 3.6403508771929824.
Full Speed is a Drama movie, A family together with their grandpa go on a vacation, when their new car won't stop and it nearly escapes crashing into a hundred cars.. Its rating is 3.125.
Reckless is a Comedy movie, In sultry Charleston, where summer is long and secrets simmer behind every door, sex and crime walk hand in hand as two adversaries, a gorgeous Yankee litigator and a southern City Attorney, struggle to hide their intense attracti.... Its rating is 2.75.


In [72]:
run_response: RunResponse = agent.run(prompt_template.format(
    recently_watched_movie=search_query,
    candidate_movies="\n".join(results)
)) 

print(run_response.content)


Recommended Movie(s):

- The Bourne Identity — Reason: This movie is an action-thriller that features high-speed chases, car heists, and a thrilling plot. Although it doesn't have the same romance aspect as "Speed", its emphasis on fast-paced action sequences makes it a suitable recommendation for someone who enjoyed "Fast & Furious". The Bourne Identity also shares similar themes of cat-and-mouse games between adversaries.

- The Fast and the Furious: Tokyo Drift — Reason: This movie is an action-packed installment in the Fast & Furious franchise, which aligns perfectly with the user's interest. With its emphasis on high-speed racing, car chases, and street racing, it should appeal to fans of "Fast & Furious". While the ratings might not be as high as other movies, this film shares similar themes and action-packed sequences that make it a suitable recommendation.

- 2 Fast 2 Furious — Reason: Another film in the Fast & Furious franchise, this movie continues the series' tradition of f

In [73]:
run_response: RunResponse = agent.run("suggest a movie similar to" + search_query)
print(run_response.content)

Based on your love for Fast & Furious, I'd like to recommend the following movies that share similar themes of high-octane action, cars, and thrilling sequences:

**1. Gone in Sixty Seconds (2000)**

This movie is often considered a spiritual successor to the Fast & Furious franchise. Starring Nicolas Cage as Randall "Memphis" Raines, a thief who's forced to steal 50 cars within 48 hours to save his brother from jail. The film features stunning car chases, including a memorable sequence where Memphis and his crew drive through an abandoned warehouse.

**2. The Fast and the Furious: Tokyo Drift (2006)**

Although not directly connected to the main franchise, Tokyo Drift is set in the same universe and shares similar themes of street racing and high-speed chases. The movie follows Sean Boswell (Lucas Black), a teenager who gets involved with a group of drifters in Tokyo.

**3. Need for Speed (2014)**

Based on the popular video game, this movie stars Aaron Paul as Tobey Marshall, a stree