# Movie Recommendation

The aim of this project is to create a system that will recommend a movie based off of a genre that a user submits. Then the user should be able to select a movie and get recommendations based off of that movie.

## 1. Import Libraries and Load Dataset

In [42]:
# Import required libraries
import pandas as pd
import os
import ast
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# load the dataset
movies_path = os.path.join("data", "movies_metadata.csv")
df = pd.read_csv(movies_path, low_memory=False)

# Check basic Information
df[['title', 'overview', 'genres', 'popularity']].head()


Unnamed: 0,title,overview,genres,popularity
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",150.437577
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",139.082615
2,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",107.376788
3,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",112.31295
4,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",43.926995


## 2. Cleaning up the data

In [43]:
# Drop missing overviews
df = df.dropna(subset=['overview'])

# Reset index
df = df.reset_index(drop=True)

## 3. Parse the genres Column

In [44]:
# Safely convert stringified lists of dicts to real lists of dicts
def parse_genres(genre_str):
    try:
        genres = ast.literal_eval(genre_str)
        return [genre['name'] for genre in genres]
    except (ValueError, SyntaxError):
        return []

# Apply parsing to the column
df['genre_list'] = df['genres'].apply(parse_genres)

# Take a look
df[['title', 'genre_list']].head()


Unnamed: 0,title,genre_list
0,Avatar,"[Action, Adventure, Fantasy, Science Fiction]"
1,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]"
2,Spectre,"[Action, Adventure, Crime]"
3,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]"
4,John Carter,"[Action, Adventure, Science Fiction]"


## 4. View All Unique Genres

In [45]:
# Flatten list of genres across all movies
from itertools import chain

all_genres = set(chain.from_iterable(df['genre_list']))
print(sorted(all_genres))


['Action', 'Adventure', 'Animation', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Family', 'Fantasy', 'Foreign', 'History', 'Horror', 'Music', 'Mystery', 'Romance', 'Science Fiction', 'TV Movie', 'Thriller', 'War', 'Western']


## 5. Create a Genre Filter Function

Simple function that returns movies that match a genre based on the ratings of the movies

In [46]:
def get_top_movies_by_genre(genre, limit=10):
    matches = df[df['genre_list'].apply(lambda x: genre in x)]
    matches = matches.sort_values(by='vote_average', ascending=False)
    return matches[['title', 'overview', 'vote_average']].head(limit)

## 6. Testing

In [47]:
get_top_movies_by_genre("Comedy")

Unnamed: 0,title,overview,vote_average
4044,"Dancer, Texas Pop. 81","Four guys, best friends, have grown up togethe...",10.0
3518,Stiff Upper Lips,Stiff Upper Lips is a broad parody of British ...,10.0
4659,Little Big Top,An aging out of work clown returns to his smal...,10.0
4245,Me You and Five Bucks,"A womanizing yet lovable loser, Charlie, a wai...",10.0
2969,There Goes My Baby,A group of high school seniors meets in the su...,8.5
809,Forrest Gump,A man with a low IQ has accomplished great thi...,8.2
3905,The Apartment,Bud Baxter is a minor clerk in a huge New York...,8.1
3040,Love Jones,Darius Lovehall is a young black poet in Chica...,8.1
4236,Modern Times,The Tramp struggles to live in modern industri...,8.1
4171,Dr. Strangelove or: How I Learned to Stop Worr...,Insane General Jack D. Ripper initiates a nucl...,8.0


## Incorporating a More Like This Feature

## 1. TF-IDF Vectorization
Using tfidfVectorizer to convert text into feature vectors.

In [48]:
# Replace NaNs with empty strings
df['overview'] = df['overview'].fillna('')

# TF-IDF Vectorizer
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['overview'])

# Cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

## 2. Map Movie Titles to Index
Using mapping so we can look up the row index of a movie by its title:

In [49]:
# Map titles to their index
indices = pd.Series(df.index, index=df['title']).drop_duplicates()

## 3. Defining the Function

This function will:
- Look up a movie by title
- Use cosine similarity to find similar movies
- Return the top N recommendations

In [50]:
def get_more_like_this(title, top_n=10):
    # Get index of the movie
    idx = indices.get(title)

    if idx is None:
        return f"Movie titled '{title}' not found in the dataset."

    # Get similarity scores for this movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort by similarity score (highest first), skip the movie itself
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]

    # Get the indices of the top matches
    movie_indices = [i[0] for i in sim_scores]

    return df[['title', 'overview']].iloc[movie_indices]


## 4. Testing Function

In [51]:
get_more_like_this("Shin Godzilla")

Unnamed: 0,title,overview
3579,Doug's 1st Movie,Doug and his pal Skeeter set's out to find the...
294,Epic,A teenager finds herself transported to a deep...
1287,A Monster in Paris,"Paris,1910. Emile, a shy movie projectionist, ..."
3034,Reno 911!: Miami,A rag-tag team of Reno cops are called in to s...
3825,Chain of Command,After finding his brother murdered after retur...
165,Hulk,"Bruce Banner, a genetics researcher with a tra..."
1451,Zoom,Jack Shepard is an out-of-shape auto shop owne...
3304,The Blood of Heroes,Set in a futuristic world where the only sport...
2294,Spirited Away,A ten year old girl who wanders away from her ...
4238,My Name Is Bruce,B Movie Legend Bruce Campbell is mistaken for ...
