# Caroline Barte
# DSC 630 Week 8

## References

For this assignment, I am using the article 'How To Build Your First Recommender System Using Python & MovieLens Dataset' from Analytics India Mag and 'The cosine similarity and its use in recommendation systems' from Medium.com. I have attached references below. 

Analytics India Magazine (2024). 'How To Build Your First Recommender System Using Python & MovieLens Dataset' https://analyticsindiamag.com/ai-mysteries/how-to-build-your-first-recommender-system-using-python-movielens-dataset/

Naomy Duarte Gomes (2023). 'The cosine similarity and its use in recommendation systems' https://naomy-gomes.medium.com/the-cosine-similarity-and-its-use-in-recommendation-systems-cb2ebd811ce1

## Imports And Dataset

In [65]:
import numpy as np
import pandas as pd
from fuzzywuzzy import process 

In [60]:
# Importing the datasets
df = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')

In [61]:
df.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


In [62]:
movies.head(10)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


## Code for Recommender System

In [63]:
# Merging the 2 datasets to create one
df = df.merge(movies,on='movieId', how='left')

In [64]:
df.head(10)

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
5,1,70,3.0,964982400,From Dusk Till Dawn (1996),Action|Comedy|Horror|Thriller
6,1,101,5.0,964980868,Bottle Rocket (1996),Adventure|Comedy|Crime|Romance
7,1,110,4.0,964982176,Braveheart (1995),Action|Drama|War
8,1,151,5.0,964984041,Rob Roy (1995),Action|Drama|Romance|War
9,1,157,5.0,964984100,Canadian Bacon (1995),Comedy|War


In [67]:
# Group the ratings and calculate the average for each userID and title
df = df.groupby(['userId', 'title']).rating.mean().reset_index()

In [68]:
# Create a matrix for the user and title using the ratings
user_movie_matrix = df.pivot(index='userId', columns='title', values='rating')

In [69]:
# Ensuring the dataset is clean by filling NaN values with 0. This is beneficial in case some movies are unrated.
user_movie_matrix = user_movie_matrix.fillna(0)

In [70]:
# I am using cosine similarity on the matrix to calculate which movie matches
movie_similarity = cosine_similarity(user_movie_matrix.T)

In [71]:
# Convert cosine similarity matrix to a DataFrame
movie_similarity_df = pd.DataFrame(movie_similarity, index=user_movie_matrix.columns, columns=user_movie_matrix.columns)

In [79]:
# I am making two functions, one to find the matching title and one to find similar movies
def find_closest_title(title, movie_titles):
    closest_match = process.extractOne(title, movie_titles)
    return closest_match[0] if closest_match else None

def get_movie_recommendations(user_title, similarity_df=movie_similarity_df):
    closest_title = find_closest_title(user_title, similarity_df.columns)
    if closest_title is None:
        return "No matching movie found in the dataset."
        
    # Get similarity scores and include the 10 most similar movies
    sim_scores = similarity_df[closest_title].sort_values(ascending=False)[1:11]
    return sim_scores.index.tolist(), closest_title

In [80]:
# Create a way for users to input a title into the code and return recommendations
user_title = input("Welcome! Please enter a movie: ")
recommendations, matched_title = get_movie_recommendations(user_title)
print(f"Here are your recommended movies similar to '{matched_title}':", recommendations)

Welcome! Please enter a movie:  ghostbusters


Here are your recommended movies similar to 'Ghostbusters (2016)': ['Bad Moms (2016)', 'How to Be Single (2016)', 'Masterminds (2016)', 'A Million Ways to Die in the West (2014)', 'Friends with Kids (2011)', 'The Jungle Book (2016)', 'The Boss (2016)', 'Movie 43 (2013)', 'Safety Not Guaranteed (2012)', 'The Secret Life of Pets (2016)']
