David Culhane
<br>
<br>
**Movie Recommender System**
<br>
<br>
This project looks to build a movie recommender system to be built from the MovieLens dataset. The data consists of four CSVs. The movies.csv file has movie IDs, titles, and genres. The links CSV has movie IDs, IMDB IDs, and TMDB IDs. The tags CSV has user IDs, movie IDs, user tags, and timestamps for when the tag was applied by the user. The ratings CSV has user IDs, movie IDs, ratings, and timestamps for when the rating was given by the user. The timestamps are the integer numbers of seconds since January 1st, 1970.
<br>
<br>
The recommender system to be built will want to accept the name of a movie supplied by the user and then supply 10 recommendations for movies to watch. It will follow the system detailed at https://analyticsindiamag.com/ai-mysteries/how-to-build-your-first-recommender-system-using-python-movielens-dataset/, part of this week's readings. In order to make sure that user input doesn't have to exactly match the titles in the data though, Jaccard Similarity will be calucalted using the user's input and each title in the data. This similarity will then select a title within the data to acquire recommendations.

In [1]:
import pandas as pd

In [3]:
# Loading the CSVs
movies = pd.read_csv('movies.csv')
ratings = pd.read_csv('ratings.csv')
tags = pd.read_csv('tags.csv')
links = pd.read_csv('links.csv')

**Building the Dataset**
<br>
<br>
The movie recommender will want to use the ratings provided by users to select the movies. Ideally, suggestions will be made based off ratings provided by common users - if a user asks for recommendations based on movie A, the recommendations should be movies that were rated highly by users who also rated movie A highly. So we will want to merge the movies and ratings dataframes.

In [6]:
# Merging the ratings and movies dataframes
data = ratings.merge(movies, on='movieId', how='left')
data

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
...,...,...,...,...,...,...
100831,610,166534,4.0,1493848402,Split (2017),Drama|Horror|Thriller
100832,610,168248,5.0,1493850091,John Wick: Chapter Two (2017),Action|Crime|Thriller
100833,610,168250,5.0,1494273047,Get Out (2017),Horror
100834,610,168252,5.0,1493846352,Logan (2017),Action|Sci-Fi


With the merged dataframe created, we will also want to organize information about the average rating of each movie and the number of times. This information can be used later 

In [9]:
# Creating the dataframe for average and total number of ratings for each movie
avg_rating = pd.DataFrame(data.groupby('title')['rating'].mean())
avg_rating['total ratings'] = data.groupby('title')['rating'].count()

In [67]:
display(avg_rating)

Unnamed: 0_level_0,rating,total ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'71 (2014),4.000000,1
'Hellboy': The Seeds of Creation (2004),4.000000,1
'Round Midnight (1986),3.500000,2
'Salem's Lot (2004),5.000000,1
'Til There Was You (1997),4.000000,2
...,...,...
eXistenZ (1999),3.863636,22
xXx (2002),2.770833,24
xXx: State of the Union (2005),2.000000,5
¡Three Amigos! (1986),3.134615,26


Lastly, we will want a dataframe that organizes the users and all reviews for each movie. If a user has not reviewed a particular movie, that data can be left blank.

In [11]:
# Creating the dataframe with each user, their review scores, and all movies.
user_reviews = data.pivot_table(index='userId', columns='title', values='rating')

**User-Interactive Functions**
<br>
<br>
For this process to take user input, we will want to provide users something where all they have to do is type the name of a movie. We don't know what title the user will input and the list of movie titles is large. In order to make this work, we will need to see which movie title is most similar to the user's input. So we will calculate the Jaccard similarity for the user's input and each title in the data. These can then be sorted to select the title with the highest similarity and use that title to find the recommendations.


In [71]:
def jts(input, title):  # Jaccard Title Similarity Function
    # Creating the sets by tokenizing the user input and a title
    set1 = input.lower().split()
    set2 = title.lower().split()
    # Calculating the Jaccard Similarity
    intersection = len(list(set(set1).intersection(set2)))
    union = (len(set(set1)) + len(set(set2))) - intersection
    jaccard = float(intersection) / union
    return jaccard

In [69]:
def rec_getter(title):  
    # Finding the title most similar to the user's input
    similarities = []  # Initializing list to hold
    i = 0  # Initializing loop index
    while i < len(movies):
        score = jts(title, movies.title[i])
        similarities.append((score, movies.title[i]))
        i += 1
    similarities.sort(reverse=True)
    # Selecting the most similar movie title from the list of tuples
    selection = similarities[0][1]
    print('Acquiring movie recommendations similar to', selection)
    # Getting the recommendations
    correlations = user_reviews.corrwith(user_reviews[selection])
    # Making the recommendations dataframe
    recs = pd.DataFrame(correlations, columns=['Correlation'])  
    recs.dropna(inplace=True)  # Dropping NA values
    # Merging with the remaining titles
    recs = recs.join(avg_rating['total ratings'])  
    # Setting review count floor, sorting, and re-indexing
    recs = recs[recs['total ratings'] > 100].sort_values(
        'Correlation', ascending=False).reset_index()  
    # Merging the movies data to present genres and movieIDs
    recs = recs.merge(movies, on='title', how='left')
    return recs[1:11]

With the recommending function created, we will now want to make the user-interactive portion. Users will need to be presented with a valid list of options since this method is clunky at best. A main mathod will be used and sentinel values to loop the recommendation process will also be implemented.

In [73]:
def main():
    loop_state = 0
    print('Welcome to the Movie Recommender!')

    while loop_state != 'no':
        try:  # Try Block to test user inputs and then get recommendations
            title = input(
                'What movie would you like recommendations based off of? \n\
                Please match the title exactly from the list above.')
            rec_movies = rec_getter(title)
            display(rec_movies)
            loop_state = input(
                "Would you like to look up another movie? Type 'no' to stop.")
            loop_state = loop_state.lower()
        except:
            print('Invalid Input')



In [75]:
if __name__ == '__main__':  # Executing Main Method 
    main()

Welcome to the Movie Recommender!


What movie would you like recommendations based off of? 
            Please match the title exactly from the list above. stargate


Acquiring movie recommendations similar to Stargate (1994)


  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)
  c /= stddev[:, None]
  c /= stddev[None, :]


Unnamed: 0,title,Correlation,total ratings,movieId,genres
1,X-Men (2000),0.45595,133,3793,Action|Adventure|Sci-Fi
2,There's Something About Mary (1998),0.444863,105,1923,Comedy|Romance
3,"Lord of the Rings: The Two Towers, The (2002)",0.394109,188,5952,Adventure|Fantasy
4,Up (2009),0.383291,105,68954,Adventure|Animation|Children|Drama
5,"Shining, The (1980)",0.365455,109,1258,Horror
6,Twister (1996),0.352594,123,736,Action|Adventure|Romance|Thriller
7,Waterworld (1995),0.348821,115,208,Action|Adventure|Sci-Fi
8,Mrs. Doubtfire (1993),0.34605,144,500,Comedy|Drama
9,Indiana Jones and the Temple of Doom (1984),0.342278,108,2115,Action|Adventure|Fantasy
10,Independence Day (a.k.a. ID4) (1996),0.331652,202,780,Action|Adventure|Sci-Fi|Thriller


Would you like to look up another movie? Type 'no' to stop. yes
What movie would you like recommendations based off of? 
            Please match the title exactly from the list above. toy story


Acquiring movie recommendations similar to Toy Story (1995)


  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)
  c /= stddev[:, None]
  c /= stddev[None, :]


Unnamed: 0,title,Correlation,total ratings,movieId,genres
1,"Incredibles, The (2004)",0.643301,125,8961,Action|Adventure|Animation|Children|Comedy
2,Finding Nemo (2003),0.618701,141,6377,Adventure|Animation|Children|Comedy
3,Aladdin (1992),0.611892,183,588,Adventure|Animation|Children|Comedy|Musical
4,"Monsters, Inc. (2001)",0.490231,132,4886,Adventure|Animation|Children|Comedy|Fantasy
5,Mrs. Doubtfire (1993),0.446261,144,500,Comedy|Drama
6,"Amelie (Fabuleux destin d'Amélie Poulain, Le) ...",0.438237,120,4973,Comedy|Romance
7,American Pie (1999),0.420117,103,2706,Comedy|Romance
8,Die Hard: With a Vengeance (1995),0.410939,144,165,Action|Crime|Thriller
9,E.T. the Extra-Terrestrial (1982),0.409216,122,1097,Children|Drama|Sci-Fi
10,Home Alone (1990),0.408444,116,586,Children|Comedy


Would you like to look up another movie? Type 'no' to stop. iron man
What movie would you like recommendations based off of? 
            Please match the title exactly from the list above. iron man


Acquiring movie recommendations similar to Iron Man (2008)


  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)
  c *= np.true_divide(1, fact)
  c /= stddev[:, None]
  c /= stddev[None, :]


Unnamed: 0,title,Correlation,total ratings,movieId,genres
1,Spider-Man (2002),0.49585,122,5349,Action|Adventure|Sci-Fi|Thriller
2,Batman Begins (2005),0.490347,116,33794,Action|Crime|IMAX
3,Up (2009),0.484231,105,68954,Adventure|Animation|Children|Drama
4,"Monsters, Inc. (2001)",0.456854,132,4886,Adventure|Animation|Children|Comedy|Fantasy
5,"Fugitive, The (1993)",0.451236,190,457,Thriller
6,"Lion King, The (1994)",0.439399,172,364,Adventure|Animation|Children|Drama|Musical|IMAX
7,WALL·E (2008),0.43856,104,60069,Adventure|Animation|Children|Romance|Sci-Fi
8,"Matrix, The (1999)",0.433419,278,2571,Action|Sci-Fi|Thriller
9,Men in Black (a.k.a. MIB) (1997),0.423156,165,1580,Action|Comedy|Sci-Fi
10,"Dark Knight, The (2008)",0.41747,149,58559,Action|Crime|Drama|IMAX


Would you like to look up another movie? Type 'no' to stop. no
