# Movie Recommender System
This project is created by Anuja Alice Thomas.

The objective of this project is to create a movie recommendation system based on the 'movies' and 'ratings' dataset.

In this project we focus on 3 types of movie recommendation techniques. They are:
1. Popularity Based
2. Content Based
3. Collaborative

First we perform some initial pre-processing and exploratory data analysis for our datasets.

In [1]:
#import required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics.pairwise import cosine_similarity

import ipywidgets as widgets
from IPython.display import display

In [2]:
#loading datasets
movies = pd.read_csv('movies.csv')
rating = pd.read_csv('ratings.csv')

In [3]:
#movies
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
10324,146684,Cosmic Scrat-tastrophe (2015),Animation|Children|Comedy
10325,146878,Le Grand Restaurant (1966),Comedy
10326,148238,A Very Murray Christmas (2015),Comedy
10327,148626,The Big Short (2015),Drama


In [4]:
#ratings
rating

Unnamed: 0,userId,movieId,rating,timestamp
0,1,16,4.0,1217897793
1,1,24,1.5,1217895807
2,1,32,4.0,1217896246
3,1,47,4.0,1217896556
4,1,50,4.0,1217896523
...,...,...,...,...
105334,668,142488,4.0,1451535844
105335,668,142507,3.5,1451535889
105336,668,143385,4.0,1446388585
105337,668,144976,2.5,1448656898


In [5]:
ratings = pd.DataFrame(rating.groupby('movieId')['rating'].mean()).reset_index()
ratings.shape

(10325, 2)

In [6]:
movies.drop_duplicates(inplace = True)

In [7]:
print(movies.isnull().sum())
print(ratings.isnull().sum())

movieId    0
title      0
genres     0
dtype: int64
movieId    0
rating     0
dtype: int64


In [8]:
df = movies.merge(ratings, how='left', on='movieId')
df.isnull().sum()

movieId    0
title      0
genres     0
rating     4
dtype: int64

In [9]:
no_ratings = df[df['rating'].isna()]
no_ratings['title'].values

array(["Intolerance: Love's Struggle Throughout the Ages (1916)",
       'Early Summer (Bakushû) (1951)', 'Bratz: The Movie (2007)',
       'Johnny Express (2014)'], dtype=object)

In [10]:
avg_ratings = pd.DataFrame(df.groupby('title')['rating'].mean()).reset_index()

print(avg_ratings[avg_ratings['title'].isin(no_ratings['title'].values)])

                                                  title  rating
1367                            Bratz: The Movie (2007)     NaN
2789                      Early Summer (Bakushû) (1951)     NaN
4679  Intolerance: Love's Struggle Throughout the Ag...     NaN
4874                              Johnny Express (2014)     NaN


There is no other ratings available for these 4 movies. The average rating can't be filled. So they remain as null values.

In [11]:
df

Unnamed: 0,movieId,title,genres,rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3.907328
1,2,Jumanji (1995),Adventure|Children|Fantasy,3.353261
2,3,Grumpier Old Men (1995),Comedy|Romance,3.189655
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,2.818182
4,5,Father of the Bride Part II (1995),Comedy,3.250000
...,...,...,...,...
10324,146684,Cosmic Scrat-tastrophe (2015),Animation|Children|Comedy,4.000000
10325,146878,Le Grand Restaurant (1966),Comedy,2.500000
10326,148238,A Very Murray Christmas (2015),Comedy,3.000000
10327,148626,The Big Short (2015),Drama,4.333333


## 1. Popularity Based Recommender System

This recommender system takes 3 inputs, the genre, minimum rating (g, t, N) and number of recommendations.

It returns 'N' number of movies with 't' minimum rating and genre 'g'

In [12]:
genres_df = df['genres'].str.get_dummies('|')
df1 = pd.concat([df[['movieId','title','genres','rating']], genres_df], axis = 1)
movie_genre_df = pd.concat([df['movieId'], genres_df], axis=1).set_index('movieId')

In [13]:
g = input("Enter the genre: ")
t = float(input("Enter minimum rating: "))
N = int(input("Enter number of recommendations: "))
  
def popular_movies(g,t,N):
  data_filter1 = (df1.loc[(df1[g] == 1) & (df1['rating'] >=t)])
  data_filter1.sort_values('rating', ascending=False, inplace = True)
  popular_movies = data_filter1[['title', 'genres', 'rating']].iloc[0:N,].reset_index()
  popular_movies.index = popular_movies.index + 1
  return popular_movies

popular_movies(g,t,N)

Enter the genre: Adventure
Enter minimum rating: 3.4
Enter number of recommendations: 11


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_filter1.sort_values('rating', ascending=False, inplace = True)


Unnamed: 0,index,title,genres,rating
1,10285,Everything's Gonna Be Great (1998),Adventure|Children|Comedy|Drama,5.0
2,4111,Iron Will (1994),Adventure,5.0
3,8988,Mothra (Mosura) (1961),Adventure|Fantasy|Sci-Fi,5.0
4,9674,Ivan Vasilievich: Back to the Future (Ivan Vas...,Adventure|Comedy,5.0
5,6376,Interstella 5555: The 5tory of the 5ecret 5tar...,Adventure|Animation|Fantasy|Musical|Sci-Fi,5.0
6,6459,"Chase, The (1994)",Action|Adventure|Comedy|Crime|Romance|Thriller,5.0
7,4337,"Friend Is a Treasure, A (Chi Trova Un Amico, T...",Action|Adventure|Comedy,5.0
8,3896,Topkapi (1964),Adventure|Comedy|Thriller,5.0
9,7227,"Plague Dogs, The (1982)",Adventure|Animation|Drama,5.0
10,5022,Interstate 60 (2002),Adventure|Comedy|Drama|Fantasy|Mystery|Sci-Fi|...,5.0



<!-- df['genres'].unique()
 genres = pd.DataFrame(df['genres'])
 array = []
 for i in range(0,len(genres['genres'])):
   array.append(genres['genres'][i].split('|'))
 array

 l = []
 for lists in array:
   for genre in lists:
     l.append(genre)
 genre_types = sorted(list(set(l)))
 genre_types

 for i in genre_types:
   df[i] = None

 for index, row in df[genre_types].iterrows():
   for gtype in genre_types:
     if gtype in (array[index]):
       df.at[index, gtype] = 1
 df

 df[genre_types] = df[genre_types].fillna(value=0)

 df -->

## 2. Content-Based Recommender System
This recommendation system intakes a movie name and returns N number of similar movies. It works based on cosine similarity.

In [14]:
movie_title = input("Enter movie to search for: ")
N = int(input("Enter the number of recommendations: "))

Enter movie to search for: Father of the Bride Part II (1995)
Enter the number of recommendations: 3


In [15]:
def content_recommender(movie_title, N):
    # Get index of movie title
    idx = df[df['title'] == movie_title].index[0]

    #finding cosine similarity between movies
    cosine_sim = cosine_similarity(movie_genre_df)
    
    similar_movies_indices = cosine_sim[idx].argsort()[::-1][1:N+1]
    similar_movies = df.loc[similar_movies_indices, 'title']
    similar_movies = pd.DataFrame(similar_movies).reset_index(drop = True)
    similar_movies.index+=1
    return similar_movies
    
content_recommender(movie_title, N)

Unnamed: 0,title
1,"Associate, The (1996)"
2,Our Idiot Brother (2011)
3,Baby Boom (1987)


## 3. Collaborative Recommender System

This system recommends top N movies
based on “K” similar users for a target user “u”

In [16]:
target_user = 1
N = 5
k = 100

In [17]:
def collaborative():
  user_item_matrix = rating.pivot(index='userId', columns='movieId', values='rating')
  #identifying similar users
  user_similarities = user_item_matrix.corr(method='pearson')
  similar_users = user_similarities[target_user].drop(target_user).sort_values(ascending=False)[:k].index

  l = []
  for i in similar_users:
    l.append(i)

  a = rating.loc[l]
  weighted_ratings = user_item_matrix.loc[a.userId.unique()].mean().fillna(0)

  already_rated = user_item_matrix.loc[target_user, :].notnull()
  weighted_ratings[already_rated] = 0
  
  #printing top N movies based on collaborative filtering
  top_N_movies = weighted_ratings.sort_values(ascending=False).index[:N]


  print("Top {} movie recommendations for user {} based on {} similar users:".format(N, target_user, k))
  for i, movie_id in enumerate(top_N_movies):
      print("{}. {}".format(i+1, movie_id))

collaborative()


Top 5 movie recommendations for user 1 based on 100 similar users:
1. 25961
2. 1546
3. 3525
4. 49817
5. 25795


## IPyWidgets







In [18]:
# style = {'description_width': 'initial'}
# g = widgets.Text(description = "Genre: ", style = style)
# t = widgets.FloatSlider(description = "Rating: ", min = 0, max =5, step = 0.05, style = style)
# N = widgets.IntText(description = "No. of Recommendation: ", style = style)
# button = widgets.Button(description = "OK", style = {'button_color':'lightgreen'})

# def clicked_button(b):
# response1 =[g.value,t.value,N.value]
# print(response1)

# button.on_click(clicked_button)

# popularmovie_interface = widgets.VBox([g,t,N, button])
# popularmovie_interface