## Loading the data

#### Imporing Libraries

In [1]:
import numpy as np
import pandas as pd

import warnings
warnings.simplefilter(action='ignore')

In [2]:
data = pd.read_csv('ratings.csv')
data.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


In [3]:
movie_titles_genre = pd.read_csv("movies.csv")
movie_titles_genre.head(10)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


In [4]:
data = data.merge(movie_titles_genre,on='movieId', how='left')
data.head(10)

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
5,1,70,3.0,964982400,From Dusk Till Dawn (1996),Action|Comedy|Horror|Thriller
6,1,101,5.0,964980868,Bottle Rocket (1996),Adventure|Comedy|Crime|Romance
7,1,110,4.0,964982176,Braveheart (1995),Action|Drama|War
8,1,151,5.0,964984041,Rob Roy (1995),Action|Drama|Romance|War
9,1,157,5.0,964984100,Canadian Bacon (1995),Comedy|War


## Feature Engineering

### Average Rating

In [5]:
Average_ratings = pd.DataFrame(data.groupby('title')['rating'].mean())
Average_ratings.head(10)

Unnamed: 0_level_0,rating
title,Unnamed: 1_level_1
'71 (2014),4.0
'Hellboy': The Seeds of Creation (2004),4.0
'Round Midnight (1986),3.5
'Salem's Lot (2004),5.0
'Til There Was You (1997),4.0
'Tis the Season for Love (2015),1.5
"'burbs, The (1989)",3.176471
'night Mother (1986),3.0
(500) Days of Summer (2009),3.666667
*batteries not included (1987),3.285714


### Total Number Of Rating 

##### The rating of a movie is proportional to the total number of ratings it has. Therefore, we will also consider the total ratings cast for each movie.

In [6]:
Average_ratings['Total Ratings'] = pd.DataFrame(data.groupby('title')['rating'].count())
Average_ratings.head(10)

Unnamed: 0_level_0,rating,Total Ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'71 (2014),4.0,1
'Hellboy': The Seeds of Creation (2004),4.0,1
'Round Midnight (1986),3.5,2
'Salem's Lot (2004),5.0,1
'Til There Was You (1997),4.0,2
'Tis the Season for Love (2015),1.5,1
"'burbs, The (1989)",3.176471,17
'night Mother (1986),3.0,1
(500) Days of Summer (2009),3.666667,42
*batteries not included (1987),3.285714,7


## Building The Recommender

### Calculating The Correlation

##### We will create a table where the rows are userIds and the columns represent the movies. The values of the matrix represent the rating for each movie by each user.

In [7]:
movie_user = data.pivot_table(index='userId',columns='title',values='rating')
movie_user.head(10)

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
7,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,,...,,,,,,,,,,
9,,,,,,,,,,,...,,,,,,,1.0,,,
10,,,,,,,,,,,...,,,,,,,,,,


##### Now we need to select a movie to test our recommender system.

###### Corrwith method computes the pairwise correlation between rows or columns of a DataFrame with rows or columns of Series or DataFrame. 

In [8]:
correlations = movie_user.corrwith(movie_user['Seven (a.k.a. Se7en) (1995)'])
correlations.head()

title
'71 (2014)                                NaN
'Hellboy': The Seeds of Creation (2004)   NaN
'Round Midnight (1986)                    NaN
'Salem's Lot (2004)                       NaN
'Til There Was You (1997)                 NaN
dtype: float64

##### Now we will remove all the empty values and merge the total ratings to the correlation table.

In [9]:
recommendation = pd.DataFrame(correlations,columns=['Correlation'])
recommendation.dropna(inplace=True)
recommendation = recommendation.join(Average_ratings['Total Ratings'])
recommendation.head()

Unnamed: 0_level_0,Correlation,Total Ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
"'burbs, The (1989)",0.175075,17
(500) Days of Summer (2009),0.088741,42
10 Cloverfield Lane (2016),-0.107096,14
10 Things I Hate About You (1999),-0.215209,54
"10,000 BC (2008)",0.403631,17


### Testing The Recommendation System

##### Let’s filter all the movies with a correlation value to Seven movie and with at least 100 ratings.

In [10]:
recc = recommendation[recommendation['Total Ratings']>100].sort_values('Correlation',ascending=False).reset_index()

##### Let’s also merge the movies dataset for verifying the recommendations.

In [11]:
recc = recc.merge(movie_titles_genre,on='title', how='left')
recc.head(10)

Unnamed: 0,title,Correlation,Total Ratings,movieId,genres
0,Seven (a.k.a. Se7en) (1995),1.0,203,47,Mystery|Thriller
1,Good Will Hunting (1997),0.514347,141,1704,Drama|Romance
2,Fight Club (1999),0.510702,218,2959,Action|Crime|Drama|Thriller
3,Reservoir Dogs (1992),0.494351,131,1089,Crime|Mystery|Thriller
4,Saving Private Ryan (1998),0.437833,188,2028,Action|Drama|War
5,Eternal Sunshine of the Spotless Mind (2004),0.434593,131,7361,Drama|Romance|Sci-Fi
6,Memento (2000),0.42405,159,4226,Mystery|Thriller
7,"Lord of the Rings: The Return of the King, The...",0.42057,185,7153,Action|Adventure|Drama|Fantasy
8,"Truman Show, The (1998)",0.41861,125,1682,Comedy|Drama|Sci-Fi
9,"Godfather, The (1972)",0.403199,192,858,Crime|Drama


##### We can see that the top recommendations are pretty good. The movie that has the highest/full correlation to Seven is Seven itself. The movies such as Good Will Hunting (1997), Fight Club (1999) and Reservoir Dogs (1992)  are the next top high correlation with Seven and recommended to watch if you liked Seven.