# Movie Recommendation System with Collaborative Filtering
In this project I used the **collaborative filtering** method to create a movie recommendation system based on user ratings of a particular movie.

Import the required libraries

In [1]:
import pandas as pd

In [3]:
ratings = pd.read_csv('ratings.csv', index_col = 0) We set the index column to the first column at index 0
ratings.head()

Unnamed: 0,Avengers,The_Incredibles,The_Lion_King,Dumbo,Frozen,Ponyo
user 1,4.0,5.0,3.0,,2.0,1.0
user 2,5.0,3.0,3.0,2.0,2.0,
user 3,1.0,,,4.0,5.0,4.0
user 4,,2.0,1.0,4.0,,3.0
user 5,1.0,,2.0,3.0,3.0,4.0


There are NaN values meaning that that the user did not rate that particular movie so we change the NaN to 0.

In [5]:
ratings = ratings.fillna(0)
ratings.head()

Unnamed: 0,Avengers,The_Incredibles,The_Lion_King,Dumbo,Frozen,Ponyo
user 1,4.0,5.0,3.0,0.0,2.0,1.0
user 2,5.0,3.0,3.0,2.0,2.0,0.0
user 3,1.0,0.0,0.0,4.0,5.0,4.0
user 4,0.0,2.0,1.0,4.0,0.0,3.0
user 5,1.0,0.0,2.0,3.0,3.0,4.0


We need to standardize the ratings because even though a user did not rate a movie it doesn't mean the movie has a score of 0. So the best thing is to standardize the rating scores with the function below.

In [8]:
def standardize(row):
    new_row = (row-row.mean())/(row.max()-row.min())
    return new_row

ratings_std = ratings.apply(standardize)
ratings_std

Unnamed: 0,Avengers,The_Incredibles,The_Lion_King,Dumbo,Frozen,Ponyo
user 1,0.36,0.6,0.4,-0.65,-0.08,-0.35
user 2,0.56,0.2,0.4,-0.15,-0.08,-0.6
user 3,-0.24,-0.4,-0.6,0.35,0.52,0.4
user 4,-0.44,0.0,-0.266667,0.35,-0.48,0.15
user 5,-0.24,-0.4,0.066667,0.1,0.12,0.4


In [9]:
from sklearn.metrics.pairwise import cosine_similarity


The function imported above calculates the similarity of values row wise using the cosine method.

In [28]:
ratings_std = ratings_std.T
ratings_std

Unnamed: 0,Avengers,The_Incredibles,The_Lion_King,Dumbo,Frozen,Ponyo
user 1,0.36,0.6,0.4,-0.65,-0.08,-0.35
user 2,0.56,0.2,0.4,-0.15,-0.08,-0.6
user 3,-0.24,-0.4,-0.6,0.35,0.52,0.4
user 4,-0.44,0.0,-0.266667,0.35,-0.48,0.15
user 5,-0.24,-0.4,0.066667,0.1,0.12,0.4


If you want to use **user collaborative filtering** then there's no need to transpose the data but in this case we are using item-based collaborative filtering.

In [11]:
movie_similarity = cosine_similarity(ratings_std)
movie_similarity

array([[ 1.        ,  0.70668875,  0.81368151, -0.79941088, -0.02539184,
        -0.91410609],
       [ 0.70668875,  1.        ,  0.72310153, -0.84515425, -0.5189993 ,
        -0.84337386],
       [ 0.81368151,  0.72310153,  1.        , -0.84794611, -0.3799803 ,
        -0.80218063],
       [-0.79941088, -0.84515425, -0.84794611,  1.        ,  0.14803913,
         0.72374686],
       [-0.02539184, -0.5189993 , -0.3799803 ,  0.14803913,  1.        ,
         0.39393939],
       [-0.91410609, -0.84337386, -0.80218063,  0.72374686,  0.39393939,
         1.        ]])

We convert the above numpy array to a dataframe for easy reading.

In [12]:
movie_similarity_df = pd.DataFrame(movie_similarity, index = ratings.columns, columns = ratings.columns)
movie_similarity_df

Unnamed: 0,Avengers,The_Incredibles,The_Lion_King,Dumbo,Frozen,Ponyo
Avengers,1.0,0.706689,0.813682,-0.799411,-0.025392,-0.914106
The_Incredibles,0.706689,1.0,0.723102,-0.845154,-0.518999,-0.843374
The_Lion_King,0.813682,0.723102,1.0,-0.847946,-0.37998,-0.802181
Dumbo,-0.799411,-0.845154,-0.847946,1.0,0.148039,0.723747
Frozen,-0.025392,-0.518999,-0.37998,0.148039,1.0,0.393939
Ponyo,-0.914106,-0.843374,-0.802181,0.723747,0.393939,1.0


Using the avengers movie as an example we can see from the table that avengers is similar to the incredibles with a similarity rating 0.706689 and also similar to the lion king with a rating of 0.813682 because they are action movies. 

But you can see how dissimilar it is to the movie Dumbo,Frozen  and Ponyo with negative similarity rating values.

In [20]:
def get_similarity(movie_name, rating):
    similar_score = movie_similarity_df[movie_name]*(rating-2.5)
    
    similar_score = similar_score.sort_values(ascending = False)
    
    return similar_score

What the above function does is that it takes 2 arguments from the user namely movie_name and the rating it gave that particular movie. It multiplies the similarity score of the movie the user inputted with the rating of that particular movie and returns all the movies with their similarity score*rating of user in a sorted list.

The (rating-2.5) sets our threshold. Since the range of the rating is 5 we can use 2.5 as our threshold. Anything less than 2.5 indicates that the user did not like the movie and anything above 2.5 shows that the user liked that particular movie.

In [22]:
get_similarity('Avengers',5)

Avengers           2.500000
The_Lion_King      2.034204
The_Incredibles    1.766722
Frozen            -0.063480
Dumbo             -1.998527
Ponyo             -2.285265
Name: Avengers, dtype: float64

Now we can see above that the next movie to be recommended to the user is The Lion King.

In [24]:
get_similarity('Frozen', 1)

The_Incredibles    0.778499
The_Lion_King      0.569970
Avengers           0.038088
Dumbo             -0.222059
Ponyo             -0.590909
Frozen            -1.500000
Name: Frozen, dtype: float64

Now we can see that since the user clearly did not like the movie **'Frozen'** the similarity rating is low and so is the same for other movies such as ponyo and Dumbo. The similarity ratings also show that the user probably likes movies like the incredibles.

In [25]:
action_lover = [('The_Incredibles', 5), ('Dumbo',1), ('Frozen',1)]

In [26]:
similar_scores = pd.DataFrame()

for movie, rating in action_lover:
    similar_scores = similar_scores.append(get_similarity(movie,rating), ignore_index = True)
    
similar_scores.head()

Unnamed: 0,The_Incredibles,The_Lion_King,Avengers,Frozen,Ponyo,Dumbo
0,2.5,1.807754,1.766722,-1.297498,-2.108435,-2.112886
1,1.267731,1.271919,1.199116,-0.222059,-1.08562,-1.5
2,0.778499,0.56997,0.038088,-1.5,-0.590909,-0.222059


In [27]:
similar_scores.sum().sort_values(ascending = False)

The_Incredibles    4.546230
The_Lion_King      3.649643
Avengers           3.003926
Frozen            -3.019557
Ponyo             -3.784964
Dumbo             -3.834944
dtype: float64

What we've done above is taken user ratings for 3 movies and recommended movies based of those ratings. Since the user likes the movie **'The Incredibles'** by giving it a rating of 5 and gives a rating of 1 to **'Frozen'** and **'Dumbo'** the similarity rating for both movies are low and as you can see from the list they are at the bottom.

The movies recommended to the user will be the lion king and avengers since their similarity ratings to the incredibles are high.