<a href="https://colab.research.google.com/github/SSRaylia/Prediction-Using-Machine-Learning/blob/master/Collaborative_Filtering_on_Movie_Preference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Collaborative Filtering on Movie Preference

Here I use the data Professor Ashwin collected from our classmates last quarter, which contains the rating student provided on 47 movies.

Important Reference: https://www.kaggle.com/ajmichelutti/collaborative-filtering-on-anime-data. Truly appreciate the author for providing a straightforward and clear code for collaborative filtering and prediction.

## Data Ingestion

In [0]:
import pandas as pd
import numpy as np
import scipy as sp
from sklearn.metrics.pairwise import cosine_similarity
import operator
%matplotlib inline

In [0]:
from google.colab import files
uploaded = files.upload()

Saving hw7.csv to hw7.csv


In [0]:
movie = pd.read_csv('hw7.csv')

## Create Pivot Table

In [0]:
piv = movie.pivot_table(index=['User'], columns=['Movie Name'], values='Rating')

In [0]:
print(piv.shape)
piv.head()

(42, 47)


Movie Name,12 Years a Slave,A Prophet,A Separation,Amour,Argo,Arrival,Avatar,Beasts of the Southern Wild,Birdman,Black Swan,...,The Social Network,The Tree of Life,The White Ribbon,The Wolf of Wall Street,"Three Billboards Outside Ebbing, Missouri",Toni Erdmann,Toy Story 3,True Grit,Up in the Air,Zero Dark Thirty
User,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Abhinav,,,,,5.0,3.0,,,,,...,,,,4.0,,,,,,
Albina,,,,,5.0,,5.0,,,4.0,...,5.0,4.0,,5.0,,,4.0,,5.0,
April,,,,,5.0,,,,,,...,,,,,,,,,,
Ashish,,,,,,,4.0,,,,...,4.0,,,3.0,,,5.0,,3.0,
Ashwin,,,,,,,4.0,,,,...,4.0,,,5.0,,,3.0,,,4.0


## Calculate Similarity



In [0]:
piv_norm = piv.apply(lambda x: (x-np.mean(x))/(np.max(x)-np.min(x)), axis=1)

In [0]:
piv_norm.fillna(0, inplace=True)
piv_norm = piv_norm.T
piv_norm = piv_norm.loc[:, (piv_norm != 0).any(axis=0)]

In [0]:
piv_sparse = sp.sparse.csr_matrix(piv_norm.values)

In [0]:
item_similarity = cosine_similarity(piv_sparse)
user_similarity = cosine_similarity(piv_sparse.T)

In [0]:
item_sim_df = pd.DataFrame(item_similarity, index = piv_norm.index, columns = piv_norm.index)
user_sim_df = pd.DataFrame(user_similarity, index = piv_norm.columns, columns = piv_norm.columns)

## Define Functions

In [0]:
#Function that returns the top 10 movies with highest similarity with the inputed movie
def top_movies(movie_name):
    count = 1
    print('Similar shows to {} include:\n'.format(movie_name))
    for item in item_sim_df.sort_values(by = movie_name, ascending = False).index[1:11]:
        print('No. {}: {}'.format(count, item))
        count +=1 

In [0]:
#Function that returns the top 5 users with highest similarity with the inputed user
def top_users(user):
    
    if user not in piv_norm.columns:
        return('No data available on user {}'.format(user))
    
    print('Most Similar Users:\n')
    sim_values = user_sim_df.sort_values(by=user, ascending=False).loc[:,user].tolist()[1:11]
    sim_users = user_sim_df.sort_values(by=user, ascending=False).index[1:11]
    zipped = zip(sim_users, sim_values,)
    for user, sim in zipped:
        print('User #{0}, Similarity value: {1:.2f}'.format(user, sim)) 

In [0]:
#Predict the rating for the inputed movie and user based on similarity
def predicted_rating(movie_name, user):
    sim_users = user_sim_df.sort_values(by=user, ascending=False).index[1:1000]
    user_values = user_sim_df.sort_values(by=user, ascending=False).loc[:,user].tolist()[1:1000]
    rating_list = []
    weight_list = []
    for j, i in enumerate(sim_users):
        rating = piv.loc[i, movie_name]
        similarity = user_values[j]
        if np.isnan(rating):
            continue
        elif not np.isnan(rating):
            rating_list.append(rating*similarity)
            weight_list.append(similarity)
    return sum(rating_list)/sum(weight_list)   

## Function: Find the most similiar movies

In [0]:
movie_name = input('Please input movie name to fine similiar ones: ')

Please input movie name to fine similiar ones: Inception


In [0]:
top_movies(movie_name)

Similar shows to Inception include:

No. 1: Spotlight
No. 2: The Artist
No. 3: Precious
No. 4: Manchester by the Sea
No. 5: Zero Dark Thirty
No. 6: Toy Story 3
No. 7: The Secret in Their Eyes
No. 8: Blue is the Warmest Colour
No. 9: Toni Erdmann
No. 10: Amour


## Function: Find the most similiar user

In [0]:
user_name = input('Please input user name to find similiar users: ')

Please input user name to find similiar users: Raylia


In [0]:
top_users(user_name)

Most Similar Users:

User #April, Similarity value: 0.50
User #Sheel, Similarity value: 0.36
User #Brenda, Similarity value: 0.33
User #Chunjin, Similarity value: 0.29
User #Renee, Similarity value: 0.24
User #Hazel, Similarity value: 0.16
User #Malavika, Similarity value: 0.14
User #Sidney, Similarity value: 0.12
User #Chloe, Similarity value: 0.07
User #Xinyu, Similarity value: 0.03


## Funciton: Predict the rating for specific movie and user

In [0]:
predict_movie = input('Please input the name of the movie you want to predict rating: ')

Please input the name of the movie you want to predict rating: Amour


In [0]:
predict_user = input('Please input the name of the user you want to predict rating: ')

Please input the name of the user you want to predict rating: Raylia


In [0]:
predicted_rating(predict_movie, predict_user)

4.0