### Library

In [1]:
import numpy as np
import pandas as pd
import matrix_factorization_utilities

Ratings Data Set

In [2]:
ratings = pd.read_csv('../Data/Ratings.csv')
ratings.head(3)

Unnamed: 0,user_id,movie_id,rating
0,1,28,4
1,1,26,4
2,1,9,4


Movies Data Set

In [3]:
movies = pd.read_csv('../Data/Movies.csv', index_col='movie_id')
movies.head(3)

Unnamed: 0_level_0,title,genre
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1,The Sheriff 1,"crime drama, western"
2,The Big City Judge 1,legal drama
3,The Sheriff 2,"crime drama, western"


Converting User Reviews into Matrix

In [4]:
matrix = pd.pivot_table(ratings, 
                        index='user_id', 
                        columns='movie_id', 
                        aggfunc=np.max)

Apply Matrix Factorization to find Latent Features ( Hidden Features computed from Observed Features )

In [5]:
U, M = matrix_factorization_utilities.low_rank_matrix_factorization(matrix.values,
                                                                    num_features=15,
                                                                    regularization_amount=0.1)

         Current function value: 32.504368
         Iterations: 3000
         Function evaluations: 4491
         Gradient evaluations: 4491


Find all Predicted Ratings by Multiplying U and M Matrices

In [6]:
predicted_ratings = np.matmul(U, M)

print(f'Enter a User ID to get Recommendation between 1 and 100 :')
user_id = int(input())
print(f'\nMovies Previously Reviewed by User ID : {user_id}\n')


reviewed_movies = ratings[ratings['user_id'] == user_id]
reviewed_movies = reviewed_movies.join(movies, on='movie_id')

print(f'Reviewed Movies : \n{reviewed_movies[["title", "genre", "rating"]]}')

user_ratings = predicted_ratings[user_id - 1]
movies['rating'] = user_ratings

already_reviewed = reviewed_movies['movie_id']
recommend = movies[movies.index.isin(already_reviewed) == False]
recommend = recommend.sort_values(by=['rating'], ascending = False)

print(f'\nRecommendations : \n{recommend[["title", "genre", "rating"]].head(5)}')

Enter a User ID to get Recommendation between 1 and 100 :
7

Movies Previously Reviewed by User ID : 7

Reviewed Movies : 
                    title                          genre  rating
40          The Sheriff 3           crime drama, western       5
41  The Serious Detective                detective drama       4
42  Just a Regular Family                        reality       2
43             Horrorfest                         horror       5
44          The Sheriff 1           crime drama, western       5
45        Master Criminal  thriller, horror, crime drama       5
46      Attack on Earth 2                 sci-fi, action       5
47       Trapped in Space                sci-fi, mystery       5

Recommendations : 
                         title                      genre    rating
movie_id                                                           
29          Post-Apocalyptia 1  sci-fi, thriller, mystery  5.167552
14              The Spy Family                  spy drama  4.913145


Modeling Users and Movies

Movie Ratings > Matrix Factorization > U ( User Attributes ) | M ( Movie Attributes )

Overfitting ( Model will Fail to Seperate a Horror Comedy and Serious Horror )

How to Prevent from Overfitting ?

1. Regularization

- Control how much Weight to Assign on Single Attribute.
- Higher the Regularization Amount, Less Weight we can put on any Single Attribute.
- Setting Regularization : 0.1 ( Small Data Set ) and 1.0, 10.0, etc ( Large Data Set )

2. Root Mean Squared Error ( RMSE )
- Difference between User's Real Rating and Predicted Rating.
- Lower RMSE, more Accurate Model.

How to Measure Accuracy ?

- Split the Review Data into Train Set and Test Set.
- Do Matrix Factorization using only Training Set.
- Check Accuracy on Test Set.

In [7]:
rmse = matrix_factorization_utilities.RMSE(matrix, predicted_ratings)
print(f'RMSE : {rmse}')

RMSE : 0.02053592227169507


Save User Rating Features, Movie Features and Predicted Ratings 

In [8]:
import pickle
pickle.dump(U, open('../Data/User_Rating_Features.dat', mode='wb'))
pickle.dump(M, open('../Data/Movie_Features.dat', mode='wb'))
pickle.dump(predicted_ratings, open('../Data/Predicted_Ratings.dat', mode='wb'))