# Recommender system based on collaborative filtering
+ Based on subset of the dataset containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.
+ [ratings_small.csv](https://www.kaggle.com/rounakbanik/the-movies-dataset/downloads/ratings_small.csv/7): The subset of 100,000 ratings from 700 users on 9,000 movies.

# Import libraries

In [18]:
import pandas as pd
import numpy as np
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

is_main_module() function is used to prevent running heavy functions in hybrid_recommender.ipynb

In [19]:
def is_main_module():
    return __name__ == '__main__' and '__file__' not in globals()

# Load dataset

In [20]:
ratings = pd.read_csv('datasets/ratings_small.csv')

### Peek at data

In [21]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


# Read data for surprise library functions

In [22]:
reader = Reader()

In [23]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Check the performance of the model using cross validation

In [24]:
collaborative_filtering = SVD()
if is_main_module():
    cross_validate(collaborative_filtering, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8977  0.8959  0.9014  0.8954  0.8962  0.8973  0.0022  
MAE (testset)     0.6915  0.6889  0.6965  0.6891  0.6863  0.6904  0.0034  
Fit time          7.39    7.37    7.50    7.32    7.25    7.37    0.08    
Test time         0.32    0.30    0.35    0.25    0.25    0.29    0.04    


+ we get RMSE of 0.8968 which is good

# Train the final model

In [25]:
trainset = data.build_full_trainset()
collaborative_filtering.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x25381c23eb8>

# Predict some ratings

In [26]:
ratings[ratings['userId'] == 1].head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [27]:
collaborative_filtering.predict(uid=1, iid=31, r_ui=2.5, verbose=True)
collaborative_filtering.predict(uid=1, iid=31, verbose=True)

user: 1          item: 31         r_ui = 2.50   est = 2.33   {'was_impossible': False}
user: 1          item: 31         r_ui = None   est = 2.33   {'was_impossible': False}


Prediction(uid=1, iid=31, r_ui=None, est=2.334781817236088, details={'was_impossible': False})

+ estimate is close to the real value