# Film Recommendation with Collaborative Filtering

In this notebook I will implement collaborative filtering to generate movie recommendations, using the TMDB dataset and the Surprise library for Python

The basic approach is as follows:
* Read in "ratings_small.csv" to a pandas dataframe
* Create a Surprise Reader object, and load the movie ratings into a Surprise Dataset object
* Create a Singular Value Decomposition (SVD) model, and train it on the ratings
* Generate some new predictions to see how the model performs

1. Imports

In [2]:
import pandas as pd
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

2. Load Data

In [4]:
ratings_df = pd.read_csv("/Users/daniellefevre/Desktop/kaggle/movie/archive/ratings_small.csv")
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


The Surprise Reader object expects data in a format of **"user; item; rating"**, so we will load only these columns from the ratings file into our working dataset

In [8]:
reader = Reader()
data = Dataset.load_from_df(ratings_df[["userId", "movieId", "rating"]], reader)


3. Initialize model

In [10]:
svd = SVD()

In [11]:
cross_validate(svd, data, cv = 5, measures = ['RMSE', 'MAE'])

{'test_rmse': array([0.89679968, 0.89728831, 0.8960211 , 0.90338517, 0.8871855 ]),
 'test_mae': array([0.69296818, 0.69099658, 0.69064104, 0.69212115, 0.68317525]),
 'fit_time': (4.375380039215088,
  4.341005086898804,
  4.3374011516571045,
  4.341750621795654,
  4.3929078578948975),
 'test_time': (0.20064997673034668,
  0.1992480754852295,
  0.16738200187683105,
  0.17335820198059082,
  0.12194991111755371)}

The test RMSE is close to 90%, which seems satisfactory for the purpose of movie recommendations

4. Build training set and make predictions

In [14]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fd3b1418b80>

Pick a user and check their ratings:

In [17]:
ratings_df[ratings_df['userId'] == 10]

Unnamed: 0,userId,movieId,rating,timestamp
744,10,50,5.0,942766420
745,10,152,4.0,942766793
746,10,318,4.0,942766515
747,10,344,3.0,942766603
748,10,345,4.0,942766603
749,10,592,3.0,942767328
750,10,735,4.0,942766974
751,10,1036,3.0,942767258
752,10,1089,3.0,942766420
753,10,1101,2.0,942767328


In [20]:
svd.predict(10, 50, 5.0)

Prediction(uid=10, iid=50, r_ui=5.0, est=4.483867845206662, details={'was_impossible': False})

The model predicts a rating of 4.48 for a movie which this user actually rated 5.0. Not too bad.

In [21]:
svd.predict(10, 51, 5.0)

Prediction(uid=10, iid=51, r_ui=5.0, est=3.591027324134858, details={'was_impossible': False})

For movie 51, which the user has not reviewed, the model predicts that they would rate it 3.59

## Conclusion and Summary

We used Pandas and Surprise to implement collaborative filtering via Singular Value Decomposition. The Surprise library makes it possible to create an effective model for content recommendation in only a few lines of code.