<a href="https://www.kaggle.com/code/mostafahabibi1994/movie-recommender-system-collaborative-filtering?scriptVersionId=159757676" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import numpy as np 
import pandas as pd
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split, GridSearchCV
from surprise import SVD, KNNBasic, NMF, SlopeOne, CoClustering, BaselineOnly
from surprise import accuracy

**Collaborative filtering recommends items based on the preferences of users or items similar to the target.** 
There are two main types:

> User-based: Recommends items liked by users with similar preferences.

> Item-based: Recommends items similar to those the user has liked or interacted with.


*Both methods rely on the collective behavior of users to make personalized recommendations*

> This Notebook is only for showing how a CF RS is done in the simplest form

In [2]:
data = pd.read_csv('/kaggle/input/the-movies-dataset/ratings_small.csv')
df = pd.DataFrame(data)
df.head(3)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182


In [3]:
df = df.drop(columns = 'timestamp')
df.head(3)

Unnamed: 0,userId,movieId,rating
0,1,31,2.5
1,1,1029,3.0
2,1,1061,3.0


In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
userId,100004.0,347.01131,195.163838,1.0,182.0,367.0,520.0,671.0
movieId,100004.0,12548.664363,26369.198969,1.0,1028.0,2406.5,5418.0,163949.0
rating,100004.0,3.543608,1.058064,0.5,3.0,4.0,4.0,5.0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100004 entries, 0 to 100003
Data columns (total 3 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   userId   100004 non-null  int64  
 1   movieId  100004 non-null  int64  
 2   rating   100004 non-null  float64
dtypes: float64(1), int64(2)
memory usage: 2.3 MB


In [6]:
df.isna().sum()

userId     0
movieId    0
rating     0
dtype: int64

**Reading the data with surprise Library**

In [7]:
reader = Reader(rating_scale=(0,5))
data = Dataset.load_from_df(df , reader)

**Making a train and test data for evaluation**

In [8]:
train_set , test_set = train_test_split(data ,test_size = 0.2 )

**I am going to use these algorithms on the data to see which works better**

In [9]:
algorithms = [SVD(), KNNBasic(), NMF(), SlopeOne(), CoClustering(), BaselineOnly()]

**Making a loop to test the algorithms**

In [10]:
rmse_vals = []
mae_vals = []


for algo in algorithms:
    
    algo.fit(train_set)

    
    preds = algo.test(test_set)

 
    rmse = accuracy.rmse(preds)
    mae = accuracy.mae(preds)

    
    rmse_vals.append(rmse)
    mae_vals.append(mae)

RMSE: 0.8920
MAE:  0.6885
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9634
MAE:  0.7429
RMSE: 0.9397
MAE:  0.7248
RMSE: 0.9220
MAE:  0.7100
RMSE: 0.9528
MAE:  0.7425
Estimating biases using als...
RMSE: 0.8898
MAE:  0.6903


In [11]:
combined_acc = [(rmse + mae) / 2 for rmse, mae in zip(rmse_vals, mae_vals)]

In [12]:
best_algo_index = combined_acc.index(min(combined_acc))
best_algo_name = algorithms[best_algo_index].__class__.__name__


In [13]:
print(f"Best Algorithm: {best_algo_name}")
print(f"RMSE: {rmse_vals[best_algo_index]}")
print(f"MAE: {mae_vals[best_algo_index]}")
print(f"Combined Score: {combined_acc[best_algo_index]}")

Best Algorithm: BaselineOnly
RMSE: 0.889782568138115
MAE: 0.6903489074110488
Combined Score: 0.7900657377745819


In [14]:
bsl = BaselineOnly()
data_train = data.build_full_trainset()
bsl.fit(data_train)

Estimating biases using als...


<surprise.prediction_algorithms.baseline_only.BaselineOnly at 0x7b05524ce110>

In [15]:
bsl.predict(3, 872)

Prediction(uid=3, iid=872, r_ui=None, est=3.494574995057245, details={'was_impossible': False})

**this simple model is predicting the rating of 3.49 for movoe id 872 for user id 3**