# Memory-Based Recommendation System using Python Surprise 
*   Average of RMSE using User based Collborative Filtering: 1.02
*   Average of RMSE using Item based Collborative Filtering: 1.03

**Installing the surprise package**

In [4]:
pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 7.3MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1617573 sha256=23b091b2aa63c7cde46865e3875894e87791783285827b5de88a3186124074d1
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


**Importing the required packages**

In [5]:
import statistics
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
from surprise import KNNBasic
from surprise import accuracy
from surprise.model_selection import KFold

**Downloading MovieLens 100K dataset, loading it and dividing it into training and testing set with 80:20 ratio**

In [6]:
# Loading the movielens-100k dataset
data = Dataset.load_builtin('ml-100k')
trainset, testset = train_test_split(data, test_size=0.20)

Dataset ml-100k could not be found. Do you want to download it? [Y/n] Y
Trying to download dataset from http://files.grouplens.org/datasets/movielens/ml-100k.zip...
Done! Dataset ml-100k has been saved to /root/.surprise_data/ml-100k


# Defining the model for training
In this section, first the model parameters are defined in varaibles sim_options_1 for user based cf and sim_options_2 for item based cf. algo_1 and algo_2 state the model that will be used for training.

In [8]:
sim_options_1 = {'name':'cosine', 'user_based':True}
sim_options_2 = {'name':'cosine', 'user_based':False}

algo_1 = KNNBasic(sim_options = sim_options_1)
algo_2 = KNNBasic(sim_options = sim_options_2)

# Training and prection using the above defined models
In this section, training and testing of the KNNBasic model is being done. Cross validation is performed 5 times using KFold(). And RMSE is calculated for both user based and item based model. This RMSE score is being appended to respective lists acc_1 and acc_2

In [9]:
acc_1 = []
acc_2 = []

# Defining a cross-validation iterator
kf = KFold(n_splits=5)

for trainset, testset in kf.split(data):
  # train and test algorithm
  algo_1.fit(trainset)
  algo_2.fit(trainset)
  predictions_1 = algo_1.test(testset)
  predictions_2 = algo_2.test(testset)
  # Computing and appending Root Mean Squared Error
  acc_1.append(accuracy.rmse(predictions_1, verbose=False))
  acc_2.append(accuracy.rmse(predictions_2, verbose=False))

Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.


**Printing the RMSE scores for all five folds of cross validation and for both the models**

In [10]:
print("For User based Collaborative Filtering")
for i in range(5):
  print("RMSE of fold",i+1, ":", round(acc_1[i],2))

print("\n")
print("For Item based Collaborative Filtering")
for i in range(5):
  print("RMSE of fold",i+1, ":", round(acc_2[i],2))

For User based Collaborative Filtering
RMSE of fold 1 : 1.02
RMSE of fold 2 : 1.02
RMSE of fold 3 : 1.02
RMSE of fold 4 : 1.02
RMSE of fold 5 : 1.01


For Item based Collaborative Filtering
RMSE of fold 1 : 1.03
RMSE of fold 2 : 1.03
RMSE of fold 3 : 1.03
RMSE of fold 4 : 1.03
RMSE of fold 5 : 1.02


**Printing the average of RMSE score for both the techniques**

In [11]:
print("Average of RMSE using User based CF:", round(statistics.mean(acc_1),2))
print("Average of RMSE using Item based CF:", round(statistics.mean(acc_2),2))

Average of RMSE using User based CF: 1.02
Average of RMSE using Item based CF: 1.03
