
**Installing scikit-surprise**

Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.



In [None]:
!pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 4.9MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp36-cp36m-linux_x86_64.whl size=1670890 sha256=3c4e3ec1212128a5c4f53b10dec61596fbde7ea0c13577c8e89bbefadbf5ea16
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [None]:
from surprise import SVD
from surprise import Dataset, Reader
from surprise import accuracy
from surprise.model_selection.split import train_test_split
import pandas as pd 
import random  

Reading 100k and 1M data into dataframe from csv  

In [None]:
df_100k = pd.read_csv("/content/u_data.csv")
df_100k.drop('timestamp',axis=1,inplace=True)
df_1M = pd.read_csv("/content/1m_user_data.csv")

Creating Train and Test datasets(70/30 split) for 100k and 1M data


In [None]:
reader = Reader(rating_scale=(1,5))
data_100k = Dataset.load_from_df(df_100k,reader)
data_1M = Dataset.load_from_df(df_1M,reader)
trainset_100k,testset_100k = train_test_split(data_100k, test_size=0.3, train_size=None, random_state=None, shuffle=True)
trainset_1M,testset_1M = train_test_split(data_1M, test_size=0.3, train_size=None, random_state=None, shuffle=True)

Defining Average Per item MSE function

In [None]:
def avg_per_item_mse(predictions):
  pdf = pd.DataFrame(predictions)
  pdf['Squarederror'] = (pdf['r_ui']-pdf['est'])**2
  pdf.drop(['uid','details','r_ui','est'],axis=1,inplace=True)
  item_mse=pdf.groupby(['iid']).mean()
  return item_mse['Squarederror'].mean() 

For Probabilistic Matrix Factorization, Using SVD algorithm without bias factors

[Matrix factorization algorithm SVD](https://surprise.readthedocs.io/en/stable/matrix_factorization.html)

In [None]:
algo = SVD(biased = False)

Training and testing svd algorithm with 100k data  

In [None]:
algo.fit(trainset_100k)
predictions_100k = algo.test(testset_100k)
MSE_100k = avg_per_item_mse(predictions_100k)
print("Average per item MSE of 100k data:",MSE_100k) 

Average per item MSE of 100k data: 1.324084273578656


Training and testing svd algorithm with 1M data  

In [None]:
algo.fit(trainset_1M)
predictions_1M = algo.test(testset_1M)
MSE_1M = avg_per_item_mse(predictions_1M)
print("Average per item MSE of 1M data:",MSE_1M) 

Average per item MSE of 1M data: 1.0734020349047144
