### Evaluating Various Algorithms

In [1]:
from surprise import SVDpp
from surprise import SVD
from surprise import NMF
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate

In [2]:
DATASET_PATH = '../../../datasets'
!ls $DATASET_PATH

books		    books_ratings.csv  refined_metadata.pkl
books_lexile.tar    lexile.json        refined_ratings.csv
books_meta.json.gz  lexile.pkl
books_meta.pkl	    merged_df.pkl


In [3]:
reader = Reader(line_format='user item rating timestamp', sep=',')
data = Dataset.load_from_file(DATASET_PATH+'/refined_ratings.csv', reader=reader)

### SVD

In [4]:
%%timeit

# We'll use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.6081  0.6225  0.6150  0.6141  0.6174  0.6154  0.0047  
RMSE (testset)    0.7935  0.8098  0.7990  0.8120  0.8091  0.8047  0.0072  
Fit time          1.93    2.58    2.50    2.12    1.61    2.14    0.36    
Test time         0.33    0.12    0.08    0.06    0.07    0.13    0.10    
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.6089  0.6223  0.6129  0.6129  0.6230  0.6160  0.0056  
RMSE (testset)    0.7960  0.8146  0.8003  0.8015  0.8174  0.8060  0.0084  
Fit time          2.12    2.32    2.64    2.21    1.70    2.20    0.31    
Test time         0.23    0.12    0.12    0.09    0.05    0.12    0.06    
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)

### SVD++

In [6]:
%%timeit

algo_svdpp =  SVDpp()
# Run 5-fold cross-validation and print results
cross_validate(algo_svdpp, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.5992  0.6077  0.6182  0.5996  0.6041  0.6058  0.0069  
RMSE (testset)    0.7853  0.8018  0.8186  0.7905  0.7978  0.7988  0.0114  
Fit time          24.08   23.74   25.26   24.76   22.22   24.01   1.04    
Test time         0.62    0.50    0.54    0.35    0.50    0.50    0.09    
Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.6013  0.6142  0.6047  0.6029  0.6037  0.6054  0.0046  
RMSE (testset)    0.7928  0.8134  0.7989  0.7947  0.7981  0.7996  0.0073  
Fit time          33.42   31.85   33.11   32.80   15.73   29.38   6.85    
Test time         0.63    0.53    0.35    0.32    0.30    0.43    0.13    
Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (te

### Non-negative Matrix Factorization

In [8]:
%%timeit

algo_nmf =  NMF()
# Run 5-fold cross-validation and print results
cross_validate(algo_nmf, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.7558  0.7540  0.7537  0.7476  0.7451  0.7512  0.0041  
RMSE (testset)    0.9666  0.9684  0.9649  0.9656  0.9608  0.9653  0.0025  
Fit time          5.38    6.36    5.77    5.16    4.17    5.37    0.72    
Test time         0.14    0.12    0.13    0.08    0.08    0.11    0.02    
Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     0.7483  0.7519  0.7394  0.7441  0.7530  0.7473  0.0050  
RMSE (testset)    0.9611  0.9684  0.9533  0.9575  0.9636  0.9608  0.0052  
Fit time          3.70    4.66    4.32    4.06    3.24    4.00    0.49    
Test time         0.20    0.13    0.07    0.07    0.06    0.10    0.05    
Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)

### Probabilistic Matrix Factorization

In [9]:
%%timeit

# We'll use the famous SVD algorithm.
algo_pmf = SVD(biased=False)

# Run 5-fold cross-validation and print results
cross_validate(algo_pmf, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     2.1200  2.1006  2.0880  2.0813  2.1092  2.0998  0.0140  
RMSE (testset)    2.4647  2.4452  2.4327  2.4308  2.4578  2.4462  0.0134  
Fit time          2.76    3.56    3.43    2.98    2.40    3.03    0.43    
Test time         0.14    0.17    0.11    0.10    0.08    0.12    0.03    
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)     2.1116  2.0622  2.0637  2.0672  2.0971  2.0803  0.0202  
RMSE (testset)    2.4585  2.4171  2.4157  2.4201  2.4411  2.4305  0.0168  
Fit time          2.81    3.50    3.63    2.92    2.35    3.04    0.47    
Test time         0.16    0.14    0.12    0.10    0.08    0.12    0.03    
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MAE (testset)

### __Conclusion__: **SVD++** > **SVD** > **NMF** > **PMF**