### Building a Recommender system with Surprise

This try-it focuses on exploring additional algorithms with the `Suprise` library to generate recommendations.  Your goal is to identify the optimal algorithm by minimizing the mean squared error using cross validation. You are also going to select a dataset to use from [grouplens](https://grouplens.org/datasets/movielens/) example datasets.  

To begin, head over to grouplens and examine the different datasets available.  Choose one so that it is easy to create the data as expected in `Surprise` with user, item, and rating information.  Then, compare the performance of at least the `KNNBasic`, `SVD`, `NMF`, `SlopeOne`, and `CoClustering` algorithms to build your recommendations.  For more information on the algorithms see the documentation for the algorithm package [here](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

Share the results of your investigation and include the results of your cross validation and a basic description of your dataset with your peers.



In [89]:
from surprise import Dataset, Reader, SVD, NMF, KNNBasic, KNNWithMeans, KNNWithZScore, SlopeOne, CoClustering, NormalPredictor
from surprise.model_selection import cross_validate

import pandas as pd
import numpy as np

# Import libraries
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
df = pd.read_csv('ml-25m/ratings.csv')

https://grouplens.org/datasets/movielens/25m/

MovieLens 25M movie ratings. Stable benchmark dataset. 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. Includes tag genome data with 15 million relevance scores across 1,129 tags. Released 12/2019

In [6]:
df.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044
1,1,306,3.5,1147868817
2,1,307,5.0,1147868828
3,1,665,5.0,1147878820
4,1,899,3.5,1147868510
5,1,1088,4.0,1147868495
6,1,1175,3.5,1147868826
7,1,1217,3.5,1147878326
8,1,1237,5.0,1147868839
9,1,1250,4.0,1147868414


In [8]:
df[df.movieId == 296]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044
264,3,296,5.0,1439474476
912,4,296,4.0,1573938898
1184,5,296,4.0,830786155
1290,7,296,4.0,835444730
...,...,...,...,...
24998549,162533,296,5.0,1280834123
24998905,162534,296,4.5,1526666265
24999445,162536,296,5.0,1572257729
24999632,162538,296,2.5,1438781382


In [12]:
df.tail(10)

Unnamed: 0,userId,movieId,rating,timestamp
25000085,162541,8983,4.5,1240953211
25000086,162541,31658,4.5,1240953287
25000087,162541,33794,4.0,1240951792
25000088,162541,41566,4.0,1240952749
25000089,162541,45517,4.5,1240953353
25000090,162541,50872,4.5,1240953372
25000091,162541,55768,2.5,1240951998
25000092,162541,56176,2.0,1240950697
25000093,162541,58559,4.0,1240953434
25000094,162541,63876,5.0,1240952515


In [42]:
df.rating.min(), df.rating.max()

(0.5, 5.0)

In [56]:
train = df[(df.movieId < 10001) & (df.userId < 10001)][['userId', 'movieId', 'rating']]

In [141]:
train

Unnamed: 0,userId,movieId,rating
0,1,296,5.0
1,1,306,3.5
2,1,307,5.0
3,1,665,5.0
4,1,899,3.5
...,...,...,...
1496582,10000,7147,3.0
1496583,10000,7153,4.5
1496584,10000,7154,2.0
1496585,10000,7361,5.0


In [144]:
reader = Reader(line_format='item user rating')
rating_data = Dataset.load_from_df(train, reader)

knn_basic = KNNBasic(k=100, verbose=False)
knn_basic_results = cross_validate(knn_basic, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8967  0.8982  0.8973  0.8968  0.8969  0.8972  0.0006  
MAE (testset)     0.6857  0.6850  0.6845  0.6847  0.6842  0.6848  0.0005  
Fit time          15.16   15.17   15.31   15.17   15.27   15.22   0.06    
Test time         74.99   79.68   81.14   79.57   82.06   79.49   2.43    


In [61]:
# using Singular Value Decomposition (Matrix Factorisation) to build the recommender system
svd = SVD(verbose=False, n_epochs=100)
svd_results = cross_validate(svd, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8591  0.8597  0.8628  0.8597  0.8586  0.8600  0.0015  
MAE (testset)     0.6528  0.6546  0.6562  0.6537  0.6536  0.6542  0.0011  
Fit time          25.96   22.28   24.20   21.36   25.99   23.96   1.89    
Test time         0.85    0.85    0.64    0.87    0.86    0.81    0.09    


In [62]:
# SlopOne results
slope_one = SlopeOne()
slope_one_results = cross_validate(slope_one, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SlopeOne on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8704  0.8640  0.8676  0.8683  0.8666  0.8674  0.0021  
MAE (testset)     0.6657  0.6615  0.6637  0.6647  0.6629  0.6637  0.0014  
Fit time          7.12    7.17    7.22    7.23    7.25    7.20    0.05    
Test time         15.76   15.95   19.09   15.82   16.11   16.55   1.28    


In [63]:
# CoClustering
co_clustering = CoClustering(n_epochs=100, verbose=False, random_state=0)
co_clustering_results = cross_validate(co_clustering, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm CoClustering on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8870  0.8891  0.8977  0.8880  0.8885  0.8901  0.0039  
MAE (testset)     0.6894  0.6912  0.6968  0.6897  0.6908  0.6916  0.0027  
Fit time          26.66   31.45   29.35   26.57   29.68   28.74   1.88    
Test time         0.96    0.65    0.64    0.65    0.94    0.77    0.15    


In [91]:
# NMF
nmf = NMF(n_epochs=100, verbose=False, random_state=0)
nmf_results = cross_validate(nmf, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8460  0.8497  0.8508  0.8508  0.8487  0.8492  0.0018  
MAE (testset)     0.6507  0.6534  0.6542  0.6548  0.6535  0.6533  0.0014  
Fit time          13.79   11.00   14.01   17.70   11.54   13.61   2.37    
Test time         0.95    0.97    1.53    0.98    0.94    1.07    0.23    


In [64]:
#KNNWithMeans
knn_means = KNNWithMeans(k=100, verbose=False)
knn_means_results = cross_validate(knn_means, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8788  0.8774  0.8755  0.8771  0.8776  0.8773  0.0011  
MAE (testset)     0.6772  0.6756  0.6746  0.6757  0.6774  0.6761  0.0010  
Fit time          14.60   14.96   14.87   15.15   14.79   14.87   0.18    
Test time         70.43   72.47   73.78   82.09   69.23   73.60   4.53    


In [65]:
#KNNWithZScore
knn_zscore = KNNWithZScore(k=100, verbose=False)
knn_zscore_results = cross_validate(knn_zscore, rating_data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNWithZScore on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8793  0.8740  0.8742  0.8753  0.8767  0.8759  0.0020  
MAE (testset)     0.6740  0.6709  0.6713  0.6721  0.6723  0.6721  0.0011  
Fit time          14.98   15.23   14.95   15.01   15.00   15.04   0.10    
Test time         81.27   81.92   75.73   73.15   80.08   78.43   3.41    


In [142]:
train_5k_users = df[(df.movieId < 10001) & (df.userId < 5001)][['userId', 'movieId', 'rating']]
train_5k_users

Unnamed: 0,userId,movieId,rating
0,1,296,5.0
1,1,306,3.5
2,1,307,5.0
3,1,665,5.0
4,1,899,3.5
...,...,...,...
733150,5000,1073,2.0
733151,5000,1210,3.0
733152,5000,1356,3.0
733153,5000,1393,4.0


In [143]:
reader_5k = Reader(line_format='item user rating')
rating_data_5k = Dataset.load_from_df(train_5k_users, reader_5k)

knn_basic_5k = KNNBasic(k=100, verbose=False)
knn_basic_results_5k = cross_validate(knn_basic_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9035  0.9037  0.9043  0.9026  0.9019  0.9032  0.0008  
MAE (testset)     0.6920  0.6915  0.6922  0.6913  0.6911  0.6916  0.0004  
Fit time          3.21    3.32    3.21    3.21    3.20    3.23    0.05    
Test time         20.76   19.88   17.32   20.34   18.74   19.41   1.24    


In [68]:
# using Singular Value Decomposition (Matrix Factorisation) to build the recommender system
svd_5k = SVD(verbose=False, n_epochs=100)
svd_results_5k = cross_validate(svd_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8749  0.8711  0.8715  0.8711  0.8716  0.8720  0.0014  
MAE (testset)     0.6679  0.6646  0.6647  0.6658  0.6651  0.6656  0.0012  
Fit time          12.90   11.47   12.97   12.71   12.85   12.58   0.56    
Test time         0.63    0.21    0.63    0.66    0.64    0.55    0.17    


In [69]:
# SlopOne results
slope_one_5k = SlopeOne()
slope_one_results_5k = cross_validate(slope_one_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SlopeOne on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8693  0.8707  0.8688  0.8657  0.8684  0.8686  0.0016  
MAE (testset)     0.6642  0.6660  0.6645  0.6639  0.6650  0.6647  0.0007  
Fit time          3.36    3.47    3.51    3.51    3.49    3.47    0.06    
Test time         8.16    7.95    7.25    8.31    8.45    8.03    0.42    


In [70]:
# CoClustering
co_clustering_5k = CoClustering(n_epochs=100, verbose=False, random_state=0)
co_clustering_results_5k = cross_validate(co_clustering_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm CoClustering on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8907  0.8922  0.8964  0.8909  0.8897  0.8920  0.0023  
MAE (testset)     0.6914  0.6929  0.6958  0.6921  0.6901  0.6925  0.0019  
Fit time          16.95   16.11   13.34   12.50   12.96   14.37   1.80    
Test time         0.66    0.65    0.17    0.64    0.67    0.56    0.19    


In [92]:
# NMF
nmf_5k = NMF(n_epochs=100, verbose=False, random_state=0)
nmf_results_5k = cross_validate(nmf_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8605  0.8606  0.8566  0.8585  0.8582  0.8589  0.0015  
MAE (testset)     0.6615  0.6617  0.6593  0.6600  0.6609  0.6607  0.0009  
Fit time          6.48    8.35    8.58    6.49    5.89    7.15    1.09    
Test time         0.77    0.76    0.78    0.17    0.78    0.65    0.24    


In [71]:
#KNNWithMeans
knn_means_5k = KNNWithMeans(k=100, verbose=False)
knn_means_results_5k = cross_validate(knn_means_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8768  0.8743  0.8767  0.8737  0.8769  0.8757  0.0014  
MAE (testset)     0.6738  0.6738  0.6749  0.6720  0.6746  0.6738  0.0010  
Fit time          3.36    3.36    3.16    3.18    3.18    3.25    0.09    
Test time         21.31   19.27   18.24   20.70   21.09   20.12   1.18    


In [72]:
#KNNWithZScore
knn_zscore_5k = KNNWithZScore(k=100, verbose=False)
knn_zscore_results_5k = cross_validate(knn_zscore_5k, rating_data_5k, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm KNNWithZScore on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8774  0.8705  0.8742  0.8729  0.8755  0.8741  0.0023  
MAE (testset)     0.6708  0.6668  0.6699  0.6693  0.6698  0.6693  0.0013  
Fit time          3.38    3.42    3.27    3.32    3.34    3.35    0.05    
Test time         19.57   21.01   18.95   21.92   18.55   20.00   1.27    


In [151]:
print(f"{knn_zscore_results_5k['test_rmse'].mean():.4f}, {knn_zscore_results_5k['test_rmse'].std():.4f}")
print(f"{knn_zscore_results_5k['test_mae'].mean():.4f}, {knn_zscore_results_5k['test_mae'].std():.4f}")

0.8741, 0.0023
0.6693, 0.0013


In [152]:
results = np.array(([svd_results['test_rmse'].mean(), svd_results['test_rmse'].std(),
                    svd_results['test_mae'].mean(), svd_results['test_mae'].std(),
                    sum(svd_results['fit_time'])/len(svd_results['fit_time']), 
                    sum(svd_results['test_time'])/len(svd_results['test_time'])],
                    
                    [nmf_results['test_rmse'].mean(), nmf_results['test_rmse'].std(),
                     nmf_results['test_mae'].mean(), nmf_results['test_mae'].std(),
                     sum(nmf_results['fit_time'])/len(nmf_results['fit_time']), 
                     sum(nmf_results['test_time'])/len(nmf_results['test_time'])],

                    [co_clustering_results['test_rmse'].mean(), co_clustering_results['test_rmse'].std(),
                     co_clustering_results['test_mae'].mean(), co_clustering_results['test_mae'].std(),
                     sum(co_clustering_results['fit_time'])/len(co_clustering_results['fit_time']), 
                     sum(co_clustering_results['test_time'])/len(co_clustering_results['test_time'])],

                    [slope_one_results['test_rmse'].mean(), slope_one_results['test_rmse'].std(),
                     slope_one_results['test_mae'].mean(), slope_one_results['test_mae'].std(),
                     sum(slope_one_results['fit_time'])/len(slope_one_results['fit_time']), 
                     sum(slope_one_results['test_time'])/len(slope_one_results['test_time'])],

                    [knn_basic_results['test_rmse'].mean(), knn_basic_results['test_rmse'].std(),
                     knn_basic_results['test_mae'].mean(), knn_basic_results['test_mae'].std(),
                     sum(knn_basic_results['fit_time'])/len(knn_basic_results['fit_time']), 
                     sum(knn_basic_results['test_time'])/len(knn_basic_results['test_time'])],

                    [knn_means_results['test_rmse'].mean(), knn_means_results['test_rmse'].std(),
                     knn_means_results['test_mae'].mean(), knn_means_results['test_mae'].std(),
                     sum(knn_means_results['fit_time'])/len(knn_means_results['fit_time']), 
                     sum(knn_means_results['test_time'])/len(knn_means_results['test_time'])],

                    [knn_zscore_results['test_rmse'].mean(), knn_zscore_results['test_rmse'].std(),
                     knn_zscore_results['test_mae'].mean(), knn_zscore_results['test_mae'].std(),
                     sum(knn_zscore_results['fit_time'])/len(knn_zscore_results['fit_time']),
                     sum(knn_zscore_results['test_time'])/len(knn_zscore_results['test_time'])]))

In [153]:
results_5k = np.array(([svd_results_5k['test_rmse'].mean(), svd_results_5k['test_rmse'].std(),
                        svd_results_5k['test_mae'].mean(),  svd_results_5k['test_mae'].std(),
                    sum(svd_results_5k['fit_time'])  /  len(svd_results_5k['fit_time']), 
                    sum(svd_results_5k['test_time']) /  len(svd_results_5k['test_time'])],
                    
                        [nmf_results_5k['test_rmse'].mean(), nmf_results_5k['test_rmse'].std(),
                         nmf_results_5k['test_mae'].mean(),  nmf_results_5k['test_mae'].std(),
                     sum(nmf_results_5k['fit_time'])  /  len(nmf_results_5k['fit_time']), 
                     sum(nmf_results_5k['test_time']) /  len(nmf_results_5k['test_time'])],

                        [co_clustering_results_5k['test_rmse'].mean(), co_clustering_results_5k['test_rmse'].std(),
                         co_clustering_results_5k['test_mae'].mean(),  co_clustering_results_5k['test_mae'].std(),
                     sum(co_clustering_results_5k['fit_time'])  /  len(co_clustering_results_5k['fit_time']), 
                     sum(co_clustering_results_5k['test_time']) /  len(co_clustering_results_5k['test_time'])],

                        [slope_one_results_5k['test_rmse'].mean(), slope_one_results_5k['test_rmse'].std(),
                         slope_one_results_5k['test_mae'].mean(),  slope_one_results_5k['test_mae'].std(),
                     sum(slope_one_results_5k['fit_time'])  /  len(slope_one_results_5k['fit_time']), 
                     sum(slope_one_results_5k['test_time']) /  len(slope_one_results_5k['test_time'])],

                        [knn_basic_results_5k['test_rmse'].mean(), knn_basic_results_5k['test_rmse'].std(),
                         knn_basic_results_5k['test_mae'].mean(),  knn_basic_results_5k['test_mae'].std(),
                     sum(knn_basic_results_5k['fit_time'])  /  len(knn_basic_results_5k['fit_time']), 
                     sum(knn_basic_results_5k['test_time']) /  len(knn_basic_results_5k['test_time'])],

                        [knn_means_results_5k['test_rmse'].mean(), knn_means_results_5k['test_rmse'].std(),
                         knn_means_results_5k['test_mae'].mean(),  knn_means_results_5k['test_mae'].std(),
                     sum(knn_means_results_5k['fit_time'])  /  len(knn_means_results_5k['fit_time']), 
                     sum(knn_means_results_5k['test_time']) /  len(knn_means_results_5k['test_time'])],

                        [knn_zscore_results_5k['test_rmse'].mean(), knn_zscore_results_5k['test_rmse'].std(),
                         knn_zscore_results_5k['test_mae'].mean(),  knn_zscore_results_5k['test_mae'].std(),
                     sum(knn_zscore_results_5k['fit_time'])  /  len(knn_zscore_results_5k['fit_time']),
                     sum(knn_zscore_results_5k['test_time']) /  len(knn_zscore_results_5k['test_time'])]))

In [154]:
results

array([[8.59965933e-01, 1.46201713e-03, 6.54186774e-01, 1.14135193e-03,
        2.39556204e+01, 8.14153004e-01],
       [8.49209255e-01, 1.78133597e-03, 6.53304545e-01, 1.39715574e-03,
        1.36082708e+01, 1.07179866e+00],
       [8.90068806e-01, 3.89594259e-03, 6.91603525e-01, 2.69259007e-03,
        2.87418577e+01, 7.68877172e-01],
       [8.67366013e-01, 2.11466575e-03, 6.63711419e-01, 1.43362746e-03,
        7.19937601e+00, 1.65450042e+01],
       [8.97160452e-01, 5.67163585e-04, 6.84796366e-01, 5.14811011e-04,
        1.52164154e+01, 7.94880784e+01],
       [8.77267760e-01, 1.07193007e-03, 6.76099580e-01, 1.04707813e-03,
        1.48725300e+01, 7.35991481e+01],
       [8.75908802e-01, 1.96441276e-03, 6.72122757e-01, 1.08290979e-03,
        1.50351339e+01, 7.84297518e+01]])

In [155]:
results_5k

array([[8.72020714e-01, 1.43478105e-03, 6.65631172e-01, 1.21703343e-03,
        1.25797895e+01, 5.53082418e-01],
       [8.58880272e-01, 1.49162936e-03, 6.60700551e-01, 9.27207404e-04,
        7.15436349e+00, 6.53157854e-01],
       [8.91998457e-01, 2.34656092e-03, 6.92454441e-01, 1.89215918e-03,
        1.43723734e+01, 5.58905506e-01],
       [8.68562769e-01, 1.61448688e-03, 6.64698316e-01, 7.26476408e-04,
        3.46887178e+00, 8.02561841e+00],
       [9.03186429e-01, 8.46410311e-04, 6.91628814e-01, 4.23781923e-04,
        3.22872882e+00, 1.94064451e+01],
       [8.75670461e-01, 1.38272252e-03, 6.73817861e-01, 9.91941902e-04,
        3.24807606e+00, 2.01234848e+01],
       [8.74081454e-01, 2.32864830e-03, 6.69307486e-01, 1.32480102e-03,
        3.34646239e+00, 2.00009933e+01]])

In [156]:
result_df = pd.DataFrame(results, 
                         index=['SVD', 'NMF', 'Co Clustering', 'SlopeOne', 'KNN_Basic', 'KNN_Means', 'KNN_ZScore'],
                         columns=['RMSE Mean','RMSE Std', 'MAE Mean','MAE Std', 'Fit Time Mean','Test Time Mean'])

print ('\n\t\tThe results from evaluating\033[1m 10k Movies for 10k users\033[0;0m\n')

result_df


		The results from evaluating[1m 10k Movies for 10k users[0;0m



Unnamed: 0,RMSE Mean,RMSE Std,MAE Mean,MAE Std,Fit Time Mean,Test Time Mean
SVD,0.859966,0.001462,0.654187,0.001141,23.95562,0.814153
NMF,0.849209,0.001781,0.653305,0.001397,13.608271,1.071799
Co Clustering,0.890069,0.003896,0.691604,0.002693,28.741858,0.768877
SlopeOne,0.867366,0.002115,0.663711,0.001434,7.199376,16.545004
KNN_Basic,0.89716,0.000567,0.684796,0.000515,15.216415,79.488078
KNN_Means,0.877268,0.001072,0.6761,0.001047,14.87253,73.599148
KNN_ZScore,0.875909,0.001964,0.672123,0.001083,15.035134,78.429752


In [157]:
result_5k_df = pd.DataFrame(results_5k, 
                         index=['SVD', 'NMF', 'Co Clustering', 'SlopeOne', 'KNN_Basic', 'KNN_Means', 'KNN_ZScore'],
                         columns=['RMSE Mean','RMSE Std', 'MAE Mean','MAE Std', 'Fit Time Mean','Test Time Mean'])

print ('\n\t\tThe results from evaluating\033[1m 10k Movies for only 5k users\033[0;0m\n')

result_5k_df


		The results from evaluating[1m 10k Movies for only 5k users[0;0m



Unnamed: 0,RMSE Mean,RMSE Std,MAE Mean,MAE Std,Fit Time Mean,Test Time Mean
SVD,0.872021,0.001435,0.665631,0.001217,12.57979,0.553082
NMF,0.85888,0.001492,0.660701,0.000927,7.154363,0.653158
Co Clustering,0.891998,0.002347,0.692454,0.001892,14.372373,0.558906
SlopeOne,0.868563,0.001614,0.664698,0.000726,3.468872,8.025618
KNN_Basic,0.903186,0.000846,0.691629,0.000424,3.228729,19.406445
KNN_Means,0.87567,0.001383,0.673818,0.000992,3.248076,20.123485
KNN_ZScore,0.874081,0.002329,0.669307,0.001325,3.346462,20.000993
