### Building a Recommender system with Surprise

This try-it focuses on exploring additional algorithms with the `Suprise` library to generate recommendations.  Your goal is to identify the optimal algorithm by minimizing the mean squared error using cross validation. You are also going to select a dataset to use from [grouplens](https://grouplens.org/datasets/movielens/) example datasets.  

To begin, head over to grouplens and examine the different datasets available.  Choose one so that it is easy to create the data as expected in `Surprise` with user, item, and rating information.  Then, compare the performance of at least the `KNNBasic`, `SVD`, `NMF`, `SlopeOne`, and `CoClustering` algorithms to build your recommendations.  For more information on the algorithms see the documentation for the algorithm package [here](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

Share the results of your investigation and include the results of your cross validation and a basic description of your dataset with your peers.



In [1]:
import numpy as np
import pandas as pd

from plotly.figure_factory import create_table
from surprise import Dataset, accuracy, SVD, NMF, KNNBasic, SlopeOne, CoClustering
from surprise.model_selection import cross_validate, train_test_split, KFold
from surprise.model_selection import GridSearchCV

In [2]:
data = Dataset.load_builtin("ml-100k")

In [3]:
# grid parameters
param_grid ={ 'SVD' : {"n_factors": [50 , 100], 
                      "n_epochs": [20, 30], 
                      "lr_all": [0.002, 0.005], 
                      "reg_all": [0.4, 0.6]},
              'NMF' : {"n_factors": [15, 30],
                       "n_epochs": [50, 100],
                       "lr_bu": [0.002, 0.005, 0.01],
                       "lr_bi": [0.002, 0.005, 0.01]},
              'KNNBasic' : {"k": [30,40,50], 
                            "min_k": [1,2,3], 
                            },                            
              'SlopeOne' : {},
              'CoClustering' : {"n_cltr_u": [3, 5, 7],
                            "n_epochs": [3, 5, 7]}
              }   
models = [  SVD, NMF, KNNBasic, SlopeOne, CoClustering] # 
report_data = {'Model':[], 'best_parameters':[], 'RMSE':[], 'MAE':[],'fit_time':[]}

In [None]:
# Models building
for model in models:
    algo_m = model()
    algo_name = algo_m.__class__.__name__
    param = param_grid[algo_name]
    grid = GridSearchCV(model, param, measures=["rmse", "mae"], cv=3)
    grid.fit(data)
    mean_test_rmse =np.mean(grid.cv_results['mean_test_rmse'])
    mean_test_mae = np.mean(grid.cv_results['mean_test_mae'])
    mean_fit_time = np.mean(grid.cv_results['mean_fit_time'])
    report_data['Model'].append(algo_name)
    report_data['best_parameters'].append({'rmse':grid.best_params["rmse"], "mae":grid.best_params["mae"]})
    report_data['RMSE'].append(mean_test_rmse)
    report_data['MAE'].append(mean_test_mae)
    report_data['fit_time'].append(mean_fit_time)

In [None]:
# creating score dataframe
df_scores = pd.DataFrame.from_dict(report_data)
df_scores.set_index('Model', inplace=True)
df_scores.head()

In [None]:
#results table
create_table(df_scores.sort_values(by=['RMSE'], ascending=False),index_title='Model',index=True, )

In [None]:
trainset, testset = train_test_split(data, test_size=0.25)

In [None]:
# We'll use the famous SVD algorithm.
algo = SVD()

In [None]:
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

In [None]:
# Then compute RMSE
accuracy.rmse(predictions)

In [None]:
# Run 5-fold cross-validation and print results
cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True)

In [None]:
param_grid = {"n_epochs": [5, 10], "lr_all": [0.002, 0.005], "reg_all": [0.4, 0.6]}
gs = GridSearchCV(SVD, param_grid, measures=["rmse", "mae"], cv=3)

gs.fit(data)

# best RMSE score
print(gs.best_score["rmse"])

# combination of parameters that gave the best RMSE score
print(gs.best_params["rmse"])

In [None]:
# define a cross-validation iterator
kf = KFold(n_splits=3)

algo = SVD()

for trainset, testset in kf.split(data):

    # train and test algorithm.
    algo.fit(trainset)
    predictions = algo.test(testset)

    # Compute and print Root Mean Squared Error
    accuracy.rmse(predictions, verbose=True)