### Import Library

1. **Imports pandas and numpy**: These libraries are essential for data manipulation and numerical computations.
   - `pandas` (`pd`) is used for data structures like DataFrames.
   - `numpy` (`np`) is used for numerical operations and handling arrays.
   
2. **Imports surprise library components**: These are used for building and evaluating recommendation systems.
   - `Dataset` and `Reader`: Used to load and preprocess the dataset.
   - `SVD`, `NMF`, `KNNBasic`: These are different algorithms for collaborative filtering.
   - `rmse`, `mse`: Functions to compute Root Mean Squared Error and Mean Squared Error, used as accuracy metrics.
   - `GridSearchCV`, `train_test_split`: Tools for hyperparameter tuning and splitting the dataset into training and testing sets.

3. **Imports List from typing**: This is used for type hinting, indicating that a variable should be a list.

In [1]:
import pandas as pd
import numpy as np
from surprise import Dataset, Reader
from surprise import SVD, NMF, KNNBasic
from surprise.accuracy import rmse, mse 
from surprise.model_selection import GridSearchCV, train_test_split
from typing import List

### Load Dataset (Review Dataset)

In [2]:
filePathReviewsF: str = "reviews_filtered.csv"

### Prepare The Data

1. **Imports from surprise.dataset**:
   - `DatasetAutoFolds` and `Dataset` are imported to work with custom datasets for recommendation systems.

2. **Imports Trainset**:
   - `Trainset` is imported for handling training datasets in surprise.

3. **Defines string variables**:
   - `authorId` and `recipeId` are string variables holding column names "AuthorId" and "RecipeId" respectively.

4. **Loads and filters the CSV file**:
   - `df_food_reviews` is a DataFrame created by reading the CSV file specified by `filePathReviewsF`. Only the columns "RecipeId", "AuthorId", and "Rating" are loaded.
   - The DataFrame is then filtered to include only these columns in the specified order.

5. **Creates a Reader object**:
   - `reader` is a `Reader` object from the surprise library, initialized with a rating scale from 1 to 5.

6. **Loads the DataFrame into a Dataset**:
   - `rd` is a `DatasetAutoFolds` object created by loading the filtered DataFrame `df_food_reviews` using the `reader`.

In [3]:
from surprise.dataset import DatasetAutoFolds, Dataset
from surprise import Trainset
authorId: str ="AuthorId"
recipeId: str ="RecipeId"

df_food_reviews = pd.read_csv(filePathReviewsF, usecols=["RecipeId", "AuthorId", "Rating"])
df_food_reviews = df_food_reviews[["AuthorId", "RecipeId", "Rating"]]

reader = Reader(rating_scale=(1,5))
rd : DatasetAutoFolds = Dataset.load_from_df(df_food_reviews,reader)

1. **Defines parameter grids for hyperparameter tuning**:
   - Parameter Tuning is done using Grid Search

2. **SVD parameter grid**:
   - `lr_all`: Learning rates for the SVD algorithm.
   - `reg_all`: Regularization terms for the SVD algorithm.
   - `n_epochs`: Number of epochs (iterations) for training.
   - `n_factors`: Number of factors (latent features) for the SVD algorithm.

3. **NMF parameter grid**:
   - `n_factors`: Number of factors for the NMF algorithm.
   - `reg_pu`, `reg_qi`: Regularization terms for user and item factors.
   - `reg_bu`, `reg_bi`: Regularization terms for user and item biases.
   - `lr_bu`, `lr_bi`: Learning rates for user and item biases.

4. **KNN parameter grid**:
   - `k`: Number of neighbors to consider.
   - `min_k`: Minimum number of neighbors.
   - `sim_options`: Options for similarity computation, including the similarity measure (`name`) and whether the similarity is user-based or item-based (`user_based`).

In [4]:
param_grid_svd = {
    "lr_all": [0.005, 0.01, 0.02],
    "reg_all": [0.02, 0.1, 0.4],
    "n_epochs": [20, 50, 80],
    "n_factors": [20, 50, 100]
}

param_grid_nmf = {
    "n_factors": [50, 100, 200],
    "reg_pu": [0.02, 0.1, 0.5],
    "reg_qi": [0.02, 0.1, 0.5],
    "reg_bu": [0.005, 0.02, 0.1],
    "reg_bi": [0.005, 0.02, 0.1],
    "lr_bu": [0.002, 0.005, 0.01],
    "lr_bi": [0.002, 0.005, 0.01]
}

param_grid_knn = {
    "k": [10, 20, 40],
    "min_k": [2, 3],
    "sim_options": {
        "name": ["msd", "pearson"],
        "user_based": [False, True]
    }
}

### Run Hyperparameter Search for Each Algorithms

1. **Performs grid search For Each Algorithm**:
   - Perform `GridSearchCV` for each algorithms, initialized with the parameter grid, and measures `rmse` (Root Mean Squared Error) and `mae` (Mean Absolute Error) to evaluate model performance.
   - `cv=5` specifies 5-fold cross-validation.

2. **Fits the model**:
   - performs the grid search by fitting the model on the dataset `rd`.

3. **Prints best scores and parameters**:
   - Print the best RMSE score found during the grid search.
   - Print the best hyperparameters for the SVD, KNN, NMF model based on RMSE.
   - Print the best MAE score found during the grid search.
   - Print the best hyperparameters for the SVD, KNN, NMF model based on MAE.

#### SVD

In [11]:
# Perform grid search for SVD
gs_svd = GridSearchCV(SVD, param_grid=param_grid_svd, measures=["rmse", "mae"], cv=5)
gs_svd.fit(rd)

# Get best score and parameters for SVD
print("Best RMSE score for SVD:", gs_svd.best_score["rmse"])
print("Best RMSE parameters for SVD:", gs_svd.best_params["rmse"])
print("Best MAE score for SVD:", gs_svd.best_score["mae"])
print("Best MAE parameters for SVD:", gs_svd.best_params["mae"])

Best RMSE score for SVD: 1.4789196245536333
Best RMSE parameters for SVD: {'lr_all': 0.005, 'reg_all': 0.02, 'n_epochs': 50, 'n_factors': 20}
Best MAE score for SVD: 1.0431158266950842
Best MAE parameters for SVD: {'lr_all': 0.01, 'reg_all': 0.02, 'n_epochs': 80, 'n_factors': 20}


#### KNN

In [12]:

# Perform grid search for KNN
gs_knn = GridSearchCV(KNNBasic, param_grid=param_grid_knn, measures=["rmse", "mae"], cv=5)
gs_knn.fit(rd)
print("Best RMSE score for KNN:", gs_knn.best_score["rmse"])
print("Best RMSE parameters for KNN:", gs_knn.best_params["rmse"])
print("Best MAE score for KNN:", gs_knn.best_score["mae"])
print("Best MAE parameters for KNN:", gs_knn.best_params["mae"])

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix

#### NMF

In [5]:
# Perform grid search for NMF
gs_nmf = GridSearchCV(NMF, param_grid=param_grid_nmf, measures=["rmse", "mae"], cv=5)
gs_nmf.fit(rd)

# Get best score and parameters for NMF
print("Best RMSE score for NMF:", gs_nmf.best_score["rmse"])
print("Best RMSE parameters for NMF:", gs_nmf.best_params["rmse"])
print("Best MAE score for NMF:", gs_nmf.best_score["mae"])
print("Best MAE parameters for NMF:", gs_nmf.best_params["mae"])

Best RMSE score for NMF: 1.6088902722586753
Best RMSE parameters for NMF: {'n_factors': 200, 'reg_pu': 0.1, 'reg_qi': 0.1, 'reg_bu': 0.1, 'reg_bi': 0.1, 'lr_bu': 0.002, 'lr_bi': 0.005}
Best MAE score for NMF: 1.087558210420603
Best MAE parameters for NMF: {'n_factors': 50, 'reg_pu': 0.02, 'reg_qi': 0.02, 'reg_bu': 0.005, 'reg_bi': 0.005, 'lr_bu': 0.002, 'lr_bi': 0.005}
