# Evaluation <a id='top'></a>

In this last notebook, we start evaluating our newly trained recommendation systems for new users. 

The structure of this notebook is as follows:

[0. Import Libraries](#libraries) <br>
[1. Create Necessary Functions](#functions) <br>
&emsp; [1.1. Evaluation Function](#evaluate) <br>
&emsp; [1.2. Plot Evaluation Function](#plot_evaluate) <br>
&emsp; [1.3. Evaluate and Visualize Function](#evaluate_visualize) <br>
[2. Evaluate Recommendation Systems ](#apply_functions) <br>

# 0. Import libraries <a id='libraries'></a>
[to the top](#top)

Import the necessary libraries.

In [None]:
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, precision_score, recall_score, f1_score
from model import UserBasedCF, ItemBasedCF, MatrixFactorizationCF
import matplotlib.pyplot as plt
import time
from datetime import timedelta
from helper_functions import load_config, load_data, format_timedelta

# 1. Create Necessary Functions <a id='functions'></a>
[to the top](#top)

In this section of the notebook, we start by defining the functions we are going to use to evaluate the models.

## 1.1. Evaluation Function <a id='evaluate'></a>
[to the top](#top)

This function evaluates the performance of a recommendation model by comparing its predictions against the actual ratings in the test dataset. It calculates several metrics including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Precision, Recall, and F1-score. Precision, Recall, and F1-score are computed using a binary classification approach, assuming a threshold of 3 for relevant ratings. These metrics provide insights into the accuracy and effectiveness of the recommendation model in predicting user preferences and can help guide further model refinement and optimization efforts.

In [None]:
# Function to evaluate the model
def evaluate_model(model, test_data):
    y_true = []
    y_pred = []

    for row in test_data.itertuples():
        user_id = row.user_id
        game_id = row.game_id
        actual_rating = row.rating

        prediction = model.predict(user_id, game_id)
        y_true.append(actual_rating)
        y_pred.append(prediction)

    y_true = np.array(y_true)
    y_pred = np.array(y_pred)

    # Mean Absolute Error
    mae = mean_absolute_error(y_true, y_pred)

    # Root Mean Squared Error
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))

    # Precision, Recall, F1-score (assuming threshold for relevant rating is > 3)
    y_true_binary = (y_true > 3).astype(int)
    y_pred_binary = (y_pred > 3).astype(int)

    precision = precision_score(y_true_binary, y_pred_binary, zero_division=0)
    recall = recall_score(y_true_binary, y_pred_binary, zero_division=0)
    f1 = f1_score(y_true_binary, y_pred_binary, zero_division=0)

    return {'MAE': mae, 'RMSE': rmse, 'Precision': precision, 'Recall': recall, 'F1-score': f1}

## 1.2. Plot Evaluation Function <a id='plot_evaluate'></a>
[to the top](#top)

This function plots the evaluation results of a recommendation model, visualizing various performance metrics. It takes a dictionary of results containing metric names as keys and corresponding scores as values, along with the name of the model being evaluated. The function extracts metric labels and scores from the dictionary, plots them as a bar chart, and labels the axes appropriately. The resulting plot provides a concise overview of the model's performance across different evaluation metrics, aiding in the comparison and interpretation of its effectiveness.

In [None]:
def plot_evaluation_results(results, model_name):
    labels, values = zip(*results.items())
    x = np.arange(len(labels))
    plt.bar(x, values, color='b')
    plt.xticks(x, labels)
    plt.ylabel('Score')
    plt.title(f'{model_name} Evaluation')
    plt.show()

## 1.3. Evaluate and Visualize Function <a id='evaluate_visualize'></a>
[to the top](#top)

This function simplifies the evaluation and visualization process for the recommendation sytems. It loads test data, pre-trained models, and evaluates each model's performance using the evaluate_model function. The evaluation results are then printed, providing insights into the models' performance. Additionally, the function plots the evaluation results for each model, presenting a visual comparison of their performance across various metrics.

In [None]:
def get_evaluate_and_show_results(reviews_dir, game_details_path, selection_file, selection_name=""):
    start_time = time.time()
    
    print(f"Starting evaluation for selection: {selection_name}")
    _, test_data = load_data(reviews_dir, game_details_path, selection_file)
    data_loading_time = time.time()
    print(f"Data loaded in {format_timedelta(timedelta(seconds=data_loading_time - start_time))}")
    print("-" * 40)
    
    # Evaluate User-Based CF model
    print(f"Loading User-Based CF model for selection: {selection_name}")
    user_based_model = UserBasedCF.load(f'data/model/{selection_name}_user_based_cf_model.pkl')
    user_based_loading_time = time.time()
    print(f"User-Based CF model loaded in {format_timedelta(timedelta(seconds=user_based_loading_time - data_loading_time))}")
    
    print(f"Evaluating User-Based CF model for selection: {selection_name}")
    user_based_results = evaluate_model(user_based_model, test_data)
    user_based_evaluation_time = time.time()
    print(f"User-Based CF model evaluated in {format_timedelta(timedelta(seconds=user_based_evaluation_time - user_based_loading_time))}")
    print("-" * 40)
    
    # Free memory
    del user_based_model
    
    # Evaluate Item-Based CF model
    print(f"Loading Item-Based CF model for selection: {selection_name}")
    item_based_model = ItemBasedCF.load(f'data/model/{selection_name}_item_based_cf_model.pkl')
    item_based_loading_time = time.time()
    print(f"Item-Based CF model loaded in {format_timedelta(timedelta(seconds=item_based_loading_time - user_based_evaluation_time))}")
    
    print(f"Evaluating Item-Based CF model for selection: {selection_name}")
    item_based_results = evaluate_model(item_based_model, test_data)
    item_based_evaluation_time = time.time()
    print(f"Item-Based CF model evaluated in {format_timedelta(timedelta(seconds=item_based_evaluation_time - item_based_loading_time))}")
    print("-" * 40)
    
    # Free memory
    del item_based_model
    
    # Evaluate Matrix Factorization CF model
    print(f"Loading Matrix Factorization CF model for selection: {selection_name}")
    matrix_factorization_model = MatrixFactorizationCF.load(f'data/model/{selection_name}_matrix_factorization_cf_model.pkl')
    matrix_factorization_loading_time = time.time()
    print(f"Matrix Factorization CF model loaded in {format_timedelta(timedelta(seconds=matrix_factorization_loading_time - item_based_evaluation_time))}")
    
    print(f"Evaluating Matrix Factorization CF model for selection: {selection_name}")
    matrix_factorization_results = evaluate_model(matrix_factorization_model, test_data)
    matrix_factorization_evaluation_time = time.time()
    print(f"Matrix Factorization CF model evaluated in {format_timedelta(timedelta(seconds=matrix_factorization_evaluation_time - matrix_factorization_loading_time))}")
    print("-" * 40)
    
    # Free memory
    del matrix_factorization_model
    
    print(f"Results for selection: {selection_name}")
    print("User-Based CF Evaluation Results:", user_based_results)
    print("Item-Based CF Evaluation Results:", item_based_results)
    print("Matrix Factorization CF Evaluation Results:", matrix_factorization_results)
    print("=" * 40)
    
    # Plot results
    plot_evaluation_results(user_based_results, f'User-Based CF {selection_name}')
    plot_evaluation_results(item_based_results, f'Item-Based CF {selection_name}')
    plot_evaluation_results(matrix_factorization_results, f'Matrix Factorization CF {selection_name}')
    
    plotting_time = time.time()
    print(f"Results plotted in {format_timedelta(timedelta(seconds=plotting_time - matrix_factorization_evaluation_time))}")
    
    total_time = plotting_time - start_time
    print(f"Total evaluation time for selection {selection_name}: {format_timedelta(timedelta(seconds=total_time))}")
    print("=" * 40)


# 2. Evaluate Recommendation Systems <a id='apply_functions'></a>
[to the top](#top)

Finally, we apply our new functions to evaluate our recommendation systems with different metrics and plot the results.


In [None]:
config = load_config()
for variation, params in config.items():
    get_evaluate_and_show_results(
        params['reviews_dir'],
        params['game_details_path'],
        params['selection_file'],
        params['selection_name']
    )