## FEVR Framework

![Alt text](image.png)

# 1- Evaluation Objectives

**1- Prediction Accuracy:**

* Objective: Measure how accurately the recommender system predicts user ratings.
* Metrics:

RMSE (Root Mean Squared Error): Measures the square root of the average of squared differences between predicted and actual ratings.

MAE (Mean Absolute Error): Measures the average of absolute differences between predicted and actual ratings.

**2- User Satisfaction:**

* Objective: Assess user satisfaction with the recommendations.
* Metrics:

User feedback: Gathered through surveys or user studies to determine user satisfaction levels. (Future Improvement)

# 2- Evaluation Principles

**1-Hypotheses:**


* Hypothesis 1: "The SVD-based model provides higher prediction accuracy than the Item-based Pearson Correlation method."
* Hypothesis 2: "Recommendations generated by the Item-based Pearson Correlation method will have higher diversity than those generated by the SVD-based model."

**2- Control Variables:**

* we woll ensure consistency in our experimental setup to isolate the effect of the recommendation algorithms.
* we will Use the same dataset splits, preprocessing steps, and evaluation metrics across different models.
* Maintain the same random seed (random_state=10) for reproducibility.

**3- Generalization Power:**

* we will evaluate the models on different subsets of the data to ensure that the results are generalizable.
* Split the data into multiple training and testing sets using cross-validation.
* Test the models on different user groups or time periods.

**4- Reliability:**

* we will ensure that the evaluation is consistent and reproducible.
* Run multiple experiments and average the results.
* Document the process 

# 3 Experiment Types


**1- Offline Evaluation:**

* we will use historical data to evaluate the predictive accuracy of our models. This involves splitting the data into training and testing sets, training the models, and measuring their performance on the test set.

**2- User Study (Future Improvement):**

* Plan to conduct a user study in the future to gather qualitative feedback on the user experience and satisfaction with the recommendations.

**3- Online Evaluation (Future Improvement):**

**Plan to deploy the model in a real-world setting and measure user interactions and satisfaction in a live environment.**

## 4 Evaluation Aspects

**1-Types of Data:**

* We should Use both explicit data (user ratings) and implicit data 
* In our case, we are using explicit ratings from the Netflix dataset.

**2-Data Collection:**

We will use the existing Netflix dataset for offline evaluation.



**3-Data Quality and Biases:**

Address potential biases in the data by using techniques like cross-validation and ensuring a representative sample.



**4- Evaluation Metrics:**

* RMSE 

* MAE

# Implementation

In [1]:
import pandas as pd
from surprise import Reader, Dataset, SVD, accuracy
from surprise.model_selection import train_test_split

# Load Rating Data
ratings = pd.read_csv('C:\\Users\\Musae\\Documents\\GitHub-REPOs\\FEVR_Framework\\data\\Netflix_Dataset_Rating.csv')
movies = pd.read_csv('C:\\Users\\Musae\\Documents\\GitHub-REPOs\\FEVR_Framework\\data\\Netflix_Dataset_Movie.csv')

# Preprocess Data
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings[['User_ID', 'Movie_ID', 'Rating']], reader)


# Split Data into Training and Testing Sets


In [14]:
trainset, testset = train_test_split(data, test_size=0.2, random_state=10)


# Hypotheses

In [15]:
''' 
Hypothesis 1: 
The SVD-based model will provide higher prediction accuracy than the Item-based Pearson Correlation method.

Hypothesis 2: 
Recommendations generated by the Item-based Pearson Correlation method will have higher diversity than those generated by the SVD-based model.
'''

' \nHypothesis 1: \nThe SVD-based model will provide higher prediction accuracy than the Item-based Pearson Correlation method.\n\nHypothesis 2: \nRecommendations generated by the Item-based Pearson Correlation method will have higher diversity than those generated by the SVD-based model.\n'

In [6]:
from surprise.model_selection import KFold
from surprise import KNNBasic, SVD

# Specify the number of epochs for the SVD model
n_epochs_svd = 5  
# Use KFold cross-validation to evaluate the models on different subsets of the data
kf = KFold(n_splits=5, random_state=10, shuffle=True)

# Initialize lists to store evaluation results for SVD model
rmse_results_svd = []
mae_results_svd = []

# Initialize lists to store evaluation results for Pearson Correlation model
rmse_results_pearson = []
mae_results_pearson = []

# SVD Model with changed number of epochs
for trainset, testset in kf.split(data):
    svd_model = SVD(n_epochs=n_epochs_svd, verbose=True)
    svd_model.fit(trainset)
    predictions = svd_model.test(testset)
    rmse = accuracy.rmse(predictions, verbose=True)
    mae = accuracy.mae(predictions, verbose=True)
    rmse_results_svd.append(rmse)
    mae_results_svd.append(mae)

# Item-based Pearson Correlation Model
sim_options = {
    'name': 'pearson',
    'user_based': False  # Compute similarities between items
}

for trainset, testset in kf.split(data):
    pearson_model = KNNBasic(sim_options=sim_options, verbose=True)
    pearson_model.fit(trainset)
    predictions = pearson_model.test(testset)
    rmse = accuracy.rmse(predictions, verbose=True)
    mae = accuracy.mae(predictions, verbose=True)
    rmse_results_pearson.append(rmse)
    mae_results_pearson.append(mae)

# Calculate average RMSE and MAE for SVD
average_rmse_svd = sum(rmse_results_svd) / len(rmse_results_svd)
average_mae_svd = sum(mae_results_svd) / len(mae_results_svd)

# Calculate average RMSE and MAE for Pearson Correlation
average_rmse_pearson = sum(rmse_results_pearson) / len(rmse_results_pearson)
average_mae_pearson = sum(mae_results_pearson) / len(mae_results_pearson)

print(f'Average RMSE for SVD: {average_rmse_svd}')
print(f'Average MAE for SVD: {average_mae_svd}')
print(f'Average RMSE for Pearson Correlation: {average_rmse_pearson}')
print(f'Average MAE for Pearson Correlation: {average_mae_pearson}')


Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
RMSE: 0.8903
MAE:  0.6977
Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
RMSE: 0.8907
MAE:  0.6973
Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
RMSE: 0.8899
MAE:  0.6971
Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
RMSE: 0.8895
MAE:  0.6965
Processing epoch 0
Processing epoch 1
Processing epoch 2
Processing epoch 3
Processing epoch 4
RMSE: 0.8894
MAE:  0.6966
Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 0.9170
MAE:  0.7261
Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 0.9173
MAE:  0.7260
Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 0.9171
MAE:  0.7262
Computing the pearson similarity matrix...
Done computing similarity matrix.
RMSE: 0.9

## Conclusion

In this notebook, we implemented and evaluated two different recommendation algorithms—SVD (Singular Value Decomposition) and Item-based Pearson Correlation—using the Netflix movie ratings dataset. The evaluation process followed the FEVR framework to ensure a comprehensive assessment.

**SVD Model:**

Each epoch shows the progress of training, and after 5 epochs, the RMSE and MAE are reported for the test set.
The average RMSE for SVD across 5 splits is approximately 0.89.
The average MAE for SVD across 5 splits is approximately 0.70.

**Item-based Pearson Correlation Model:**

The similarity matrix computation is indicated, followed by the RMSE and MAE for the test set.
The average RMSE for Pearson Correlation across 5 splits is approximately 0.92.
The average MAE for Pearson Correlation across 5 splits is approximately 0.73.
Conclusion:

The SVD model performs slightly better than the Item-based Pearson Correlation model in terms of both RMSE and MAE, as it has lower average values for these metrics.
Lower RMSE and MAE values indicate better predictive accuracy, so based on these results, the SVD model is the preferable choice for this dataset.