# Assignment 1
## JuPyter Notebook - Verschuur L. 1811053, Kolenbrander M. 1653415
This JuPyter Notebook contains the three different implementations for estimating movie ratings. The implementations are as described in **assignment 1** of the course **Advances in Datamining 2021/2022**.

In this assignment we are tasked with implementing the three different approaches and then testing their accuracy. To measure the accuracy, we employ the *Root Mean Squared Error (RMSE)* and *Mean Absolute Error (MAE)* metrics.

In order to increase the reliability of our results, we use the *5 fold cross validation* technique. For every fold we run, we capture the RMSE and MAE values and add them to an average. The results we present are therefor the average RMSE and MAE values over the 5 folds.

#### The methods
The three different methods we used are as follows:
- `The Naive approaches`
    - `Predicted rating is the global average`
    - `Predicted rating is the user average`
    - `Predicted rating is the item average`
    - `Predicted rating is a linear regression result`
    - `Predicted rating is a linear regression result with a gamma value`
- `UV Matrix Decomposition`
- `Matrix Factorisation`

In this notebook, we have separated the relevant methods/functions according to their approach, so: all UV matrix decomposition methods are together, all matrix factorisation methods are together and all naive approach methods are together.

Most python cells, and most method sections, are introduced with a markdown section (such as this one) to quickly introduce what the python cell(s) are doing and on what methodology they are based.

#### Experimentation
In order to run the experiments, all python cells up to the last two (bottom two under the *experimentation* section) need to first be run. This is done in order to initialize all the utility methods and optimization methods. ***They need to be run from top to bottom.***

The *final two cells* under the *experimentation* section are used for data ingesting from the MovieLens 1M dataset and then automatically running the 5 fold cross validation experimentation over *all* methods.

Please note that running the full 5 fold cross validation run can be quite ***slow*** (especially UV matrix decomposition)! The individual approaches can be "turned on" and "turned off" by setting the following flags to either `True` or `False` respectively:

`
run_naive = True
run_UV_decomp = True
run_matrix_fac = True
`

#### Running the experiments
In order to run the experiments and replicate our results, simply load in this Jupyter Notebook file (no additional libraries required) and run all cells. Commonly Notebook editors have a button *"Run All"* under a tab *"cells"*, this will load all the required cells and automatically start the experiment.

## Utility functions

### File loading
The following two functions are used to read a file and convert the CSV contents *(:: delimited)* to a list of python dictionaries.

```format_add``` is an utility function used to construct a dictionary based on the required format.  
```build_file``` reads an input file and uses `format_add` to create the dictionaries.

In [1]:
def format_add(format_list, input_data, target_data):
    """
    Constructs a dictionary using input_data and 
    the format_list as the dictionary format and 
    appends the dictionary to the target data.
    """
    
    appender_dict = {}
    
    for count, item in enumerate(format_list):
        try:
            appender_dict.update({item: input_data[count]})
        except IndexError:
            appender_dict.update({item: None})
            
    target_data.append(appender_dict)

def build_file(format_list, file_path, delimiter):
    """
    Reads a file and converts the delimiter separated data 
    in the file, based on the format provided by format_list, 
    into a dictionary list.
    """
    
    new_list = []
    
    with open(file_path) as file:
        print(f"Unpacking: '{file_path}'")
    
        for line in file:
            stripped_line = line.strip().split(delimiter)
            format_add(format_list, stripped_line, new_list)
        
    print("Unpacked succesfully")
    
    return new_list

### Data to numpy
The following functions are used to convert the input data into both a 2D numpy array and to calculate the means of the columns and rows of said array.

In [2]:
import numpy as np

def construct_M_array(row_size, column_size, data, make_nan=True):
    """
    Convert a list of dictionary values in the form of "data" into
    a 2D numpy array with row and column size being: "row_size", "column_size".
    """
    M_array = np.zeros((row_size, column_size))

    for item in data:
        itv = list(item.values())
        M_array[int(itv[0]) - 1, int(itv[1]) - 1] = float(itv[2])

    if make_nan:
        M_array = np.where(M_array == 0., np.NaN, M_array)
        
    return M_array

def get_row_col_mean(data):
    """
    Return two arrays of the row and column means of a given 2D numpy array.
    """
    return np.nanmean(data, axis=1), np.nanmean(data, axis=0)

### Performance metrics
The following two functions are used to *(efficiëntly)* calculate both:
 - The **Root Mean Square Error** (`RSME_two_arrays`)
 - The **Mean Absolute Error** (`MAE_two_arrays`)  
 
Between two (numpy) arrays.

In [3]:
import numpy as np

def RSME_two_arrays(arr_1, arr_2):
    """
    Calculate the Root Mean Square Error between two equally sized numpy arrays.
    arr_1 is the prediction array.
    arr_2 is the test array.
    """
    return np.sqrt(np.nanmean(((arr_1 - arr_2) ** 2)))

def MAE_two_arrays(arr_1, arr_2):
    """
    Calculate the Mean absolute Error between two equally sized numpy arrays.
    arr_1 is the prediction array.
    arr_2 is the test array.
    """
    return np.nanmean(np.absolute(arr_1 - arr_2))

In [4]:
def test_matrixes(ratingsMatrix, testing_array, verbose=True):
    """
    Calculate the root mean squared error and mean absolute error between two arrays.
    """
    
    RSME = RSME_two_arrays(ratingsMatrix, testing_array)
    MAE = MAE_two_arrays(ratingsMatrix, testing_array)
    
    if verbose:
        print(RSME_two_arrays(ratingsMatrix, testing_array))
        print(MAE_two_arrays(ratingsMatrix, testing_array))
    
    return RSME, MAE

## Naive approaches
Run linear regression on the user and item datasets to find the coefficients of both and the intercepts of both (alpha, beta and gamma respectively).

In [5]:
import numpy as np
from sklearn.linear_model import LinearRegression

def perform_linear_regression(user_ids_size, item_ids_size, R_user, R_item):
    """
    Perform linear regression on average user rating and average item rating values.
    
    Returns the alpha, beta and gamma values, which are, 
    the user coefficient, the item coefficient and the average intercept respectively.
    """
    
    LR_1 = LinearRegression()
    LR_2 = LinearRegression()

    # Regress over average user rating values
    user_ids_array = np.arange(1, user_ids_size, 1)

    LR_1.fit(user_ids_array.reshape(-1,1), R_user)
    alpha = LR_1.coef_

    # Regress over average item rating values
    item_ids_array = np.arange(1, item_ids_size, 1)

    LR_2.fit(item_ids_array.reshape(-1,1), R_item)
    beta = LR_2.coef_

    # Set the intercept gamma value as the average of both linear regressions intercepts
    gamma = (LR_1.intercept_ + LR_2.intercept_) / 2
    
    return alpha, beta, gamma

Using a calculated alpha, beta and gamma value, check the findings against the testset and calculate the RMSE and MAE of each of the 5 naive approaches.

In [6]:
def test_naive_approaches(alpha, beta, gamma, Rating_test, verbose=True):
    """
    Calculate the root mean squared error and mean absolute error of the
    5 different naive approaches which are:
        - global average rating prediction
        - item average rating prediction
        - user average rating prediction
        - user item linear regression without gamma rating prediction
        - user item linear regression with gamma rating prediction
    """
    
    #Root mean squared error
    RMSE_global = 0
    RMSE_item = 0
    RMSE_user = 0
    RMSE_user_item = 0
    RMSE_user_item_gamma = 0

    #Mean absolute error
    MAE_global = 0
    MAE_item = 0
    MAE_user = 0
    MAE_user_item = 0
    MAE_user_item_gamma = 0

    # For every rating value in the original dataset, calculate the error values
    for rating in Rating_test:
        _global = int(rating['rating']) - int(R_global)
        RMSE_global += (_global)**2
        MAE_global += abs(_global) 
        
        # Try since we are not sure the item is trained in the trainingsset
        try:
            _item = int(rating['rating']) - R_item[int(rating['mid']) - 1]  
        except:
            _item = int(int(rating['rating']) - int(R_global))

        RMSE_item += (_item)**2
        MAE_item += abs(_item)
        
        # Try since we are not sure the user is trained in the trainingsset
        try:
            _user = int(rating['rating']) - R_user[int(rating['uid'])  - 1]
        except:
            _user = int(rating['rating']) - int(R_global)

        RMSE_user += (_user)**2
        MAE_user += abs(_user)
        
        # Try since we are not sure the item and user are trained in the trainingsset
        try:
            _user_item = int(rating['rating']) - int((alpha*R_user[int(rating['uid'])  - 1] + beta*R_item[int(rating['mid'])  - 1]))
        except:
            _user_item = int(rating['rating']) - int(R_global)

        RMSE_user_item += (_user_item)**2
        MAE_user_item += abs(_user_item)
        
        # Try since we are not sure the item and user are trained in the trainingsset
        try:
            _user_item_gamma = int(rating['rating']) - int((alpha*R_user[int(rating['uid'])  - 1] + beta*R_item[int(rating['mid'])  - 1]) + gamma)
        except:
            _user_item_gamma += int(rating['rating']) - int(R_global)

        RMSE_user_item_gamma += (_user_item_gamma)**2
        MAE_user_item_gamma += abs(_user_item_gamma)


    rating_test_len = len(Rating_test)
    
    # Make sure we generate the *MEAN* of the computed values
    RMSE_global = np.sqrt(RMSE_global/rating_test_len)
    RMSE_item = np.sqrt(RMSE_item/rating_test_len)
    RMSE_user = np.sqrt(RMSE_user/rating_test_len)
    RMSE_user_item = np.sqrt(RMSE_user_item/rating_test_len)
    RMSE_user_item_gamma = np.sqrt(RMSE_user_item_gamma/rating_test_len)

    MAE_global = MAE_global/rating_test_len
    MAE_item = MAE_item/rating_test_len
    MAE_user = MAE_user/rating_test_len
    MAE_user_item = MAE_user_item/rating_test_len
    MAE_user_item_gamma = MAE_user_item_gamma/rating_test_len
    
    if verbose:
        print(f"RMSE global average: {RMSE_global}")
        print(f"RMSE item average: {RMSE_item}")
        print(f"RMSE user average: {RMSE_user}")
        print(f"RMSE Linear Regression User Item (A,B): {RMSE_user_item}")
        print(f"RMSE Linear Regression User Item (A,B,Y): {RMSE_user_item_gamma}\n")

        print(f"MAE global average: {MAE_global}")
        print(f"MAE item average: {MAE_item}")
        print(f"MAE user average: {MAE_user}")
        print(f"MAE Linear Regression User Item (A,B): {MAE_user_item}")
        print(f"MAE Linear Regression User Item (A,B,Y): {MAE_user_item_gamma}")
    
    return RMSE_global, RMSE_item, RMSE_user, RMSE_user_item, RMSE_user_item_gamma, MAE_global, MAE_item, MAE_user, MAE_user_item, MAE_user_item_gamma

## Matrix Factorization section
Implementation based on the algorithms described in the [gravity-Tikk.pdf paper](https://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/gravity-Tikk.pdf) and [Netflix Update: Try This at Home](https://sifter.org/~simon/journal/20061211.html) *(note that the primary algorithm is from the gravity-Tikk paper)*

In [7]:
#Predict rating with the help of the to train values for movie and value
def predictRating(movie, user):
    return (movieValue[movie] + userValue[user])/2

Training method using the error rate, a learning rate and a K. The learning rate and K are random small numbers proposed by the [gravity-Tikk.pdf paper](https://www.cs.uic.edu/~liub/KDD-cup-2007/proceedings/gravity-Tikk.pdf).  The training method alters the userValue and movieValue elements to predict new and closer ratings.

In [8]:
#Train data using the error rate and the standard predictionRating method
def train(movieValue, userValue, user, movie, rating, lrate=0.001, K=0.02):
    err = rating - predictRating(movie, user)
    
    uv = userValue[user];    
    userValue[user] += lrate * (2 * err * movieValue[movie] - K * userValue[user])
    movieValue[movie] += lrate * (2 * err * uv - K * movieValue[movie])
    return err

Functions used for training and predictions

In [9]:
def run_matrix_factorization(user_ids_size, item_ids_size, user_data, movie_data, Rating_train):
    #Create new 2D array with 0.01 (random small value) as base value
    ratingsMatrix = np.full((user_ids_size, item_ids_size), 0.01)

    #Initialize user and movie values setting their base value as the global average (random approximation value)
    userValue = np.full((user_ids_size, ), R_global)
    movieValue = np.full((item_ids_size, ), R_global)

    #Traing on training set (keep on training the matrix untill it converges in a Gradient Descent)
    converge = False
    while converge == False:
        avg_err = 0
        for rating in Rating_train: 
            avg_err += train(movieValue, userValue, int(rating['uid']) - 1, int(rating['mid']) - 1, int(rating['rating']))
        avg_err = avg_err / len(Rating_train)
        print(avg_err)
        #0.001 is a random small value set to determine the matrix to keep changing a lot and theirfore to converge or not to converge 
        if abs(avg_err) < 0.001:
            converge = True

    #Fill remaining values based on the trainingsset
    for user in user_data:
        for movie in movie_data:
            ratingsMatrix[int(user['uid']) - 1, int(movie['mid']) - 1] = (userValue[int(user['uid']) - 1] + movieValue[int(movie['mid']) - 1])/2

    return ratingsMatrix

## UV Matrix Decomposition
Implementation of the algorithm as described in [Section 9.4 of the MMDS textbook](http://infolab.stanford.edu/~ullman/mmds/ch9.pdf)

In [10]:
import numpy as np
import warnings

def replace_nans(arr):
    """
    Replace all NaNs occuring in the input array "arr"
    
    NaNs are replaced by the column average of a NaN cell. 
    If the column is empty (only NaN values), set the column average to the global average.
    
    Returns a modified copy of "arr" with all nans replaced
    """
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=RuntimeWarning)
        
        copy_arr = np.copy(arr)

        col_mean = np.nanmean(copy_arr, axis=0)
        mean = np.nanmean(copy_arr)

        col_mean = np.nan_to_num(col_mean, nan=mean)

        inds = np.where(np.isnan(copy_arr))

        copy_arr[inds] = np.take(col_mean, inds[1])

        return copy_arr

Optimizing U and V cells from the $UV=M$ matrix.

The **U optimization** is based on:

$u_{rs}=\frac{\Sigma_{j}v_{sj}(m_{rj}-\Sigma_{k\neq s}u_{rk}v_{kj})}{\Sigma_{j}v_{sj}^2}$

Where $r,s$ are *row* and *column* values.

Where $\Sigma_{j}$ is shorthand for the sum of $j$ such that $m_{rj}$ is nonblank.

The **V optimization** is based on:

$v_{rs}=\frac{\Sigma_{i}u_{ir}(m_{is}-\Sigma_{k\neq r}u_{ik}v_{ks})}{\Sigma_{i}u_{ir}^2}$

Where $r,s$ are *row* and *column* values.

Where $\Sigma_{i}$ is shorthand for the sum of $i$ such that $m_{is}$ is nonblank.

In [11]:
def optimize_U_cell(r, s, dim, U_matrix, V_matrix, M_matrix):
    """
    Optimize a single cell with row "r" and column "s" of the U_matrix.
    
    Optimization based on Section 9.4.4 of the MMDS textbook
    
    Returns the optimized value of cell U[r,s].
    """
    
    # Upper part of the optimization sum
    x_above = 0
    
    for j in range(M_matrix.shape[1]):
        M_val = M_matrix[r,j]
        
        if np.isnan(M_val):
            continue
            
        
        U_el_V_el_sum = 0
        
        for k in range(dim):
            if k == s:
                continue
            U_el_V_el_sum += U_matrix[r,k] * V_matrix[k,j]
            
        x_above += V_matrix[s,j] * (M_val - U_el_V_el_sum)
        
    # Lower part of the optimization sum
    x_under = 0
    
    for j in range(M_matrix.shape[1]):
        if np.isnan(M_matrix[r,j]):
            continue
            
        x_under += V_matrix[s, j] ** 2
    
    return x_above / x_under if x_under != 0. else x_above

def optimize_V_cell(r, s, dim, U_matrix, V_matrix, M_matrix):
    """
    Optimize a single cell with row "r" and column "s" of the V_matrix.
    
    Optimization based on Section 9.4.4 of the MMDS textbook
    
    Returns the optimized value of cell V[r,s].
    """
    
    # Upper part of the optimization sum
    y_above = 0
    
    for i in range(M_matrix.shape[0]):
        M_val = M_matrix[i,s]
        
        if np.isnan(M_val):
            continue
            
        
        U_el_V_el_sum = 0
        
        for k in range(dim):
            if k == r:
                continue
            U_el_V_el_sum += U_matrix[i,k] * V_matrix[k,s]
            
        y_above += U_matrix[i,r] * (M_val - U_el_V_el_sum)
        
    # Lower part of the optimization sum
    y_under = 0
    
    for i in range(M_matrix.shape[1]):
        if np.isnan(M_matrix[i,s]):
            continue
            
        y_under += U_matrix[i, r] ** 2
    
    return y_above / y_under if y_under != 0. else y_above

Running the optimization in a round robin fashion. This method alternates between matrix U_d and V_d when optimizing a cell. The method of traversal is given by a following example:

$
U=\begin{bmatrix}
0 & 2 \\
4 & 6 \\
8 & 10
\end{bmatrix}
$

$
V=\begin{bmatrix}
1 & 3 & 5 & 7\\
9 & 11 & 12 & 13
\end{bmatrix}
$
        
Note that the figure in each of the cells of the two matrices indicates the traversing order. If one of the matrices is bigger than the other, it will finish the larger matrix in a sequential fashion

In [12]:
def run_opti_round_robin(U_d_matrix, V_d_matrix, M_matrix, dimensions):
    """
    Run UV decomposition optimization in a "round robin fashion". Based on U_d_matrix and V_d_matrix.
    
    The round robin approach works as:
    U = [[0, 2],
        [4, 6],
        [8, 10]]
        
    V = [[1, 3, 5, 7],
        [9, 11, 12, 13]]
        
    The optimizer alternates between the cells in U and V. 
    Note that if one array is larger, the final cells in the array are no longer done
    in a round robin fashion. (See example above, cells 11, 12 and 13 in V are done in a sequence).
    """
    
    r_U, c_U = 0, 0
    r_V, c_V = 0, 0
    
    run = True
    
    while run:
        run = False
        
        # Run U cell optimizer
        if r_U < U_d_matrix.shape[0] and c_U < U_d_matrix.shape[1]:
            U_d_matrix[r_U, c_U] = optimize_U_cell(r_U, c_U, dimensions, U_d_matrix, V_d_matrix, M_matrix)
            run = True
            
            c_U += 1

            if c_U >= U_d_matrix.shape[1]:
                r_U += 1
                c_U = 0
            
        # Run V cell optimizer
        if r_V < V_d_matrix.shape[0] and c_V < V_d_matrix.shape[1]:
            V_d_matrix[r_V, c_V] = optimize_V_cell(r_V, c_V, dimensions, U_d_matrix, V_d_matrix, M_matrix)
            run = True
            
            c_V += 1

            if c_U >= V_d_matrix.shape[1]:
                r_V += 1
                c_V = 0
        

In [13]:
def optimize_UV_array(U_d_matrix, V_d_matrix, M_array, dimensions, reference_RSME, max_rounds=5, improvement_threshold=1.0e-10):
    """
    Run round robin fashion UV cell optimization until the optimization converges.
    
    The convergence is based on a theshold improvement value. If the error reduces less than the improvement_threshold,
    the run is cut short. If the threshold value is set to "False", this metric is ignored and the run continues
    until max_rounds is reached.
    """
    
    difference = None

    for i in range(max_rounds):
        print(f"opti-cycle {i}")
        run_opti_round_robin(U_d_matrix, V_d_matrix, M_array, dimensions)

        result = RSME_two_arrays(M_array, np.matmul(U_d_matrix, V_d_matrix))
        
        improvement = reference_RSME - result if difference is None else difference - result
        print(f"RSME result round {i}: {result}, this constitutes an improvement of: {reference_RSME - result} and a difference of {improvement}")
        
        if improvement_threshold is not False and improvement < improvement_threshold:
            print("Improvement threshold reached, optimization stopping")
            return
        
        difference = result

In [14]:
import numpy as np

def run_UV_matrix_decomposition(training_array, testing_array, dimensions=2, max_rounds=5):
    """
    Perform UV matrix decomposition and return the prediction matrix.
    """
    
    # Replace all nan values with column averages
    users_movies_array_no_nans = replace_nans(training_array)

    # Normalize user movies array.
    users_movies_array_norm = np.linalg.norm(users_movies_array_no_nans)
    users_movies_array_pp = users_movies_array_no_nans / users_movies_array_norm
    
    # Ititialize users and movies d matrix with the mean of 
    # the normalized users_movies_array divided by the dimensions.
    mean_users_movies_array = np.nanmean(users_movies_array_pp)
    start_value = np.sqrt(mean_users_movies_array/dimensions)
    users_d_matrix = np.full((users_movies_array_pp.shape[0], dimensions), start_value)
    movies_d_matrix = np.full((dimensions, users_movies_array_pp.shape[1]), start_value)
    
    # Find the baseline
    base_line = RSME_two_arrays(testing_array, np.matmul(users_d_matrix, movies_d_matrix) * users_movies_array_norm)

    optimize_UV_array(users_d_matrix, movies_d_matrix, users_movies_array_pp, dimensions, base_line, max_rounds=max_rounds)

    return np.matmul(users_d_matrix, movies_d_matrix) * users_movies_array_norm


## Experimentation

### Data ingesting
Below, the three input files are ingested and processed to be usable in python.

```user_data``` is a list of dictionaries where each entry in the list corresponds one to one with the lines in the input file. The relevant keys can be found in the provided list parameter in ```build_file()```. ```movie_data``` and ```rating_data``` follow the same principles.

Fetching ```user_data[0]``` returns an entire line. Please be aware: ```[id]``` and UIDs might not map 1 to 1! 

In [15]:
file_path_users = "./ml-1m/users.dat"
file_path_movies = "./ml-1m/movies.dat"
file_path_ratings = "./ml-1m/ratings.dat"

user_data = build_file(
    ["uid", "gender", "age", "occupation", "zip-code"],
    file_path_users,
    "::"
)

movie_data = build_file(
    ["mid", "title", "genre"],
    file_path_movies,
    "::"
)

rating_data = build_file(
    ["uid", "mid", "rating", "timestamp"],
    file_path_ratings,
    "::"
)

Unpacking: './ml-1m/users.dat'
Unpacked succesfully
Unpacking: './ml-1m/movies.dat'
Unpacked succesfully
Unpacking: './ml-1m/ratings.dat'
Unpacked succesfully


### Run experiments
Rating data is split between a testing and training set.

These datasets are converted into a 2D array where users are the rows, movies the columns, and the cells are the ratings.

This is done over 5 folds to perform 5 cross fold validation. For every fold all experiments are run. At the end of all folds, the average RMSE and MAE are displayed.

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
import random

random_seed = 10

num_folds = 5

min_rand_val = 0

max_rand_val = 1000

random.seed(random_seed)

run_naive = True
run_UV_decomp = True
run_matrix_fac = True

### Run experiments ###

#Root mean squared error
N_RMSE_global = 0
N_RMSE_item = 0
N_RMSE_user = 0
N_RMSE_user_item = 0
N_RMSE_user_item_gamma = 0
F_RMSE = 0
UV_RMSE = 0

#Mean absolute error
N_MAE_global = 0
N_MAE_item = 0
N_MAE_user = 0
N_MAE_user_item = 0
N_MAE_user_item_gamma = 0
F_MAE = 0
UV_MAE = 0

for i in range(num_folds):
    print(f" -#- Fold {i} -#- ")
    
    ### Data setup ###
    Rating_train, Rating_test = train_test_split(rating_data, test_size=0.2, random_state=random.randint(min_rand_val, max_rand_val))

    training_array = construct_M_array(int(user_data[-1]['uid']), int(movie_data[-1]['mid']), Rating_train)

    testing_array = construct_M_array(int(user_data[-1]['uid']), int(movie_data[-1]['mid']), Rating_test)

    #the global average rating 
    R_global = np.nanmean(training_array)

    #the average rating per user & item 
    R_user, R_item = get_row_col_mean(training_array)

    R_user = np.nan_to_num(R_user, nan=np.nanmean(R_user))

    R_item = np.nan_to_num(R_item, nan=np.nanmean(R_item))
    
    if run_naive:
        print(" --- Running Naive approaches --- ")
        alpha, beta, gamma = perform_linear_regression(int(user_data[-1]['uid']) + 1, int(movie_data[-1]['mid']) + 1, R_user, R_item)

        RMSE_global, RMSE_item, RMSE_user, RMSE_user_item, RMSE_user_item_gamma, MAE_global, MAE_item, MAE_user, MAE_user_item, MAE_user_item_gamma = test_naive_approaches(alpha, beta, gamma, Rating_test, verbose=False)

        N_RMSE_global += RMSE_global
        N_RMSE_item += RMSE_item
        N_RMSE_user += RMSE_user
        N_RMSE_user_item += RMSE_user_item
        N_RMSE_user_item_gamma += RMSE_user_item_gamma

        N_MAE_global += MAE_global
        N_MAE_item += MAE_item
        N_MAE_user +=MAE_user
        N_MAE_user_item += MAE_user_item
        N_MAE_user_item_gamma += MAE_user_item_gamma
    
    if run_UV_decomp:
        print(" --- Running UV decomposition approach --- ")
        UV_predict_array = run_UV_matrix_decomposition(training_array, testing_array)
        RSME, MAE = test_matrixes(UV_predict_array, testing_array, verbose=False)

        UV_RMSE += RSME
        UV_MAE += MAE
    
    if run_matrix_fac:
        print(" --- Running matrix factorization approach --- ")
        MF_predict_array = run_matrix_factorization(int(user_data[-1]['uid']), int(movie_data[-1]['mid']), user_data, movie_data, Rating_train)
        RSME, MAE = test_matrixes(MF_predict_array, testing_array, verbose=False)

        F_RMSE += RSME
        F_MAE += MAE
    
print(" #-#-# Results: #-#-# ")
print(f"The following results are the averages of the {num_folds} folds\n")

print(f"RMSE global average: {N_RMSE_global / num_folds}")
print(f"RMSE item average: {N_RMSE_item / num_folds}")
print(f"RMSE user average: {N_RMSE_user / num_folds}")
print(f"RMSE Linear Regression User Item (A,B): {N_RMSE_user_item / num_folds}")
print(f"RMSE Linear Regression User Item (A,B,Y): {N_RMSE_user_item_gamma / num_folds}\n")
print(f"RMSE UV decomposition: {UV_RMSE / num_folds}\n")
print(f"RMSE matrix factorization: {F_RMSE / num_folds}\n")


print(f"MAE global average: {N_MAE_global / num_folds}")
print(f"MAE item average: {N_MAE_item / num_folds}")
print(f"MAE user average: {N_MAE_user / num_folds}")
print(f"MAE Linear Regression User Item (A,B): {N_MAE_user_item / num_folds}")
print(f"MAE Linear Regression User Item (A,B,Y): {N_MAE_user_item_gamma / num_folds}")
print(f"MAE UV decomposition: {UV_MAE / num_folds}\n")
print(f"MAE matrix factorization: {F_MAE / num_folds}\n")

 -#- Fold 0 -#- 


  return np.nanmean(data, axis=1), np.nanmean(data, axis=0)


 --- Running Naive approaches --- 
 --- Running UV decomposition approach --- 
opti-cycle 0
RSME result round 0: 2.2308603648578905e-05, this constitutes an improvement of: 1.1614223862436044 and a difference of 1.1614223862436044
opti-cycle 1
RSME result round 1: 2.212815969051611e-05, this constitutes an improvement of: 1.1614225666875624 and a difference of 1.8044395806279522e-07
opti-cycle 2
RSME result round 2: 2.158788466271562e-05, this constitutes an improvement of: 1.1614231069625902 and a difference of 5.402750278004892e-07
opti-cycle 3
RSME result round 3: 2.1474996131615455e-05, this constitutes an improvement of: 1.1614232198511214 and a difference of 1.1288853110016526e-07
opti-cycle 4
RSME result round 4: 2.1373985899166813e-05, this constitutes an improvement of: 1.1614233208613538 and a difference of 1.010102324486414e-07
 --- Running matrix factorization approach --- 
-0.03881741483244614
-0.028821624869946164
-0.01862318071751677
-0.012449001443680758
-0.008251491015

  return np.nanmean(data, axis=1), np.nanmean(data, axis=0)


 --- Running Naive approaches --- 
 --- Running UV decomposition approach --- 
opti-cycle 0
RSME result round 0: 2.232740054534455e-05, this constitutes an improvement of: 1.1614548158198654 and a difference of 1.1614548158198654
opti-cycle 1
RSME result round 1: 2.2130894657405837e-05, this constitutes an improvement of: 1.1614550123257534 and a difference of 1.965058879387137e-07
opti-cycle 2
RSME result round 2: 2.1586389417722765e-05, this constitutes an improvement of: 1.161455556830993 and a difference of 5.445052396830716e-07
opti-cycle 3
RSME result round 3: 2.1474588052182823e-05, this constitutes an improvement of: 1.1614556686323585 and a difference of 1.1180136553994212e-07
opti-cycle 4
