GRUPO: 23

Esta práctica ha sido realizado exclusivamente por el integrante del grupo:

Pablo Martínez

## Tercera parte.  Recomendacion basada en filtrado colaborativo.

En esta tercera parte utilizaremos la librería SURPRISE Se puede consultar la documentacion en http://surpriselib.com/

Para instalarla: conda install -c conda-forge scikit-surprise o pip install numpy pip install scikit-surprise

La librería SurPRISE (Simple Python RecommendatIon System Engine) tiene algoritmos de predición de ratings incluidos: baseline algorithms, neighborhood methods, matrix factorization-based ( SVD, PMF, SVD++, NMF) y otros. También tiene predefinidas las medidas de similitud mas comunes sobre vectores (cosine, MSD, pearson…) Una de las cosas más utiles es que proporciona herramientas para evaluar, analizar y comparar el rendimiento de distitnos algoritmos. Lo que vamos a hacer en esta parte de la práctica es probar varios procedimientos de evaluación cruzada midiendo datos sobre errores entre el valor real (conocido) y la predicción del recomendador. Las siglas corresponden a las siguientes medidas:

MAE: Mean Absolute Error
RMSE: Root mean square error (RMSE)
MSE: mean square error is defined as the expected value of the square of the difference between the estimator and the parameter. -square root of the mean square error.

Vamos a ejecutar algunos recomendadores y evaluarlos para poder comentar los resultados.


In [1]:
from collections import defaultdict
import numpy as np

from surprise import KNNBasic
from surprise import KNNWithMeans
from surprise import KNNWithZScore
from surprise import KNNBaseline

from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split

In [2]:
## Ejemplo getting started de la documentación de SURPRISE
##http://surpriselib.com/

from surprise import SVD
from surprise import Dataset
from surprise.model_selection import cross_validate

# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

# Use the famous SVD algorithm.
algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)



Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9376  0.9367  0.9405  0.9278  0.9372  0.9360  0.0043  
MAE (testset)     0.7412  0.7354  0.7433  0.7312  0.7372  0.7377  0.0043  
Fit time          5.79    5.49    5.60    5.40    5.40    5.54    0.15    
Test time         0.38    0.17    0.21    0.22    0.22    0.24    0.07    


{'test_rmse': array([0.93759826, 0.93670262, 0.94052325, 0.92776556, 0.93718323]),
 'test_mae': array([0.74119207, 0.73536046, 0.74329677, 0.73119917, 0.7372093 ]),
 'fit_time': (5.792865037918091,
  5.490248680114746,
  5.600041389465332,
  5.403141498565674,
  5.39815092086792),
 'test_time': (0.3793976306915283,
  0.17240023612976074,
  0.2086181640625,
  0.22337055206298828,
  0.22017431259155273)}

In [3]:
# Evaluacion extracted from surprise: 
# https://surprise.readthedocs.io/en/stable/FAQ.html#how-to-compute-precision-k-and-recall-k
def measures_at_k(predictions, k, th_recom, th_relev):
    '''Return precision and recall at k metrics for each user.'''

    # First map the predictions to each user.
    user_est_true = defaultdict(list)
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))

    precisions = dict()
    recalls = dict()
    onehits = dict()
    mrr = dict()
    
    for uid, user_ratings in user_est_true.items():
        
        # Sort user ratings by estimated value
        user_ratings.sort(key=lambda x: x[0], reverse=True)

        # Number of relevant items
        n_rel = sum((true_r >= th_relev) for (_, true_r) in user_ratings)

        # Number of recommended items in top k
        n_rec_k = sum((est >= th_recom) for (est, _) in user_ratings[:k])

        # Number of relevant and recommended items in top k
        n_rel_and_rec_k = sum(((true_r >= th_relev) and (est >= th_recom))
                              for (est, true_r) in user_ratings[:k])

        # Precision@K: Proportion of recommended items that are relevant
        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0

        # Recall@K: Proportion of relevant items that are recommended
        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0
       
        
    return precisions, recalls


In [4]:
def f1(precision, recall):
    """
        Funcion que calcula el f1 (media armónica) en funcion de precision y recall
    """
    denominador = precision + recall
    
    if denominador == 0:
        return 0
    else:
        return (2 * precision * recall) / denominador

In [5]:
def get_results(recommendations, k, knn):
    """
        Function to get the measures results 
    """
    # threshold = 4 --> solo se tienen en cuenta peliculas que hayan 
    # sido puntuadas con 4 o 5 estrellas
    precisions, recalls  = measures_at_k(recommendations, k, th_recom=4, th_relev=1)
    
    # Measures can then be averaged over all users
    precision_result = sum(prec for prec in precisions.values()) / len(precisions)
    recall_result = sum(rec for rec in recalls.values()) / len(recalls)
    # Media armónica  /
    f1_result = f1(precision_result, recall_result)
    # En este archivo se volcarán los resultados de las métricas. Tiene que existir. 
    f = open("C:/Users/Tine/Documents/Python Scripts/results_user_cf.csv", 'a')
    #f = open("C:/hlocal/results_user_cf.csv", 'a')
    f.write(str(k) + ',' + knn + "," + str(precision_result) + ',' + str(recall_result) + ',' +  str(f1_result) +  '\n') 
    f.close()
    

In [6]:
# Hemos cargado antes los datos de movieLens para 100K
# data = Dataset.load_builtin('ml-100k')

In [7]:
# creo dos conjuntos de datos, el training set y el evaluation set
# cada uno contendra la mitad de los datos
training_set, evaluation_set = train_test_split(data, test_size=.5)

In [8]:
# Ahora determino cual es el algoritmo que voy a usar de recomendacion
# en este caso voy a usar el algoritmo KNN para encontrar las similitudes entre items
recommendation_algorithm = KNNBasic(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# aplico el algoritmo sobre el training_set
recommendation_algorithm.fit(training_set)

# aplico el algoritmo sobre el evaluation set y obtengo las predicciones en las recomendaciones
recommendations = recommendation_algorithm.test(evaluation_set)


K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_basic")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [9]:
##########################################################
# Hacer distintas pruebas con el resto de tipos KNN
recommendation_algorithm = KNNWithMeans(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# aplico el algoritmo sobre el training_set
recommendation_algorithm.fit(training_set)

# aplico el algoritmo sobre el evaluation set y obtengo las predicciones en las recomendaciones
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_withmeans")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [10]:
##########################################################
# Hago lo mismo con el resto de tipos KNN
recommendation_algorithm = KNNWithZScore(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# aplico el algoritmo sobre el training_set
recommendation_algorithm.fit(training_set)

# aplico el algoritmo sobre el evaluation set y obtengo las predicciones en las recomendaciones
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_withzscore")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [11]:
##########################################################
# Hago lo mismo con el resto de tipos KNN
recommendation_algorithm = KNNBaseline(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# aplico el algoritmo sobre el training_set
recommendation_algorithm.fit(training_set)

# aplico el algoritmo sobre el evaluation set y obtengo las predicciones en las recomendaciones
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_baseline")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


## Ejercicio:  se pide ejecutar, comprender y escribir comentarios razonados sobre la evaluación de distintos recomendadores.
    
Prueba otros algoritmos de predicción de ratings basados en la estimación de los vecinos más próximos y realiza evaluaciones de su comportamiento. Comenta los resultados.¶
Puedes consultar la documentación en https://surprise.readthedocs.io/en/stable/knn_inspired.html#

Algoritmos de filtrado colaborativo utilizados: KNN(basic, withMeans,withZScore), CSV

Medidas de similitud utilizadas: pearson, cosine

In [12]:
data = Dataset.load_builtin('ml-100k')

algo = KNNBasic(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9974  1.0046  1.0010  0.9964  1.0050  1.0009  0.0036  
MAE (testset)     0.7897  0.7961  0.7907  0.7894  0.7962  0.7924  0.0031  
Fit time          2.46    2.32    2.36    2.38    2.72    2.45    0.14    
Test time         5.21    5.18    5.25    7.08    5.92  

{'test_rmse': array([0.99739056, 1.00456676, 1.00103434, 0.99641232, 1.00504276]),
 'test_mae': array([0.78972001, 0.79611746, 0.79070023, 0.78944225, 0.79624456]),
 'fit_time': (2.4552764892578125,
  2.322972536087036,
  2.3613946437835693,
  2.384783983230591,
  2.7175440788269043),
 'test_time': (5.21230673789978,
  5.183574676513672,
  5.253926515579224,
  7.079390287399292,
  5.922556638717651)}

In [36]:
recommendation_algorithm = KNNBasic(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)


K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_basic")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [None]:
1,knn_basic,0.9554612937433722,0.03683050154825231,0.07092695681952789
2,knn_basic,0.9554612937433722,0.06634819519670446,0.12408014039082521
3,knn_basic,0.9554612937433722,0.09064349297854951,0.16557872628055398
4,knn_basic,0.9554612937433722,0.10976073063172775,0.1969019177070252
5,knn_basic,0.9554612937433722,0.1249723960289564,0.22103399463079929
6,knn_basic,0.9554612937433722,0.1380991757027904,0.24131892248941134
7,knn_basic,0.9554612937433722,0.1494700369632874,0.258500833353103
8,knn_basic,0.9554612937433722,0.1592824977791682,0.27304616999193676
9,knn_basic,0.9554612937433722,0.16737535789565453,0.28485118607822907
10,knn_basic,0.9554612937433722,0.17484458530530517,0.29559650493976747

In [13]:
data = Dataset.load_builtin('ml-100k')

algo = KNNWithMeans(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNWithMeans on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9410  0.9380  0.9366  0.9300  0.9406  0.9373  0.0040  
MAE (testset)     0.7345  0.7306  0.7294  0.7288  0.7324  0.7311  0.0021  
Fit time          2.60    2.54    2.58    2.77    2.57    2.61    0.08    
Test time         6.10    6.40    6.09    6.12    5.

{'test_rmse': array([0.94101555, 0.93799499, 0.93660295, 0.9300082 , 0.94064396]),
 'test_mae': array([0.73449435, 0.73056517, 0.72937414, 0.72881871, 0.73242012]),
 'fit_time': (2.60072922706604,
  2.536982297897339,
  2.5775511264801025,
  2.7667617797851562,
  2.565481424331665),
 'test_time': (6.10114598274231,
  6.397415399551392,
  6.087649583816528,
  6.122856140136719,
  5.95072078704834)}

In [38]:
recommendation_algorithm = KNNWithMeans(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_withmeans")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [None]:
1,knn_withmeans,0.8377518557794273,0.031802320709128785,0.06127842039642204
2,knn_withmeans,0.8377518557794273,0.05873492328920355,0.10977360097984366
3,knn_withmeans,0.8377518557794273,0.08309527425485001,0.15119386908641352
4,knn_withmeans,0.8377518557794273,0.1051542128017031,0.1868545337717737
5,knn_withmeans,0.8377518557794273,0.12543844157058034,0.21820462166397073
6,knn_withmeans,0.8377518557794273,0.14310429111236533,0.2444515147899852
7,knn_withmeans,0.8377518557794273,0.15873462884356213,0.2668982107478648
8,knn_withmeans,0.8377518557794273,0.17332159903135488,0.28722045968924964
9,knn_withmeans,0.8377518557794273,0.18660448703651422,0.30522240899463493
10,knn_withmeans,0.8377518557794273,0.19794957041102942,0.3202324835499166

In [14]:
data = Dataset.load_builtin('ml-100k')

algo = KNNWithZScore(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNWithZScore on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9343  0.9403  0.9358  0.9346  0.9380  0.9366  0.0023  
MAE (testset)     0.7276  0.7330  0.7274  0.7261  0.7287  0.7286  0.0024  
Fit time          2.76    2.78    2.87    2.74    2.55    2.74    0.10    
Test time         6.63    6.39    6.61    6.57    6

{'test_rmse': array([0.93432485, 0.94032708, 0.93583485, 0.93458352, 0.93800941]),
 'test_mae': array([0.72756312, 0.73302805, 0.72741225, 0.72608579, 0.72871842]),
 'fit_time': (2.7602970600128174,
  2.7808141708374023,
  2.8690478801727295,
  2.7438509464263916,
  2.5529518127441406),
 'test_time': (6.628171920776367,
  6.391536235809326,
  6.605456352233887,
  6.570687294006348,
  6.42845892906189)}

In [39]:
recommendation_algorithm = KNNWithZScore(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_withzscore")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [None]:
1,knn_withzscore,0.8282078472958643,0.031165516237714467,0.060070572834535506
2,knn_withzscore,0.8282078472958643,0.0585295607603518,0.10933257372498337
3,knn_withzscore,0.8282078472958643,0.08266082090737455,0.15031879551735844
4,knn_withzscore,0.8282078472958643,0.10442775142427144,0.18546983049699492
5,knn_withzscore,0.8282078472958643,0.12445411581647581,0.2163912895414276
6,knn_withzscore,0.8282078472958643,0.14232481332491123,0.24290687380930623
7,knn_withzscore,0.8282078472958643,0.1583304793256178,0.26583973861951093
8,knn_withzscore,0.8282078472958643,0.1729395935945105,0.2861315380195129
9,knn_withzscore,0.8282078472958643,0.1859697143193834,0.3037369048342545
10,knn_withzscore,0.8282078472958643,0.1975035230843203,0.318947361627442

In [15]:
data = Dataset.load_builtin('ml-100k')

algo = KNNBaseline(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBaseline on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9134  0.9283  0.9141  0.9183  0.9306  0.9210  0.0072  
MAE (testset)     0.7154  0.7287  0.7160  0.7201  0.7287  0.7218  0.0059  
Fit time          2.59    2.82    2.98    2.68    2.42    2.70    0.19    
Test time         7.40    7.58    7.20    7.26    6.4

{'test_rmse': array([0.91340454, 0.92831992, 0.91412969, 0.91830874, 0.93059261]),
 'test_mae': array([0.71544524, 0.72872479, 0.71597759, 0.72006529, 0.72870533]),
 'fit_time': (2.5948503017425537,
  2.819277048110962,
  2.981828451156616,
  2.680899143218994,
  2.4216787815093994),
 'test_time': (7.399085998535156,
  7.5823681354522705,
  7.201022624969482,
  7.2570013999938965,
  6.445598363876343)}

In [40]:
recommendation_algorithm = KNNBaseline(k=100, sim_options={'name': 'pearson_baseline', 'user_based': True})
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_baseline")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.


In [None]:
1,knn_baseline,0.9204665959703076,0.03596465691528636,0.06922455790978523
2,knn_baseline,0.9204665959703076,0.066402314481288,0.12386875648392193
3,knn_baseline,0.9204665959703076,0.09259877088962959,0.168269646204569
4,knn_baseline,0.9204665959703076,0.1148783332291254,0.20426365234612023
5,knn_baseline,0.9204665959703076,0.13331660009924678,0.23290080452485945
6,knn_baseline,0.9204665959703076,0.1485012935046663,0.2557429114106759
7,knn_baseline,0.9204665959703076,0.16142317220388175,0.2746761124841565
8,knn_baseline,0.9204665959703076,0.17294923767357778,0.2911865571701712
9,knn_baseline,0.9204665959703076,0.18328385039071837,0.3056971119181652
10,knn_baseline,0.9204665959703076,0.19269992083716841,0.31868339102659743

In [16]:
data = Dataset.load_builtin('ml-100k')

algo = KNNBaseline(k=100, sim_options={'name': 'cosine', 'user_based': True})

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating RMSE, MAE of algorithm KNNBaseline on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9352  0.9387  0.9341  0.9376  0.9367  0.9364  0.0016  
MAE (testset)     0.7364  0.7423  0.7399  0.7377  0.7370  0.7387  0.0022  
Fit time          2.42    2.21    2.24    2.23    2.25    2.27    0.08    
Test time         7.40    7.21    7.37    7.33    7.24    7.31    0.07    


{'test_rmse': array([0.93521374, 0.9386884 , 0.93408547, 0.93757794, 0.93668218]),
 'test_mae': array([0.73640392, 0.74232647, 0.73994538, 0.73773628, 0.73700178]),
 'fit_time': (2.421116590499878,
  2.2131898403167725,
  2.24385404586792,
  2.2269680500030518,
  2.2486202716827393),
 'test_time': (7.397406816482544,
  7.2078938484191895,
  7.3738062381744385,
  7.332503080368042,
  7.238425970077515)}

In [52]:
recommendation_algorithm = KNNBaseline(k=100, sim_options={'name': 'cosine', 'user_based': True})
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "knn_cosine")

Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.


In [None]:
1,knn_cosine,0.88016967126193,0.03390149745530835,0.06528828583971324
2,knn_cosine,0.88016967126193,0.06196093726656509,0.11577192650639628
3,knn_cosine,0.88016967126193,0.08583462405910573,0.15641552156015406
4,knn_cosine,0.88016967126193,0.10458669390172426,0.18695799132926552
5,knn_cosine,0.88016967126193,0.12054022828465319,0.21204117826997154
6,knn_cosine,0.88016967126193,0.13362219184362004,0.23202040763677487
7,knn_cosine,0.88016967126193,0.14530467860880128,0.24943143867040823
8,knn_cosine,0.88016967126193,0.15573348461862765,0.264642286652242
9,knn_cosine,0.88016967126193,0.1645867132682507,0.2773167706011046
10,knn_cosine,0.88016967126193,0.17281397072597135,0.28890404321227753

In [17]:
data = Dataset.load_builtin('ml-100k')

algo = SVD()

# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9377  0.9321  0.9324  0.9363  0.9346  0.9346  0.0022  
MAE (testset)     0.7412  0.7343  0.7350  0.7367  0.7365  0.7367  0.0024  
Fit time          5.65    5.53    5.67    5.61    5.51    5.59    0.06    
Test time         0.25    0.17    0.25    0.18    0.18    0.21    0.03    


{'test_rmse': array([0.93770645, 0.93208688, 0.93242449, 0.93631593, 0.93463473]),
 'test_mae': array([0.74122125, 0.7342647 , 0.73499089, 0.7367379 , 0.7365045 ]),
 'fit_time': (5.649105787277222,
  5.526436805725098,
  5.672861576080322,
  5.607264757156372,
  5.511228084564209),
 'test_time': (0.2494187355041504,
  0.1740856170654297,
  0.24844002723693848,
  0.18211841583251953,
  0.18266797065734863)}

In [58]:
recommendation_algorithm = SVD()
recommendation_algorithm.fit(training_set)
recommendations = recommendation_algorithm.test(evaluation_set)

K = 10
for k in range(K):
    get_results(recommendations, k+1, "SVD")

In [None]:
1,SVD,0.8589607635206787,0.03280891914425131,0.06320370558958135
2,SVD,0.8589607635206787,0.05912689996673682,0.11063799059693
3,SVD,0.8589607635206787,0.08137522355650367,0.14866627485995196
4,SVD,0.8589607635206787,0.10039752739748306,0.1797817095141711
5,SVD,0.8589607635206787,0.1169841421103758,0.20592307505712248
6,SVD,0.8589607635206787,0.13119040354335543,0.22761657601904117
7,SVD,0.8589607635206787,0.1431369890868279,0.24538336128217708
8,SVD,0.8589607635206787,0.15366536821030932,0.26069349361740224
9,SVD,0.8589607635206787,0.16299589721540633,0.2739980778089759
10,SVD,0.8589607635206787,0.17139921813100983,0.28577430392173125

##### Comparamos

|   Algoritmo   | M.Similitud | Precision |     Recall     |   M.Armónica  |
|--------------:|:-----------:|----------:|:--------------:|:--------------:
| Knn_basic     | pearson     |     0.955 |  0.036 - 0.174 |  0.07 - 0.295 |
| knn_withmeans | pearson     |     0.837 |  0.031 - 0.197 |  0.061 - 0.32 |
| knn_withzscore| pearson     |     0.828 |  0.031 - 0.197 |  0.06 - 0.318 |
| knn_baseline  | pearson     |     0.920 |  0.035 - 0.192 | 0.069 - 0.318 |
| knn_baseline  | cosine      |     0.880 |  0.033 - 0.172 | 0.065 - 0.288 |
| SVD           |             |     0.858 |  0.032 - 0.171 |  0.63 - 0.285 |

El mejor algoritmo recomendador es el que tiene mayor precision y menor recall. Observando la tabla, podemos establecer un orden segun la eficencia del algoritmo.

1.- Knn_basic(pearson)

2.- Knn_baseline(pearson)

3.- Knn_baseline(cosine)

Dado que los margenes de recall son muy similares, hemos tenido en cuenta el valor de precisión en especial.