# Comparaisons entre différentes manières de prédiction de notes

Ce notebook présente la MSE et la MAE pour différentes façons de prédire les notes d'utilisateurs sur des séries.  
4000 utilisateurs pour 3000 séries.  
Au moins 4 notes pour chacun des utilisateurs.  
Les différents algorithmes comparés :
- kSVD
- NMF
    - sans biais
    - avec biais
- content


In [1]:
from utils.predictions_content import *
from utils.predictions_notes import *
from utils.collaborative import *

In [2]:
path_ratings = "/Vrac/PLDAC_addic7ed/ratings/ratings_imdb/users"
#path_ratings = "/Users/constancescherer/Desktop/ratings/ratings_imdb/users"
path_d_user = "/Vrac/PLDAC_addic7ed/pickles/d_user.p"
#path_d_user = "/Users/constancescherer/Desktop/pickles/d_user.p"
#d_user= get_d_user(path)
with open(path_d_user, 'rb') as pickle_file:
    d_user = pickle.load(pickle_file)

liste_series = get_liste_series(d_user)
data = get_data(d_user)
all_data, num_user, num_item = get_all_data(data)
train, train_mat, test = get_train_test(num_user, num_item, all_data, test_size=20)
mean, u_means, i_means,U_ksvd, I_ksvd =  get_Uksvd_Iksvd(train, train_mat, num_user, num_item)
d_username_id, d_itemname_id, Full = create_sparse_mat(data)

#path_series = "/Users/constancescherer/Desktop/addic7ed_final"
path_series = '/Vrac/PLDAC_addic7ed/addic7ed_final'
d_info, d_name = getDicts(path_series)
d_ind = reverse_dict(d_name)
d_titre_filename = get_d_titre_filename("titles/title-filename.txt")
d_filename_titre = reverse_dict(d_titre_filename)
d_id_username = reverse_dict(d_username_id)
d_id_serie = reverse_dict(d_itemname_id)

path_sim = "/Vrac/PLDAC_addic7ed/pickles/sim.p"
#path_sim = "/Users/constancescherer/Desktop/pickles/sim.p"
# matrice des similarités cosinus
with open(path_sim, 'rb') as pickle_file:
    similarities = pickle.load(pickle_file)

In [3]:
def error_ksvd(train_mat, test) :
    ## Getting the truth values
    truth_tr = np.array([rating for (uid,iid),rating in train_mat.items()])
    truth_te = np.array([rating for uid,iid,rating in test])

    prediction_tr = np.array([pred_func_ksvd(u,i, U_ksvd, I_ksvd, u_means, i_means, mean) for (u,i),rating in train_mat.items()])
    prediction_te = np.array([pred_func_ksvd(u,i, U_ksvd, I_ksvd, u_means, i_means, mean) for u,i,rating in test])


    print("Training Error:")
    print("MSE:",  MSE_err(prediction_tr,truth_tr))
    print("MAE:",  MAE_err(prediction_tr,truth_tr))

    print("Test Error:")
    print("MSE:",  MSE_err(prediction_te,truth_te))
    print("MAE:",  MAE_err(prediction_te,truth_te))

In [4]:
def error_NMF(train_mat, test, num_user, num_item) :
    ## Getting the truth values
    truth_tr = np.array([rating for (uid,iid),rating in train_mat.items()])
    truth_te = np.array([rating for uid,iid,rating in test])
    
    prediction_tr, prediction_te = predictions_NMF(train_mat,test, 100, num_user, num_item)
    print("Training Error:")
    print("MSE:",  MSE_err(prediction_tr,truth_tr))
    print("MAE:",  MAE_err(prediction_tr,truth_tr))

    print("Test Error:")
    print("MSE:",  MSE_err(prediction_te,truth_te))
    print("MAE:",  MAE_err(prediction_te,truth_te))

In [5]:
def error_NMF_biais(train_mat,test, nb_comp, num_user, num_item) :
    truth_tr = np.array([rating for (uid,iid),rating in train_mat.items()])
    truth_te = np.array([rating for uid,iid,rating in test])
    
    prediction_tr, prediction_te = predictions_NMF_biais(train_mat,test, 100, num_user, num_item)
    print("Training Error:")
    print("MSE:",  MSE_err(prediction_tr,truth_tr))
    print("MAE:",  MAE_err(prediction_tr,truth_tr))

    print("Test Error:")
    print("MSE:",  MSE_err(prediction_te,truth_te))
    print("MAE:",  MAE_err(prediction_te,truth_te))

In [6]:
def error_content(train_mat, test, d_name, d_user, d_ind, d_titre_filename, d_filename_titre, d_id_username, d_id_serie, similarities) :
    ## Getting the truth values
    truth_tr = np.array([rating for (uid,iid),rating in train_mat.items()])
    truth_te = np.array([rating for uid,iid,rating in test])

    prediction_tr = np.array([pred_content(u, i, d_name, d_user, d_ind, d_titre_filename, d_filename_titre, d_id_username, d_id_serie, similarities) for (u,i),rating in train_mat.items()])
    prediction_te = np.array([pred_content(u, i, d_name, d_user, d_ind, d_titre_filename, d_filename_titre, d_id_username, d_id_serie, similarities) for u,i,rating in test])


    print("Training Error:")
    print("MSE:",  MSE_err(prediction_tr,truth_tr))
    print("MAE:",  MAE_err(prediction_tr,truth_tr))

    print("Test Error:")
    print("MSE:",  MSE_err(prediction_te,truth_te))
    print("MAE:",  MAE_err(prediction_te,truth_te))

## kSVD

In [7]:
error_ksvd(train_mat, test)

Training Error:
MSE: 1.4717542445892757
MAE: 0.7410327440155239
Test Error:
MSE: 5.641115231844463
MAE: 1.716289295576649


## NMF

### Sans biais

In [8]:
error_NMF(train_mat, test, num_user, num_item)

Training Error:
MSE: 0.0029776867
MAE: 0.0029776867
Test Error:
MSE: 8.944893460690668
MAE: 2.3056576047024246


### Avec biais

In [9]:
error_NMF_biais(train_mat,test, 100, num_user, num_item)

  item_mean = np.array(list(item.values())).mean()
  ret = ret.dtype.type(ret / rcount)


Training Error:
MSE: 2.0061487296492517
MAE: 0.6923314900034804
Test Error:
MSE: 6.0911094783247615
MAE: 1.6869948567229978


## Content

In [10]:
error_content(train_mat, test, d_name, d_user, d_ind, d_titre_filename, d_filename_titre, d_id_username, d_id_serie, similarities)

Training Error:
MSE: 3.667719384181738
MAE: 1.3562782783556986
Test Error:
MSE: 3.714425667401421
MAE: 1.3764388929708549


kSVD et content donnent de bons résultats en test. La NMF sans biais est un peu moins bonne en test mais elle ne prend pas en compte les biais des utilisateurs et des items (contrairement au kSVD). Avec la prise en compte des bias, les résultats sont un peu améliorés.