# À propos de ce livret


Recherches liées au développement de cette méthode : 
>Renaud Gaujoux, Cathal Seoighe (2010). **A flexible R package for nonnegative matrix factorization.**  
>Xihui Lin, Paul C. Boutros (2020). [**Optimization and expansion of non-negative matrix factorization.**](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6945623/)

La méthode implantés provient de ces sources : 
>[**Dépôt 1**](https://github.com/linxihui/NNLM) *Paquet en R*  
>[**Exemples d'implémentation**](https://rdrr.io/cran/NNLM/f/inst/doc/Fast-And-Versatile-NMF.pdf)  *Document explicatif*



**Note sur le cachier**
- Ajouter la recherche du rang 
- Modulariser les éléments


## Préparation pour l'utilisation

In [1]:
# Chargement des données
library(R.matlab)

# Algorithme NNLM
library(NNLM)

R.matlab v3.6.2 (2018-09-26) successfully loaded. See ?R.matlab for help.


Attaching package: 'R.matlab'


The following objects are masked from 'package:base':

    getOption, isOpen




In [2]:
setwd("C:/Users/amass/OneDrive/02_Education/02_Maitrise/Cours/PROJET_MAITRISE/application")

# Données

## Chargement des données

In [3]:
# Tenseur de données
tensor <- readMat("data/Guangzhou-data-set/tensor.mat")
tensor <- tensor$tensor # Prendre l'array 
dim(tensor)

# Matrice de données aléatoires
random_matrix <- readMat("data/Guangzhou-data-set/random_matrix.mat")
random_matrix <- random_matrix$random.matrix
dim(random_matrix)

# Tenseur de données aléatoires
random_tensor <- readMat("data/Guangzhou-data-set/random_tensor.mat")
random_tensor <- random_tensor$random.tensor
dim(random_tensor)

## Format de données

In [4]:
# Indiquer si utiliser le jeu de données tronqué ou complet
petit_donnees = TRUE
# Indiquer si je veux travailler avec un tenseur ou une matrice
matrice = TRUE

capteurs = 50
jours = 10
sequences = dim(tensor)[3]   # (la séquence de temps du tenseur est pleine)

# Choix de données tronquées ou complètes
if (petit_donnees == TRUE){
    # Si l'utilisateur choisi de travailler avec les données tronquées 
    tensor = tensor[1:capteurs, 1:jours, 1:sequences, drop = FALSE]
    
    random_tensor = random_tensor[1:capteurs, 1:jours, 1:sequences, drop = FALSE]
    }

# Choix de matrice ou de tenseur
if(matrice == TRUE){
    # Si l'utilisateur choisi de travailler avec une matrice
    mat_dense <- array(tensor, c(dim(tensor[,,])[1], dim(tensor[,,])[2] * dim(tensor[,,])[3]))
    }

dim(tensor)
dim(mat_dense)

## Scénarios de manquants


- Scénario de manquants aléatoires
- Scénario de manquants non-aléatoires

In [5]:
tx_manquant = 0.2
manquants_aleatoires = TRUE

# Manquants aléatoires
if(manquants_aleatoires == TRUE){
    print("Manquants aléatoires")
    mat_binaire <- round(random_tensor + 0.5 - tx_manquant)
    mat_binaire <- array(mat_binaire, c(dim(random_tensor[,,])[1], dim(random_tensor[,,])[2] * dim(random_tensor[,,])[3])) 
    dim(mat_binaire)
    
        # Création de la matrice de données manquantes (element-wise)
    mat_manq <- mat_dense * mat_binaire
    head(mat_manq)
}

# Manquants non-aléatoires
if(manquants_aleatoires == FALSE){
    print("Manquants non-aléatoires")
    tens_binaire <- array(0, dim(tensor))
    for (i1 in 1:dim(tensor)[1]){
        for (i2 in 1:dim(tensor)[2]){
            tens_binaire[i1, i2, ] <- round(random_matrix[i1, i2] + 0.5 - tx_manquant)
            }
        }
    mat_binaire <- array(tens_binaire, c(dim(tens_binaire)[1], dim(tens_binaire)[2] * dim(tens_binaire)[3]))
    dim(mat_binaire)
    
    # Création de la matrice de données manquants (element-wise)
    mat_manq <- mat_dense * mat_binaire
    head(mat_manq)
    }

[1] "Manquants aléatoires"


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
40.893,41.227,42.68,0.0,42.804,40.02,0.0,35.357,41.097,41.821,...,0.0,40.87,0.0,42.295,39.082,0.0,39.008,0.0,39.974,0.0
50.319,0.0,47.984,50.66,51.622,50.77,50.542,50.463,52.192,0.0,...,51.534,0.0,51.192,50.546,50.227,50.693,51.067,0.0,51.232,0.0
0.0,57.001,49.905,52.292,55.792,52.594,50.13,0.0,0.0,55.341,...,53.983,50.571,52.489,55.382,54.422,0.0,52.618,0.0,53.334,52.322
37.305,0.0,37.195,0.0,0.0,33.248,31.593,38.236,34.7,0.0,...,36.883,0.0,37.362,34.821,0.0,32.957,0.0,34.647,34.758,37.017
38.388,39.534,35.762,36.393,36.625,37.164,34.079,0.0,37.764,0.0,...,37.508,35.285,37.175,37.707,36.642,34.773,37.572,36.967,37.361,36.306
0.0,44.112,43.173,0.0,48.262,47.25,46.281,48.18,0.0,49.987,...,51.562,45.329,50.494,48.465,46.354,45.446,49.026,45.5,47.928,47.412


In [6]:
# Création d'un index des variables == 0
index <- which(mat_binaire %in% c(0))

# Modélisation

## Spécification du modèle

In [7]:

# Initialisation des paramètres
rang = 2

# Algorithme NMF
mat_manq.nmf <- NNLM::nnmf(mat_manq, k = rang)

# Matrice imputée
mat_manq.hat.nmf <- with(mat_manq.nmf, W %*% H);

# Présentation des résultats

In [8]:
# Sommaire de la manipulation
mat_manq.nmf 

Non-negative matrix factorization:
   Algorithm: Sequential coordinate-wise descent
        Loss: Mean squared error
         MSE: 249.4869
         MKL: 6.879058
      Target: 124.7435
   Rel. tol.: 6.9e-05
Total epochs: 687
# Interation: 15
Running time:
   user  system elapsed 
   0.08    0.02    0.28 

In [9]:
# MSE et KL-divergence
sapply(
    X = list(
        NMF = mat_manq.hat.nmf[index]
        ),
    FUN = mse.mkl,
    obs = mat_dense[index])

Unnamed: 0,NMF
MSE,138.385749
MKL,2.326434


In [10]:
head(mat_manq.hat.nmf)

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
32.04581,35.20214,30.45213,26.67824,29.54631,36.20275,38.22167,30.19457,32.07221,36.44698,...,36.1615,27.56195,33.74162,37.48141,30.33022,32.6094,33.01625,28.06604,32.42314,31.9751
36.15038,38.82901,34.36282,32.58586,38.27178,40.26525,42.54177,34.05282,35.26277,40.92327,...,41.90246,31.2363,44.30701,42.04152,35.60114,35.57243,34.96103,31.14427,36.05864,34.09001
40.57178,45.1713,38.54709,32.07215,34.02669,46.22779,48.78452,38.23432,41.23287,46.27528,...,45.02357,34.79638,38.44699,47.61823,37.45142,42.11573,43.36315,35.88663,41.4035,41.83737
26.54313,29.5755,25.21825,20.9168,22.13094,30.25858,31.93129,25.01414,26.99981,30.27957,...,29.42638,22.76091,24.98848,31.15944,24.46517,27.58521,28.42951,23.4916,27.1009,27.42331
28.45054,31.21383,27.03611,23.79508,26.44952,32.11576,33.90812,26.80659,28.43349,32.34947,...,32.15344,24.47609,30.23164,33.26569,26.9886,28.89733,29.2113,24.89447,28.7627,28.30036
35.56736,39.13432,33.79781,29.42987,32.43593,40.22267,42.46352,33.51336,35.663,40.46608,...,40.05514,30.58035,36.99808,41.6177,33.56303,36.28065,36.8096,31.18761,36.02357,35.6321
