
# Recurrent Neural Networks with Top-k Gains for Session-based Recommendations


Avec l'avènement des plateformes de streaming et de commerce en ligne les systèmes de recommandation ont pris une place incontournable dans notre vie de chaque jour. Les systèmes de recommandation sont aujourd'hui incontournables lors de nos visites quotidiennes sur le web. les systèmes de recommandation sont des algorithmes dont le but est de faire des suggestions pertinentes à l'usager. par exemple lorsque vous allez sur l'application netflix, vous avez des suggestions de films qui pourraient vous intéresser. ces algorithmes sont technologies qui rendent les entreprises extrêmement concurrentielles. Il existe deux grandes familles de méthodes pour construire un système de recommandation:
- les méthodes par filtrage collaboratif;
- les méthodes basées sur le contenu.
Les méthodes collaboratives pour les systèmes de recommandation sont des méthodes qui se basent uniquement sur les interactions passées enregistrées entre les utilisateurs et les éléments afin de produire de nouvelles recommandations. Ces interactions sont stockées dans la "matrice des interactions entre utilisateurs et articles". L'idée principale qui régit les méthodes collaboratives est que ces interactions passées entre utilisateurs et articles sont suffisantes pour détecter des utilisateurs similaires et/ou des articles similaires et faire des prédictions basées sur ces proximités estimées. Contrairement aux méthodes de collaboration qui reposent uniquement sur les interactions entre l'utilisateur et l'article, les approches basées sur le contenu utilisent des informations supplémentaires sur les utilisateurs et/ou les articles. L'idée derrière ces méthodes est de construire un modèle, basé sur les "caractéristiques" disponibles, qui expliquent les interactions observées entre l'utilisateur et les articles. par exemple si nous voulons faire des recommandations sur un site de vente en ligne nous pouvons ajouter des variables comme le sexe du visiteur, son age, sa profession... 

Cependant ces deux familles ont une principale limite: elles sont incapable de faire une recommandation lorsqu'il s'agit d'un nouvel utilisateur. Un utilisateur qui n'a pas d'historique. Ceci est le but de l'article étudié pour de projet, construire un système de recommandation qui soit capable de faire des recommandation lorsque l'utilisateur n'a pas d'historique sur le site. une solution triviale à ce problème est de faire le item-to-item approche. Avec cette approche on recommande à l'utilisateur des articles qui sont similaires. Dans cet article, l'auteur n'utilisera pas cette approche mais plutôt des réseaux de neuronnes, ici, des réseaux de neuronnes récurrents. Ces derniers sont réputés pour leur excellente habilité à modéliser des données séquentielles. Avec ces réseau l'auteur va modéliser toute la session de l'utilisateur afin de pouvoir faire des prédictions.

## présentation des travaux précédents
Pleines de solutions avaient été proposés pour ce problème notemment celle avec la matrice des articles similaires, et des méthodes avec Reccurent Neural Networks(RNN), les LSTMs et les GRU. Dans cette section nous nous focaliseront sur l'article de référence de l'auteur. Nous présenteront la méthode de résoltion dans cet article afin de présenter plus les améliorations de cette dernière dans l'article que nous étudions. L'article sur lequel l'auteur a bati son raisonnement est **SESSION-BASED RECOMMENDATIONS WITH RECURRENT NEURAL NETWORKS**, disponible ici [ici](https://arxiv.org/abs/1511.06939). 

Dans cet article la version de RNN utilisée est celle de General Recurent Unit (GRU), cette version permet de résoudre le problème du gradient qui disparait avec les RNN. le modèle utilisé ici est constitué d'une couche d'embedding, des couches de GRU, des feedforwards layers et la sortie du réseau contient les différents scores des articles prédisant ainsi le prochain article sur lequel l'utilisateur cliquera. L'entrée du réseau est l'état actuel de la session. Ci-dessous une représentation du réseau.
<img src="images/architecture.png"/>

Pour l'entrainement du réseau les auteurs considèrent des mini-batches de sessions parrallèles. En effet les RNN sont toujours entrainé sur des batches de données, et la taille des données d'entrées doit être fixe. ceci ne peut être obtenu avec ce type de données car les sessions n'ont pas les mêmes durée, de plus vu qu'on veut modéliser la session entière ça ne fait pas sens de couper une session pour en faire un batch. pour résoudre ce problème les auteurs utlisent à la fois plusieurs sessions d'utilisateurs et forment des batchs avec avec les éléments des différentes sessions (pour cela ils supposent l'indépendance entre les sessions). ci dessous une illustration de la formation des batches.
<img src="images/batchs.png"/>
Ensuite pour chaque session il faudra prédire les prochaines sélections de l'utilisateur. Le problème ici est qu'un site peut contenir des milliers d'articles et que ceux qui intéressent vraiment sont ceux qui pourront intéresser l'utilisateur. Si on considèrent tous les articles cela conduirait à un vecteur sparse car les articles jugés (par le réseau) non intéressant pas l'utilisateur auront des probabilités très faibles. Ainsi pour résoudre ce problème avec la sparsité du résultat les auteurs adoptent cette méthode d'échantillonnage des articles. Ils considèrent tous les autres articles du mini batchs comme des exemples négatifs.

Pour faire la backpropagation avec leur réseau, les auteurs considèrent deux fonctions de perte:
- BPR: Bayesian Personalized Ranking. c'est une méthode de factorisation matricielle qui utilise la perte de classement par paire. Il compare le score d'un positif et d'un négatif échantillonné point. Ici on compare le score de l'élément positif avec plusieurs éléments échantillonnés et utilisons leur moyenne comme la perte. Ainsi on compare le score de l'élément positif avec celui des négatifs. la formule de cette perte est donnée ci-dessous:
    $$ L_s^{BPR} = \frac{-1}{N_s} \sum_{j=1}^{N_s}{log(\sigma(\hat r_{s,i} - \hat r_{s,j}))} $$
où {N_s} est la taille de l'échantillon, {\hat r_{s,k}} est le score de l'article k, i est le prochain item (celui qu'on cherche à prédire) et j les échantillons négatifs.

- TOP1: cette perte a été conçue par les auteurs pour cette tâche, elle régularise l'approximation du rang relatif de l'article concerné. Cette perte est donnée par la formule suivante:
$$ L_s^{TOP1} = \frac{-1}{N_s} \sum_{j=1}^{N_s}{\sigma(\hat r_{s,j} - \hat r_{s,i}) + \sigma(\hat r_{s,j}^2)} $$

In [8]:
##clonage du github où se trouve l'implémentation de l'article
!git clone https://github.com/hidasib/GRU4Rec

Cloning into 'GRU4Rec'...


In [1]:
## installation des requirements
!pip install theano 
conda install -c conda-forge pygpu



### importation des librairies utiles

In [None]:
import numpy as np
import datetime as dt
import pandas as pd
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
import os
import lib
import time
import torch
from tqdm import tqdm
import json


## Nétoyage des données
Dans cette section il est question de nétoyer les données et de les séparer en train, validation, test. Pour cela Nous lisons le fichier contenant toutes les données. Ce fichier contient trois colonnes:
- l'identifiant de la session;
- le temps;
- les identifiants des items sur lesquels l'utilisateur a cliqué pendant la session.

De prime abord nous retenons les sessions de longueur au moins égale à deux ensuite les items qui ont été sélectionnés au moins 5 fois puis une selection des sessions au moins égales à 2. Pour séparer le dataset en train et en test il faut éviter de couper une session (cf article) et donc la stratégie utilisée est de mettre dans le fichier train les sessions qui ont été effectuées jusqu'à un moment donné (tmax-86400) et le reste dans le dataset test. La même stratégie est appliquée sur le dataset train pour le séparer en train et en validation.

In [35]:
### liens pour les données
##le premier permet de récupérer les données à nétoyer et le second lien va contenir les données nétoyées
PATH_TO_ORIGINAL_DATA = 'C:/Users/mbial/OneDrive/Bureau/2020_2021/2020-2021/ENSAE/projet/datasets/RSC15/'
PATH_TO_PROCESSED_DATA = 'C:/Users/mbial/OneDrive/Bureau/2020_2021/2020-2021/ENSAE/projet/datasets/RSC15/'

In [5]:
##reading of the entire dataset
data = pd.read_csv('yoochoose-clicks.dat', sep=',', header=None, usecols=[0,1,2], dtype={0:np.int32, 1:str, 2:np.int64})

In [7]:
data.columns = ['SessionId', 'TimeStr', 'ItemId']

In [8]:
data['Time'] = data.TimeStr.apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%dT%H:%M:%S.%fZ').timestamp()) #This is not UTC. It does not really matter.
del(data['TimeStr']) ##delete the column TimeStr

In [14]:
data.shape ##Check the shape of data

(33003944, 3)

In [11]:
data.head()

Unnamed: 0,SessionId,ItemId,Time
0,1,214536502,1396861000.0
1,1,214536500,1396861000.0
2,1,214536506,1396861000.0
3,1,214577561,1396861000.0
4,2,214662742,1396872000.0


In [13]:
session_lengths = data.groupby('SessionId').size() #size of each session
print(session_lengths)

SessionId
1           4
2           6
3           3
4           2
6           2
           ..
11562156    2
11562157    2
11562158    3
11562159    1
11562161    1
Length: 9249729, dtype: int64


In [15]:
### Only keep session with length >1
data = data[np.in1d(data.SessionId, session_lengths[session_lengths>1].index)]
data.shape #check the new shape of data

(31744233, 3)

In [16]:
item_supports = data.groupby('ItemId').size() #number of time each item was selected
item_supports

ItemId
214507224     32
214507226     13
214507228      1
214507239      6
214507256      1
              ..
1178835219     1
1178835247     1
1178835585     1
1178835641     1
1178837797    12
Length: 52069, dtype: int64

In [17]:
### Only keep items selected at least 5 times
data = data[np.in1d(data.ItemId, item_supports[item_supports>=5].index)]

In [18]:
data.shape #check new data shape

(31713448, 3)

In [20]:
#check session length
session_lengths = data.groupby('SessionId').size()
np.min(session_lengths)

1

In [21]:
### Only keep session with length >1
data = data[np.in1d(data.SessionId, session_lengths[session_lengths>=2].index)]

In [22]:
tmax = data.Time.max() #check the max time of sessions
tmax

1412038799.43

In [23]:
session_max_times = data.groupby('SessionId').Time.max()
session_max_times

SessionId
1           1.396861e+09
2           1.396872e+09
3           1.396438e+09
4           1.396866e+09
6           1.396797e+09
                ...     
11562152    1.411718e+09
11562153    1.411571e+09
11562156    1.411700e+09
11562157    1.411641e+09
11562158    1.411701e+09
Name: Time, Length: 7981581, dtype: float64

In [26]:
session_train = session_max_times[session_max_times < tmax-86400].index ##selected index for train data
session_test = session_max_times[session_max_times >= tmax-86400].index ##selected index for test data

In [33]:
train = data[np.in1d(data.SessionId, session_train)] ##train data
test = data[np.in1d(data.SessionId, session_test)]
test = test[np.in1d(test.ItemId, train.ItemId)] ##test data

In [28]:
test

Unnamed: 0,SessionId,ItemId,Time
32186847,11265009,214586805,1.411997e+09
32186848,11265009,214509260,1.411997e+09
32186868,11265017,214857547,1.412011e+09
32186869,11265017,214857268,1.412011e+09
32186870,11265017,214857260,1.412011e+09
...,...,...,...
33003915,11299816,214859859,1.412014e+09
33003916,11299816,214859859,1.412014e+09
33003917,11299816,214859859,1.412014e+09
33003918,11299816,214746399,1.412015e+09


In [29]:
train

Unnamed: 0,SessionId,ItemId,Time
0,1,214536502,1.396861e+09
1,1,214536500,1.396861e+09
2,1,214536506,1.396861e+09
3,1,214577561,1.396861e+09
4,2,214662742,1.396872e+09
...,...,...,...
33003939,11299809,214819412,1.411630e+09
33003940,11299809,214830939,1.411631e+09
33003941,11299811,214854855,1.411578e+09
33003942,11299811,214854838,1.411578e+09


In [34]:
tslength = test.groupby('SessionId').size()
print(np.min(tslength))
test = test[np.in1d(test.SessionId, tslength[tslength>=2].index)]

2


In [36]:
##save train data and test data
print('Full train set\n\tEvents: {}\n\tSessions: {}\n\tItems: {}'.format(len(train), train.SessionId.nunique(), train.ItemId.nunique()))
train.to_csv(PATH_TO_PROCESSED_DATA + 'rsc15_train_full.txt', sep='\t', index=False)
print('Test set\n\tEvents: {}\n\tSessions: {}\n\tItems: {}'.format(len(test), test.SessionId.nunique(), test.ItemId.nunique()))
test.to_csv(PATH_TO_PROCESSED_DATA + 'rsc15_test.txt', sep='\t', index=False)


Full train set
	Events: 31637239
	Sessions: 7966257
	Items: 37483
Test set
	Events: 71222
	Sessions: 15324
	Items: 6751


In [37]:
##same strategy for the selection of validation data
##split the previous train data on train data and validation data
##save dataframe
tmax = train.Time.max()
session_max_times = train.groupby('SessionId').Time.max()
session_train = session_max_times[session_max_times < tmax-86400].index
session_valid = session_max_times[session_max_times >= tmax-86400].index
train_tr = train[np.in1d(train.SessionId, session_train)]
valid = train[np.in1d(train.SessionId, session_valid)]
valid = valid[np.in1d(valid.ItemId, train_tr.ItemId)]
tslength = valid.groupby('SessionId').size()
valid = valid[np.in1d(valid.SessionId, tslength[tslength>=2].index)]
print('Train set\n\tEvents: {}\n\tSessions: {}\n\tItems: {}'.format(len(train_tr), train_tr.SessionId.nunique(), train_tr.ItemId.nunique()))
train_tr.to_csv(PATH_TO_PROCESSED_DATA + 'rsc15_train_tr.txt', sep='\t', index=False)
print('Validation set\n\tEvents: {}\n\tSessions: {}\n\tItems: {}'.format(len(valid), valid.SessionId.nunique(), valid.ItemId.nunique()))
valid.to_csv(PATH_TO_PROCESSED_DATA + 'rsc15_train_valid.txt', sep='\t', index=False)

Train set
	Events: 31579006
	Sessions: 7953885
	Items: 37483
Validation set
	Events: 58233
	Sessions: 12372
	Items: 6359


### Expérimentations

#### prise en main des données

In [124]:
class Dataset(object):
    """
    Read data,
    create indices for items ID
    created sorted or no indices for SessionID
    
    arguments:
    ----------
    path: The link to find data,
    session_key: The name of session ID on dataset
    item_key: The name of Item ID on dataset
    time_key: The name of Time on dataset
    n_sample : Number of sample should be used on dataset if positive
    itemmap : list of item indices (none per default)
    time_sort: wheter the session ID index should be sorted by time or not
    """
    def __init__(self, path, sep='\t', session_key='SessionId', item_key='ItemId', time_key='Time', n_sample=-1, itemmap=None, itemstamp=None, time_sort=False):
        
        self.df = pd.read_csv(path, sep=sep, dtype={session_key: int, item_key: int, time_key: float}) #rezd csv
        self.session_key = session_key
        self.item_key = item_key
        self.time_key = time_key
        self.time_sort = time_sort
        if n_sample > 0:
            self.df = self.df[:n_sample]

        
        self.add_item_indices(itemmap=itemmap) # Add colummn item index to data
        """
        Sort the df by time, and then by session ID. That is, df is sorted by session ID and
        clicks within a session are next to each other, where the clicks within a session are time-ordered.
        """
        self.df.sort_values([session_key, time_key], inplace=True)
        self.click_offsets = self.get_click_offset()
        self.session_idx_arr = self.order_session_idx()

    def add_item_indices(self, itemmap=None):
        """
        Add item index column named "item_idx" to the df
        Args:
            itemmap (pd.DataFrame): mapping between the item Ids and indices
        """
        if itemmap is None:
            item_ids = self.df[self.item_key].unique()  # numpy ND_array with all item_key
            item2idx = pd.Series(data=np.arange(len(item_ids)),
                                 index=item_ids) # Numpy array with index of each item_key
            # Build itemmap is a DataFrame that have 2 columns (self.item_key, 'item_idx)
            itemmap = pd.DataFrame({self.item_key: item_ids,
                                   'item_idx': item2idx[item_ids].values})
        self.itemmap = itemmap
        self.df = pd.merge(self.df, self.itemmap, on=self.item_key, how='inner')

    def get_click_offset(self):
        """return a cumulative sum of sessions' size
        
        """
        offsets = np.zeros(self.df[self.session_key].nunique() + 1, dtype=np.int32)
        offsets[1:] = self.df.groupby(self.session_key).size().cumsum()
        return offsets

    def order_session_idx(self):
        """Return sorted indices by time of session key if mentionned (self.time_sort=True)
        else return indices of session key
        """
        if self.time_sort:
            sessions_start_time = self.df.groupby(self.session_key)[self.time_key].min().values ##minimum time of each session
            session_idx_arr = np.argsort(sessions_start_time) #return indice of session_key that would sort sessions_start_time
        else:
            session_idx_arr = np.arange(self.df[self.session_key].nunique())
        return session_idx_arr

    @property
    def items(self):
        """number of unique items on session
        """
        return self.itemmap[self.item_key].unique()

In [149]:
class DataLoader():
    def __init__(self, dataset, batch_size=50):
        """
        A class for creating session-parallel mini-batches.

        Args:
             dataset (SessionDataset): the session dataset to generate the batches from
             batch_size (int): size of the batch
        """
        self.dataset = dataset
        self.batch_size = batch_size

    def __iter__(self):
        """ Returns the iterator for producing session-parallel training mini-batches.

        Yields:
            input (B,): torch.FloatTensor. Item indices that will be encoded as one-hot vectors later.
            target (B,): a Variable that stores the target item indices
            masks: Numpy array indicating the positions of the sessions to be terminated
        """
        # initializations
        df = self.dataset.df ##dataframe whith sessions, items and times
        click_offsets = self.dataset.click_offsets #cumulative number of sessions clicks
        session_idx_arr = self.dataset.session_idx_arr ##indices array of each session

        iters = np.arange(self.batch_size) #iterations
        maxiter = iters.max() ##maximum number of iterations
        start = click_offsets[session_idx_arr[iters]] #The begin of the session in same batch on the dataset
        end = click_offsets[session_idx_arr[iters] + 1] #The begin of the session out of the batch
        mask = []  # indicator for the sessions to be terminated
        finished = False

        while not finished:
            minlen = (end - start).min()
            # Item indices(for embedding) for clicks where the first sessions start
            idx_target = df.item_idx.values[start]

            for i in range(minlen - 1):
                # Build inputs & targets
                idx_input = idx_target
                idx_target = df.item_idx.values[start + i + 1]
                inputs = torch.LongTensor(idx_input)
                target = torch.LongTensor(idx_target)
                yield inputs, target, mask

            # click indices where a particular session meets second-to-last element
            start = start + (minlen - 1)
            # see if how many sessions should terminate
            mask = np.arange(len(iters))[(end - start) <= 1]
            for idx in mask:
                maxiter += 1
                if maxiter >= len(click_offsets) - 1:
                    finished = True
                    break
                # update the next starting/ending point
                iters[idx] = maxiter
                start[idx] = click_offsets[session_idx_arr[maxiter]]
                end[idx] = click_offsets[session_idx_arr[maxiter] + 1]

#### Fonctions de pertes

In [213]:

class LossFunction(nn.Module):
    def __init__(self, loss_type='TOP1', use_cuda=False):
        """ An abstract loss function that can supports custom loss functions compatible with PyTorch."""
        super(LossFunction, self).__init__()
        self.loss_type = loss_type
        self.use_cuda = use_cuda
        if loss_type == 'CrossEntropy':
            self._loss_fn = SampledCrossEntropyLoss(use_cuda)
        elif loss_type == 'TOP1':
            self._loss_fn = TOP1Loss()
        elif loss_type == 'BPR':
            self._loss_fn = BPRLoss()
        elif loss_type == 'TOP1-max':
            self._loss_fn = TOP1_max()
        elif loss_type == 'BPR-max':
            self._loss_fn = BPR_max()
        else:
            raise NotImplementedError

    def forward(self, logit):
        return self._loss_fn(logit)


class SampledCrossEntropyLoss(nn.Module):
    """ CrossEntropyLoss with n_classes = batch_size = the number of samples in the session-parallel mini-batch """
    def __init__(self, use_cuda):
        """
        Args:
             use_cuda (bool): whether to use cuda or not
        """
        super(SampledCrossEntropyLoss, self).__init__()
        self.xe_loss = nn.CrossEntropyLoss()
        self.use_cuda = use_cuda

    def forward(self, logit):
        batch_size = logit.size(1)
        target = Variable(torch.arange(batch_size).long())
        if self.use_cuda:
            target = target.cuda()

        return self.xe_loss(logit, target)


class BPR(nn.Module):
    def __init__(self):
        super(BPRLoss, self).__init__()

    def forward(self, logit):
        """
        Args:
            logit (BxB): Variable that stores the logits for the items in the mini-batch
                         The first dimension corresponds to the batches, and the second
                         dimension corresponds to sampled number of items to evaluate
        """
        # differences between the item scores
        diff = logit.diag().view(-1, 1).expand_as(logit) - logit
        # final loss
        loss = -torch.mean(F.logsigmoid(diff))
        return loss


class BPR_max(nn.Module):
    def __init__(self):
        super(BPR_max, self).__init__()
    def forward(self, logit):
        logit_softmax = F.softmax(logit, dim=1)
        diff = logit.diag().view(-1, 1).expand_as(logit) - logit
        loss = -torch.log(torch.mean(logit_softmax * torch.sigmoid(diff)))
        return loss


class TOP1(nn.Module):
    def __init__(self):
        super(TOP1Loss, self).__init__()
    def forward(self, logit):
        """
        Args:
            logit (BxB): Variable that stores the logits for the items in the mini-batch
                         The first dimension corresponds to the batches, and the second
                         dimension corresponds to sampled number of items to evaluate
        """
        diff = -(logit.diag().view(-1, 1).expand_as(logit) - logit)
        loss = torch.sigmoid(diff).mean() + torch.sigmoid(logit ** 2).mean()
        return loss


class TOP1_max(nn.Module):
    def __init__(self):
        super(TOP1_max, self).__init__()

    def forward(self, logit):
        logit_softmax = F.softmax(logit, dim=1)
        diff = -(logit.diag().view(-1, 1).expand_as(logit) - logit)
        loss = torch.mean(logit_softmax * (torch.sigmoid(diff) + torch.sigmoid(logit ** 2)))
        return loss


#### L'optimiseur

In [158]:
import torch.optim as optim


class Optimizer:
    def __init__(self, params, optimizer_type='Adagrad', lr=.05,
                 momentum=0, weight_decay=0, eps=1e-6):
        '''
        An abstract optimizer class for handling various kinds of optimizers.
        You can specify the optimizer type and related parameters as you want.
        Usage is exactly the same as an instance of torch.optim

        Args:
            params: torch.nn.Parameter. The NN parameters to optimize
            optimizer_type: type of the optimizer to use
            lr: learning rate
            momentum: momentum, if needed
            weight_decay: weight decay, if needed. Equivalent to L2 regulariztion.
            eps: eps parameter, if needed.
        '''
        if optimizer_type == 'RMSProp':
            self.optimizer = optim.RMSprop(params, lr=lr, eps=eps, weight_decay=weight_decay, momentum=momentum)
        elif optimizer_type == 'Adagrad':
            self.optimizer = optim.Adagrad(params, lr=lr, weight_decay=weight_decay)
        elif optimizer_type == 'Adadelta':
            self.optimizer = optim.Adadelta(params, lr=lr, eps=eps, weight_decay=weight_decay)
        elif optimizer_type == 'Adam':
            self.optimizer = optim.Adam(params, lr=lr, eps=eps, weight_decay=weight_decay)
        elif optimizer_type == 'SparseAdam':
            self.optimizer = optim.SparseAdam(params, lr=lr, eps=eps)
        elif optimizer_type == 'SGD':
            self.optimizer = optim.SGD(params, lr=lr, momentum=momentum, weight_decay=weight_decay)
        else:
            raise NotImplementedError

    def zero_grad(self):
        self.optimizer.zero_grad()

    def step(self):
        self.optimizer.step()

### Le modèle

In [160]:
from torch import nn
import torch

class GRU4REC(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1, final_act='tanh',
                 dropout_hidden=.5, dropout_input=0, batch_size=50, embedding_dim=-1, use_cuda=False):
        super(GRU4REC, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.dropout_hidden = dropout_hidden
        self.dropout_input = dropout_input
        self.embedding_dim = embedding_dim
        self.batch_size = batch_size
        self.use_cuda = use_cuda
        self.device = torch.device('cuda' if use_cuda else 'cpu')
        self.onehot_buffer = self.init_emb()
        self.h2o = nn.Linear(hidden_size, output_size)
        self.create_final_activation(final_act)
        if self.embedding_dim != -1:
            self.look_up = nn.Embedding(input_size, self.embedding_dim)
            self.gru = nn.GRU(self.embedding_dim, self.hidden_size, self.num_layers, dropout=self.dropout_hidden)
        else:
            self.gru = nn.GRU(self.input_size, self.hidden_size, self.num_layers, dropout=self.dropout_hidden)
        self = self.to(self.device)

    def create_final_activation(self, final_act):
        """Set the final activation function of the network
        """
        if final_act == 'tanh':
            self.final_activation = nn.Tanh()
        elif final_act == 'relu':
            self.final_activation = nn.ReLU()
        elif final_act == 'softmax':
            self.final_activation = nn.Softmax()
        elif final_act == 'softmax_logit':
            self.final_activation = nn.LogSoftmax()
        elif final_act.startswith('elu-'):
            self.final_activation = nn.ELU(alpha=float(final_act.split('-')[1]))
        elif final_act.startswith('leaky-'):
            self.final_activation = nn.LeakyReLU(negative_slope=float(final_act.split('-')[1]))

    def forward(self, input, hidden):
        '''
        Args:
            input (B,): a batch of item indices from a session-parallel mini-batch.
            target (B,): torch.LongTensor of next item indices from a session-parallel mini-batch.

        Returns:
            logit (B,C): Variable that stores the logits for the next items in the session-parallel mini-batch
            hidden: GRU hidden state
        '''

        if self.embedding_dim == -1:
            embedded = self.onehot_encode(input)
            if self.training and self.dropout_input > 0: embedded = self.embedding_dropout(embedded)
            embedded = embedded.unsqueeze(0)
        else:
            embedded = input.unsqueeze(0)
            embedded = self.look_up(embedded)

        output, hidden = self.gru(embedded, hidden) #(num_layer, B, H)
        output = output.view(-1, output.size(-1))  #(B,H)
        logit = self.final_activation(self.h2o(output))

        return logit, hidden

    def init_emb(self):
        '''
        Initialize the one_hot embedding buffer, which will be used for producing the one-hot embeddings efficiently
        '''
        onehot_buffer = torch.FloatTensor(self.batch_size, self.output_size)
        onehot_buffer = onehot_buffer.to(self.device)
        return onehot_buffer

    def onehot_encode(self, input):
        """
        Returns a one-hot vector corresponding to the input
        Args:
            input (B,): torch.LongTensor of item indices
            buffer (B,output_size): buffer that stores the one-hot vector
        Returns:
            one_hot (B,C): torch.FloatTensor of one-hot vectors
        """
        self.onehot_buffer.zero_()
        index = input.view(-1, 1)
        one_hot = self.onehot_buffer.scatter_(1, index, 1)
        return one_hot

    def embedding_dropout(self, input):
        p_drop = torch.Tensor(input.size(0), 1).fill_(1 - self.dropout_input)
        mask = torch.bernoulli(p_drop).expand_as(input) / (1 - self.dropout_input)
        mask = mask.to(self.device)
        input = input * mask
        return input

    def init_hidden(self):
        '''
        Initialize the hidden state of the GRU
        '''
        try:
            h0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(self.device)
        except:
            self.device = 'cpu'
            h0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(self.device)
        return h0

#### Calcul des métriques de performance

In [210]:
def get_recall(indices, targets): #recall --> wether next item in session is within top K=20 recommended items or not
    """
    Calculates the recall score for the given predictions and targets
    Args:
        indices (Bxk): torch.LongTensor. top-k indices predicted by the model.
        targets (B): torch.LongTensor. actual target indices.
    Returns:
        recall (float): the recall score
    """
    targets = targets.view(-1, 1).expand_as(indices)
    hits = (targets == indices).nonzero()
    if len(hits) == 0:
        return 0
    n_hits = (targets == indices).nonzero()[:, :-1].size(0)
    recall = float(n_hits) / targets.size(0)
    return recall


def get_mrr(indices, targets): #Mean Receiprocal Rank --> Average of rank of next item in the session.
    """
    Calculates the MRR score for the given predictions and targets
    Args:
        indices (Bxk): torch.LongTensor. top-k indices predicted by the model.
        targets (B): torch.LongTensor. actual target indices.
    Returns:
        mrr (float): the mrr score
    """
    tmp = targets.view(-1, 1)
    targets = tmp.expand_as(indices)
    hits = (targets == indices).nonzero()
    ranks = hits[:, -1] + 1
    ranks = ranks.float()
    rranks = torch.reciprocal(ranks)
    mrr = torch.sum(rranks).data / targets.size(0)
    return mrr.item()


def evaluate(indices, targets, k=20):
    """
    Evaluates the model using Recall@K, MRR@K scores.

    Args:
        logits (B,C): torch.LongTensor. The predicted logit for the next items.
        targets (B): torch.LongTensor. actual target indices.

    Returns:
        recall (float): the recall score
        mrr (float): the mrr score
    """
    _, indices = torch.topk(indices, k, -1)
    recall = get_recall(indices, targets)
    mrr = get_mrr(indices, targets)
    return recall, mrr

#### fonction d'entrainement et d'évaluation

In [217]:
class Evaluation(object):
    def __init__(self, model, loss_func, use_cuda, k=20):
        self.model = model
        self.loss_func = loss_func
        self.topk = k
        self.device = torch.device('cuda' if use_cuda else 'cpu')

    def eval(self, eval_data, batch_size):
        self.model.eval()
        losses = []
        recalls = []
        mrrs = []
        dataloader = DataLoader(eval_data, batch_size)
        with torch.no_grad():
            hidden = self.model.init_hidden()
            for ii, (inputs, target, mask) in tqdm(enumerate(dataloader), total=len(dataloader.dataset.df) // dataloader.batch_size, miniters = 1000):
            #for input, target, mask in dataloader:
                inputs = inputs.to(self.device)
                target = target.to(self.device)
                logit, hidden = self.model(inputs, hidden)
                logit_sampled = logit[:, target.view(-1)]
                loss = self.loss_func(logit_sampled)
                recall, mrr = evaluate(logit, target, k=self.topk)

                # torch.Tensor.item() to get a Python number from a tensor containing a single value
                losses.append(loss.item())
                recalls.append(recall)
                mrrs.append(mrr)
        mean_losses = np.mean(losses)
        mean_recall = np.mean(recalls)
        mean_mrr = np.mean(mrrs)

        return mean_losses, mean_recall,mean_mrr

In [218]:
class Trainer(object):
    def __init__(self, model, train_data, eval_data, optim, use_cuda, loss_func, batch_size,k_eval):
        self.model = model
        self.train_data = train_data
        self.eval_data = eval_data
        self.optim = optim
        self.loss_func = loss_func
        self.evaluation = Evaluation(self.model, self.loss_func, use_cuda, k = k_eval)
        self.device = torch.device('cuda' if use_cuda else 'cpu')
        self.batch_size = batch_size
        #self.args = args

    def train(self, start_epoch, end_epoch, start_time=None):
        results=[]
        if start_time is None:
            self.start_time = time.time()
        else:
            self.start_time = start_time

        for epoch in range(start_epoch, end_epoch + 1):
            st = time.time()
            print('Start Epoch #', epoch)
            train_loss = self.train_epoch(epoch)
            loss, recall, mrr = self.evaluation.eval(self.eval_data, self.batch_size)
            #loss, recall,mrr= self.evaluation.eval(self.eval_data, self.batch_size)
            


            print("Epoch: {}, train loss: {:.4f}, loss: {:.4f}, recall: {:.4f}, mrr: {:.4f}, time: {}".format(epoch, train_loss, loss, recall, mrr, time.time() - st))
            results.append([train_loss,loss,recall,mrr])
        return results
            #checkpoint = {
                #'model': self.model,
                #'args': self.args,
                #'epoch': epoch,
                #'optim': self.optim,
                #'loss': loss,
                #'recall': recall,
                #'mrr': mrr
            #}
            #model_name = os.path.join(self.args.checkpoint_dir, "model_{0:05d}.pt".format(epoch))
            #torch.save(checkpoint, model_name)
            #print("Save model as %s" % model_name)


    def train_epoch(self, epoch):
        self.model.train()
        losses = []

        def reset_hidden(hidden, mask):
            """Helper function that resets hidden state when some sessions terminate"""
            if len(mask) != 0:
                hidden[:, mask, :] = 0
            return hidden

        hidden = self.model.init_hidden()
        dataloader = DataLoader(self.train_data, self.batch_size)
        #for ii,(data,label) in tqdm(enumerate(train_dataloader),total=len(train_data)):
        for ii, (inputs, target, mask) in tqdm(enumerate(dataloader), total=len(dataloader.dataset.df) // dataloader.batch_size, miniters = 1000):
            inputs = inputs.to(self.device)
            target = target.to(self.device)
            self.optim.zero_grad()
            hidden = reset_hidden(hidden, mask).detach()
            logit, hidden = self.model(inputs, hidden)
            # output sampling
            logit_sampled = logit[:, target.view(-1)]
            loss = self.loss_func(logit_sampled)
            losses.append(loss.item())
            loss.backward()
            self.optim.step()

        mean_losses = np.mean(losses)
        return mean_losses

#### Entrainements

In [171]:
##reading of training data and evaluation data
train_data = Dataset('rsc15_train_tr.txt')
valid_data = Dataset('rsc15_train_valid.txt', itemmap=train_data.itemmap)

In [194]:
#initialisation of hyper parameters
input_size = len(train_data.items)
hidden_size = 100
num_layers = 3
output_size = input_size
batch_size = 32
dropout_input = 0
dropout_hidden = 0.5
embedding_dim = -1
final_act = 'tanh'
loss_type = 'TOP1-max'
optimizer_type = 'Adagrad'
lr = 0.01
weight_decay = 0
momentum = 0
eps = 1e-6
n_epochs = 2
time_sort = False
cuda = torch.cuda.is_available()
is_eval = False
k_eval=20

In [195]:
#initialisation of loss function
loss_function = LossFunction(loss_type=loss_type, use_cuda=cuda) 

First training with TOP1-max loss function on 2 epochs

In [198]:
if not is_eval: #training
        #Initialize the model
        model = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result1 = trainer.train(0, n_epochs - 1)

  0%|                                                                                       | 0/986843 [00:00<?, ?it/s]

#### START TRAINING....
Start Epoch # 0


 75%|█████████████████████████████████████████████████████                  | 738282/986843 [1:38:55<33:18, 124.38it/s]
 79%|████████████████████████████████████████████████████████████▌                | 1430/1819 [00:03<00:00, 443.79it/s]
  0%|                                                                                       | 0/986843 [00:00<?, ?it/s]

Epoch: 0, train loss: 0.0281, loss: 0.0288, recall: 0.3393, mrr: 1.0000, time: 5939.15017080307
Start Epoch # 1


 75%|█████████████████████████████████████████████████████                  | 738282/986843 [1:39:32<33:30, 123.61it/s]
 79%|████████████████████████████████████████████████████████████▌                | 1430/1819 [00:03<00:00, 441.12it/s]

Epoch: 1, train loss: 0.0277, loss: 0.0287, recall: 0.3962, mrr: 1.0000, time: 5976.10270524025





Nous faisons les première expériences sur toutes les fonctions de perte (BPR, BPR-max, TOP1, TOP1-max, Cross entropy

In [219]:
loss_type = 'BPR-max'

In [None]:
n_epochs = 3
if not is_eval: #training
        #Initialize the model
        model = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result1 = trainer.train(0, n_epochs - 1)

  0%|                                                                                       | 0/986843 [00:00<?, ?it/s]

#### START TRAINING....
Start Epoch # 0


 75%|█████████████████████████████████████████████████████▊                  | 738282/986843 [2:19:21<46:55, 88.29it/s]
 79%|████████████████████████████████████████████████████████████▌                | 1430/1819 [00:04<00:01, 311.79it/s]
  0%|                                                                                       | 0/986843 [00:00<?, ?it/s]

Epoch: 0, train loss: 0.0281, loss: 0.0289, recall: 0.3100, mrr: 0.0813, time: 8366.553636550903
Start Epoch # 1


 51%|███████████████████████████████████                                  | 502167/986843 [1:15:04<1:10:24, 114.72it/s]

In [None]:

with open("result1.txt", "w") as fp:
    json.dump(result1, fp)

In [None]:
loss_type = 'BPR'

In [None]:
n_epochs = 3
if not is_eval: #training
        #Initialize the model
        model2 = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model2.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model2, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result2 = trainer.train(0, n_epochs - 1)

In [None]:

with open("result2.txt", "w") as fp:
    json.dump(result2, fp)

In [None]:
loss_type = 'TOP1-max'

In [None]:
n_epochs = 3
if not is_eval: #training
        #Initialize the model
        model3 = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model3.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model3, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result3 = trainer.train(0, n_epochs - 1)

In [None]:
with open("result3.txt", "w") as fp:
    json.dump(result3, fp)

In [None]:
loss_type = 'TOP1'

In [None]:
n_epochs = 3
if not is_eval: #training
        #Initialize the model
        model4 = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model4.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model4, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result4 = trainer.train(0, n_epochs - 1)

In [None]:
with open("result4.txt", "w") as fp:
    json.dump(result4, fp)

In [None]:
loss_type = 'SampledCrossEntropyLoss'

In [None]:
n_epochs = 3
if not is_eval: #training
        #Initialize the model
        model5 = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim).to('cuda')
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model5.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model5, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size,k_eval=k_eval)
        print('#### START TRAINING....')
        result5 = trainer.train(0, n_epochs - 1)

In [None]:
with open("result5.txt", "w") as fp:
    json.dump(result5, fp)

In [199]:
result1

[[0.0280873653445166, 0.0287911825585407, 0.3392701048951049, 1],
 [0.027731535416971355, 0.02869272900654094, 0.39615384615384613, 1]]

In [None]:
def __init__(self, model, loss_func, use_cuda, k=20):

In [212]:
Evaluation(model, loss_function, cuda,  k_eval).eval(valid_data, batch_size)

 79%|████████████████████████████████████████████████████████████▌                | 1430/1819 [00:03<00:00, 454.21it/s]


(0.02869272900654094, 0.39615384615384613, 0.11811459891862802)

In [None]:
.detach().numpy()

In [None]:
if not is_eval: #training
        #Initialize the model
        model = GRU4REC(input_size, hidden_size, output_size, final_act=final_act,
                            num_layers=num_layers, use_cuda=cuda, batch_size=batch_size,
                            dropout_input=dropout_input, dropout_hidden=dropout_hidden, embedding_dim=embedding_dim)
        #weights initialization
        #init_model(model)
        #optimizer
        optimizer = Optimizer(model.parameters(), optimizer_type=optimizer_type, lr=lr,
                                  weight_decay=weight_decay, momentum=momentum, eps=eps)
        #trainer class
        trainer = Trainer(model, train_data=train_data, eval_data=valid_data, optim=optimizer,
                              use_cuda=cuda, loss_func=loss_function, batch_size=batch_size)
        print('#### START TRAINING....')
        trainer.train(0, n_epochs - 1)
    else: #testi
        
        ng
        if args.load_model is not None:
            print("Loading pre-trained model from {}".format(args.load_model))
            try:
                checkpoint = torch.load(args.load_model)
            except:
                checkpoint = torch.load(args.load_model, map_location=lambda storage, loc: storage)
            model = checkpoint["model"]
            model.gru.flatten_parameters()
            evaluation = lib.Evaluation(model, loss_function, use_cuda=args.cuda, k = args.k_eval)
            loss, recall, mrr = evaluation.eval(valid_data, batch_size)
            print("Final result: recall = {:.2f}, mrr = {:.2f}".format(recall, mrr))
        else:
            print("No Pretrained Model was found!")

In [6]:
cd GRU4Rec

C:\Users\mbial\GRU4Rec


In [4]:
!python GRU4Rec/run.py -h

usage: run.py [-h] [-ps PARAM_STRING] [-pf PARAM_PATH] [-l] [-s MODEL_PATH]
              [-t TEST_PATH [TEST_PATH ...]] [-m AT [AT ...]] [-e EVAL_TYPE]
              [-ss SS] [--sample_store_on_cpu]
              [--test_against_items N_TEST_ITEMS]
              PATH

Train or load a GRU4Rec model & measure recall and MRR on the specified test
set(s).

positional arguments:
  PATH                  Path to the training data (TAB separated file (.tsv or
                        .txt) or pickled pandas.DataFrame object (.pickle))
                        (if the --load_model parameter is NOT provided) or to
                        the serialized model (if the --load_model parameter is
                        provided).

optional arguments:
  -h, --help            show this help message and exit
  -ps PARAM_STRING, --parameter_string PARAM_STRING
                        Training parameters provided as a single parameter
                        string. The format of the string is `param_name

stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device


In [10]:
train_data = 'C:/Users/mbial/OneDrive/Bureau/2020_2021/2020-2021/ENSAE/projet/datasets/RSC15'

In [11]:
!python GRU4Rec/run.py rsc15_train_tr.txt -t rsc15_train_valid.txt -m 1 5 10 20 -ps loss=bpr-max,final_act=elu-0.5,hidden_act=tanh,layers=100,adapt=adagrad,n_epochs=10,batch_size=32,dropout_p_embed=0.0,dropout_p_hidden=0.0,learning_rate=0.2,momentum=0.3,n_sample=2048,sample_alpha=0.0,bpreg=1.0,constrained_embedding=False

SET   loss                    TO   bpr-max   (type: <class 'str'>)
SET   final_act               TO   elu-0.5   (type: <class 'str'>)
SET   hidden_act              TO   tanh      (type: <class 'str'>)
SET   layers                  TO   [100]     (type: <class 'list'>)
SET   adapt                   TO   adagrad   (type: <class 'str'>)
SET   n_epochs                TO   10        (type: <class 'int'>)
SET   batch_size              TO   32        (type: <class 'int'>)
SET   dropout_p_embed         TO   0.0       (type: <class 'float'>)
SET   dropout_p_hidden        TO   0.0       (type: <class 'float'>)
SET   learning_rate           TO   0.2       (type: <class 'float'>)
SET   momentum                TO   0.3       (type: <class 'float'>)
SET   n_sample                TO   2048      (type: <class 'int'>)
SET   sample_alpha            TO   0.0       (type: <class 'float'>)
SET   bpreg                   TO   1.0       (type: <class 'float'>)
SET   constrained_embedding   TO   False     (typ

stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "c:\users\mbial\anaconda3\lib\site-packages\theano\gpuarray\__init__.py", line 227, in <module>
    use(config.device)
  File "c:\users\mbial\anaconda3\lib\site-packages\theano\gpuarray\__init__.py", line 214, in use
    init_de