<a id="plan"></a>

<div style="font-variant: small-caps; 
      font-weight: normal; 
      font-size: 30px; 
      text-align: center; 
      padding: 15px; 
      margin: 10px;">
  Deep Learning for NLP
  </div> 
  
<div style="font-variant: small-caps; 
      font-weight: normal; 
      font-size: 30px; 
      text-align: center; 
      padding: 15px; 
      margin: 10px;">
  Dashboard
  </div> 

  <div style="font-variant: small-caps; 
      font-weight: normal; 
      font-size: 20px; 
      text-align: center; 
      padding: 15px;">
  Jean-baptiste Aujogue
  </div> 

# [Models](#models)

### Part I

1. [Word Embedding](#embedding)

|  | CBOW | Skipgram|
|------|------|------|
| Word-level Embedding | [1.1](#CBOW) | [1.2](#Skipgram) |

2. [Sentence Classification](#sentence_classification)
    
3. [Language Model](#languageModel)

4. Sequence Labelling


### Part II

5. Auto-Encoder

6. Translator

7. Text classifier




### Part III

8. Abstractive summarization

9. Question answering

10. [Chatbot](#chatbot)


|  | Sans mémoire | Avec mémoire à règles | Avec mémoire agnostique|
|------|------|------|
| **Sélectif** | [10.1.1](#ChatbotsSelectifsSansMemoire) | [10.2.1](#ChatbotsSelectifsAvecMemoireRegles) | [10.3.1](#ChatbotsSelectifsAvecMemoireAgnostique) |
| **Génératif** | [10.1.2](#ChatbotsGeneratifsSansMemoire) | [10.2.2](#ChatbotsGeneratifsAvecMemoireRegles) | [10.3.2](#ChatbotsGeneratifsAvecMemoireAgnostique) |

***
# [Modules](#modules)

2.1 [Encodeurs de mots](#encodeursDeMots)
- 2.1.1 Encodeur Récurrent
        
2.2 [Modules d'attention simple](#attentionSimple)
- 2.2.1 Attention additive
- 2.2.2 Attention additive multi-tête
- 2.2.3 Attention additive multi-hopée
        
2.3 [Modules d'attention hiérarchique](#attentionHierarchique)
    
2.4 [Décodeurs](#decodeurs)
- 2.4.1 Décodeur sélectif
- 2.4.2 Décodeur génératif
- 2.4.3 Décodeur génératif à attention
- 2.4.4 Décodeur génératif à modèle linguistique

# [Miscellaneous](#misc)


M.1 [Filtre Anti-bruit](#FiltreAntiBruit)

# [Utils](#utils)


4.1 [Language](#lang)

4.2 [Attention weights visualization](#attn_viz)

[Bas de page](#basDePage)

Création du répertoire principal contenant la librairie, dans le quel on se déplace ensuite et où on génère un fichier README.txt avec une brève présentation de cette librairie :

In [31]:
%mkdir chatNLP

Un sous-r‚pertoire ou un fichier chatNLP existe d‚j….


In [32]:
%cd chatNLP

C:\Users\Jb\Desktop\NLP\chatNLP\chatNLP


In [33]:
%%writefile README.txt


Inspiration pour la construction de la librairie :
    
https://github.com/pytorch/fairseq
https://github.com/allenai/allennlp
https://www.dabeaz.com/modulepackage/ModulePackage.pdf

Overwriting README.txt


Transformation du répertoire courant en librairie Python :

In [34]:
%%writefile __init__.py

#import libNLP.modules
#import libNLP.models

Overwriting __init__.py


<a id="models"></a>

***
# Models
***

[Retour à la table des matières](#plan)

Génération du sous-répertoire _libNLP.models_ contenant l'ensemble des modèles de Deep Learning développés dans cette librairie :

In [35]:
%mkdir models

Un sous-r‚pertoire ou un fichier models existe d‚j….


In [36]:
%%writefile models/__init__.py


from .Word_Embedding import Word2Vec, Word2VecConnector, Word2VecShell
from .Sentence_Classifier import SentenceClassifier


__all__ = [
    'Word2Vec',
    'Word2VecConnector',
    'Word2VecShell',
    
    'SentenceClassifier',
    
    'LanguageModel',
    
    'Chatbot',
    'CreateBot',
    'BotTrainer']

Overwriting models/__init__.py


<a id="embedding"></a>

# 1 Embedding

[Retour à la table des matières](#plan)

<a id="CBOW"></a>

### 1.1 Continuous Bag Of Word (CBOW)

[Retour à la table des matières](#plan)

Cette méthode de vectorisation est introduite dans \cite{mikolov2013distributed, mikolov2013efficient}, et consiste à construire pour un vocabulaire de mots une table de vectorisation $T$ contenant un vecteur par mot. La spécificité de cette méthode est que cette vectorisation est faite de façon à pouvoir prédire chaque mot à partir de son contexte. La construction de cette table $T$ passe par la création d'un réseau de neurones, qui sert de modèle pour l'estimation de la probabilité de prédiction d'un mot $w_t$ d'après son contexte $c = w_{t-N}, \, ... \, , w_{t-1}$, $w_{t+1}, \, ... \, , w_{t+N}$. La table $T$ intégrée au modèle sera optimisée lorsque ce modèle sera entrainé de façon à ce qu'un mot $w_t$ maximise la vraisemblance de la probabilité $P(. \, | \, c)$ fournie par le modèle. 

Le réseau de neurones de décrit de la façon suivante :

![cbow](figs/CBOW.png)

Un contexte $c = w_{t-N}, \, ... \, , w_{t-1}$, $w_{t+1}, \, ... \, , w_{t+N}$ est vectorisé via une table $T$ fournissant un ensemble de vecteurs denses (typiquement de dimension comprise entre 50 et 300) $T(w_{t-N}), \, ... \, , T(w_{t-1})$, $T(w_{t+1}), \, ... \, , T(w_{t+N})$. Chaque vecteur est ensuite transformé via une transformation affine, dont les vecteurs résultants sont superposés en un unique vecteur

\begin{align*}
v_c = \sum _{i = - N}^N M_i T(w_{t+i}) + b_i
\end{align*}

Le vecteur $v_c$ est de dimension typiquement égale à la dimension de la vectorisation de mots. Une autre table $T'$ est utilisée pour une nouvelle vectorisation du vocabulaire, de sorte que le mot $w_{t}$ soit transformé en un vecteur $T'(w_{t})$ par cette table, et soit proposé en position $t$ avec probabilité

\begin{align*}
P(w_{t} \, | \, c\,) = \frac{\exp\left( T'(w_{t}) \cdot v_c \right) }{\displaystyle \sum _{w \in \mathcal{V}} \exp\left(   T'(w) \cdot v_c 
\right) }
\end{align*}

Ici $\cdot$ désigne le produit scalaire entre vecteurs. L'optimisation de ce modèle permet d'ajuster la table $T$ afin que les vecteurs de mots portent suffisamment d'information pour reformer un mot à partir du contexte.

<a id="Skipgram"></a>

### 1.2 Skip-Gram

[Retour à la table des matières](#plan)

Cette méthode de vectorisation est introduite dans \cite{mikolov2013distributed, mikolov2013efficient} comme version mirroir au Continuous Bag Of Words, et consiste là encore à construire pour un vocabulaire de mots une table de vectorisation $T$ contenant un vecteur par mot. La spécificité de cette méthode est que cette vectorisation est faite non pas de façon prédire un mot central $w$ à partir d'un contexte $c $ comme pour CBOW, mais plutôt de prédire le contexte $c $ à partir du mot central $w$. La construction de cette table $T$ passe par la création d'un réseau de neurones servant de modèle pour l'estimation de la probabilité de prédiction d'un contexte $c = w_{t-N}, \, ... \, , w_{t-1}$, $w_{t+1}, \, ... \, , w_{t+N}$ à partir d'un mot central $w_t$. La table $T$ intégrée au modèle sera optimisée lorsque ce modèle sera entrainé de façon à ce que le contexte  $ c $ maximise la vraisemblance de la probabilité $P( . \, | \, w_t)$ fournie par le modèle.


Une implémentation de ce modèle est la suivante : 


![cbow](figs/skipgram.png)


Un mot courant $w_t$ est vectorisé par une table $T$ fournissant un vecteur dense (typiquement de dimension comprise entre 50 et 300) $T(w_t)$. Ce vecteur est alors transformé en un ensemble de $2N$ vecteurs

\begin{align*}
\sigma (M_{i} T(w_t) + b_{i}) \qquad \qquad i =-N,\, ...\, , -1, 1, \, ...\, , N
\end{align*}

où $N$ désigne la taille de la fenêtre retenue, d'une dimension typiquement égale à la dimension de la vectorisation de mots, et $\sigma$ une fonction non linéaire (typiquement la _Rectified Linear Unit_ $\sigma (x) = max (0, x)$). Une autre table $T'$ est utilisée pour une nouvelle vectorisation du vocabulaire, de sorte que chaque mot $w_{t+i}$, transformé en un vecteur $T'(w_{t+i})$ par cette table, soit proposé en position $t+i$ avec probabilité

\begin{align*}
P( w_{t+i} | \, w_t) = \frac{\exp\left(  T'(w_{t+i}) ^\perp \sigma \left( M_i T(w_t) + b_{i}\right) \right) }{\displaystyle \sum _{w \in \mathcal{V}} \exp\left(   T'(w) ^\perp \sigma \left( M_i T(w_t) + b_i\right) \right) }
\end{align*}

On modélise alors la probabilité qu'un ensemble de mots $c = w_{t-N}, \, ... \, , w_{t-1}$, $w_{t+1}, \, ... \, , w_{t+N}$ soit le contexte d'un mot $w_t$ par le produit

\begin{align*}
 P( c\, | \, w_t) = \prod _{i = -N}^N P( w_{t+i}\, | \, w_t)
\end{align*}

Ce modèle de probabilité du contexte d'un mot est naif au sens où les mots de contextes sont considérés comme indépendants deux à deux dès lors que le mot central est connu. Cette approximation rend cependant le calcul d'optimisation beaucoup plus court.



L'optimisation de ce modèle permet d'ajuster la table $T$ afin que les vecteurs de mots portent suffisamment d'information pour reformer l'intégralité du contexte à partir de ce seul mot. La vectorisation Skip-Gram est typiquement plus performante que CBOW, car la table $T$ subit plus de contrainte dans son optimisation, et puisque le vecteur d'un mot est obtenu de façon à pouvoir prédire l'utilisation réelle du mot, ici donnée par son contexte. 

In [37]:
%%writefile models/Word_Embedding.py

import math
import time
import unicodedata
import re
import random
import copy

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #, FuncFormatter
#%matplotlib inline

import numpy as np
from sklearn.preprocessing import normalize

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.autograd import Variable

from chatNLP.utils import Lang


#-------------------------------------------------------------------#
#                       Word Embedding model                        #
#-------------------------------------------------------------------#


class Word2Vec(nn.Module) :
    def __init__(self, lang, T = 100):
        super(Word2Vec, self).__init__()
        self.lang = lang
        if type(T) == int :
            self.embedding = nn.Embedding(lang.n_words, T)  
        else :
            self.embedding = nn.Embedding(T.shape[0], T.shape[1])
            self.embedding.weight = nn.Parameter(torch.FloatTensor(T))
            
        self.output_dim = self.lookupTable().shape[1]
        self.sims = None
        
    def lookupTable(self) :
        return self.embedding.weight.cpu().detach().numpy()
        
    def computeSimilarities(self) :
        T = normalize(self.lookupTable(), norm = 'l2', axis = 1)
        self.sims = np.matmul(T, T.transpose())
        return

    def most_similar(self, word, bound = 10) :
        if word not in self.lang.word2index : return
        if self.sims is None : self.computeSimilarities()
        index = self.lang.word2index[word]
        coefs = self.sims[index]
        indices = coefs.argsort()[-bound -1 :-1]
        output = [(self.lang.index2word[i], coefs[i]) for i in reversed(indices)]
        return output
    
    def wv(self, word) :
        return self.lookupTable()[self.lang.getIndex(word)]
    
    def addWord(self, word, vector = None) :
        self.lang.addWord(word)
        T = self.lookupTable()
        v = np.random.rand(1, T.shape[1]) if vector is None else vector
        updated_T = np.concatenate((T, v), axis = 0)
        self.embedding = nn.Embedding(updated_T.shape[0], updated_T.shape[1])
        self.embedding.weight = nn.Parameter(torch.FloatTensor(updated_T))
        return
    
    def freeze(self) :
        for param in self.embedding.parameters() : param.requires_grad = False
        return self
    
    def unfreeze(self) :
        for param in self.embedding.parameters() : param.requires_grad = True
        return self
    
    def forward(self, words, device = None) :
        '''Transforms a list of n words into a torch.FloatTensor of size (1, n, emb_dim)'''
        indices  = [self.lang.getIndex(w) for w in words]
        indices  = [[i for i in indices if i is not None]]
        variable = Variable(torch.LongTensor(indices)) # size = (1, n)
        if device is not None : variable = variable.to(device)
        tensor   = self.embedding(variable)            # size = (1, n, emb_dim)
        return tensor



class Word2VecConnector(nn.Module) :
    '''A Pytorch module wrapping a FastText word2vec model'''
    def __init__(self, word2vec) :
        super(Word2VecConnector, self).__init__()
        self.word2vec = word2vec
        self.twin = Word2Vec(lang = Lang([list(word2vec.wv.index2word)], base_tokens = []), T = word2vec.wv.vectors)
        self.twin.addWord('PADDING_WORD')
        self.twin.addWord('UNK')
        self.twin = self.twin.freeze()
        
        self.lang       = self.twin.lang
        self.embedding  = self.twin.embedding
        self.output_dim = self.twin.output_dim
        
    def lookupTable(self) :
        return self.word2vec.wv.vectors
        
    def forward(self, words, device = None) :
        '''Transforms a sequence of n words into a Torch FloatTensor of size (1, n, emb_dim)'''
        try :
            embeddings = Variable(torch.Tensor(self.word2vec[words])).unsqueeze(0)
            if device is not None : embeddings = embeddings.to(device)
        except :
            embeddings = self.twin(words, device)
        return embeddings


#-------------------------------------------------------------------#
#                         training shell                            #
#-------------------------------------------------------------------#



class Word2VecShell(nn.Module):
    '''Word2Vec model :
        - sg = 0 yields CBOW training procedure
        - sg = 1 yields Skip-Gram training procedure
    '''
    def __init__(self, word2vec, device, sg = 0, context_size = 5, hidden_dim = 150, 
                 criterion = nn.NLLLoss(size_average = False), optimizer = optim.SGD):
        super(Word2VecShell, self).__init__()
        self.device = device
        
        # core of Word2Vec
        self.word2vec = word2vec
        
        # training layers
        self.input_n_words  = (2 * context_size if sg == 0 else 1)
        self.output_n_words = (1 if sg == 0 else 2 * context_size)
        self.linear_1  = nn.Linear(self.input_n_words * word2vec.embedding.weight.size(1), self.output_n_words * hidden_dim)
        self.linear_2  = nn.Linear(hidden_dim, lang.n_words)
        
        # training tools
        self.sg = sg
        self.criterion = criterion
        self.optimizer = optimizer
        
        # load to device
        self.to(device)
        
    def forward(self, batch):
        '''Transforms a batch of Ngrams of size (batch_size, input_n_words)
           Into log probabilities of size (batch_size, lang.n_words, output_n_words)
           '''
        batch = batch.to(self.device)                 # size = (batch_size, self.input_n_words)
        embed = self.word2vec.embedding(batch)        # size = (batch_size, self.input_n_words, embedding_dim)
        embed = embed.view((batch.size(0), -1))       # size = (batch_size, self.input_n_words * embedding_dim)
        out = self.linear_1(embed)                    # size = (batch_size, self.output_n_words * hidden_dim) 
        out = out.view((batch.size(0),self.output_n_words, -1))
        out = F.relu(out)                             # size = (batch_size, self.output_n_words, hidden_dim)                                         
        out = self.linear_2(out)                      # size = (batch_size, self.output_n_words, lang.n_words)
        out = torch.transpose(out, 1, 2)              # size = (batch_size, lang.n_words, self.output_n_words)
        log_probs = F.log_softmax(out, dim = 1)       # size = (batch_size, lang.n_words, self.output_n_words)
        return log_probs
    
    def generatePackedNgrams(self, corpus, context_size = 5, batch_size = 32, seed = 42) :
        # generate Ngrams
        data = []
        for text in corpus :
            text = [w for w in text if w in self.word2vec.lang.word2index]
            text = ['SOS' for i in range(context_size)] + text + ['EOS' for i in range(context_size)]
            for i in range(context_size, len(text) - context_size):
                context = text[i-context_size : i] + text[i+1 : i+context_size+1]
                word = text[i]
                data.append([word, context])
        # pack Ngrams into mini_batches
        random.seed(seed)
        random.shuffle(data)
        packed_data = []
        for i in range(0, len(data), batch_size):
            pack0 = [el[0] for el in data[i:i + batch_size]]
            pack0 = [[self.word2vec.lang.getIndex(w)] for w in pack0]
            pack0 = Variable(torch.LongTensor(pack0)) # size = (batch_size, 1)
            pack1 = [el[1] for el in data[i:i + batch_size]]
            pack1 = [[self.word2vec.lang.getIndex(w) for w in context] for context in pack1]
            pack1 = Variable(torch.LongTensor(pack1)) # size = (batch_size, 2*context_size)   
            if   self.sg == 1 : packed_data.append([pack0, pack1])
            elif self.sg == 0 : packed_data.append([pack1, pack0])
            else :
                print('A problem occured')
                pass
        return packed_data
    
    def train(self, ngrams, iters = None, epochs = None, lr = 0.025, random_state = 42,
              print_every = 10, compute_accuracy = False):
        """Performs training over a given dataset and along a specified amount of loop
        s"""
        def asMinutes(s):
            m = math.floor(s / 60)
            s -= m * 60
            return '%dm %ds' % (m, s)

        def timeSince(since, percent):
            now = time.time()
            s = now - since
            rs = s/percent - s
            return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

        def computeAccuracy(log_probs, targets) :
            accuracy = 0
            for i in range(targets.size(0)) :
                for j in range(targets.size(1)) :
                    topv, topi = log_probs[i, :, j].data.topk(1) 
                    ni = topi[0][0]
                    if ni == targets[i, j].data[0] : accuracy += 1
            return (accuracy * 100) / (targets.size(0) * targets.size(1))

        def printScores(start, iter, iters, tot_loss, tot_loss_words, print_every, compute_accuracy) :
            avg_loss = tot_loss / print_every
            avg_loss_words = tot_loss_words / print_every
            if compute_accuracy : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}  accuracy : {:.1f} %'.format(iter, int(iter / iters * 100), avg_loss, avg_loss_words))
            else                : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}                     '.format(iter, int(iter / iters * 100), avg_loss))
            return 0, 0

        def trainLoop(couple, optimizer, compute_accuracy = False):
            """Performs a training loop, with forward pass and backward pass for gradient optimisation."""
            optimizer.zero_grad()
            self.zero_grad()
            log_probs = self(couple[0])           # size = (batch_size, agent.output_n_words, agent.lang.n_words)
            targets   = couple[1].to(self.device) # size = (batch_size, agent.output_n_words)
            loss      = self.criterion(log_probs, targets)
            loss.backward()
            optimizer.step() 
            accuracy = computeAccuracy(log_probs, targets) if compute_accuracy else 0
            return float(loss.item() / (targets.size(0) * targets.size(1))), accuracy
        
        # --- main ---
        np.random.seed(random_state)
        start = time.time()
        optimizer = self.optimizer([param for param in self.parameters() if param.requires_grad == True], lr = lr)
        tot_loss = 0  
        tot_loss_words = 0
        if epochs is None :
            for iter in range(1, iters + 1):
                couple = random.choice(ngrams)
                loss, loss_words = trainLoop(couple, optimizer, compute_accuracy)
                tot_loss += loss
                tot_loss_words += loss_words      
                if iter % print_every == 0 : 
                    tot_loss, tot_loss_words = printScores(start, iter, iters, tot_loss, tot_loss_words, print_every, compute_accuracy)
        else :
            iter = 0
            iters = len(ngrams) * epochs
            for epoch in range(1, epochs + 1):
                print('epoch ' + str(epoch))
                np.random.shuffle(ngrams)
                for couple in ngrams :
                    loss, loss_words = trainLoop(couple, optimizer, compute_accuracy)
                    tot_loss += loss
                    tot_loss_words += loss_words 
                    iter += 1
                    if iter % print_every == 0 : 
                        tot_loss, tot_loss_words = printScores(start, iter, iters, tot_loss, tot_loss_words, print_every, compute_accuracy)
        return

Overwriting models/Word_Embedding.py


<a id="sentence_classification"></a>

# 2 Sentence Classification

[Retour à la table des matières](#plan)

In [38]:
%%writefile models/Sentence_Classifier.py

import math
import time
import unicodedata
import re
import random
import copy
import itertools

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #, FuncFormatter
#%matplotlib inline

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.autograd import Variable

from chatNLP.modules import RecurrentEncoder, SelfAttention
from chatNLP.utils   import heatmap, annotate_heatmap


#-------------------------------------------------------------------#
#                       Sentence Classifier                         #
#-------------------------------------------------------------------#


class SentenceClassifier(nn.Module) :
    def __init__(self, device, tokenizer, word2vec, 
                 hidden_dim = 100, 
                 n_layers = 1, 
                 n_attn_heads = 1, 
                 n_class = 2, 
                 dropout = 0, 
                 class_weights = None, 
                 optimizer = optim.SGD
                 ):
        super(SentenceClassifier, self).__init__()
        
        # embedding
        self.bin_mode  = (n_class == 'binary')
        self.tokenizer = tokenizer
        self.word2vec  = word2vec
        self.context   = RecurrentEncoder(self.word2vec.output_dim, hidden_dim, n_layers, dropout, bidirectional = True)
        self.attention = MultiHeadSelfAttention(self.context.output_dim, n_head = n_attn_heads, dropout = dropout)
        self.out       = nn.Linear(self.attention.output_dim, (1 if self.bin_mode else n_class))
        self.act       = F.sigmoid if self.bin_mode else F.softmax
        
        # optimizer
        if self.bin_mode : self.criterion = nn.BCEWithLogitsLoss(size_average = False)
        else             : self.criterion = nn.NLLLoss(size_average = False, weight = class_weights)
        self.optimizer = optimizer
        
        # load to device
        self.device = device
        self.to(device)
        
    def nbParametres(self) :
        return sum([p.data.nelement() for p in self.parameters() if p.requires_grad == True])
    
    def showAttention(self, words, attn) :
        for i in range(attn.size(1)) :
            fig, ax  = plt.subplots()
            im       = heatmap(np.array(attn[:, i, :].data.cpu().numpy()),  [' '], words, ax=ax, cmap="YlGn", cbarlabel="harvest [t/year]")
            texts    = annotate_heatmap(im, valfmt="{x:.2f}")
            fig.tight_layout()
            plt.show()
        return
        
    def forward(self, sentence, show_attention = False) :
        '''classifies a sentence as string'''
        words         = self.tokenizer(sentence)
        embeddings    = self.word2vec(words, self.device)
        hiddens, _    = self.context(embeddings) 
        attended, atn = self.attention(hiddens)
        if self.bin_mode : prediction = self.act(self.out(attended).view(-1)).data.topk(1)[0].item()
        else             : prediction = self.act(self.out(attended.squeeze(1)), dim = 1).data.topk(1)[1].item()
        if show_attention : self.showAttention(words, atn)
        return prediction
    
    def generatePackedSentences(self, sentences, batch_size = 32) :
        sentences.sort(key = lambda s: len(self.tokenizer(s[0])), reverse = True)
        packed_data = []
        for i in range(0, len(sentences), batch_size) :
            pack0 = [self.tokenizer(s[0]) for s in sentences[i:i + batch_size]]
            pack0 = [[self.word2vec.lang.getIndex(w) for w in words] for words in pack0]
            pack0 = [[w for w in words if w is not None] for words in pack0]
            pack0.sort(key = len, reverse = True)
            lengths = torch.tensor([len(p) for p in pack0])               # size = (batch_size) 
            pack0 = list(itertools.zip_longest(*pack0, fillvalue = self.word2vec.lang.getIndex('PADDING_WORD')))
            pack0 = Variable(torch.LongTensor(pack0).transpose(0, 1))     # size = (batch_size, max_length)
            pack1 = [[el[1]] for el in sentences[i:i + batch_size]]
            if self.bin_mode : pack1 = Variable(torch.FloatTensor(pack1)) # size = (batch_size) 
            else             : pack1 = Variable(torch.LongTensor(pack1))  # size = (batch_size) 
            packed_data.append([[pack0, lengths], pack1])
        return packed_data
    
    def compute_accuracy(self, sentences) :
        batches = self.generatePackedSentences(sentences, batch_size = 32)
        score = 0
        for batch, target in batches :
            embeddings  = self.word2vec.embedding(batch[0].to(self.device))
            hiddens, _  = self.context(embeddings, lengths = batch[1].to(self.device))
            attended, _ = self.attention(hiddens)
            if self.bin_mode : 
                vects  = self.out(attended).view(-1)
                target = target.to(self.device).view(-1)
                score += sum(torch.abs(target - self.act(vects)) < 0.5).item()
            else : 
                log_probs = F.log_softmax(self.out(attended.squeeze(1)))
                target    = target.to(self.device).view(-1)
                score    += sum([target[i].item() == log_probs[i].data.topk(1)[1].item() for i in range(target.size(0))])
        return score * 100 / len(sentences)
    
    def fit(self, batches, iters = None, epochs = None, lr = 0.025, random_state = 42,
              print_every = 10, compute_accuracy = True):
        """Performs training over a given dataset and along a specified amount of loops"""
        def asMinutes(s):
            m = math.floor(s / 60)
            s -= m * 60
            return '%dm %ds' % (m, s)

        def timeSince(since, percent):
            now = time.time()
            s = now - since
            rs = s/percent - s
            return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
        
        def computeLogProbs(batch) :
            embeddings  = self.word2vec.embedding(batch[0].to(self.device))
            hiddens, _  = self.context(embeddings, lengths = batch[1].to(self.device))
            attended, _, penal = self.attention(hiddens, penal = True)
            if self.bin_mode : return self.out(attended).view(-1), penal
            else             : return F.log_softmax(self.out(attended.squeeze(1))), penal

        def computeAccuracy(log_probs, targets) :
            if self.bin_mode : return sum(torch.abs(targets - self.act(log_probs)) < 0.5).item() * 100 / targets.size(0)
            else             : return sum([targets[i].item() == log_probs[i].data.topk(1)[1].item() for i in range(targets.size(0))]) * 100 / targets.size(0)
            
        def printScores(start, iter, iters, tot_loss, tot_loss_words, print_every, compute_accuracy) :
            avg_loss = tot_loss / print_every
            avg_loss_words = tot_loss_words / print_every
            if compute_accuracy : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}  accuracy : {:.1f} %'.format(iter, int(iter / iters * 100), avg_loss, avg_loss_words))
            else                : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}                     '.format(iter, int(iter / iters * 100), avg_loss))
            return 0, 0

        def trainLoop(batch, optimizer, compute_accuracy = True):
            """Performs a training loop, with forward pass, backward pass and weight update."""
            optimizer.zero_grad()
            self.zero_grad()
            log_probs, penal = computeLogProbs(batch[0])
            targets = batch[1].to(self.device).view(-1)
            loss    = self.criterion(log_probs, targets)
            if penal is not None and penal.item() > 10 : loss = loss + penal
            loss.backward()
            optimizer.step() 
            accuracy = computeAccuracy(log_probs, targets) if compute_accuracy else 0
            return float(loss.item() / targets.size(0)), accuracy
        
        # --- main ---
        self.train()
        np.random.seed(random_state)
        start = time.time()
        optimizer = self.optimizer([param for param in self.parameters() if param.requires_grad == True], lr = lr)
        tot_loss = 0  
        tot_acc  = 0
        if epochs is None :
            for iter in range(1, iters + 1):
                batch = random.choice(batches)
                loss, acc = trainLoop(batch, optimizer, compute_accuracy)
                tot_loss += loss
                tot_acc += acc      
                if iter % print_every == 0 : 
                    tot_loss, tot_acc = printScores(start, iter, iters, tot_loss, tot_acc, print_every, compute_accuracy)
        else :
            iter = 0
            iters = len(batches) * epochs
            for epoch in range(1, epochs + 1):
                print('epoch ' + str(epoch))
                np.random.shuffle(batches)
                for batch in batches :
                    loss, acc = trainLoop(batch, optimizer, compute_accuracy)
                    tot_loss += loss
                    tot_acc += acc 
                    iter += 1
                    if iter % print_every == 0 : 
                        tot_loss, tot_acc = printScores(start, iter, iters, tot_loss, tot_acc, print_every, compute_accuracy)
        return

Overwriting models/Sentence_Classifier.py


<a id="languageModel"></a>

# 2 Language models

[Retour à la table des matières](#plan)

In [39]:
%%writefile models/Language_Model.py

import math
import time
import unicodedata
import re
import random
import copy

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #, FuncFormatter
#%matplotlib inline

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.autograd import Variable


#-------------------------------------------------------------------#
#                        Language model                             #
#-------------------------------------------------------------------#


class LanguageModel(nn.Module) :
    def __init__(self, device, tokenizer, word2vec, 
                 hidden_dim = 100, 
                 n_layers = 1, 
                 dropout = 0, 
                 class_weights = None, 
                 optimizer = optim.SGD
                 ):
        super(LanguageModel, self).__init__()
        
        # embedding
        self.tokenizer = tokenizer
        self.word2vec  = word2vec
        self.context   = RecurrentEncoder(self.word2vec.output_dim, hidden_dim, n_layers, dropout, bidirectional = False)
        self.out       = nn.Linear(self.context.output_dim, self.word2vec.lang.n_words)
        self.act       = F.softmax
        
        # optimizer
        self.criterion = nn.NLLLoss(size_average = False, weight = class_weights)
        self.optimizer = optimizer
        
        # load to device
        self.device = device
        self.to(device)
        
    def nbParametres(self) :
        return sum([p.data.nelement() for p in self.parameters() if p.requires_grad == True])
    
    def forward(self, sentence = '.', hidden = None, limit = 10, color_code = '\033[94m'):
        words  = self.tokenizer(sentence)
        result = words + [color_code]
        hidden, count, stop = None, 0, False
        while not stop :
            # compute probs
            embeddings = self.word2vec(words, self.device)
            _, hidden  = self.context(embeddings, lengths = None, hidden = hidden) # WARNING : dim = (n_layers, batch_size, hidden_dim)
            probs      = self.act(self.out(hidden[-1, :, :]), dim = 1).view(-1)
            # get predicted word
            topv, topi = probs.data.topk(1)
            words = [self.word2vec.lang.index2word[topi.item()]]
            result += words
            # stopping criterion
            count += 1
            if count == limit or words == [limit] or count == 50 : stop = True
        print(' '.join(result + ['\033[0m']))
        return
    
    def generatePackedSentences(self, sentences, batch_size = 32, depth_range = (2, 10)) :
        sentences = [s[i: i+j] \
                     for s in sentences \
                     for i in range(len(s)-depth_range[0]) \
                     for j in range(depth_range[0], min(depth_range[1], len(s)-i)+1) \
                    ]
        sentences.sort(key = lambda s: len(s), reverse = True)
        packed_data = []
        for i in range(0, len(sentences), batch_size) :
            pack0 = sentences[i:i + batch_size]
            pack0 = [[self.word2vec.lang.getIndex(w) for w in s] for s in pack0]
            pack0 = [[w for w in words if w is not None] for words in pack0]
            pack0.sort(key = len, reverse = True)
            pack1 = Variable(torch.LongTensor([s[-1] for s in pack0]))
            pack0 = [s[:-1] for s in pack0]
            lengths = torch.tensor([len(p) for p in pack0]) # size = (batch_size) 
            pack0 = list(itertools.zip_longest(*pack0, fillvalue = self.word2vec.lang.getIndex('PADDING_WORD')))
            pack0 = Variable(torch.LongTensor(pack0).transpose(0, 1))   # size = (batch_size, max_length) 
            packed_data.append([[pack0, lengths], pack1])
        return packed_data
    
    def fit(self, batches, iters = None, epochs = None, lr = 0.025, random_state = 42,
              print_every = 10, compute_accuracy = True):
        """Performs training over a given dataset and along a specified amount of loops"""
        def asMinutes(s):
            m = math.floor(s / 60)
            s -= m * 60
            return '%dm %ds' % (m, s)

        def timeSince(since, percent):
            now = time.time()
            s = now - since
            rs = s/percent - s
            return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
        
        def computeLogProbs(batch) :
            embeddings = self.word2vec.embedding(batch[0].to(self.device))
            _, hidden  = self.context(embeddings, lengths = batch[1].to(self.device)) # WARNING : dim = (n_layers, batch_size, hidden_dim)
            log_probs  = F.log_softmax(self.out(hidden[-1, :, :]), dim = 1)   # dim = (batch_size, lang_size)
            return log_probs

        def computeAccuracy(log_probs, targets) :
            return sum([targets[i].item() == log_probs[i].data.topk(1)[1].item() for i in range(targets.size(0))]) * 100 / targets.size(0)

        def printScores(start, iter, iters, tot_loss, tot_loss_words, print_every, compute_accuracy) :
            avg_loss = tot_loss / print_every
            avg_loss_words = tot_loss_words / print_every
            if compute_accuracy : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}  accuracy : {:.1f} %'.format(iter, int(iter / iters * 100), avg_loss, avg_loss_words))
            else                : print(timeSince(start, iter / iters) + ' ({} {}%) loss : {:.3f}                     '.format(iter, int(iter / iters * 100), avg_loss))
            return 0, 0

        def trainLoop(batch, optimizer, compute_accuracy = True):
            """Performs a training loop, with forward pass, backward pass and weight update."""
            optimizer.zero_grad()
            self.zero_grad()
            log_probs = computeLogProbs(batch[0])
            targets   = batch[1].to(self.device).view(-1)
            loss      = self.criterion(log_probs, targets)
            loss.backward()
            optimizer.step() 
            accuracy = computeAccuracy(log_probs, targets) if compute_accuracy else 0
            return float(loss.item() / targets.size(0)), accuracy
        
        # --- main ---
        self.train()
        np.random.seed(random_state)
        start = time.time()
        optimizer = self.optimizer([param for param in self.parameters() if param.requires_grad == True], lr = lr)
        tot_loss = 0  
        tot_acc  = 0
        if epochs is None :
            for iter in range(1, iters + 1):
                batch = random.choice(batches)
                loss, acc = trainLoop(batch, optimizer, compute_accuracy)
                tot_loss += loss
                tot_acc += acc      
                if iter % print_every == 0 : 
                    tot_loss, tot_acc = printScores(start, iter, iters, tot_loss, tot_acc, print_every, compute_accuracy)
        else :
            iter = 0
            iters = len(batches) * epochs
            for epoch in range(1, epochs + 1):
                print('epoch ' + str(epoch))
                np.random.shuffle(batches)
                for batch in batches :
                    loss, acc = trainLoop(batch, optimizer, compute_accuracy)
                    tot_loss += loss
                    tot_acc += acc 
                    iter += 1
                    if iter % print_every == 0 : 
                        tot_loss, tot_acc = printScores(start, iter, iters, tot_loss, tot_acc, print_every, compute_accuracy)
        return

Overwriting models/Language_Model.py


<a id="chatbot"></a>

# 10 Chatbots

[Retour à la table des matières](#plan)

<a id="ChatbotsGeneratifsAvecMemoireAgnostique"></a>

### 10.3.2 Chatbots génératifs à mémoire agnostique

[Retour à la table des matières](#plan)

![Chatbot génératif à mémoire agnostique](figs/Chatbot.png)

In [40]:
%%writefile models/Chatbot.py

import math
import time
import unicodedata
import re
import random

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #, FuncFormatter
#%matplotlib inline

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.autograd import Variable

from .Language_Model import LanguageModel


from chatNLP.modules import (RecurrentWordsEncoder, 
                            
                            AdditiveAttention,
                            MultiHeadAttention,
                            MultiHopedAttention,
                            RecurrentHierarchicalAttention, 
                            
                            WordsDecoder,
                            AttnWordsDecoder,
                            LMWordsDecoder)



#-------------------------------------------------------------------#
#                         Chatbot model                             #
#-------------------------------------------------------------------#


class Chatbot(nn.Module):
    """Conversationnal agent with bi-GRU Encoder, taking as parameters at training time :
    
            -a complete dialogue of the form (with each content as string)
    
                    [['question 1', 'answer 1'],
                     ['question 2', 'answer 2'],
                             ..........
                     ['current question', 'current answer']]
     
            -the current answer for teacher forcing, or None
    
    and at test time :
    
            -the current question as string
    
    Returns :
     
            -word indices of the generated answer, according to output language of the model
            -attention weights of first attention layer, or None is no attention
            -attention weights of second attention layer, or None is no attention
    """
    def __init__(self, device, lang, encoder, attention, decoder, autoencoder = None):
        super(Chatbot, self).__init__()
        
        # relevant quantities
        self.lang = lang 
        self.device = device
        self.n_level = attention.n_level if attention is not None else 1
        self.memory_dim = encoder.output_dim
        self.memory_length = 0
        # modules        
        self.encoder = encoder
        self.attention = attention
        self.decoder = decoder
        self.autoencoder = autoencoder
        
        
        
    # ---------------------- Technical methods -----------------------------
    def loadSubModule(self, encoder = None, attention = None, decoder = None) :
        if encoder is not None   : self.encoder = encoder
        if attention is not None : self.attention = attention
        if decoder is not None   : self.decoder = decoder
        return
    
    def freezeSubModule(self, encoder = False, attention = False, decoder = False) :
        for param in self.encoder.parameters()  : param.requires_grad = not encoder
        for param in self.attention.parameters(): param.requires_grad = not attention
        for param in self.decoder.parameters()  : param.requires_grad = not decoder
        return
    
    def nbParametres(self) :
        count = 0
        for p in self.parameters():
            if p.requires_grad == True : count += p.data.nelement()
        return count
    
    def flatten(self, description):
        '''Baisse le nombre de niveaux de 1 dans la description'''
        flatten = []
        for line in description :
            if type(line) == list : flatten += line  
            else                  : flatten.append(line)
        return [[int(word) for word in sentence.data.view(-1)] for sentence in flatten]

    
    
    # ------------------------ Text processing methods ---------------------------------
    def variableFromSentence(self, sentence):
        def normalizeString(sentence) :
            '''Remove rare symbols from a string'''
            def unicodeToAscii(s):
                """Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427"""
                return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')
            sentence = unicodeToAscii(sentence.lower().strip())
            sentence = re.sub(r"[^a-zA-Z0-9?&\%\-\_]+", r" ", sentence) 
            return sentence
        sentence = normalizeString(sentence).split(' ') # a raw string transformed into a list of clean words
        indexes = []
        #unknowns = 0
        for word in sentence:
            if word not in self.lang.word2index.keys() :
                if 'UNK' in self.lang.word2index.keys() : indexes.append(self.lang.word2index['UNK'])
            else :
                indexes.append(self.lang.word2index[word])
        #indexes.append(self.lang.word2index['EOS']) 
        indexes.append(1) # EOS_token
        result = Variable(torch.LongTensor([[i] for i in indexes]))
        return result
    
    
    
    # ------------------------ Visualisation methods ---------------------------------
    def flattenDialogue(self, dialogue):
        flatten = []
        for paire in dialogue : flatten += paire
        return [[int(word) for word in sentence.data.view(-1)] for sentence in flatten]
    
    def flattenWeights(self, weights) :
        '''Baisse le nombre de niveaux de 1 dans les poids d'attention'''
        flatten = []
        for weight_layer in weights : flatten.append(torch.cat(tuple(weight_layer.values()), dim = 2))
        return flatten
    
    def formatWeights(self, dialogue, attn1_weights, attn2_weights) :
        if self.n_level == 2 : attn1_weights = self.flattenWeights(attn1_weights)
        hops = self.attention.hops
        l, L = len(dialogue), max([len(line) for line in dialogue])
        Table = np.zeros((l, 1, L))
        Liste = np.zeros((l, 1)) if attn2_weights is not None else None
        count = 0
        count_line = 0
        for i, line in enumerate(dialogue) :
            present = False
            for j, word in enumerate(line) :
                if word in self.lang.index2word.keys():
                    present = True
                    Table[i, 0, j] = sum([attn1_weights[k][0, 0, count].data for k in range(hops)])
                    count += 1
            if present and Liste is not None :
                Liste[i] = sum([attn2_weights[k][count_line].data for k in range(hops)])
                count_line += 1
        return Table, Liste
    
    def showWeights(self, dialogue, attn1_weights, attn2_weights, maxi):
        table, liste = self.formatWeights(dialogue[:-2], attn1_weights, attn2_weights)
        l = table.shape[0]
        L = table.shape[2]
        fig = plt.figure(figsize = (l, L))
        for i, line in enumerate(dialogue[:-2]):
            ligne = [self.lang.index2word[int(word)] for word in line]
            ax = fig.add_subplot(l, 1, i+1)
            vals = table[i]
            text = [' '] + ligne + [' ' for k in range(L-len(ligne))] if L>len(ligne) else [' '] + ligne
            if liste is not None :
                vals = np.concatenate((np.zeros((1, 1)) , vals), axis = 1)  
                vals = np.concatenate((np.reshape(liste[i], (1, 1)) , vals), axis = 1)
                turn = 'User' if i % 2 == 0 else 'Bot'
                text = [turn] + [' '] + text
            cax = ax.matshow(vals, vmin=0, vmax=maxi, cmap='YlOrBr')
            ax.set_xticklabels(text, ha='left')
            ax.set_yticklabels(' ')
            ax.tick_params(axis=u'both', which=u'both',length=0, labelrotation = 30, labelright  = True)
            ax.grid(b = False, which="minor", color="w", linestyle='-', linewidth=1)
            ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
            plt.subplots_adjust(hspace=0, wspace = 0.1)
        plt.show()
    
    def showAttention(self, dialogue, n_col = 1, maxi = None):
        answer, decoder_outputs, attn1_w, attn2_w, _ = self.answerTrain(dialogue)
        dialogue = self.flattenDialogue(dialogue)
        if len(dialogue) > 1 : self.showWeights(dialogue, attn1_w, attn2_w, maxi)
        print('User : ', ' '.join([self.lang.index2word[int(word)] for word in dialogue[-2][:-1]]))
        print('target : ', ' '.join([self.lang.index2word[int(word)] for word in dialogue[-1][:-1]]))
        print('predic : ', ' '.join([self.lang.index2word[int(word)] for word in answer]))
        return
    
    
    
    # ------------------- Process methods ------------------------
    def initMemory(self):
        """Initialize memory slots"""
        self.memory = {}
        self.memory_queries = {}
        self.query_hidden = self.encoder.initHidden()
        self.memory_length = 0
        
    def updateMemory(self, last_words, query_hidden):
        """Update memory with a list of word vectors 'last_words' and the last query vector 'last_query'"""
        self.memory[self.memory_length] = last_words
        self.memory_queries[self.memory_length] = query_hidden
        self.query_hidden = query_hidden
        self.memory_length += 1
        
    def readSentence(self, utterance):
        """Perform reading of an utterance, returning created word vectors
           and last hidden states of teh encoder bi-GRU
        """
        utterance = utterance.to(self.device)
        last_words, query_hidden = self.encoder(utterance, self.query_hidden)
        return last_words, query_hidden
        
    def readDialogue(self, dialogue):
        """Loop of readUtterance over a whole dialogue
        """
        for i in range(len(dialogue)) :
            if type(dialogue[i]) == list :
                for utterance in dialogue[i]:
                    last_words, query_hidden = self.readSentence(utterance)
                    self.updateMemory(last_words, query_hidden)
            else :
                utterance = dialogue[i]
                last_words, query_hidden = self.readSentence(utterance)
                self.updateMemory(last_words, query_hidden)
   
    def tracking(self, query_vector = None):
        """Détermine un vecteur d'attention sur les éléments du registre de l'agent,
        sachant un vecteur 'very_last_hidden', et l'accole à ce vecteur """
        decision_vector, attn1_weights, attn2_weights = self.attention(words_memory = self.memory, 
                                                                       query = query_vector)
        return decision_vector, attn1_weights, attn2_weights

    def generateAnswer(self,last_words, query_vector, decision_vector, target_answer = None) :
        """Génère une réponse à partir d'un état caché initialisant le décodeur,
        en utilisant une réponse cible pour un mode 'teacher forcing-like' si celle-ci est fournie """
        answer, decoder_outputs = self.decoder(last_words, query_vector, decision_vector, target_answer)
        return answer, decoder_outputs
    
    def generateQuery(self,last_words, query_vector, decision_vector, target_answer = None) :
        """Génère une réponse à partir d'un état caché initialisant le décodeur,
        en utilisant une réponse cible pour un mode 'teacher forcing-like' si celle-ci est fournie """
        if self.autoencoder is not None : 
            query, autoencoder_outputs = self.autoencoder(last_words, query_vector, decision_vector, target_answer)
            return query, autoencoder_outputs
        else :
            return None, None
        
        
        
    # ------------ 1st working mode : training mode ------------
    def answerTrain(self, input, target_answer = None):
        """Parameters are a complete dialogue, containing the current query,
           
           - either of the form :

                    [['query 1', 'answer 1'],
                     ['query 2', 'answer 2'],
                             ..........
                     ['current query', 'current answer']]
                     
           - or :
           
                    ['query 1',
                     'query 2',
                       ....
                     'current query'] 

           The model learns to generate the current answer. 
           Teacher forcing can be enabled by passing the ground answer though the 'target_answer' option.
        """
        # 1) initiates memory instance
        self.initMemory()
        
        # 2) reads historical part of dialogue (if applicable),
        # word vectors and last hidden states of encoder bi-GRU are stored in memory
        dialogue = input[:-1]
        self.readDialogue(dialogue)
        
        # 3) reads current utterance,
        # returns word vectors of query and query vector
        query = input[-1]
        query = query[0] if type(query) == list else query
        last_words, query_hidden = self.readSentence(query)
        query_hidden = query_hidden.view(1, 1, -1)
        
        # 4) performs tracking
        # returns decision vector
        if self.attention is not None :
            decision_vector, attn1_weights, attn2_weights = self.tracking(query_hidden)
        else :
            decision_vector = query_hidden
            attn1_attention_weights = None
            attn2_attention_weights = None
            
        # 5) response generation
        # returns list of indices
        answer, decoder_outputs = self.generateAnswer(last_words, query_hidden, decision_vector, target_answer)
        pred_query, autoencoder_outputs = self.generateQuery(last_words, query_hidden, decision_vector, target_answer)    
        # 6) returns answer
        return answer, decoder_outputs, attn1_weights, attn2_weights, autoencoder_outputs

        
        
    # ------------ 2nd working mode : test mode ------------
    def forward(self, input):
        """Parameters are a single current query as string, and the model attempts to generate the current answer.
        """
        
        # 1) initiates memory and hidden states of encoder bi-GRU if conversation starts
        if self.memory_length == 0 : self.initMemory()
            
        # 2) reads current utterance,
        # returns word vectors of query and query vector
        sentence = self.variableFromSentence(input)
        if sentence is None :
            return "Excusez-moi je n'ai pas compris", None, None
        else :
            last_words, query_hidden = self.readSentence(sentence)
            q_hidden = query_hidden.view(1, 1, -1)

            # 3) performs tracking
            # returns decision vector
            if self.attention is not None :
                decision_vector, attn1_weights, attn2_weights = self.tracking(q_hidden)
            else :
                decision_vector = q_hidden
                attn1_attention_weights = None
                attn2_attention_weights = None

            # 4) response generation
            # returns list of indices
            answer, decoder_outputs = self.generateAnswer(last_words, q_hidden, decision_vector)
            
            # 5) updates memory with current query and answer
            self.updateMemory(last_words, query_hidden)
            answer_var = Variable(torch.LongTensor([[i] for i in answer]))
            last_words, query_hidden = self.readSentence(answer_var)
            self.updateMemory(last_words, query_hidden)

            # 6) returns answer
            answer = ' '.join([self.lang.index2word[int(word)] for word in answer])
            return answer, attn1_weights, attn2_weights
    
    

    
#-------------------------------------------------------------------#
#                         Instanciator                              #
#-------------------------------------------------------------------#
    
    
    
def CreateBot(lang,                     ###
              embedding,                  # --- Encoder options
              hidden_dim,                 #
              n_layers,                 ###

              sentence_hidden_dim,      ###
              hops,                       #
              share,                      # --- Hierarchical encoder options
              transf,                     #
              dropout,                  ###
              
              attn_decoder_n_layers,    ###
              language_model_n_layers,    #
              tf_ratio,                   # --- decoder options
              bound,                      #
              autoencoding,             ###
              
              device
             ):
    '''Create an agent with specified dimensions and specificities'''
    # 1) ----- encoding -----
    EOS_token = lang.word2index['EOS'] if 'EOS' in lang.word2index.keys() else 1
    if type(embedding) == torch.nn.modules.sparse.Embedding : 
        for param in embedding.parameters() : param.requires_grad = False
    elif type(embedding) == int : 
        embedding = nn.Embedding(lang.n_words, embedding) 
    else : 
        embedding = nn.Embedding.from_pretrained(torch.FloatTensor(embedding), freeze=True)

    encoder = RecurrentWordsEncoder(device, 
                                    embedding, 
                                    hidden_dim, 
                                    n_layers, 
                                    dropout) # embedding, hidden_dim, n_layers = 1, dropout = 0
    # 2) ----- attention -----
    word_hidden_dim = encoder.output_dim
    attention = RecurrentHierarchicalAttention(device,
                                               word_hidden_dim,
                                               sentence_hidden_dim, 
                                               query_dim = word_hidden_dim,
                                               n_heads = 1,
                                               n_layers = n_layers,
                                               hops = hops,
                                               share = share,
                                               transf = transf,
                                               dropout = dropout)
    # 3) ----- decoding -----
    tracking_dim = attention.output_dim
    autoencoder = None
    if language_model_n_layers > 0 :
        language_model = LanguageModel(device, 
                                       lang,
                                       embedding = embedding, 
                                       hidden_dim = hidden_dim,
                                       n_layers = language_model_n_layers, 
                                       dropout = dropout)
        decoder = LMWordsDecoder(device,
                                 language_model,                                   
                                 word_hidden_dim,
                                 tracking_dim,
                                 dropout = dropout,
                                 tf_ratio = tf_ratio,
                                 bound = bound)
        if autoencoding :
            autoencoder = LMWordsDecoder(device,
                                         language_model,                                   
                                         word_hidden_dim,
                                         tracking_dim,
                                         dropout = dropout,
                                         tf_ratio = tf_ratio,
                                         bound = bound)
        
    elif attn_decoder_n_layers >= 0 :
        decoder = AttnWordsDecoder(device,
                                   embedding,
                                   word_hidden_dim,
                                   tracking_dim,
                                   dropout = dropout,
                                   n_layers = attn_decoder_n_layers,
                                   tf_ratio = tf_ratio,
                                   bound = bound)
        if autoencoding :
            autoencoder = AttnWordsDecoder(device,
                                           embedding,
                                           word_hidden_dim,
                                           tracking_dim,
                                           dropout = dropout,
                                           n_layers = attn_decoder_n_layers,
                                           tf_ratio = tf_ratio,
                                           bound = bound)
    else :
        decoder = WordsDecoder(device,
                               embedding,                                   
                               word_hidden_dim,
                               tracking_dim,
                               dropout = dropout,
                               tf_ratio = tf_ratio,
                               EOS_token = EOS_token,
                               bound = bound)       
        if autoencoding :
            autoencoder = WordsDecoder(device,
                                       embedding,                                   
                                       word_hidden_dim,
                                       tracking_dim,
                                       dropout = dropout,
                                       tf_ratio = tf_ratio,
                                       EOS_token = EOS_token,
                                       bound = bound) 
    # 4) ----- model -----
    chatbot = Chatbot(device, lang, encoder, attention, decoder, autoencoder)
    chatbot = chatbot.to(device)
    return chatbot



#-------------------------------------------------------------------#
#                             Trainer                               #
#-------------------------------------------------------------------#


class BotTrainer(object):
    def __init__(self, 
                 device,
                 criterion = nn.NLLLoss(), 
                 optimizer = optim.SGD, 
                 clipping = 10, 
                 print_every=100):
        
        # relevant quantities
        self.device = device
        self.criterion = criterion.to(device)
        self.optimizer = optimizer
        self.clip = clipping
        self.print_every = print_every# timer
        
        
    def asMinutes(self, s):
        m = math.floor(s / 60)
        s -= m * 60
        return '%dm %ds' % (m, s)


    def timeSince(self, since, percent):
        now = time.time()
        s = now - since
        es = s / (percent)
        rs = es - s
        return '%s (- %s)' % (self.asMinutes(s), self.asMinutes(rs))
        
        
    def distance(self, agent_outputs, target_answer) :
        """ Compute cumulated error between predicted output and ground answer."""
        loss = 0
        loss_diff_mots = 0
        agent_outputs_length = len(agent_outputs)
        target_length = len(target_answer)
        Max = max(agent_outputs_length, target_length)
        Min = min(agent_outputs_length, target_length)   
        for i in range(Min):
            agent_output = agent_outputs[i]
            target_word = target_answer[i]
            #print(agent_output.size(), target_answer.size())
            factor = (1 + Max - Min) if i == Min -1 else 1
            loss += factor * self.criterion(agent_output, target_word)
            topv, topi = agent_output.data.topk(1)
            ni = topi[0][0]
            if ni != target_word.data[0]:
                loss_diff_mots += 1
        loss_diff_mots += Max - Min
        return loss, loss_diff_mots
        
        
    def trainLoop(self, agent, dialogue, target_answer, optimizer):
        """Performs a training loop, with forward pass and backward pass for gradient optimisation."""
        optimizer.zero_grad()
        query = dialogue[-1][0].to(self.device)
        target_answer = target_answer.to(self.device)
        answer, agent_outputs, attn1_w, attn2_w, _ = agent.answerTrain(dialogue, target_answer) 
        loss, loss_diff_mots = self.distance(agent_outputs, target_answer)
        loss.backward()
        _ = torch.nn.utils.clip_grad_norm_(agent.parameters(), self.clip)
        optimizer.step()
        return loss.data[0] / len(target_answer), loss_diff_mots
        
        
    def train(self, agent, dialogues, n_iters = 10000, learning_rate=0.01, dic = None, per_dialogue = False, return_errors = False):
        """Performs training over a given dataset and along a specified amount of loops."""
        if type(dialogues[0][0]) == list :
            debut = 0
            double = True
        else :
            debut = 1
            double = False
        start = time.time()
        optimizer = self.optimizer([param for param in agent.parameters() if param.requires_grad == True], lr=learning_rate)
        print_loss_total = 0  
        print_loss_diff_mots_total = 0
        if return_errors : errors = {}
        iter = 1
        while iter < n_iters :
            if dic is not None :
                j = int(random.choice(list(dic.keys())))
                training_dialogue = dialogues[j]
                i = random.choice(dic[j])
                partie_dialogue = training_dialogue[:i+1-debut]
                target_answer = training_dialogue[i][1] if double else training_dialogue[i]
                loss, loss_diff_mots = self.trainLoop(agent, partie_dialogue, target_answer, optimizer)
                if return_errors and loss_diff_mots > 0 :
                    if j not in list(errors.keys()) : errors[j] = []
                    if i not in errors[j] : errors[j].append(i)
                # quantité d'erreurs sur la réponse i
                print_loss_total += loss
                print_loss_diff_mots_total += loss_diff_mots 
                iter += 1
                if iter % self.print_every == 0:
                    print_loss_avg = print_loss_total / self.print_every
                    print_loss_diff_mots_avg = print_loss_diff_mots_total / self.print_every
                    print_loss_total = 0
                    print_loss_diff_mots_total = 0
                    print('%s (%d %d%%) %.4f %.2f' % (self.timeSince(start, iter / n_iters), iter, iter / n_iters * 100, 
                                                                  print_loss_avg, print_loss_diff_mots_avg))
            elif per_dialogue :
                j = int(random.choice(range(len(dialogues))))
                training_dialogue = dialogues[j]
                indices = list(range(debut, len(training_dialogue)))
                random.shuffle(indices)
                for i in indices :
                    partie_dialogue = training_dialogue[:i+1]
                    target_answer = training_dialogue[i][1] if double else training_dialogue[i]
                    loss, loss_diff_mots = self.trainLoop(agent, partie_dialogue, target_answer, optimizer)
                    if return_errors and loss_diff_mots > 0 :
                        if j not in list(errors.keys()) : errors[j] = []
                        if i not in errors[j] : errors[j].append(i)
                    # quantité d'erreurs sur la réponse i
                    print_loss_total += loss
                    print_loss_diff_mots_total += loss_diff_mots
                    iter += 1
                    if iter % self.print_every == 0:
                        print_loss_avg = print_loss_total / self.print_every
                        print_loss_diff_mots_avg = print_loss_diff_mots_total / self.print_every
                        print_loss_total = 0
                        print_loss_diff_mots_total = 0
                        print('%s (%d %d%%) %.4f %.2f' % (self.timeSince(start, iter / n_iters), iter, iter / n_iters * 100, 
                                                                      print_loss_avg, print_loss_diff_mots_avg))
            else :
                j = int(random.choice(range(len(dialogues))))
                training_dialogue = dialogues[j]
                i = random.choice(range(debut, len(training_dialogue)))
                partie_dialogue = training_dialogue[:i+1]
                target_answer = training_dialogue[i][1] if double else training_dialogue[i]
                loss, loss_diff_mots = self.trainLoop(agent, partie_dialogue, target_answer, optimizer)
                if return_errors and loss_diff_mots > 0 :
                    if j not in list(errors.keys()) : errors[j] = []
                    if i not in errors[j] : errors[j].append(i)
                # quantité d'erreurs sur la réponse i
                print_loss_total += loss
                print_loss_diff_mots_total += loss_diff_mots
                iter += 1
                if iter % self.print_every == 0:
                    print_loss_avg = print_loss_total / self.print_every
                    print_loss_diff_mots_avg = print_loss_diff_mots_total / self.print_every
                    print_loss_total = 0
                    print_loss_diff_mots_total = 0
                    print('%s (%d %d%%) %.4f %.2f' % (self.timeSince(start, iter / n_iters), iter, iter / n_iters * 100, 
                                                                  print_loss_avg, print_loss_diff_mots_avg))


        if return_errors : return errors
                
                
    def ErrorCount(self, agent, dialogues):
        bound = 10
        ERRORS = [0 for i in range(bound +1)]
        repartitionError = {}
        for i in range(bound +1) :
            repartitionError[i] = []
        liste = []
        for k, input_dialogue in enumerate(dialogues):
            for l in range(len(input_dialogue)):
                if len(input_dialogue[l][1])>0 :
                    dialogue = input_dialogue[:l+1]
                    #target_answer = variableFromSentence(agent.output_lang, input_dialogue[l][1])
                    target_answer = input_dialogue[l][1]
                    target_answer = target_answer.to(self.device)
                    answer, agent_outputs, attn1_w, attn2_w, _ = agent.answerTrain(dialogue)
                    loss, loss_diff_mots = self.distance(agent_outputs, target_answer)
                    if loss_diff_mots > bound :
                        ERRORS = ERRORS + [0 for i in range(loss_diff_mots - bound)]
                        for i in range(bound +1, loss_diff_mots +1) :
                            repartitionError[i] = []
                        bound  = loss_diff_mots
                    ERRORS[loss_diff_mots] += 1
                    if loss_diff_mots > 0 :
                        liste.append([k, l, loss_diff_mots])
        for triple in liste:
            repartitionError[triple[2]].append(triple[:2])
        print("The repartition of errors :", ERRORS)
        return repartitionError


    def DialoguesWithErrors(self, agent, dialogues) :
        '''Returns a dictionnary, with indices of dialogues and index of line in dialogue
           where a mistake was made.
        '''
        start = time.time()
        Sortie = {}
        L = len(dialogues)
        for i, dialogue in enumerate(dialogues) :
            errs = []
            for j in range(len(dialogue)) :
                target_answer = dialogue[j][1]
                target_answer = target_answer.to(self.device)
                answer, agent_outputs, attn1_w, attn2_w, _ = agent.answerTrain(dialogue[:j+1], target_answer)
                loss, loss_diff_mots = self.distance(agent_outputs, target_answer)
                if loss_diff_mots > 0 :
                    errs.append(j)
            if errs != []:
                Sortie[i] = errs
            if (i+1) % self.print_every == 0:
                print('%s (%d %d%%)' % (self.timeSince(start, (i+1) / L),
                                             (i+1), (i+1) / L * 100))
        return Sortie

Overwriting models/Chatbot.py


<a id="modules"></a>

# 2 Modules

[Retour à la table des matières](#plan)

In [41]:
%mkdir modules

Un sous-r‚pertoire ou un fichier modules existe d‚j….


In [42]:
%%writefile modules/__init__.py

from .Encoder_Words_Recurrent import RecurrentEncoder, RecurrentWordsEncoder

from .Attention_Additive import SelfAttention, AdditiveAttention
from .Attention_MultiHead import MultiHeadSelfAttention, MultiHeadAttention
from .Attention_MultiHoped import MultiHopedAttention
from .Attention_Hierarchical_Recurrent import RecurrentHierarchicalAttention

from .Decoder_Classes import ClassDecoder
from .Decoder_Words import WordsDecoder
from .Decoder_Words_Attn import AttnWordsDecoder
from .Decoder_Words_LM import LMWordsDecoder



__all__ = [
    'RecurrentEncoder',
    'RecurrentWordsEncoder',
    
    'SelfAttention',
    'AdditiveAttention',
    'MultiHeadSelfAttention',
    'MultiHeadAttention',
    'MultiHopedAttention',
    'RecurrentHierarchicalAttention',
    
    'ClassDecoder',
    'WordsDecoder',
    'AttnWordsDecoder',
    'LMWordsDecoder']

Overwriting modules/__init__.py


<a id="encodeursDeMots"></a>

## 2.1 Encodeurs de mots


[Retour à la table des matières](#plan)


### 2.1.1 Encodeur récurrent

[Retour à la table des matières](#plan)

Le module **RecurrentWordsEncoder** encode une séquence de mots $w_1, ..., w_T$ en une séquence de vecteurs $h_1, ..., h_T$ en appliquant un plongement suivi d'une couche GRU bi-directionnelle. On peut représenter son fonctionnement par la figure suivante :


![WordEncoder](figs/WordEncoder.png)

In [43]:
%%writefile modules/Encoder_Words_Recurrent.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class RecurrentEncoder(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, n_layers = 1, dropout = 0, bidirectional = False): 
        super(RecurrentEncoder, self).__init__()
        
        # relevant quantities
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.output_dim = hidden_dim * (2 if bidirectional else 1)

        # layers
        self.dropout = nn.Dropout(p = dropout)
        self.bigru = nn.GRU(embedding_dim, 
                            hidden_dim, 
                            n_layers,
                            dropout = (0 if n_layers == 1 else dropout), 
                            bidirectional = bidirectional,
                            batch_first = True)

    def forward(self, embeddings, lengths = None, hidden = None) :
        '''Transforms a batch of size (batch_size, input_length, embedding_dim) into 
        
              - outputs of size (batch_size, input_length, 2 * embedding_dim)
              - hidden  of size (batch_size, 2 * n_layers, embedding_dim)
        '''
        embeddings = self.dropout(embeddings)
        if lengths is not None : embeddings = torch.nn.utils.rnn.pack_padded_sequence(embeddings, lengths, batch_first = True)
        outputs, hidden = self.bigru(embeddings, hidden) # dim = (batch_size, input_length, output_dim)
        if lengths is not None : outputs, _ = torch.nn.utils.rnn.pad_packed_sequence(outputs, batch_first = True)
        outputs = self.dropout(outputs)                  # dim = (batch_size, input_length, output_dim)
        hidden  = self.dropout(hidden)                   # dim = (batch_size, 2, hidden_dim)
        return outputs, hidden

    
# -- OLD --
class RecurrentWordsEncoder(nn.Module):
    def __init__(self, 
                 device, 
                 embedding, 
                 hidden_dim, 
                 n_layers = 1, 
                 dropout = 0
                ): 
        super(RecurrentWordsEncoder, self).__init__()
        
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim           # dimension of hidden state of GRUs 
        self.dropout_p = dropout
        self.n_layers = n_layers               # number of stacked GRU layers
        self.output_dim = hidden_dim * 2       # dimension of outputed rep. of words and utterance
        
        # parameters
        self.embedding = embedding
        for p in embedding.parameters() :
            embedding_dim = p.data.size(1)
        self.dropout = nn.Dropout(p = dropout)
        self.bigru = nn.GRU(embedding_dim, 
                            hidden_dim, 
                            n_layers,
                            dropout=(0 if n_layers == 1 else dropout), 
                            bidirectional=True)

        
    def initHidden(self): 
        return Variable(torch.zeros(2 * self.n_layers, 1, self.hidden_dim)).to(self.device)

    def forward(self, utterance, hidden = None):
        if hidden is None : hidden = self.initHidden()
        embeddings = self.embedding(utterance)                          # dim = (input_length, 1, embedding_dim)
        embeddings = self.dropout(embeddings)                           # dim = (input_length, 1, embedding_dim)
        outputs, hidden = self.bigru(embeddings, hidden)
        outputs = self.dropout(outputs)
        hidden = self.dropout(hidden)
        return outputs, hidden                                          # dim = (input_length, 1, hidden_dim * 2)

Overwriting modules/Encoder_Words_Recurrent.py


<a id="attentionSimple"></a>

## 2.2 Modules d'attention simple

[Retour à la table des matières](#plan)



### 2.2.1 Attention additive

[Retour à la table des matières](#plan)

![AttentionAdditive](figs/Attention_Additive.png)

In [44]:
%%writefile modules/Attention_Additive.py

import torch
import torch.nn as nn
import torch.nn.functional as F


class SelfAttention(nn.Module):
    def __init__(self, embedding_dim, dropout = 0): 
        super(SelfAttention, self).__init__()
        
        # relevant quantities
        self.embedding_dim = embedding_dim
        self.output_dim = embedding_dim
        
        # parameters
        self.dropout = nn.Dropout(p = dropout)
        self.attn_layer = nn.Linear(embedding_dim, embedding_dim)
        self.attn_v = nn.Linear(embedding_dim, 1, bias = False)
        self.act = F.softmax
        
    def forward(self, embeddings):
        weights = self.attn_layer(embeddings).tanh()       # size (minibatch_size, input_length, embedding_dim)
        weights = self.act(self.attn_v(weights), dim = 1)  # size (minibatch_size, input_length, 1)
        weights = torch.transpose(weights, 1, 2)           # size (minibatch_size, 1, input_length)
        attn_applied = torch.bmm(weights, embeddings)      # size (minibatch_size, 1, embedding_dim)
        attn_applied = self.dropout(attn_applied)
        return attn_applied, weights


class AdditiveAttention(nn.Module):
    def __init__(self, 
                 query_dim, 
                 targets_dim, 
                 n_layers = 1
                ): 
        super(AdditiveAttention, self).__init__()
        # relevant quantities
        self.n_level = 1
        self.query_dim = query_dim
        self.targets_dim = targets_dim
        self.output_dim = targets_dim
        self.n_layers = n_layers
        # parameters
        self.attn_layer = nn.Linear(query_dim + targets_dim, targets_dim) if n_layers >= 1 else None
        self.attn_layer2 = nn.Linear(targets_dim, targets_dim) if n_layers >= 2 else None
        self.attn_v = nn.Linear(targets_dim, 1, bias = False) if n_layers >= 1 else None
        self.act = F.softmax
        
    def forward(self, query = None, targets = None):
        '''takes as parameters : 
                a query tensor conditionning the attention,     size = (1, minibatch_size, query_dim)
                a tensor containing attention targets           size = (targets_length, minibatch_size, targets_dim)
           returns : 
                the resulting tensor of the attention process,  size = (1, minibatch_size, targets_dim)
                the attention weights,                          size = (1, targets_length)
        '''
        if targets is not None :
            # concat method 
            if self.n_layers >= 1 :
                poids = torch.cat((query.expand(targets.size(0), -1, -1), targets), 2) if query is not None else targets
                poids = self.attn_layer(poids).tanh()                 # size (targets_length, minibatch_size, targets_dim)
                if self.n_layers >= 2 :
                    poids = self.attn_layer2(poids).tanh()            # size (targets_length, minibatch_size, targets_dim)
                attn_weights = self.attn_v(poids)                     # size (targets_length, minibatch_size, 1)
                attn_weights = torch.transpose(attn_weights, 0,1)     # size (minibatch_size, targets_length, 1)
                targets = torch.transpose(targets, 0,1)               # size (minibatch_size, targets_length, targets_dim)
            # dot method
            else :
                targets = torch.transpose(targets, 0,1)               # size (minibatch_size, targets_length, targets_dim)
                query = torch.transpose(query, 0, 1)                  # size (minibatch_size, 1, query_dim)
                query = torch.transpose(query, 1, 2)                  # size (minibatch_size, query_dim, 1)
                attn_weights = torch.bmm(targets, query)              # size (minibatch_size, targets_length, 1)
                
            attn_weights = self.act(attn_weights, dim = 1)        # size (minibatch_size, targets_length, 1)
            attn_weights = torch.transpose(attn_weights, 1,2)     # size (minibatch_size, 1, targets_length)
            attn_applied = torch.bmm(attn_weights, targets)       # size (minibatch_size, 1, targets_dim)
            attn_applied = torch.transpose(attn_applied, 0,1)     # size (1, minibatch_size, targets_dim)

        else :
            attn_applied = query
            attn_weights = None
        return attn_applied, attn_weights

Overwriting modules/Attention_Additive.py



<a id="attentionAdditiveMultitete"></a>


### 2.2.2 Attention additive multi-tête

[Retour à la table des matières](#plan)

In [45]:
%%writefile modules/Attention_MultiHead.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from . import SelfAttention, AdditiveAttention


class MultiHeadSelfAttention(nn.Module):
    def __init__(self, embedding_dim, n_head = 1, dropout = 0): 
        super(MultiHeadSelfAttention, self).__init__()
        
        # relevant quantities
        self.embedding_dim = embedding_dim
        self.output_dim = n_head * embedding_dim
        self.n_head = n_head
        
        # parameters
        self.attn_list = nn.ModuleList([SelfAttention(embedding_dim, dropout) for i in range(n_head)])
        
    def compute_penalty(self, weights) :
        weights_t = torch.transpose(weights, 1, 2)
        def_pos = [torch.mm(weights[i], weights_t[i]) for i in range(weights.size(0))] # size (minibatch_size, n_heads, n_heads)
        ide = Variable(torch.eye(self.n_head)).to(device)
        penal = torch.sum(torch.cat([torch.norm(mmt - ide).view(1) for mmt in def_pos]))
        return penal
    
    def forward(self, embeddings, penal = False):
        outputs = [attn(embeddings) for attn in self.attn_list]
        applied = torch.cat([out[0] for out in outputs], dim = 2) # size (minibatch_size, 1, n_heads * embedding_dim)
        weights = torch.cat([out[1] for out in outputs], dim = 1) # size (minibatch_size, n_heads, input_length)
        if penal and self.n_head > 1 :
            penal = self.compute_penalty(weights)
            return applied, weights, penal
        elif penal : 
            return applied, weights, None
        else :
            return applied, weights


        
class MultiHeadAttention(nn.Module):
    '''Module performing additive attention over a sequence of vectors stored in
       a memory block, conditionned by some vector. At instanciation it takes as imput :
       
                - query_dim : the dimension of the conditionning vector
                - targets_dim : the dimension of vectors stored in memory
                
      Other ideas on Multi head attention on 
      https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/transformer/SubLayers.py
      https://github.com/tlatkowski/multihead-siamese-nets/blob/master/layers/attention.py
    '''
    def __init__(self, 
                 device, 
                 n_heads, 
                 query_dim, 
                 targets_dim, 
                 n_layers = 2
                ): 
        super(MultiHeadAttention, self).__init__()
        
        # relevant quantities
        self.device = device
        self.n_level = 1
        self.n_heads = n_heads
        self.n_layers = n_layers
        
        # parameters
        self.attn_modules_list = nn.ModuleList([AdditiveAttention(query_dim, targets_dim, n_layers) for i in range(n_heads)])

    def forward(self, query = None, targets = None):
        '''takes as parameters : 
                a query tensor conditionning the attention,     size = (1, n_heads, query_dim)
                a tensor containing attention targets           size = (targets_length, n_heads, targets_dim)
           returns : 
                the resulting tensor of the attention process,  size = (1, n_heads, targets_dim)
                the attention weights,                          size = (n_heads, 1, targets_length)
        '''
        print("multihead attention")
        targets_length = targets.size(0)
        targets_dim    = targets.size(2)
        attn_applied   = Variable(torch.zeros(1, self.n_heads, targets_dim)).to(self.device)
        attn_weights   = torch.zeros(self.n_heads, 1, targets_length).to(self.device)
        for i, attn in enumerate(self.attn_modules_list) :
            que = query[:, i, :] if query is not None else None
            print(que.size())
            tar = targets[:, i, :].unsqueeze(1)
            print(tar.size())
            attn_appl, attn_wghts = attn(que, tar)
            print(attn_appl.size())
            print(attn_wghts.size())
            attn_applied[:, i, :] = attn_appl.squeeze(1)
            attn_weights[i, :, :] = attn_wghts.squeeze(0)
        return attn_applied, attn_weights

Overwriting modules/Attention_MultiHead.py


<a id="attentionAdditiveMultihoped"></a>


### 2.2.3 Attention additive multi-hopée

[Retour à la table des matières](#plan)

In [46]:
%%writefile modules/Attention_MultiHoped.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from . import AdditiveAttention


class MultiHopedAttention(nn.Module):
    '''Module performing additive attention over a sequence of vectors stored in
       a memory block, conditionned by some vector. At instanciation it takes as imput :
       
                - query_dim : the dimension of the conditionning vector
                - targets_dim : the dimension of vectors stored in memory
    '''
    def __init__(self, 
                 device,
                 targets_dim,
                 base_query_dim = 0,
                 hops = 1,
                 share = True,
                 transf = False,
                 dropout = 0
                ):
        super(MultiHopedAttention, self).__init__()

        # dimensions
        self.targets_dim = targets_dim
        self.output_dim = targets_dim
        self.hops_query_dim = self.output_dim if hops > 1 else 0
        self.query_dim = base_query_dim + self.hops_query_dim
        
        # structural coefficients
        self.device = device
        self.n_level = 1
        self.hops = hops
        self.share = share
        self.transf = transf
        self.dropout_p = dropout
        if dropout > 0 : self.dropout = nn.Dropout(p = dropout)
        
        # parameters
        self.attn = AdditiveAttention(self.query_dim, self.targets_dim) 
        self.transf = nn.GRU(self.targets_dim, self.targets_dim) if transf else None
        
        
    def initQuery(self): 
        if self.hops_query_dim > 0 :
            return Variable(torch.zeros(1, 1, self.hops_query_dim)).to(self.device)
        return None
    
    
    def update(self, hops_query, decision_vector):
        if self.transf is not None : _ , update = self.transf(decision_vector, hops_query)
        else                       :     update = hops_query + decision_vector
        return update
    
    
    def forward(self, words_memory, base_query = None):
        attn_weights_list = []
        hops_query = self.initQuery() if (self.hops > 1 and self.share) else None
        
        for hop in range(self.hops) :
            if base_query is not None and hops_query is not None : query = torch.cat((base_query, hops_query), 2) # size (1, self.n_heads, self.query_dim)
            elif base_query is not None                          : query = base_query
            elif hops_query is not None                          : query = hops_query
            else                                                 : query = None
            
            decision_vector, attn_weights = self.attn(query, words_memory)
            attn_weights_list.append(attn_weights)
            hops_query = self.update(hops_query, decision_vector) if (self.hops > 1 and hops_query is not None) else decision_vector
  
        # output decision vector
        return hops_query, attn_weights_list

Overwriting modules/Attention_MultiHoped.py


<a id="attentionHierarchique"></a>

## 2.3 Attention hiérarchique

[Retour à la table des matières](#plan)

![HierarchicalAttention](figs/Hierarchical_Attention.png)

In [47]:
%%writefile modules/Attention_Hierarchical_Recurrent.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from . import AdditiveAttention, MultiHeadAttention


class RecurrentHierarchicalAttention(nn.Module):
    '''Ce module d'attention est :
    
    - hiérarchique avec bi-GRU entre chaque niveau d'attention
    - multi-tête sur chaque niveau d'attention
    - globalement multi-hopé, où il est possible d'effectuer plusieurs passes pour accumuler de l'information
    '''

    def __init__(self, 
                 device,
                 word_hidden_dim, 
                 sentence_hidden_dim,
                 query_dim = 0, 
                 n_heads = 1,
                 n_layers = 1,
                 hops = 1,
                 share = True,
                 transf = False,
                 dropout = 0
                ):
        super(RecurrentHierarchicalAttention, self).__init__()
        
        # dimensions
        self.query_dim = query_dim
        self.word_hidden_dim = word_hidden_dim
        self.sentence_input_dim = self.word_hidden_dim
        self.sentence_hidden_dim = sentence_hidden_dim
        self.context_vector_dim = sentence_hidden_dim * 2
        self.output_dim = self.query_dim if (transf or self.hops > 0) else self.context_vector_dim
        
        # structural coefficients
        self.device = device
        self.n_level = 2
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.hops = hops
        self.share = share
        self.dropout_p = dropout
        self.dropout = nn.Dropout(p = dropout)
        
        # first attention module
        attn1_list = []
        if share :
            attn1 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.word_hidden_dim) if n_heads > 1 else \
                    AdditiveAttention(self.query_dim, self.word_hidden_dim) 
            for hop in range(hops) : attn1_list.append(attn1)
            self.attn1 = nn.ModuleList(attn1_list)
        else :
            for hop in range(hops):
                attn1 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.word_hidden_dim) if n_heads > 1 else \
                        AdditiveAttention(self.query_dim, self.word_hidden_dim) 
                attn1_list.append(attn1)
            self.attn1 = nn.ModuleList(attn1_list)
        
        # intermediate encoder module
        self.bigru = nn.GRU(self.sentence_input_dim, 
                            self.sentence_hidden_dim, 
                            n_layers,
                            dropout=(0 if n_layers == 1 else dropout), 
                            bidirectional=True)
        
        # second attention module
        attn2_list = []
        if share :
            attn2 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.context_vector_dim) if n_heads > 1 else \
                    AdditiveAttention(self.query_dim, self.context_vector_dim) 
            for hop in range(hops) : attn2_list.append(attn2)
            self.attn2 = nn.ModuleList(attn2_list)
        else :
            for hop in range(hops):
                attn2 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.context_vector_dim) if n_heads > 1 else \
                        AdditiveAttention(self.query_dim, self.context_vector_dim) 
                attn2_list.append(attn2)
            self.attn2 = nn.ModuleList(attn2_list)
        
        # accumulation step
        self.transf = nn.Linear(self.context_vector_dim, self.output_dim, bias = False) \
                      if (transf or self.hops > 0) else None


    def initQuery(self): 
        if self.query_dim > 0 :
            return Variable(torch.zeros(1, self.n_heads, self.query_dim)).to(self.device)
        return None
        
                
    def initHidden(self): 
        return Variable(torch.zeros(2 * self.n_layers, self.n_heads, self.sentence_hidden_dim)).to(self.device)
        
        
    def singlePass(self, words_memory, query, attn1, attn2): 
        L = len(words_memory)
        attn1_weights = {}
        bigru_inputs = Variable(torch.zeros(L, self.n_heads, self.sentence_input_dim)).to(self.device)
        # first attention layer
        for i in range(L) :
            targets = words_memory[i]                              # size (N_i, 1, 2*word_hidden_dim)
            targets = targets.repeat(1, self.n_heads, 1)           # size (N_i, n_heads, 2*word_hidden_dim)
            attn1_output, attn1_wghts = attn1(query, targets)
            attn1_weights[i] = attn1_wghts
            bigru_inputs[i] = attn1_output.squeeze(0)              # size (n_heads, 2*word_hidden_dim)
        # intermediate biGRU
        bigru_hidden = self.initHidden()
        attn2_inputs, bigru_hidden = self.bigru(bigru_inputs, bigru_hidden)  # size (L, n_heads, 2*word_hidden_dim)
        # second attention layer
        attn2_inputs = self.dropout(attn2_inputs)
        decision_vector, attn2_weights = attn2(query = query, targets = attn2_inputs)
        attn2_weights = attn2_weights.view(-1)
        # output decision vector
        return decision_vector, attn1_weights, attn2_weights
    
    
    def update(self, query, decision_vector):
        update = query + self.transf(decision_vector) if self.transf is not None else query + decision_vector
        return update
        
        
    def forward(self, words_memory, query = None):
        '''takes as parameters : 
                a tensor containing words_memory vectors        dim = (words_memory_length, word_hidden_dim)
                a tensor containing past queries                dim = (words_memory_length, query_dim)
           returns : 
                the resulting decision vector                   dim = (1, 1, query_dim)
                the weights of first attention layer (dict)     
                the weights of second attention layer (dict)
        '''
        attn1_weights_list = []
        attn2_weights_list = []
        if len(words_memory) > 0 :
            if query is not None : query = query.repeat(1, self.n_heads, 1)
            elif self.hops > 1   : query = self.initQuery()
            for hop in range(self.hops) :
                decision_vector, attn1_weights, attn2_weights = self.singlePass(words_memory, 
                                                                                query, 
                                                                                self.attn1[hop], 
                                                                                self.attn2[hop])
                attn1_weights_list.append(attn1_weights)
                attn2_weights_list.append(attn2_weights)
                query = self.update(query, decision_vector)  # size (L, self.n_heads, self.output_dim)

        # output decision vector
        return query, attn1_weights_list, attn2_weights_list

Overwriting modules/Attention_Hierarchical_Recurrent.py


<a id="decodeurs"></a>

# 2.4 Modules de décodage


[Retour à la table des matières](#plan)

<a id="decodeursSelectifs"></a>


### 2.4.1 Décodeur sélectif

[Retour à la table des matières](#plan)

In [48]:
%%writefile modules/Decoder_Classes.py

import torch
import torch.nn as nn
import torch.nn.functional as F


class ClassDecoder(nn.Module):
    
    def __init__(self, text_dim, n_classes) :
        super(ClassDecoder, self).__init__() 
        self.version = 'class'
        self.n_classes = n_classes
        self.classes_decoder = nn.Linear(text_dim, n_classes)

    def forward(self, text_vector, train_mode = False):
        classes_vector = self.classes_decoder(text_vector).view(-1)
        if train_mode :
            return classes_vector
        else :
            classes = F.softmax(classes_vector) 
            topv, topi = classes.data.topk(1)
            result = topi[0][0].numpy()
            return result   

Overwriting modules/Decoder_Classes.py


<a id="decodeurGeneratif"></a>


### 2.4.2 Décodeur génératif

[Retour à la table des matières](#plan)

![Decoder](figs/Decoder.png)

In [49]:
%%writefile modules/Decoder_Words.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class WordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, 
                 device, 
                 embedding, 
                 hidden_dim, 
                 tracking_dim, 
                 dropout = 0.1,
                 tf_ratio = 1,
                 EOS_token = 1,
                 bound = 25
                ):
        super(WordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.tracking_dim = tracking_dim
        self.tf_ratio = tf_ratio
        self.EOS_token = EOS_token
        self.bound = bound
        # modules
        self.embedding = embedding
        for p in embedding.parameters() :
            lang_size     = p.data.size(0)
            embedding_dim = p.data.size(1)
        self.gru = nn.GRU(embedding_dim + tracking_dim, tracking_dim)
        self.out = nn.Linear(tracking_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, query_vector, hidden, current_word_index):
        # update hidden state
        embedded = self.embedding(current_word_index).view(1, 1, -1)
        if query_vector is not None : embedded = torch.cat((query_vector, embedded), dim = 2)
        embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        vector = self.out(hidden).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        return log_proba, hidden
    
    
    def forward(self, query_words, query_vector, decision_vector, target_answer = None) :
        log_probas = []
        answer = []
        di = 0
        ta = target_answer if random.random() < self.tf_ratio else None
        current_word_index = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        hidden = self.dropout(decision_vector)
        for di in range(self.bound) :
            log_proba, hidden = self.generateWord(decision_vector, hidden, current_word_index) 
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == self.EOS_token : # EOS_token
                break
            elif ta is not None : # Teacher forcing
                answer.append(ni)
                if di < ta.size(0) : current_word_index = ta[di].to(self.device)
                else               : break
            else :
                answer.append(ni)
                current_word_index = Variable(torch.LongTensor([[ni]])).to(self.device)
        return answer, log_probas

Overwriting modules/Decoder_Words.py


#### Version lisse

In [50]:
%%writefile modules/Decoder_Words_Smooth.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class SmoothWordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, 
                 device,
                 embedding, 
                 hidden_dim, 
                 tracking_dim, 
                 dropout = 0.1,
                 tf_ratio = 1,
                 bound = 25
                ):
        super(SmoothWordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.tracking_dim = tracking_dim
        self.tf_ratio = tf_ratio
        self.bound = bound
        for p in embedding.parameters() :
            lang_size     = p.data.size(0)
            embedding_dim = p.data.size(1)
        # modules
        self.enbedding = nn.Linear((lang_size, embedding_dim), bias = False)
        # TODO : put embedding weights into the self.embedding layer
        self.gru = nn.GRU(embedding_dim + tracking_dim, tracking_dim)
        self.out = nn.Linear(tracking_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, query_vector, hidden, embedded):
        # update hidden state
        if query_vector is not None : embedded = torch.cat((query_vector, embedded), dim = 2)
        embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        vector = self.out(hidden).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        return log_proba, hidden
    
    
    def forward(self, query_words, query_vector, decision_vector, target_answer = None) :
        log_probas = []
        answer = []
        di = 0
        ta = target_answer if random.random() < self.tf_ratio else None
        current_word_index = Variable(torch.LongTensor(1, 1, self.lang_size)).to(self.device)
        current_word_index.zero_()
        current_word_index = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        current_embedded_word = self.embedding(current_word_index).view(1, 1, -1)
        hidden = self.dropout(decision_vector)
        for di in range(self.bound) :
            log_proba, hidden = self.generateWord(decision_vector, hidden, current_word_index) 
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == 1 : # EOS_token
                break
            elif ta is not None : # Teacher forcing
                answer.append(ni)
                if di < ta.size(0) :
                    current_word_index = ta[di].to(self.device)
                else :
                    break
            else :
                answer.append(ni)
                current_word_index = Variable(torch.LongTensor([[ni]])).to(self.device)
                current_embedded_word = self.embedding(current_word_index).view(1, 1, -1)
        return answer, log_probas

Overwriting modules/Decoder_Words_Smooth.py


<a id="decodeurGeneratifAttention"></a>


### 2.4.3 Décodeur génératif à attention

[Retour à la table des matières](#plan)

In [51]:
%%writefile modules/Decoder_Words_Attn.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from . import AdditiveAttention


class AttnWordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, 
                 device, 
                 embedding, 
                 hidden_dim, 
                 tracking_dim,
                 n_layers = 0, 
                 dropout = 0.1,
                 tf_ratio = 1,
                 bound = 25
                ):
        super(AttnWordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.tracking_dim = tracking_dim
        self.n_layers = n_layers
        self.tf_ratio = tf_ratio
        self.bound = bound
        # modules
        self.embedding = embedding
        for p in embedding.parameters() :
            lang_size     = p.data.size()[0]
            embedding_dim = p.data.size()[1]
        self.gru = nn.GRU(embedding_dim + tracking_dim, hidden_dim)
        self.attn = AdditiveAttention(hidden_dim, hidden_dim, n_layers) 
        self.concat = nn.Linear(2 * hidden_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, query_words, query_vector, hidden, current_word_index):
        # update hidden state
        embedded = self.embedding(current_word_index).view(1, 1, -1)
        if query_vector is not None : embedded = torch.cat((query_vector, embedded), dim = 2)
        embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        attn, attn_weights = self.attn(hidden, query_words)
        vector = self.concat(torch.cat((hidden, attn), dim = 2)).tanh()
        vector = self.out(vector).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        return log_proba, hidden
    
    
    def forward(self, query_words, query_vector, decision_vector, target_answer = None) :
        log_probas = []
        answer = []
        di = 0
        ta = target_answer if random.random() < self.tf_ratio else None
        current_word_index = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        hidden = self.dropout(decision_vector)
        for di in range(self.bound) :
            log_proba, hidden = self.generateWord(query_words, query_vector, hidden, current_word_index)
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == 1 : # EOS_token
                break
            elif ta is not None : # Teacher forcing
                answer.append(ni)
                if di < ta.size(0) :
                    current_word_index = ta[di].to(self.device)
                else :
                    break
            else :
                answer.append(ni)
                current_word_index = Variable(torch.LongTensor([[ni]])).to(self.device)
        return answer, log_probas

Overwriting modules/Decoder_Words_Attn.py


<a id="decodeurGeneratifML"></a>


### 2.4.4 Décodeur génératif à modèle linguistique

[Retour à la table des matières](#plan)

In [52]:
%%writefile modules/Decoder_Words_LM.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable



class LMWordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, 
                 device, 
                 language_model, 
                 hidden_dim, 
                 tracking_dim, 
                 dropout = 0.1,
                 tf_ratio = 1,
                 bound = 25
                ):
        super(LMWordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.tracking_dim = tracking_dim
        self.tf_ratio = tf_ratio
        self.lm_ratio = 0.25
        self.bound = bound
        self.pos = 0
        self.max_pos = 3
        # modules
        self.language_model = language_model.to(self.device)
        for param in self.language_model.parameters() : param.requires_grad = False 
        self.embedding = self.language_model.embedding
        for p in self.embedding.parameters() :
            lang_size     = p.data.size(0)
            embedding_dim = p.data.size(1)
        self.gru = nn.GRU(embedding_dim + hidden_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, query_vector, hidden, hidden_lm, current_word_index, current_word_index_lm):
        # update hidden state
        embedded = self.embedding(current_word_index).view(1, 1, -1)
        if query_vector is not None : embedded = torch.cat((query_vector, embedded), dim = 2)
        embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        vector = self.out(hidden).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        # Language Model contribution
        log_proba_lm, hidden_lm = self.language_model.generateWord(current_word_index_lm, hidden_lm)
        return log_proba + (self.pos/self.max_pos) * self.lm_ratio * log_proba_lm, hidden, hidden_lm
    
    
    def forward(self, query_words, query_vector, decision_vector, target_answer) :
        log_probas = []
        answer = []
        di = 0
        ta = target_answer if random.random() < self.tf_ratio else None
        current_word_index    = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        current_word_index_lm = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        hidden    = self.dropout(decision_vector)
        hidden_lm = None
        for di in range(self.bound) :
            self.pos = min(di, self.max_pos)
            log_proba, hidden, hidden_lm = self.generateWord(query_vector, 
                                                             hidden, 
                                                             hidden_lm, 
                                                             current_word_index, 
                                                             current_word_index_lm)
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == 1 : # EOS_token
                break
            elif ta is not None : # Teacher forcing
                answer.append(ni)
                if di < ta.size(0) :
                    current_word_index    = ta[di].view(-1, 1).to(self.device)
                    current_word_index_lm = target_answer[di].view(-1, 1).to(self.device) if di < target_answer.size(0) else \
                                            target_answer[-1].view(-1, 1).to(self.device)
                else :
                    break
            else :
                answer.append(ni)
                current_word_index    = Variable(torch.LongTensor([[ni]])).to(self.device)
                if target_answer is not None and di < target_answer.size(0): 
                    current_word_index_lm = target_answer[di].view(-1, 1).to(self.device)
                else :
                    current_word_index_lm = Variable(torch.LongTensor([[ni]])).to(self.device)
        return answer, log_probas

Overwriting modules/Decoder_Words_LM.py


<a id="misc"></a>

# 3 Miscellaneous

[Retour à la table des matières](#plan)

In [53]:
%mkdir misc

Un sous-r‚pertoire ou un fichier misc existe d‚j….


In [54]:
%%writefile misc/__init__.py


__all__ = [
    'NoiseFilter',
    'NoiseFilterWrapper',
    'NoiseFilterTrainer']

Overwriting misc/__init__.py


<a id="FiltreAntiBruit"></a>

## M.1 Filtre Anti-bruit

[Retour à la table des matières](#plan)

In [55]:
%%writefile misc/Noise_Filter.py

import math
import time
import unicodedata
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable


class NoiseFilter(nn.Module):

    def __init__(self, chatbot, pretrained = True, layers = [50], dropout = 0.15):
        super(NoiseFilter, self).__init__()
        
        # modules        
        self.device = chatbot.device
        self.chatbot = chatbot
        if pretrained : 
            for param in chatbot.parameters() : param.requires_grad = False
        self.decoder = nn.ModuleList([nn.Linear(chatbot.encoder.output_dim, layers[0])] + 
                                     [nn.Linear(layers[i], layers[i+1]) for i in range(len(layers)-1) if len(layers) > 1] +
                                     [nn.Linear(layers[-1], 2)])
        self.dropout = nn.Dropout(dropout)
        
        
    # ---------------------- Technical methods -----------------------------
    def nbParametres(self) :
        count = 0
        for p in self.parameters():
            if p.requires_grad == True : count += p.data.nelement()
        return count
        
        
    # ------------ 2nd working mode : test mode ------------
    def forward(self, input):

        sentence = self.chatbot.variableFromSentence(input)
        if sentence is None :
            return 0, None, None
        else :
            sentence = sentence.to(self.device)
            last_words, hidden = self.chatbot.encoder(sentence)
            hidden = self.dropout(hidden.view(1,1,-1))
            for layer in self.decoder : hidden = self.dropout(F.relu(layer(hidden)))
                
            log_probas  = F.log_softmax(hidden.view(1, -1), dim = 1)
            topv, topi = log_probas.data.topk(1)
            predict = topi[0][0]
            return predict, log_probas
        
        

        
class NoiseFilterWrapper(nn.Module) :
    def __init__(self, noise_filter, chatbot) :
        
        super(NoiseFilterWrapper, self).__init__()
        self.noise_filter = noise_filter
        self.chatbot = chatbot
        self.basic_answer = "Je n'ai pas compris, merci de reformuler la question"
        
    def forward(self, sentence) :
        #print(self.noise_filter(sentence)[1].data)
        if self.noise_filter(sentence)[0] == 1 : return self.chatbot(sentence)
        else                                   : return self.basic_answer, None, None
        
        

class NoiseFilterTrainer(object):
    def __init__(self, 
                 device,
                 criterion = nn.NLLLoss(), 
                 optimizer = optim.SGD, 
                 clipping = 10, 
                 print_every=100):
        
        # relevant quantities
        self.device = device
        self.criterion = criterion.to(device)
        self.optimizer = optimizer
        self.clip = clipping
        self.print_every = print_every# timer
        
        
    def asMinutes(self, s):
        m = math.floor(s / 60)
        s -= m * 60
        return '%dm %ds' % (m, s)


    def timeSince(self, since, percent):
        now = time.time()
        s = now - since
        es = s / (percent)
        rs = es - s
        return '%s (- %s)' % (self.asMinutes(s), self.asMinutes(rs))
        
        
    def distance(self, probas, target_var) :
        """ Compute cumulated error between predicted output and ground answer."""
        loss = self.criterion(probas, target_var)
        loss_diff = int(ni != target_var.item())
        return loss, loss_diff
        
        
    def trainLoop(self, agent, sentence, target, optimizer):
        """Performs a training loop, with forward pass and backward pass for gradient optimisation."""
        optimizer.zero_grad()
        target_var = Variable(torch.LongTensor([target])).to(self.device)
        answer, log_probas = agent(sentence) 
        loss = self.criterion(log_probas, target_var)
        loss_diff_mots = int(answer != target_var.item())
        
        loss.backward()
        _ = torch.nn.utils.clip_grad_norm_(agent.parameters(), self.clip)
        optimizer.step()
        return loss.data[0] , loss_diff_mots
        
        
    def train(self, agent, sentences, n_iters = 10000, learning_rate=0.01):
        """Performs training over a given dataset and along a specified amount of loops."""
        start = time.time()
        optimizer = self.optimizer([param for param in agent.parameters() if param.requires_grad == True], lr=learning_rate)
        print_loss_total = 0  
        print_loss_diff_mots_total = 0
        for iter in range(1, n_iters + 1):
            training_sentence = random.choice(sentences)
            sentence = training_sentence[0]
            target   = training_sentence[1]

            loss, loss_diff_mots = self.trainLoop(agent, sentence, target, optimizer)
            # quantité d'erreurs sur la réponse i
            print_loss_total += loss
            print_loss_diff_mots_total += loss_diff_mots       
            if iter % self.print_every == 0:
                print_loss_avg = print_loss_total / self.print_every
                print_loss_diff_mots_avg = print_loss_diff_mots_total / self.print_every
                print_loss_total = 0
                print_loss_diff_mots_total = 0
                print('%s (%d %d%%) %.4f %.2f' % (self.timeSince(start, iter / n_iters),
                                             iter, iter / n_iters * 100, print_loss_avg, print_loss_diff_mots_avg))

Overwriting misc/Noise_Filter.py


<a id="utils"></a>

# 4 Utils

[Retour à la table des matières](#plan)

In [56]:
%mkdir utils

Un sous-r‚pertoire ou un fichier utils existe d‚j….


In [57]:
%%writefile utils/__init__.py

from .Lang import Lang
from .Attention_Weight_Visualization import heatmap, annotate_heatmap

__all__ = [
    'Lang',
    'heatmap',
    'annotate_heatmap']

Overwriting utils/__init__.py


<a id="lang"></a>

## 4.1 Language

[Retour à la table des matières](#plan)

In [58]:
%%writefile utils/Lang.py


class Lang:
    def __init__(self, corpus = None, base_tokens = ['UNK'], min_count = None):
        self.base_tokens = base_tokens
        self.initData(base_tokens)
        if    corpus is not None : self.addCorpus(corpus)
        if min_count is not None : self.removeRareWords(min_count)

        
    def initData(self, base_tokens) :
        self.word2index = {word : i for i, word in enumerate(base_tokens)}
        self.index2word = {i : word for i, word in enumerate(base_tokens)}
        self.word2count = {word : 0 for word in base_tokens}
        self.n_words = len(base_tokens)
        return
    
    def getIndex(self, word) :
        if    word in self.word2index : return self.word2index[word]
        elif 'UNK' in self.word2index : return self.word2index['UNK']
        return
        
    def addWord(self, word):
        '''Add a word to the language'''
        if word not in self.word2index:
            if word.strip() != '' :
                self.word2index[word] = self.n_words
                self.word2count[word] = 1
                self.index2word[self.n_words] = word
                self.n_words += 1
        else:
            self.word2count[word] += 1
        return 
            
    def addSentence(self, sentence):
        '''Add to the language all words of a sentence'''
        words = sentence if type(sentence) == list else nltk.word_tokenize(sentence)
        for word in words : self.addWord(word)          
        return
            
    def addCorpus(self, corpus):
        '''Add to the language all words contained into a corpus'''
        for text in corpus : self.addSentence(text)
        return 
                
    def removeRareWords(self, min_count):
        '''remove words appearing lesser than a min_count threshold'''
        kept_word2count = {word: count for word, count in self.word2count.items() if count >= min_count}
        self.initData(self.base_tokens)
        for word, count in kept_word2count.items(): 
            self.addWord(word)
            self.word2count[word] = kept_word2count[word]
        return

Overwriting utils/Lang.py


<a id="attn_viz"></a>

## 4.2 Attention weights visualization

[Retour à la table des matières](#plan)

In [59]:
%%writefile utils/Attention_Weight_Visualization.py

# single attention head over a sequence of words
def heatmap(data, row_labels, col_labels, ax = None, cbar_kw = {}, cbarlabel = "", **kwargs):
    if not ax: ax = plt.gca()
    # Plot the heatmap
    im = ax.imshow(data, **kwargs)
    # We want to show all ticks...
    ax.set_xticks(np.arange(data.shape[1]))
    ax.set_yticks(np.arange(data.shape[0]))
    # ... and label them with the respective list entries.
    ax.set_xticklabels(col_labels)
    ax.set_yticklabels(row_labels)
    # Let the horizontal axes labeling appear on top.
    ax.tick_params(top=True, bottom=False, labeltop=True, labelbottom=False)
    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=-30, ha="right",
             rotation_mode="anchor")
    # Turn spines off and create white grid.
    for edge, spine in ax.spines.items():
        spine.set_visible(False)

    ax.set_xticks(np.arange(data.shape[1]+1)-.5, minor=True)
    ax.set_yticks(np.arange(data.shape[0]+1)-.5, minor=True)
    ax.grid(which="minor", color="w", linestyle='-', linewidth=3)
    ax.tick_params(which="minor", bottom=False, left=False)
    return im

def annotate_heatmap(im, data = None, valfmt = "{x:.2f}", textcolors = ["black", "white"], threshold = None, **textkw):
    if not isinstance(data, (list, np.ndarray)):
        data = im.get_array()
    # Normalize the threshold to the images color range.
    if threshold is not None:
        threshold = im.norm(threshold)
    else:
        threshold = im.norm(data.max())/2.
    # Set default alignment to center, but allow it to be
    # overwritten by textkw.
    kw = dict(horizontalalignment="center",
              verticalalignment="center")
    kw.update(textkw)
    # Get the formatter in case a string is supplied
    if isinstance(valfmt, str):
        valfmt = matplotlib.ticker.StrMethodFormatter(valfmt)
    # Loop over the data and create a `Text` for each "pixel".
    # Change the text's color depending on the data.
    texts = []
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            kw.update(color=textcolors[int(im.norm(data[i, j]) > threshold)])
            text = im.axes.text(j, i, valfmt(data[i, j], None), **kw)
            texts.append(text)
    return texts

Overwriting utils/Attention_Weight_Visualization.py


Retour dans le répertoire courant du tableau de bord :

In [60]:
%cd ..

C:\Users\Jb\Desktop\NLP\chatNLP


[Retour à la table des matières](#plan)

<a id="basDePage"></a>