<a id="plan"></a>

# Dashboard chatNLP

## Table des matières

1. [Modèles](#modeles)

|  | Sans mémoire | Avec mémoire à règles | Avec mémoire agnostique|
|------|------|------|
| **Sélectif** | [1.1.1](#ChatbotsSelectifsSansMemoire) | [1.2.1](#ChatbotsSelectifsAvecMemoireRegles) | [1.3.1](#ChatbotsSelectifsAvecMemoireAgnostique) |
| **Génératif** | [1.1.2](#ChatbotsGeneratifsSansMemoire) | [1.2.2](#ChatbotsGeneratifsAvecMemoireRegles) | [1.3.2](#ChatbotsGeneratifsAvecMemoireAgnostique) |

    

2. [Modules](#modules)

    2.1 [Encodeurs de texte](#encodeursDeTexte)
        2.1.1 Encodeurs de mots
        
    2.2 [Modules d'attention simple](#attentionSimple)
        2.2.1 Attention additive
        2.2.2 Attention additive multi-tête
        2.2.3 Attention additive multi-hopée
        
    2.3 [Modules d'attention hiérarchique](#attentionHierarchique)
    
    2.4 [Décodeurs](#decodeurs)
        2.4.1 Décodeur sélectif
        2.4.2 Décodeur génératif
        2.4.3 Décodeur génératif à attention
        2.4.4 Décodeur génératif à pointeur


[Bas de page](#basDePage)

Création du répertoire principal contenant la librairie, dans le quel on se déplace ensuite et où on génère un fichier README.txt avec une brève présentation de cette librairie :

In [181]:
%mkdir chatNLP

Un sous-r‚pertoire ou un fichier chatNLP existe d‚j….


In [182]:
%cd chatNLP

C:\Users\Jb\Desktop\Scripts\notebooks\chatNLP


In [183]:
%%writefile README.txt


Inspiration pour la construction de la librairie :
    
https://github.com/pytorch/fairseq
https://github.com/allenai/allennlp
https://www.dabeaz.com/modulepackage/ModulePackage.pdf

Overwriting README.txt


Transformation du répertoire courant en librairie Python :

In [184]:
%%writefile __init__.py

#import libNLP.modules
#import libNLP.models

Overwriting __init__.py


<a id="modeles"></a>

# 1 Modèles

[Retour à la table des matières](#plan)

Génération du sous-répertoire _libNLP.models_ contenant l'ensemble des modèles de Deep Learning développés dans cette librairie :

In [185]:
%mkdir models

Un sous-r‚pertoire ou un fichier models existe d‚j….


In [186]:
%%writefile models/__init__.py


from .Chatbot import (Chatbot,
                      CreateBot,
                      BotTrainer)


__all__ = [
    'Chatbot',
    'CreateBot',
    'BotTrainer']

Overwriting models/__init__.py


<a id="ChatbotsGeneratifsAvecMemoireAgnostique"></a>

## 1.3.2 Chatbots génératifs à mémoire agnostique

[Retour à la table des matières](#plan)

In [187]:
%%writefile models/Chatbot.py

import math
import time
import unicodedata
import re
import random

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker #, FuncFormatter
#%matplotlib inline

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
from torch.autograd import Variable
from chatNLP.modules import (RecurrentWordsEncoder, 
                            
                            AdditiveAttention,
                            MultiHeadAttention,
                            MultiHopedAttention,
                            RecurrentHierarchicalAttention, 
                            
                            WordsDecoder,
                            AttnWordsDecoder)





class Chatbot(nn.Module):
    """Conversationnal agent with bi-GRU Encoder, taking as parameters at training time :
    
            -a complete dialogue of the form (with each content as string)
    
                    [['question 1', 'answer 1'],
                     ['question 2', 'answer 2'],
                             ..........
                     ['current question', 'current answer']]
     
            -the current answer for teacher forcing, or None
    
    and at test time :
    
            -the current question as string
    
    Returns :
     
            -word indices of the generated answer, according to output language of the model
            -attention weights of first attention layer, or None is no attention
            -attention weights of second attention layer, or None is no attention
    """
    def __init__(self, device, lang, encoder, attention, decoder):
        super(Chatbot, self).__init__()
        
        # relevant quantities
        self.lang = lang 
        self.device = device
        self.n_level = attention.n_level if attention is not None else 1
        self.memory_dim = encoder.output_dim
        self.memory_length = 0
        # modules        
        self.encoder = encoder
        self.attention = attention
        self.decoder = decoder
        
        
        
    # ---------------------- Technical methods -----------------------------
    def loadSubModule(self, encoder = None, attention = None, decoder = None) :
        if encoder is not None :
            self.encoder = encoder
        if attention is not None :
            self.attention = attention
        if decoder is not None :
            self.decoder = decoder
        return
    
    def freezeSubModule(self, encoder = False, attention = False, decoder = False) :
        for param in self.encoder.parameters():
            param.requires_grad = not encoder
        for param in self.attention.parameters():
            param.requires_grad = not attention
        for param in self.decoder.parameters():
            param.requires_grad = not decoder
        return
    
    def nbParametres(self) :
        count = 0
        for p in self.parameters():
            if p.requires_grad == True :
                count += p.data.nelement()
        return count
    
    
    def flatten(self, description) :
        '''Baisse le nombre de niveaux de 1 dans la description'''
        flatten = []
        for line in description :
            flatten += line
        return flatten

    
    
    # ------------------------ Text processing methods ---------------------------------
    def variableFromSentence(self, sentence):
        def normalizeString(sentence) :
            '''Remove rare symbols from a string'''
            def unicodeToAscii(s):
                """Turn a Unicode string to plain ASCII, thanks to http://stackoverflow.com/a/518232/2809427"""
                return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')
            sentence = unicodeToAscii(sentence.lower().strip())
            sentence = re.sub(r"[^a-zA-Z0-9?&\%\-\_]+", r" ", sentence) 
            return sentence
        sentence = normalizeString(sentence).split(' ') # a raw string transformed into a list of clean words
        indexes=[]
        unknowns = 0
        for word in sentence:
            if word not in self.lang.word2index.keys() and 'UNK' in self.lang.word2index.keys() :
                indexes.append(self.lang.word2index['UNK'])
            else :
                indexes.append(self.lang.word2index[word])
        indexes.append(self.lang.word2index['EOS'])                                
        result = Variable(torch.LongTensor([[i] for i in indexes]))
        return result
    
    
    
    # ------------------------ Visualisation methods ---------------------------------
    def flattenDialogue(self, dialogue):
        flatten = []
        for paire in dialogue :
            flatten += paire
        return [[int(word) for word in sentence.data.view(-1)] for sentence in flatten]
    
    def flattenWeights(self, weights) :
        '''Baisse le nombre de niveaux de 1 dans les poids d'attention'''
        flatten = []
        for weight_layer in weights :
            flatten.append(torch.cat(tuple(weight_layer.values()), dim = 2))
        return flatten
    
    def formatWeights(self, dialogue, attn1_weights, attn2_weights) :
        if self.n_level == 2 :
            attn1_weights = self.flattenWeights(attn1_weights)
        hops = self.attention.hops
        l, L = len(dialogue), max([len(line) for line in dialogue])
        Table = np.zeros((l, 1, L))
        Liste = np.zeros((l, 1)) if attn2_weights is not None else None
        count = 0
        count_line = 0
        for i, line in enumerate(dialogue) :
            present = False
            for j, word in enumerate(line) :
                if word in self.lang.index2word.keys():
                    present = True
                    Table[i, 0, j] = sum([attn1_weights[k][0, 0, count].data for k in range(hops)])
                    count += 1
            if present and Liste is not None :
                Liste[i] = sum([attn2_weights[k][count_line].data for k in range(hops)])
                count_line += 1
        return Table, Liste
    
    def showWeights(self, dialogue, attn1_weights, attn2_weights, maxi):
        table, liste = self.formatWeights(dialogue[:-2], attn1_weights, attn2_weights)
        l = table.shape[0]
        L = table.shape[2]
        fig = plt.figure(figsize = (l, L))
        for i, line in enumerate(dialogue[:-2]):
            ligne = [self.lang.index2word[int(word)] for word in line]
            ax = fig.add_subplot(l, 1, i+1)
            vals = table[i]
            text = [' '] + ligne + [' ' for k in range(L-len(ligne))] if L>len(ligne) else [' '] + ligne
            if liste is not None :
                vals = np.concatenate((np.zeros((1, 1)) , vals), axis = 1)  
                vals = np.concatenate((np.reshape(liste[i], (1, 1)) , vals), axis = 1)
                turn = 'User' if i % 2 == 0 else 'Bot'
                text = [turn] + [' '] + text
            cax = ax.matshow(vals, vmin=0, vmax=maxi, cmap='YlOrBr')
            ax.set_xticklabels(text, ha='left')
            ax.set_yticklabels(' ')
            ax.tick_params(axis=u'both', which=u'both',length=0, labelrotation = 30, labelright  = True)
            ax.grid(b = False, which="minor", color="w", linestyle='-', linewidth=1)
            ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
            plt.subplots_adjust(hspace=0, wspace = 0.1)
        plt.show()
    
    def showAttention(self, dialogue, n_col = 1, maxi = None):
        answer, decoder_outputs, attn1_weights, attn2_weights = self.answerTrain(dialogue)
        dialogue = self.flattenDialogue(dialogue)
        if len(dialogue) > 1 :
            self.showWeights(dialogue, attn1_weights, attn2_weights, maxi)
        print('User : ', ' '.join([self.lang.index2word[int(word)] for word in dialogue[-2][:-1]]))
        print('target : ', ' '.join([self.lang.index2word[int(word)] for word in dialogue[-1][:-1]]))
        print('predic : ', ' '.join([self.lang.index2word[int(word)] for word in answer]))
        return
    
    
    
    # ------------------- Process methods ------------------------
    def initMemory(self):
        """Initialize memory slots"""
        self.memory = {}
        self.memory_queries = {}
        self.query_hidden = self.encoder.initHidden()
        self.memory_length = 0
        
    def updateMemory(self, last_words, query_hidden):
        """Update memory with a list of word vectors 'last_words' and the last query vector 'last_query'"""
        self.memory[self.memory_length] = last_words
        self.memory_queries[self.memory_length] = query_hidden
        self.query_hidden = query_hidden
        self.memory_length += 1
        
    def readSentence(self, utterance):
        """Perform reading of an utterance, returning created word vectors
           and last hidden states of teh encoder bi-GRU
        """
        utterance = utterance.to(self.device)
        last_words, query_hidden = self.encoder(utterance, self.query_hidden)
        return last_words, query_hidden
        
    def readDialogue(self, dialogue):
        """Loop of readUtterance over a whole dialogue
        """
        for i in range(len(dialogue)) :
            for j in range(2):
                utterance = dialogue[i][j]
                last_words, query_hidden = self.readSentence(utterance)
                self.updateMemory(last_words, query_hidden)
   
    def tracking(self, query_vector):
        """Détermine un vecteur d'attention sur les éléments du registre de l'agent,
        sachant un vecteur 'very_last_hidden', et l'accole à ce vecteur """
        decision_vector, attn1_weights, attn2_weights = self.attention(words_memory = self.memory, 
                                                                       base_query = query_vector)
        return decision_vector, attn1_weights, attn2_weights

    def generateAnswer(self,last_words, query_vector, decision_vector, target_answer = None) :
        """Génère une réponse à partir d'un état caché initialisant le décodeur,
        en utilisant une réponse cible pour un mode 'teacher forcing-like' si celle-ci est fournie """
        answer, decoder_outputs = self.decoder(last_words, query_vector, decision_vector, target_answer)
        return answer, decoder_outputs
        
        
        
    # ------------ 1st working mode : training mode ------------
    def answerTrain(self, input, target_answer = None):
        """Parameters are a complete dialogue, containing the current query and answer, and of the form

                    [['query 1', 'answer 1'],
                     ['query 2', 'answer 2'],
                             ..........
                     ['current query', 'current answer']]

           The model learns to generate the current answer. 
           Teacher forcing can be enabled by passing the ground answer though the 'target_answer' option. 
           Attention weights over words and past utterances can be provided with the 'provideAttention' option."""
        # 1) initiates memory instance
        self.initMemory()
        
        # 2) reads historical part of dialogue (if applicable),
        # word vectors and last hidden states of encoder bi-GRU are stored in memory
        dialogue = input[:-1]
        self.readDialogue(dialogue)
        
        # 3) reads current utterance,
        # returns word vectors of query and query vector
        query = input[-1][0]
        last_words, query_hidden = self.readSentence(query)
        q_hidden = query_hidden.view(1,1,-1)
        
        # 4) performs tracking
        # returns decision vector
        if self.attention is not None :
            decision_vector, attn1_weights, attn2_weights = self.tracking(q_hidden)
        else :
            decision_vector = q_hidden
            attn1_attention_weights = None
            attn2_attention_weights = None
            
        # 5) response generation
        # returns list of indices
        answer, decoder_outputs = self.generateAnswer(last_words, q_hidden, decision_vector, target_answer)
            
        # 6) returns answer
        return answer, decoder_outputs, attn1_weights, attn2_weights

        
        
    # ------------ 2nd working mode : test mode ------------
    def forward(self, input):
        """Parameters are a single current query as string, and the model learns to generate the current answer. 
           Attention weights over words and past utterances can be provided with the 'provideAttention' option."""
        
        # 1) initiates memory and hidden states of encoder bi-GRU if conversation starts
        if self.memory_length == 0 :
            self.initMemory()
            
        # 2) reads current utterance,
        # returns word vectors of query and query vector
        sentence = self.variableFromSentence(input)
        if sentence is None :
            return "Excusez-moi je n'ai pas compris", None, None, None
        else :
            last_words, query_hidden = self.readSentence(sentence)
            q_hidden = query_hidden.view(1,1,-1)

            # 3) performs tracking
            # returns decision vector
            if self.attention is not None :
                decision_vector, attn1_weights, attn2_weights = self.tracking(q_hidden)
            else :
                decision_vector = q_hidden
                attn1_attention_weights = None
                attn2_attention_weights = None

            # 4) response generation
            # returns list of indices
            answer, decoder_outputs = self.generateAnswer(last_words, q_hidden, decision_vector)
            
            # 5) updates memory with current query and answer
            self.updateMemory(last_words, query_hidden)
            answer_var = Variable(torch.LongTensor([[i] for i in answer]))
            last_words, query_hidden = self.readSentence(answer_var)
            self.updateMemory(last_words, query_hidden)

            # 6) returns answer
            answer = ' '.join([self.lang.index2word[int(word)] for word in answer])
            return answer, attn1_weights, attn2_weights
    
    
    
    
    
    
def CreateBot(lang,                     ###
              embedding_dim,              # --- Encoder options
              hidden_dim,                 #
              n_layers,                 ###

              sentence_hidden_dim,      ###
              hops,                       #
              share,                      # --- Hierarchical encoder options
              transf,                     #
              dropout,                  ###
              
              attn_decoder_n_layers,    ### --- decoder options
              
              device
             ):
    '''Create an agent with specified dimensions and specificities'''
    # 1) ----- encoding -----
    embedding = nn.Embedding(lang.n_words, embedding_dim)
    encoder = RecurrentWordsEncoder(device, embedding, hidden_dim, n_layers, dropout) # embedding, hidden_dim, n_layers = 1, dropout = 0
    # 2) ----- attention -----
    word_hidden_dim = encoder.output_dim
    attention = RecurrentHierarchicalAttention(device,
                                               word_hidden_dim,
                                               sentence_hidden_dim, 
                                               base_query_dim = word_hidden_dim,
                                               n_heads = 1,
                                               hops = hops,
                                               share = share,
                                               transf = transf,
                                               dropout = dropout)
    # 3) ----- decoding -----
    tracking_dim = attention.output_dim
    if attn_decoder_n_layers >= 0 :
        decoder = AttnWordsDecoder(device,
                                   embedding,
                                   decoder_hidden_dim,
                                   dropout = dropout,
                                   n_layers = attn_decoder_n_layers)
    else :
        decoder = WordsDecoder(device,
                               embedding,                                   
                               word_hidden_dim,
                               tracking_dim,
                               dropout = dropout)        
    # 4) ----- model -----
    chatbot = Chatbot(device, lang, encoder, attention, decoder)
    chatbot = chatbot.to(device)
    return chatbot





class BotTrainer(object):
    def __init__(self, 
                 device,
                 criterion= nn.NLLLoss(), 
                 optimizer = optim.SGD, 
                 clipping = 50,
                 teacher_forcing_ratio = 0.5,  
                 print_every=100):
        
        # relevant quantities
        self.device = device
        self.criterion = criterion.to(device)
        self.optimizer = optimizer
        self.clip = clipping
        self.teacher_forcing_ratio = teacher_forcing_ratio
        self.print_every = print_every# timer
        
        
    def asMinutes(self, s):
        m = math.floor(s / 60)
        s -= m * 60
        return '%dm %ds' % (m, s)


    def timeSince(self, since, percent):
        now = time.time()
        s = now - since
        es = s / (percent)
        rs = es - s
        return '%s (- %s)' % (self.asMinutes(s), self.asMinutes(rs))
        
        
    def distance(self, agent_outputs, target_answer) :
        """ Compute cumulated error between predicted output and ground answer."""
        loss = 0
        loss_diff_mots = 0
        agent_outputs_length = len(agent_outputs)
        target_length = len(target_answer)
        Max = max(agent_outputs_length, target_length)
        Min = min(agent_outputs_length, target_length)   
        for i in range(Min):
            agent_output = agent_outputs[i]
            target_word = target_answer[i]
            loss += self.criterion(agent_output, target_word)
            topv, topi = agent_output.data.topk(1)
            ni = topi[0][0]
            if ni != target_word.data[0]:
                loss_diff_mots += 1
        if agent_outputs_length != target_length :
            loss_diff_mots += Max - Min
        return loss, loss_diff_mots
        
        
    def trainLoop(self, agent, dialogue, target_answer, optimizer, learning_rate):
        """Performs a training loop, with forward pass and backward pass for gradient optimisation."""
        optimizer.zero_grad()
        target_length = len(target_answer)
        target_answer = target_answer.to(self.device)
        tf = target_answer if random.random() < self.teacher_forcing_ratio else None
        answer, agent_outputs, attn1_attention_weights, attn2_attention_weights =  agent.answerTrain(dialogue, tf) 
        loss, loss_diff_mots = self.distance(agent_outputs, target_answer)        
        loss.backward()
        _ = torch.nn.utils.clip_grad_norm_(agent.parameters(), self.clip)
        optimizer.step()
        return loss.data[0] / target_length , loss_diff_mots
        
        
    def train(self, agent, dialogues, n_iters = 10000, learning_rate=0.01, dic = None):
        """Performs training over a given dataset and along a specified amount of loops."""
        start = time.time()
        optimizer = self.optimizer([param for param in agent.parameters() if param.requires_grad == True], lr=learning_rate)
        print_loss_total = 0  
        print_loss_diff_mots_total = 0
        for iter in range(1, n_iters + 1):
            if dic is not None :
                j = int(random.choice(list(dic.keys())))
                training_dialogue = dialogues[j]
                i = random.choice(dic[j])
                partie_dialogue = training_dialogue[:i+1]
            else :
                training_dialogue = random.choice(dialogues)
                i = random.choice(range(len(training_dialogue)))
                partie_dialogue = training_dialogue[:i+1]
            #target_answer = variableFromSentence(agent.output_lang, training_dialogue[i][1])
            target_answer = training_dialogue[i][1]
            loss, loss_diff_mots = self.trainLoop(agent, partie_dialogue, target_answer, optimizer, learning_rate)
            # quantité d'erreurs sur la réponse i
            print_loss_total += loss
            print_loss_diff_mots_total += loss_diff_mots       
            if iter % self.print_every == 0:
                print_loss_avg = print_loss_total / self.print_every
                print_loss_diff_mots_avg = print_loss_diff_mots_total / self.print_every
                print_loss_total = 0
                print_loss_diff_mots_total = 0
                print('%s (%d %d%%) %.4f %.2f' % (self.timeSince(start, iter / n_iters),
                                             iter, iter / n_iters * 100, print_loss_avg, print_loss_diff_mots_avg))
                
                
    def ErrorCount(self, agent, dialogues):
        bound = 10
        ERRORS = [0 for i in range(bound +1)]
        repartitionError = {}
        for i in range(bound +1) :
            repartitionError[i] = []
        liste = []
        for k, input_dialogue in enumerate(dialogues):
            for l in range(len(input_dialogue)):
                if len(input_dialogue[l][1])>0 :
                    dialogue = input_dialogue[:l+1]
                    #target_answer = variableFromSentence(agent.output_lang, input_dialogue[l][1])
                    target_answer = input_dialogue[l][1]
                    target_answer = target_answer.to(self.device)
                    answer, agent_outputs, attn1_attention_weights, attn2_attention_weights = agent.answerTrain(dialogue)
                    loss, loss_diff_mots = self.distance(agent_outputs, target_answer)
                    if loss_diff_mots > bound :
                        ERRORS = ERRORS + [0 for i in range(loss_diff_mots - bound)]
                        for i in range(bound +1, loss_diff_mots +1) :
                            repartitionError[i] = []
                        bound  = loss_diff_mots
                    ERRORS[loss_diff_mots] += 1
                    if loss_diff_mots > 0 :
                        liste.append([k, l, loss_diff_mots])
        for triple in liste:
            repartitionError[triple[2]].append(triple[:2])
        print("The repartition of errors :", ERRORS)
        return repartitionError


    def DialoguesWithErrors(self, agent, dialogues) :
        '''Returns a dictionnary, with indices of dialogues and index of line in dialogue
           where a mistake was made.
        '''
        start = time.time()
        Sortie = {}
        L = len(dialogues)
        for i, dialogue in enumerate(dialogues) :
            errs = []
            for j in range(len(dialogue)) :
                target_answer = dialogue[j][1]
                target_answer = target_answer.to(self.device)
                answer, agent_outputs, attn1_attention_weights, attn2_attention_weights = agent.answerTrain(dialogue[:j+1],
                                                                                                            target_answer)
                loss, loss_diff_mots = self.distance(agent_outputs, target_answer)
                if loss_diff_mots > 0 :
                    errs.append(j)
            if errs != []:
                Sortie[i] = errs
            if (i+1) % self.print_every == 0:
                print('%s (%d %d%%)' % (self.timeSince(start, (i+1) / L),
                                             (i+1), (i+1) / L * 100))
        return Sortie

Overwriting models/Chatbot.py


<a id="modules"></a>

# 2 Modules

[Retour à la table des matières](#plan)

In [188]:
%mkdir modules

Un sous-r‚pertoire ou un fichier modules existe d‚j….


In [189]:
%%writefile modules/__init__.py

from .Encoder_Words_Recurrent import RecurrentWordsEncoder

from .Attention_Additive import AdditiveAttention
from .Attention_MultiHead import MultiHeadAttention
from .Attention_MultiHoped import MultiHopedAttention
from .Attention_Hierarchical_Recurrent import RecurrentHierarchicalAttention

from .Decoder_Classes import ClassDecoder
from .Decoder_Words import WordsDecoder
from .Decoder_Words_Attn import AttnWordsDecoder


__all__ = [
    'RecurrentWordsEncoder',
    
    'AdditiveAttention',
    'MultiHeadAttention',
    'MultiHopedAttention',
    'RecurrentHierarchicalAttention',
    
    'ClassDecoder',
    'WordsDecoder',
    'AttnWordsDecoder']

Overwriting modules/__init__.py


<a id="encodeursDeTexte"></a>

## 2.1 Encodeurs de texte


[Retour à la table des matières](#plan)

<a id="encodeursDeMots"></a>

### 2.1.1 Encodeur de mots

[Retour à la table des matières](#plan)

Le module **RecurrentWordsEncoder** encode une séquence de mots $w_1, ..., w_T$ en une séquence de vecteurs $h_1, ..., h_T$ en appliquant un plongement suivi d'une couche GRU bi-directionnelle. On peut représenter son fonctionnement par la figure suivante :


![WordEncoder](figs/WordEncoder.png)

In [190]:
%%writefile modules/Encoder_Words_Recurrent.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class RecurrentWordsEncoder(nn.Module):
    def __init__(self, device, embedding, hidden_dim, n_layers = 1, dropout = 0): 
        super(RecurrentWordsEncoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim           # dimension of hidden state of GRUs 
        self.dropout_p = dropout
        self.n_layers = n_layers               # number of stacked GRU layers
        self.output_dim = hidden_dim * 2       # dimension of outputed rep. of words and utterance
        # parameters
        self.embedding = embedding
        for p in embedding.parameters() :
            embedding_dim = p.data.size()[1]
        self.dropout = nn.Dropout(p = dropout)
        self.bigru = nn.GRU(embedding_dim, 
                            hidden_dim, 
                            n_layers,
                            dropout=(0 if n_layers == 1 else dropout), 
                            bidirectional=True)

        
    def initHidden(self): 
        return Variable(torch.zeros(2 * self.n_layers, 1, self.hidden_dim)).to(self.device)

    def forward(self, utterance, hidden = None):
        embeddings = self.embedding(utterance)                          # dim = (input_length, 1, embedding_dim)
        embeddings = self.dropout(embeddings)                           # dim = (input_length, 1, embedding_dim)
        outputs, hidden = self.bigru(embeddings, hidden)
        outputs = self.dropout(outputs)
        return outputs, hidden                                          # dim = (input_length, 1, hidden_dim * 2)

Overwriting modules/Encoder_Words_Recurrent.py


<a id="attentionSimple"></a>

## 2.2 Modules d'attention simple

[Retour à la table des matières](#plan)



### 2.2.1 Module d'attention additive

[Retour à la table des matières](#plan)

![AttentionAdditive](figs/Attention_Additive.png)

In [191]:
%%writefile modules/Attention_Additive.py

import torch
import torch.nn as nn
import torch.nn.functional as F


class AdditiveAttention(nn.Module):

    def __init__(self, query_dim, targets_dim, n_layers = 2): 
        super(AdditiveAttention, self).__init__()
        # relevant quantities
        self.n_level = 1
        self.query_dim = query_dim
        self.targets_dim = targets_dim
        self.output_dim = targets_dim
        self.n_layers = n_layers
        # parameters
        self.attn_layer = nn.Linear(query_dim + targets_dim, targets_dim) if n_layers >= 1 else None
        self.attn_layer2 = nn.Linear(targets_dim, targets_dim) if n_layers >= 2 else None
        self.attn_v = nn.Linear(targets_dim, 1, bias = False) if n_layers >= 1 else None
        self.act = F.softmax

        
    def forward(self, query = None, targets = None):
        '''takes as parameters : 
                a query tensor conditionning the attention,     size = (1, minibatch_size, query_dim)
                a tensor containing attention targets           size = (targets_length, minibatch_size, targets_dim)
           returns : 
                the resulting tensor of the attention process,  size = (1, minibatch_size, targets_dim)
                the attention weights,                          size = (1, targets_length)
        '''
        if targets is not None :
            # concat method 
            if self.n_layers >= 1 :
                poids = torch.cat((query.expand(targets.size(0), -1, -1), targets), 2) if query is not None else targets
                poids = self.attn_layer(poids).tanh()                 # size (targets_length, minibatch_size, targets_dim)
                if self.n_layers >= 2 :
                    poids = self.attn_layer2(poids).tanh()            # size (targets_length, minibatch_size, targets_dim)
                attn_weights = self.attn_v(poids)                     # size (targets_length, minibatch_size, 1)
                attn_weights = torch.transpose(attn_weights, 0,1)     # size (minibatch_size, targets_length, 1)
                targets = torch.transpose(targets, 0,1)               # size (minibatch_size, targets_length, targets_dim)
            # dot method
            else :
                targets = torch.transpose(targets, 0,1)               # size (minibatch_size, targets_length, targets_dim)
                query = torch.transpose(query, 0, 1)                  # size (minibatch_size, 1, query_dim)
                query = torch.transpose(query, 1, 2)                  # size (minibatch_size, query_dim, 1)
                attn_weights = torch.bmm(targets, query)              # size (minibatch_size, targets_length, 1)
                
            attn_weights = self.act(attn_weights, dim = 1)        # size (minibatch_size, targets_length, 1)
            attn_weights = torch.transpose(attn_weights, 1,2)     # size (minibatch_size, 1, targets_length)
            attn_applied = torch.bmm(attn_weights, targets)       # size (minibatch_size, 1, targets_dim)
            attn_applied = torch.transpose(attn_applied, 0,1)     # size (1, minibatch_size, targets_dim)

        else :
            attn_applied = query
            attn_weights = None
        return attn_applied, attn_weights

Overwriting modules/Attention_Additive.py



<a id="attentionAdditiveMultitete"></a>


### 2.2.2 Module d'attention additive multi-tête

[Retour à la table des matières](#plan)

In [192]:
%%writefile modules/Attention_MultiHead.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from . import AdditiveAttention


class MultiHeadAttention(nn.Module):
    '''Module performing additive attention over a sequence of vectors stored in
       a memory block, conditionned by some vector. At instanciation it takes as imput :
       
                - query_dim : the dimension of the conditionning vector
                - targets_dim : the dimension of vectors stored in memory
                
      Other ideas on Multi head attention on 
      https://github.com/jadore801120/attention-is-all-you-need-pytorch/blob/master/transformer/SubLayers.py
      https://github.com/tlatkowski/multihead-siamese-nets/blob/master/layers/attention.py
    '''
    def __init__(self, device, n_heads, query_dim, targets_dim, n_layers = 2): 
        super(MultiHeadAttention, self).__init__()
        # relevant quantities
        self.device = device
        self.n_level = 1
        self.n_heads = n_heads
        self.n_layers = n_layers
        # parameters
        self.attn_modules_list = nn.ModuleList([AdditiveAttention(query_dim, targets_dim, n_layers) for i in range(n_heads)])

        
    def forward(self, query = None, targets = None):
        '''takes as parameters : 
                a query tensor conditionning the attention,     size = (1, n_heads, query_dim)
                a tensor containing attention targets           size = (targets_length, n_heads, targets_dim)
           returns : 
                the resulting tensor of the attention process,  size = (1, n_heads, targets_dim)
                the attention weights,                          size = (n_heads, 1, targets_length)
        '''
        print("multihead attention")
        targets_length = targets.size(0)
        targets_dim    = targets.size(2)
        attn_applied   = Variable(torch.zeros(1, self.n_heads, targets_dim)).to(self.device)
        attn_weights   = torch.zeros(self.n_heads, 1, targets_length).to(self.device)
        for i, attn in enumerate(self.attn_modules_list) :
            que = query[:, i, :] if query is not None else None
            print(que.size())
            tar = targets[:, i, :].unsqueeze(1)
            print(tar.size())
            attn_appl, attn_wghts = attn(que, tar)
            print(attn_appl.size())
            print(attn_wghts.size())
            attn_applied[:, i, :] = attn_appl.squeeze(1)
            attn_weights[i, :, :] = attn_wghts.squeeze(0)
        return attn_applied, attn_weights

Overwriting modules/Attention_MultiHead.py


<a id="attentionAdditiveMultihoped"></a>


### 2.2.3 Module d'attention additive multi-hopée

[Retour à la table des matières](#plan)

In [193]:
%%writefile modules/Attention_MultiHoped.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from . import AdditiveAttention


class MultiHopedAttention(nn.Module):
    '''Module performing additive attention over a sequence of vectors stored in
       a memory block, conditionned by some vector. At instanciation it takes as imput :
       
                - query_dim : the dimension of the conditionning vector
                - targets_dim : the dimension of vectors stored in memory
    '''
    def __init__(self, 
                 device,
                 targets_dim,
                 base_query_dim = 0,
                 hops = 1,
                 share = True,
                 transf = False,
                 dropout = 0
                ):
        super(MultiHopedAttention, self).__init__()
        
        # dimensions
        self.targets_dim = targets_dim
        self.output_dim = targets_dim
        self.hops_query_dim = self.output_dim if hops > 1 else 0
        self.query_dim = base_query_dim + self.hops_query_dim
        
        # structural coefficients
        self.device = device
        self.n_level = 1
        self.hops = hops
        self.share = share
        self.transf = transf
        self.dropout_p = dropout
        if dropout > 0 :
            self.dropout = nn.Dropout(p = dropout)
        
        # parameters
        self.attn = AdditiveAttention(self.query_dim, self.targets_dim) 
        self.transf = nn.GRU(self.targets_dim, self.targets_dim) if transf else None
        
        
    def initQuery(self): 
        if self.hops_query_dim > 0 :
            return Variable(torch.zeros(1, 1, self.hops_query_dim)).to(self.device)
        return None
    
    
    def update(self, hops_query, decision_vector):
        if self.transf is not None :
            _ , update = self.transf(decision_vector, hops_query)
        else :
            update = hops_query + decision_vector
        return update
    
    
    def forward(self, words_memory, base_query = None):
        attn_weights_list = []
        if self.hops > 1 and self.share :
            hops_query = self.initQuery()
        else :
            hops_query = None
            
        for hop in range(self.hops) :
            if base_query is not None and hops_query is not None :
                query = torch.cat((base_query, hops_query), 2) # size (1, self.n_heads, self.query_dim)
            elif base_query is not None :
                query = base_query
            elif hops_query is not None :
                query = hops_query
            else :
                query = None
            
            decision_vector, attn_weights = self.attn(query, words_memory)
            attn_weights_list.append(attn_weights)
            if self.hops > 1 :
                hops_query = self.update(hops_query, decision_vector) if hops_query is not None else decision_vector                          # size (L, n_classes, output_dim)
            else :
                hops_query = decision_vector
  
        
        # output decision vector
        return hops_query, attn_weights_list

Overwriting modules/Attention_MultiHoped.py


<a id="attentionHierarchique"></a>

## 2.3 Attention hiérarchique

[Retour à la table des matières](#plan)

![HierarchicalAttention](figs/Hierarchical_Attention.png)

In [194]:
%%writefile modules/Attention_Hierarchical_Recurrent.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from . import AdditiveAttention, MultiHeadAttention


class RecurrentHierarchicalAttention(nn.Module):
    '''Ce module d'attention est :
    
    - hiérarchique avec bi-GRU entre chaque niveau d'attention
    - multi-tête sur chaque niveau d'attention
    - globalement multi-hopé, où il est possible d'effectuer plusieurs passes pour accumuler de l'information
    '''

    def __init__(self, 
                 device,
                 word_hidden_dim, 
                 sentence_hidden_dim,
                 base_query_dim = 0, 
                 n_heads = 1,
                 n_layers = 1,
                 hops = 1,
                 share = True,
                 transf = False,
                 dropout = 0
                ):
        super(RecurrentHierarchicalAttention, self).__init__()
        
        # dimensions
        self.word_hidden_dim = word_hidden_dim
        self.sentence_input_dim = self.word_hidden_dim
        self.sentence_hidden_dim = sentence_hidden_dim
        self.context_vector_dim = sentence_hidden_dim * 2
        self.output_dim = sentence_hidden_dim * 2
        self.base_query_dim = base_query_dim
        self.hops_query_dim = self.output_dim if hops > 1 else 0
        self.query_dim = self.base_query_dim + self.hops_query_dim
        
        # structural coefficients
        self.device = device
        self.n_level = 2
        self.n_heads = n_heads
        self.n_layers = n_layers
        self.hops = hops
        self.share = share
        self.dropout_p = dropout
        self.dropout = nn.Dropout(p = dropout)
        
        # first attention module
        attn1_list = []
        if share :
            attn1 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.word_hidden_dim) if n_heads > 1 else \
                    AdditiveAttention(self.query_dim, self.word_hidden_dim) 
            for hop in range(hops):
                attn1_list.append(attn1)
            self.attn1 = nn.ModuleList(attn1_list)
        else :
            for hop in range(hops):
                qd = self.query_dim if hop > 0 else self.base_query_dim
                attn1 = MultiHeadAdditiveAttention(n_heads, qd, self.word_hidden_dim) if n_heads > 1 else \
                        AdditiveAttention(qd, self.word_hidden_dim) 
                attn1_list.append(attn1)
            self.attn1 = nn.ModuleList(attn1_list)
        
        # intermediate encoder module
        self.bigru = nn.GRU(self.sentence_input_dim, 
                            self.sentence_hidden_dim, 
                            n_layers,
                            dropout=(0 if n_layers == 1 else dropout), 
                            bidirectional=True)
        
        # second attention module
        attn2_list = []
        if share :
            attn2 = MultiHeadAdditiveAttention(n_heads, self.query_dim, self.context_vector_dim) if n_heads > 1 else \
                    AdditiveAttention(self.query_dim, self.context_vector_dim) 
            for hop in range(hops):
                attn2_list.append(attn2)
            self.attn2 = nn.ModuleList(attn2_list)
        else :
            for hop in range(hops):
                qd = self.query_dim if hop > 0 else self.base_query_dim
                attn2 = MultiHeadAdditiveAttention(n_heads, qd, self.context_vector_dim) if n_heads > 1 else \
                        AdditiveAttention(qd, self.context_vector_dim) 
                attn2_list.append(attn2)
            self.attn2 = nn.ModuleList(attn2_list)
        
        # accumulation step
        self.transf = nn.GRU(self.context_vector_dim, self.context_vector_dim) if transf else None


    def initQuery(self): 
        if self.hops_query_dim > 0 :
            return Variable(torch.zeros(1, self.n_heads, self.hops_query_dim)).to(self.device)
        return None
        
                
    def initHidden(self): 
        return Variable(torch.zeros(2 * self.n_layers, self.n_heads, self.sentence_hidden_dim)).to(self.device)
        
        
    def singlePass(self, words_memory, query, attn1, attn2): 
        L = len(words_memory)
        attn1_weights = {}
        bigru_inputs = Variable(torch.zeros(L, self.n_heads, self.sentence_input_dim)).to(self.device)
        # first attention layer
        for i in range(L) :
            targets = words_memory[i]                              # size (N_i, 1, 2*word_hidden_dim)
            targets = targets.repeat(1, self.n_heads, 1)           # size (N_i, n_heads, 2*word_hidden_dim)
            attn1_output, attn1_wghts = attn1(query, targets)
            attn1_output = self.dropout(attn1_output)
            attn1_weights[i] = attn1_wghts
            bigru_inputs[i] = attn1_output.squeeze(0)              # size (n_heads, 2*word_hidden_dim)
        # intermediate biGRU
        bigru_hidden = self.initHidden()
        attn2_inputs, bigru_hidden = self.bigru(bigru_inputs, bigru_hidden)  # size (L, n_heads, 2*word_hidden_dim)
        # second attention layer
        attn2_inputs = self.dropout(attn2_inputs)
        decision_vector, attn2_weights = attn2(query = query, targets = attn2_inputs)
        attn2_weights = attn2_weights.view(-1)
        decision_vector = self.dropout(decision_vector)
        # output decision vector
        return decision_vector, attn1_weights, attn2_weights
    
    
    
    def update(self, hops_query, decision_vector):
        if self.transf is not None :
            _ , update = self.transf(decision_vector, hops_query)
        else :
            update = hops_query + decision_vector
        return update
        
        
    def forward(self, words_memory, base_query = None):
        '''takes as parameters : 
                a tensor containing words_memory vectors        dim = (words_memory_length, word_hidden_dim)
                a tensor containing past queries                dim = (words_memory_length, query_dim)
           returns : 
                the resulting decision vector                   dim = (1, 1, query_dim)
                the weights of first attention layer (dict)     
                the weights of second attention layer (dict)
        '''
        attn1_weights_list = []
        attn2_weights_list = []
        if len(words_memory) > 0 :
            if base_query is not None :
                base_query = base_query.repeat(1, self.n_heads, 1)
            if self.hops > 1 and self.share :
                hops_query = self.initQuery()
            else :
                hops_query = None

            for hop in range(self.hops) :
                if base_query is not None and hops_query is not None :
                    query = torch.cat((base_query, hops_query), 2) # size (1, self.n_heads, self.query_dim)
                elif base_query is not None :
                    query = base_query
                elif hops_query is not None :
                    query = hops_query
                else :
                    query = None
                decision_vector, attn1_weights, attn2_weights = self.singlePass(words_memory, 
                                                                                query, 
                                                                                self.attn1[hop], 
                                                                                self.attn2[hop])
                attn1_weights_list.append(attn1_weights)
                attn2_weights_list.append(attn2_weights)
                if self.hops > 1 and hops_query is not None :
                    hops_query = self.update(hops_query, decision_vector)  # size (L, self.n_heads, self.output_dim)
                else :
                    hops_query = decision_vector
        else :
            hops_query = base_query
        # output decision vector
        return hops_query, attn1_weights_list, attn2_weights_list

Overwriting modules/Attention_Hierarchical_Recurrent.py


<a id="decodeurs"></a>

# 2.4 Modules de décodage


[Retour à la table des matières](#plan)

<a id="decodeursSelectifs"></a>


### 2.4.1 Décodeur sélectif

[Retour à la table des matières](#plan)

In [195]:
%%writefile modules/Decoder_Classes.py

import torch
import torch.nn as nn
import torch.nn.functional as F


class ClassDecoder(nn.Module):
    
    def __init__(self, text_dim, n_classes) :
        super(ClassDecoder, self).__init__() 
        self.version = 'class'
        self.n_classes = n_classes
        self.classes_decoder = nn.Linear(text_dim, n_classes)

    def forward(self, text_vector, train_mode = False):
        classes_vector = self.classes_decoder(text_vector).view(-1)
        if train_mode :
            return classes_vector
        else :
            classes = F.softmax(classes_vector) 
            topv, topi = classes.data.topk(1)
            result = topi[0][0].numpy()
            return result   

Overwriting modules/Decoder_Classes.py


<a id="decodeurGeneratif"></a>


### 2.4.2 Décodeur génératif

[Retour à la table des matières](#plan)

![Decoder](figs/Decoder.png)

In [196]:
%%writefile modules/Decoder_Words.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


class WordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, device, embedding, hidden_dim, tracking_dim, dropout = 0.1):
        super(WordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.tracking_dim = tracking_dim
        # modules
        self.embedding = embedding
        for p in embedding.parameters() :
            lang_size     = p.data.size(0)
            embedding_dim = p.data.size(1)
        self.reduce = nn.Linear(tracking_dim, hidden_dim)
        self.gru = nn.GRU(embedding_dim + hidden_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, final_vector, hidden, current_word_index):
        # update hidden state
        current_word = self.embedding(current_word_index).view(1,1,-1)
        embedded = torch.cat((final_vector, current_word), dim = 2)
        #embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        vector = self.out(hidden).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        return log_proba, hidden
    
    
    def forward(self, last_words, query_vector, decision_vector, target_answer = None) :
        bound = 25
        log_probas = []
        answer = []
        di = 0
        final_vector = self.reduce(decision_vector).tanh() + query_vector
        final_vector = self.dropout(final_vector)
        current_word_index = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        hidden = final_vector
        for di in range(bound) :
            log_proba, hidden = self.generateWord(final_vector, hidden, current_word_index)
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == 1 : # EOS_token
                break
            elif target_answer is not None : # Teacher forcing
                answer.append(ni)
                if di < target_answer.size(0) :
                    current_word_index = target_answer[di].to(self.device)
                else :
                    break
            else :
                answer.append(ni)
                current_word_index = Variable(torch.LongTensor([[ni]])).to(self.device)
        return answer, log_probas

Overwriting modules/Decoder_Words.py


<a id="decodeurGeneratifAttention"></a>


### 2.4.3 Décodeur génératif à attention

[Retour à la table des matières](#plan)

In [197]:
%%writefile modules/Decoder_Words_Attn.py

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

from . import AdditiveAttention


class AttnWordsDecoder(nn.Module):
    '''Transforms a vector into a sequence of words'''
    def __init__(self, device, embedding, hidden_dim, n_layers = 0, dropout = 0.1):
        super(AttnWordsDecoder, self).__init__()
        # relevant quantities
        self.device = device
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        # modules
        self.embedding = embedding
        for p in embedding.parameters() :
            lang_size     = p.data.size()[0]
            embedding_dim = p.data.size()[1]
        self.gru = nn.GRU(embedding_dim + hidden_dim, hidden_dim)
        self.attn = AdditiveAttention(hidden_dim, hidden_dim, n_layers) 
        self.concat = nn.Linear(2 * hidden_dim, hidden_dim)
        self.out = nn.Linear(hidden_dim, lang_size)
        self.dropout = nn.Dropout(dropout)
        
        
    def generateWord(self, last_words, decision_vector, hidden, current_word_index):
        # update hidden state
        current_word = self.embedding(current_word_index).view(1,1,-1)
        #current_word = self.dropout(current_word)
        embedded = torch.cat((current_word, decision_vector), dim = 2)
        embedded = self.dropout(embedded)
        _, hidden = self.gru(embedded, hidden)
        # generate next word
        attn, attn_weights = self.attn(hidden, last_words)
        vector = self.concat(torch.cat((hidden, attn), dim = 2)).tanh()
        vector = self.out(vector).squeeze(0)
        log_proba = F.log_softmax(vector, dim = 1)
        return log_proba, hidden
    
    
    def forward(self, last_words, query_vector, decision_vector, target_answer = None) :
        bound = 25
        log_probas = []
        answer = []
        di = 0
        decision_vector = self.dropout(decision_vector)
        current_word_index = Variable(torch.LongTensor([[0]])).to(self.device) # SOS_token
        last_words = self.dropout(last_words)
        hidden = self.dropout(query_vector)
        for di in range(bound) :
            log_proba, hidden = self.generateWord(last_words, decision_vector, hidden, current_word_index)
            topv, topi = log_proba.data.topk(1)
            log_probas.append(log_proba)
            ni = topi[0][0] # index of current generated word
            if ni == 1 : # EOS_token
                break
            elif target_answer is not None : # Teacher forcing
                answer.append(ni)
                if di < target_answer.size(0) :
                    current_word_index = target_answer[di].to(self.device)
                else :
                    break
            else :
                answer.append(ni)
                current_word_index = Variable(torch.LongTensor([[ni]])).to(self.device)
        return answer, log_probas

Overwriting modules/Decoder_Words_Attn.py


Retour dans le répertoire courant du tableau de bord :

In [198]:
%cd ..

C:\Users\Jb\Desktop\Scripts\notebooks


[Retour à la table des matières](#plan)

<a id="basDePage"></a>