<a href="https://colab.research.google.com/github/marued/low-resource-machine-translation-team07/blob/master/notebooks/NMT_networks_seq2seq_nmt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# TensorFlow Addons Networks : Sequence-to-Sequence NMT with Attention Mechanism

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/addons/blob/master/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
      <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/addons/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

## Overview
This notebook gives a brief introduction into the ***Sequence to Sequence Model Architecture***
In this noteboook we broadly cover four essential topics necessary for Neural Machine Translation:


* **Data cleaning**
* **Data preparation**
* **Neural Translation Model with Attention**
* **Final Translation**

The basic idea behind such a model though, is only the encoder-decoder architecture. These networks are usually used for a variety of tasks like text-summerization, Machine translation, Image Captioning, etc. This tutorial provideas a hands-on understanding of the concept, explaining the technical jargons wherever necessary. We focus on the task of Neural Machine Translation (NMT) which was the very first testbed for seq2seq models.



In [2]:
!nvidia-smi

Mon Apr 13 06:09:27 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P0    25W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## Setup

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

In [0]:
try:
  %tensorflow_version 2.x
except:
  pass

In [5]:
!pip install sentencepiece



In [6]:
import csv
import string
import re
from pickle import dump
from unicodedata import normalize
from numpy import array
import itertools
from pickle import load
from tensorflow.keras.utils import to_categorical
from keras.utils.vis_utils import plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Embedding
from pickle import load
from numpy import array
from numpy import argmax
import tensorflow as tf
from keras.models import load_model
from nltk.translate.bleu_score import corpus_bleu
from sklearn.model_selection import train_test_split
import tensorflow_addons as tfa
import numpy as np
import sentencepiece as spm

Using TensorFlow backend.


## Data Cleaning

Our data set is a German-English translation dataset. It contains 152,820 pairs of English to German phases, one pair per line with a tab separating the language. These dataset though organized needs cleaning before we can work on it. This will enable us to remove unnecessary bumps that may come in during the training.

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
import os 
os.chdir("/content/drive/My Drive/2020Winter/IFT6759/project2/low-resource-machine-translation-team07/undreamt-tf/data/fr_cased_punc")

In [9]:
os.listdir()

['restult_dimension_300',
 'sub_test.lang2.atok',
 'sub_test.lang2',
 'fr_bpe.vocab',
 'test_fr_pred.txt.atok',
 'en_bpe.vocab',
 'en_bpe.model',
 'fr_bpe.model',
 'sub_test.lang1',
 'test_fr_pred.txt',
 'sub_test.lang1.atok',
 'unaligned.cased.en',
 'sub_train.lang2',
 'unaligned.uncased.en',
 'train.lang1',
 'en_corpus.txt',
 'unaligned.cased.fr',
 'sub_train.lang1',
 'sub_train.lang2.atok',
 'unaligned.fr',
 'sub_train.lang1.atok',
 'train.lang2',
 'fr_corpus.txt.atok',
 'en_corpus.txt.atok',
 'unaligned.en',
 'fr_corpus.txt',
 'unaligned.uncased.fr',
 'multi-bleu-detok.perl',
 'multi-bleu.perl',
 'seq2seq_test_fr_pred.txt',
 'sentencepiece_seq2seq_test_fr_pred.txt',
 'nmt_networks_seq2seq_nmt.py']

## Tokenization

In [10]:
sp_en = spm.SentencePieceProcessor()
sp_en.load('en_bpe.model')

sp_fr = spm.SentencePieceProcessor()
sp_fr.load('fr_bpe.model')

True

In [0]:
def load_ids(sp, filename, start_tok=True, end_tok=True):
    #<unk>=0, <s>=1, </s>=2
    with open(filename, 'r') as f_in:
        ids= list()
        for line in f_in:
            line= sp.encode_as_ids(line)
            if start_tok: line.insert(0,1)
            if end_tok: line.append(2)
            ids.append(line)
        return ids

In [12]:
data_en = load_ids(sp_en, "sub_train.lang1")
print(data_en[0])
print(sp_en.decode_ids(data_en[0]))

data_fr = load_ids(sp_fr, "sub_train.lang2")
print(data_fr[0])
print(sp_fr.decode_ids(data_fr[0]))

[1, 144, 632, 560, 7, 1366, 52, 7024, 571, 3079, 3080, 47, 2753, 2]
so too does the idea that accommodating religious differences is dangerous
[1, 84, 1438, 1405, 12, 285, 3483, 49, 3821, 7297, 1354, 498, 7482, 17, 2]
L ’ idée de concilier les différences religieuses semble donc dangereuse .


In [0]:
# #<unk>=0, <s>=1, </s>=2, <sep>=3, <cls>=4
# print(sp_fr.piece_to_id("<s>"))
# for id in range(4):
#   print(sp_fr.id_to_piece(id), sp_fr.is_control(id))

In [0]:
data_en = tf.keras.preprocessing.sequence.pad_sequences(data_en,padding='post')
data_fr = tf.keras.preprocessing.sequence.pad_sequences(data_fr,padding='post')

In [15]:
for i in range(5): print(data_en[:i]) 

[]
[[   1  144  632  560    7 1366   52 7024  571 3079 3080   47 2753    2
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0]]
[[   1  144  632  560    7 1366   52 7024  571 3079 3080   47 2753    2
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0   

In [0]:
def max_len(tensor):
    #print( np.argmax([len(t) for t in tensor]))
    return max( len(t) for t in tensor)

## Model Parameters

In [0]:
# X_train,  X_test, Y_train, Y_test = train_test_split(data_en,data_fr,test_size=0.2)
X_train, Y_train = data_en, data_fr
BATCH_SIZE = 128
BUFFER_SIZE = len(X_train)
steps_per_epoch = BUFFER_SIZE//BATCH_SIZE
embedding_dims = 256
rnn_units = 1024
dense_units = 1024
Dtype = tf.float32   #used to initialize DecoderCell Zero state

## Dataset Prepration

In [18]:
Tx = max_len(data_en)
Ty = max_len(data_fr)  

input_vocab_size = sp_en.get_piece_size()
output_vocab_size = sp_fr.get_piece_size()

print(input_vocab_size, output_vocab_size)
dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
example_X, example_Y = next(iter(dataset))
print(example_X.shape) 
print(example_Y.shape) 

10000 10000
(128, 111)
(128, 159)


## Defining NMT Model

In [0]:
# https://github.com/KonevDmitry/code_embeddings/blob/master/model/model.py

#ENCODER
class EncoderNetwork(tf.keras.Model):
    def __init__(self,input_vocab_size,embedding_dims, rnn_units, bidirectional=True):
        super().__init__()
        self.encoder_embedding = tf.keras.layers.Embedding(input_dim=input_vocab_size,
                                                           output_dim=embedding_dims)
        self.encoder_rnnlayer = tf.keras.layers.LSTM(rnn_units,return_sequences=True, 
                                                     return_state=True )


# class Encoder(tf.keras.Model):
#     def __init__(self, input_vocab_size, embedding_dims, rnn_units, batch_sz):
#         super(Encoder, self).__init__()
#         self.batch_sz = batch_sz
#         self.rnn_units = rnn_units
#         self.encoder_embedding = tf.keras.layers.Embedding(input_vocab_size, embedding_dims)
#         self.lstm = tf.keras.layers.LSTM(self.rnn_units,
#                                        return_sequences=True,
#                                        return_state=True,
#                                        recurrent_initializer='glorot_uniform')
#         self.encoder_rnnlayer = tf.keras.layers.Bidirectional(self.lstm)

#     def call(self, x, initial_state):
#         x = self.encoder_embedding(x)
#         output, stateF, stateB = self.encoder_rnnlayer(x, initial_state = initial_state)
#         return output, [stateF, stateB]

#     def initialize_hidden_state(self):
#         init_state = [tf.zeros((self.batch_sz, self.rnn_units)) for i in range(2)]
#         return init_state
    
class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz=32):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        gru = tf.keras.layers.LSTM(self.enc_units // 2,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.bidi = tf.keras.layers.Bidirectional(gru, merge_mode='concat')

    def call(self, x, hidden):
        x = self.embedding(x)
        output = self.bidi(x)

        whole_sequence_output=output[0]
        final_memory_state=tf.concat([output[1], output[3]], axis=1)
        final_carry_state=tf.concat([output[2], output[4]], axis=1)

        return whole_sequence_output, final_memory_state, final_carry_state
        # return output, stateF, stateB

    def initialize_hidden_state(self):
        init_state = [tf.zeros((self.batch_sz, self.enc_units //2 )) for i in range(2)]
        
        return init_state

#DECODER
class DecoderNetwork(tf.keras.Model):
    def __init__(self,output_vocab_size, embedding_dims, rnn_units):
        super().__init__()
        self.decoder_embedding = tf.keras.layers.Embedding(input_dim=output_vocab_size,
                                                           output_dim=embedding_dims) 
        self.dense_layer = tf.keras.layers.Dense(output_vocab_size)
        self.decoder_rnncell = tf.keras.layers.LSTMCell(rnn_units)
        # Sampler
        self.sampler = tfa.seq2seq.sampler.TrainingSampler()
        # Create attention mechanism with memory = None
        self.attention_mechanism = self.build_attention_mechanism(dense_units,None,BATCH_SIZE*[Tx])
        self.rnn_cell =  self.build_rnn_cell(BATCH_SIZE)
        self.decoder = tfa.seq2seq.BasicDecoder(self.rnn_cell, sampler= self.sampler,
                                                output_layer=self.dense_layer)

    def build_attention_mechanism(self, units,memory, memory_sequence_length):
        return tfa.seq2seq.LuongAttention(units, memory = memory, 
                                          memory_sequence_length=memory_sequence_length)
        #return tfa.seq2seq.BahdanauAttention(units, memory = memory, memory_sequence_length=memory_sequence_length)

    # wrap decodernn cell  
    def build_rnn_cell(self, batch_size ):
        rnn_cell = tfa.seq2seq.AttentionWrapper(self.decoder_rnncell, self.attention_mechanism,
                                                attention_layer_size=dense_units)
        return rnn_cell
    
    def build_decoder_initial_state(self, batch_size, encoder_state,Dtype):
        decoder_initial_state = self.rnn_cell.get_initial_state(batch_size = batch_size, 
                                                                dtype = Dtype)
        decoder_initial_state = decoder_initial_state.clone(cell_state=encoder_state) 
        return decoder_initial_state

# encoderNetwork = EncoderNetwork(input_vocab_size,embedding_dims, rnn_units)
encoderNetwork = Encoder(input_vocab_size,embedding_dims, rnn_units, BATCH_SIZE)
decoderNetwork = DecoderNetwork(output_vocab_size,embedding_dims, rnn_units)
optimizer = tf.keras.optimizers.Adam()


## Initializing Training functions

In [0]:
def loss_function(y_pred, y):
   
    #shape of y [batch_size, ty]
    #shape of y_pred [batch_size, Ty, output_vocab_size] 
    sparsecategoricalcrossentropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,
                                                                                  reduction='none')
    loss = sparsecategoricalcrossentropy(y_true=y, y_pred=y_pred)
    mask = tf.logical_not(tf.math.equal(y,0))   #output 0 for y=0 else output 1
    mask = tf.cast(mask, dtype=loss.dtype)
    loss = mask* loss
    loss = tf.reduce_mean(loss)
    return loss


def train_step(input_batch, output_batch,encoder_initial_cell_state):
    #initialize loss = 0
    loss = 0
    with tf.GradientTape() as tape:
        # encoder_emb_inp = encoderNetwork.encoder_embedding(input_batch)

        # a, a_tx, c_tx = encoderNetwork.encoder_rnnlayer(encoder_emb_inp, 
        #                                                 initial_state =encoder_initial_cell_state)
        a, a_tx, c_tx = encoderNetwork(input_batch, encoder_initial_cell_state)

        #[last step activations,last memory_state] of encoder passed as input to decoder Network
        
         
        # Prepare correct Decoder input & output sequence data
        decoder_input = output_batch[:,:-1] # ignore <end>
        #compare logits with timestepped +1 version of decoder_input
        decoder_output = output_batch[:,1:] #ignore <start>


        # Decoder Embeddings
        decoder_emb_inp = decoderNetwork.decoder_embedding(decoder_input)

        #Setting up decoder memory from encoder output and Zero State for AttentionWrapperState
        decoderNetwork.attention_mechanism.setup_memory(a)
        decoder_initial_state = decoderNetwork.build_decoder_initial_state(BATCH_SIZE,
                                                                           encoder_state=[a_tx, c_tx],
                                                                           Dtype=tf.float32)
        
        #BasicDecoderOutput        
        outputs, _, _ = decoderNetwork.decoder(decoder_emb_inp,initial_state=decoder_initial_state,
                                               sequence_length=BATCH_SIZE*[Ty-1])

        logits = outputs.rnn_output
        #Calculate loss

        loss = loss_function(logits, decoder_output)

    #Returns the list of all layer variables / weights.
    variables = encoderNetwork.trainable_variables + decoderNetwork.trainable_variables  
    # differentiate loss wrt variables
    gradients = tape.gradient(loss, variables)

    #grads_and_vars – List of(gradient, variable) pairs.
    grads_and_vars = zip(gradients,variables)
    optimizer.apply_gradients(grads_and_vars)
    return loss

In [0]:
#RNN LSTM hidden and memory state initializer
def initialize_initial_state():
    # [num_of layers, batch, hidden]
    return [tf.zeros((BATCH_SIZE, rnn_units)), tf.zeros((BATCH_SIZE, rnn_units))]

## Training

Load checkpoint

In [27]:
# https://github.com/dhirensk/ai/blob/master/practice/english_to_french_seq2seq_tf_2_0_withattention.py

checkpointdir = os.path.join('/content/drive/My Drive/2020Winter/IFT6759/project2/low-resource-machine-translation-team07/',"test")
chkpoint_prefix = os.path.join(checkpointdir, "chkpoint")
if not os.path.exists(checkpointdir):
    os.mkdir(checkpointdir)

checkpoint = tf.train.Checkpoint(optimizer = optimizer, encoderNetwork = encoderNetwork, 
                                 decoderNetwork = decoderNetwork)

try:
    status = checkpoint.restore(tf.train.latest_checkpoint(checkpointdir))
    print("Checkpoint found at {}".format(tf.train.latest_checkpoint(checkpointdir)))
except:
    print("No checkpoint found at {}".format(checkpointdir))

Checkpoint found at None


Train steps

In [28]:
epochs = 30
for i in range(1, epochs+1):

    # encoder_initial_cell_state = initialize_initial_state()
    encoder_initial_cell_state = encoderNetwork.initialize_hidden_state()
    total_loss = 0.0

    for ( batch , (input_batch, output_batch)) in enumerate(dataset.take(steps_per_epoch)):
        batch_loss = train_step(input_batch, output_batch, encoder_initial_cell_state)
        total_loss += batch_loss
        if (batch+1)%5 == 0:
            print("total loss: {} epoch {} batch {} ".format(batch_loss.numpy(), i, batch+1))
    checkpoint.save(file_prefix = chkpoint_prefix)

total loss: 1.6595466136932373 epoch 1 batch 5 
total loss: 1.3233318328857422 epoch 1 batch 10 
total loss: 1.2915977239608765 epoch 1 batch 15 
total loss: 1.1823575496673584 epoch 1 batch 20 
total loss: 1.2220191955566406 epoch 1 batch 25 
total loss: 1.141091227531433 epoch 1 batch 30 
total loss: 1.259871482849121 epoch 1 batch 35 
total loss: 1.1029536724090576 epoch 1 batch 40 
total loss: 1.1422545909881592 epoch 1 batch 45 
total loss: 1.185848593711853 epoch 1 batch 50 
total loss: 1.0235154628753662 epoch 1 batch 55 
total loss: 1.2659356594085693 epoch 1 batch 60 
total loss: 1.1213037967681885 epoch 1 batch 65 
total loss: 1.1102582216262817 epoch 1 batch 70 
total loss: 1.0557441711425781 epoch 1 batch 75 
total loss: 1.1541165113449097 epoch 2 batch 5 
total loss: 1.0210813283920288 epoch 2 batch 10 
total loss: 1.0645779371261597 epoch 2 batch 15 
total loss: 1.1152064800262451 epoch 2 batch 20 
total loss: 1.1407300233840942 epoch 2 batch 25 
total loss: 1.06231343746

## Evaluation

In [0]:
def predict(input_sequences):
    #In this section we evaluate our model on a raw_input converted to german, for this the entire sentence has to be passed
    #through the length of the model, for this we use greedsampler to run through the decoder
    #and the final embedding matrix trained on the data is used to generate embeddings
    # input_raw='▁i ▁agree ▁that ▁we ▁need ▁an ▁ambitious ▁social ▁agenda ▁which ▁will ▁include ▁combating ▁poverty ▁and ▁social ▁exclusion'
    input_sequences = tf.keras.preprocessing.sequence.pad_sequences(input_sequences,
                                                                    maxlen=Tx, padding='post')
    inp = tf.convert_to_tensor(input_sequences)
    # print(inp.shape)
    inference_batch_size = input_sequences.shape[0]
    encoder_initial_cell_state = [tf.zeros((inference_batch_size, rnn_units)),
                                tf.zeros((inference_batch_size, rnn_units))]
    # encoder_emb_inp = encoderNetwork.encoder_embedding(inp)
    # a, a_tx, c_tx = encoderNetwork.encoder_rnnlayer(encoder_emb_inp,
    #                                                 initial_state =encoder_initial_cell_state)
    a, a_tx, c_tx = encoderNetwork(inp,encoder_initial_cell_state)
    # print('a_tx :',a_tx.shape)
    # print('c_tx :', c_tx.shape)

    start_tokens = tf.fill([inference_batch_size], 1)

    end_token = 2

    greedy_sampler = tfa.seq2seq.GreedyEmbeddingSampler()

    decoder_input = tf.expand_dims([1]* inference_batch_size,1)
    decoder_emb_inp = decoderNetwork.decoder_embedding(decoder_input)

    decoder_instance = tfa.seq2seq.BasicDecoder(cell = decoderNetwork.rnn_cell, sampler = greedy_sampler,
                                                output_layer=decoderNetwork.dense_layer)
    decoderNetwork.attention_mechanism.setup_memory(a)
    #pass [ last step activations , encoder memory_state ] as input to decoder for LSTM
    # print("decoder_initial_state = [a_tx, c_tx] :",np.array([a_tx, c_tx]).shape)
    decoder_initial_state = decoderNetwork.build_decoder_initial_state(inference_batch_size,
                                                                    encoder_state=[a_tx, c_tx],
                                                                    Dtype=tf.float32)
    # print("\nCompared to simple encoder-decoder without attention, the decoder_initial_state \
    #  is an AttentionWrapperState object containing s_prev tensors and context and alignment vector \n ")
    # print("decoder initial state shape :",np.array(decoder_initial_state).shape)
    # print("decoder_initial_state tensor \n", decoder_initial_state)

    # Since we do not know the target sequence lengths in advance, we use maximum_iterations to limit the translation lengths.
    # One heuristic is to decode up to two times the source sentence lengths.
    maximum_iterations = tf.round(tf.reduce_max(Tx) * 2)

    #initialize inference decoder
    decoder_embedding_matrix = decoderNetwork.decoder_embedding.variables[0] 
    (first_finished, first_inputs,first_state) = decoder_instance.initialize(decoder_embedding_matrix,
                                start_tokens = start_tokens,
                                end_token=end_token,
                                initial_state = decoder_initial_state)
    #print( first_finished.shape)
    # print("\nfirst_inputs returns the same decoder_input i.e. embedding of  <start> :",first_inputs.shape)
    # print("start_index_emb_avg ", tf.reduce_sum(tf.reduce_mean(first_inputs, axis=0))) # mean along the batch

    inputs = first_inputs
    state = first_state  
    predictions = np.empty((inference_batch_size,0), dtype = np.int32)                                                                             
    for j in range(maximum_iterations):
        outputs, next_state, next_inputs, finished = decoder_instance.step(j,inputs,state)
        inputs = next_inputs
        state = next_state
        outputs = np.expand_dims(outputs.sample_id,axis = -1)
        predictions = np.append(predictions, outputs, axis = -1)
    return predictions

## Final Translation

In [0]:
# test_data_en=load_data("sub_test.lang1.atok", end_tok=False)
test_data_en=load_ids(sp_en, "sub_test.lang1")
predictions = predict(test_data_en)


In [31]:
line_ = list(itertools.takewhile( lambda index: index !=2, predictions[10].tolist()))
print(sp_en.decode_ids(test_data_en[10]))
print(sp_fr.decode_ids(line_))

i did n't sleep well
J ' ai fait sans faire .


In [0]:
#prediction based on our sentence earlier
with open("sentencepiece_seq2seq_test_fr_pred.txt", 'w') as f_out:
    for i in range(len(predictions)):
        line = predictions[i,:].tolist()
        seq = list(itertools.takewhile( lambda index: index !=2, line))
        f_out.writelines([sp_fr.decode_ids(seq), '\n'])

## bleu score

In [33]:
!perl multi-bleu.perl sub_test.lang2 < sentencepiece_seq2seq_test_fr_pred.txt

BLEU = 9.19, 36.5/12.5/5.7/2.8 (BP=1.000, ratio=1.024, hyp_len=25377, ref_len=24780)


### The accuracy can be improved by implementing:
* Beam Search or Lexicon Search
* Bi-directional encoder-decoder model 

In [0]:
def predict_beam(input_sequences):       

    """### Inference using Beam Search with beam_width = 3"""

    beam_width = 3
    input_sequences = tf.keras.preprocessing.sequence.pad_sequences(input_sequences,
                                                                    maxlen=Tx, padding='post')
    inp = tf.convert_to_tensor(input_sequences)
    #print(inp.shape)
    inference_batch_size = input_sequences.shape[0]
    encoder_initial_cell_state = [tf.zeros((inference_batch_size, rnn_units)),
                                tf.zeros((inference_batch_size, rnn_units))]
    a, a_tx, c_tx = encoderNetwork(inp,encoder_initial_cell_state)
    #pass [ last step activations , encoder memory_state ] as input to decoder for LSTM
    s_prev = [a_tx, c_tx]

    start_tokens = tf.fill([inference_batch_size],1)
    end_token = 2

    decoder_input = tf.expand_dims([1]* inference_batch_size,1)
    decoder_emb_inp = decoderNetwork.decoder_embedding(decoder_input)


    #From official documentation
    #NOTE If you are using the BeamSearchDecoder with a cell wrapped in AttentionWrapper, then you must ensure that:

    #The encoder output has been tiled to beam_width via tfa.seq2seq.tile_batch (NOT tf.tile).
    #The batch_size argument passed to the get_initial_state method of this wrapper is equal to true_batch_size * beam_width.
    #The initial state created with get_initial_state above contains a cell_state value containing properly tiled final state from the encoder.
    encoder_memory = tfa.seq2seq.tile_batch(a, beam_width)
    decoderNetwork.attention_mechanism.setup_memory(encoder_memory)
    print("beam_with * [batch_size, Tx, rnn_units] :  3 * [2, Tx, rnn_units]] :", encoder_memory.shape)
    #set decoder_inital_state which is an AttentionWrapperState considering beam_width
    decoder_initial_state = decoderNetwork.rnn_cell.get_initial_state(batch_size = inference_batch_size* beam_width,dtype = Dtype)
    encoder_state = tfa.seq2seq.tile_batch(s_prev, multiplier=beam_width)
    decoder_initial_state = decoder_initial_state.clone(cell_state=encoder_state) 

    decoder_instance = tfa.seq2seq.BeamSearchDecoder(decoderNetwork.rnn_cell,beam_width=beam_width,
                                                    output_layer=decoderNetwork.dense_layer)


    # Since we do not know the target sequence lengths in advance, we use maximum_iterations to limit the translation lengths.
    # One heuristic is to decode up to two times the source sentence lengths.
    maximum_iterations = tf.round(tf.reduce_max(Tx) * 2)

    #initialize inference decoder
    decoder_embedding_matrix = decoderNetwork.decoder_embedding.variables[0] 
    (first_finished, first_inputs,first_state) = decoder_instance.initialize(decoder_embedding_matrix,
                                start_tokens = start_tokens,
                                end_token=end_token,
                                initial_state = decoder_initial_state)
    #print( first_finished.shape)
    print("\nfirst_inputs returns the same decoder_input i.e. embedding of  <start> :",first_inputs.shape)

    inputs = first_inputs
    state = first_state  
    predictions = np.empty((inference_batch_size, beam_width,0), dtype = np.int32)
    beam_scores =  np.empty((inference_batch_size, beam_width,0), dtype = np.float32)                                                                            
    for j in range(maximum_iterations):
        beam_search_outputs, next_state, next_inputs, finished = decoder_instance.step(j,inputs,state)
        inputs = next_inputs
        state = next_state
        outputs = np.expand_dims(beam_search_outputs.predicted_ids,axis = -1)
        scores = np.expand_dims(beam_search_outputs.scores,axis = -1)
        predictions = np.append(predictions, outputs, axis = -1)
        beam_scores = np.append(beam_scores, scores, axis = -1)
    print(predictions.shape) 
    print(beam_scores.shape)