<a href="https://colab.research.google.com/github/AslanDevbrat/Seq2Seq/blob/dev/seq2seq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# TensorFlow Addons Networks : Sequence-to-Sequence NMT with Attention Mechanism

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/addons/blob/master/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/addons/blob/master/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
      <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/addons/docs/tutorials/networks_seq2seq_nmt.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

## Overview
This notebook gives a brief introduction into the ***Sequence to Sequence Model Architecture***
In this noteboook you broadly cover four essential topics necessary for Neural Machine Translation:


* **Data cleaning**
* **Data preparation**
* **Neural Translation Model with Attention**
* **Final Translation with ```tf.addons.seq2seq.BasicDecoder``` and ```tf.addons.seq2seq.BeamSearchDecoder```** 

The basic idea behind such a model though, is only the encoder-decoder architecture. These networks are usually used for a variety of tasks like text-summerization, Machine translation, Image Captioning, etc. This tutorial provideas a hands-on understanding of the concept, explaining the technical jargons wherever necessary. You focus on the task of Neural Machine Translation (NMT) which was the very first testbed for seq2seq models.


## Setup

In [1]:
!pip install tensorflow-addons==0.11.2

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import tensorflow as tf
import tensorflow_addons as tfa
from IPython.display import HTML as html_print
from IPython.display import display
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
import unicodedata
import re
import numpy as np
import os
import io
import time
from tensorflow.keras.layers import Embedding, SimpleRNNCell, GRUCell, Dense, LSTMCell

 The versions of TensorFlow you are currently using is 2.8.2 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons


## Data Cleaning and Data Preparation 

You'll use a language dataset provided by http://www.manythings.org/anki/. This dataset contains language translation pairs in the format:

---
      May I borrow this book?    ¿Puedo tomar prestado este libro?
---


There are a variety of languages available, but you'll use the English-Spanish dataset. After downloading the dataset, here are the steps you'll take to prepare the data:


1. Add a start and end token to each sentence.
2. Clean the sentences by removing special characters.
3. Create a Vocabulary with word index (mapping from word → id) and reverse word index (mapping from id → word).
5. Pad each sentence to a maximum length. (Why? you need to fix the maximum length for the inputs to recurrent encoders)

In [9]:
def download_nmt():
    path_to_zip = tf.keras.utils.get_file(
    'dakshina_dataset_v1.0.tar', origin='https://storage.googleapis.com/gresearch/dakshina/dakshina_dataset_v1.0.tar',
    extract=True, untar = True)

    path_to_file = os.path.dirname(path_to_zip)+"/spa-eng/spa.txt"
    return path_to_file
download_nmt()

Downloading data from https://storage.googleapis.com/gresearch/dakshina/dakshina_dataset_v1.0.tar


'/root/.keras/datasets/spa-eng/spa.txt'

### Define a NMTDataset class with necessary functions to follow Step 1 to Step 4. 
The ```call()``` will return:
1. ```train_dataset```  and ```val_dataset``` : ```tf.data.Dataset``` objects
2. ```inp_lang_tokenizer``` and ```targ_lang_tokenizer``` : ```tf.keras.preprocessing.text.Tokenizer``` objects 

In [3]:
!wget  https://storage.googleapis.com/gresearch/dakshina/dakshina_dataset_v1.0.tar
!tar -xf 'dakshina_dataset_v1.0.tar'
train_file_path = "/content/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.train.tsv"
val_file_path= "/content/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.test.tsv"
test_file_path  = "/content/dakshina_dataset_v1.0/hi/lexicons/hi.translit.sampled.dev.tsv"

--2022-06-24 19:26:03--  https://storage.googleapis.com/gresearch/dakshina/dakshina_dataset_v1.0.tar
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.99.128, 173.194.203.128, 108.177.98.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.99.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2008340480 (1.9G) [application/x-tar]
Saving to: ‘dakshina_dataset_v1.0.tar.1’


2022-06-24 19:26:19 (126 MB/s) - ‘dakshina_dataset_v1.0.tar.1’ saved [2008340480/2008340480]



In [4]:
class NMTDataset:
    def __init__(self, problem_type='en-spa'):
        self.problem_type = 'en-spa'
        self.inp_lang_tokenizer = None
        self.targ_lang_tokenizer = None
        self.num_of_train = 0
        self.num_of_test = 0
        self.num_of_val = 0
    

    def unicode_to_ascii(self, s):
        return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')

    ## Step 1 and Step 2 
    def preprocess_sentence(self, w):
        # w = self.unicode_to_ascii(w.lower().strip())

        # # creating a space between a word and the punctuation following it
        # # eg: "he is a boy." => "he is a boy ."
        # # Reference:- https://stackoverflow.com/questions/3645931/python-padding-punctuation-with-white-spaces-keeping-punctuation
        # w = re.sub(r"([?.!,¿])", r" \1 ", w)
        # w = re.sub(r'[" "]+', " ", w)

        # # replacing everything with space except (a-z, A-Z, ".", "?", "!", ",")
        # w = re.sub(r"[^a-zA-Z?.!,¿]+", " ", w)

        # w = w.strip()

        # adding a start and an end token to the sentence
        # so that the model know when to start and stop predicting.
        w = '\t' + w + '\n'
        return w
    
    def create_dataset(self, path, num_examples, data_name):
        # path : path to spa-eng.txt file
        # num_examples : Limit the total number of training example for faster training (set num_examples = len(lines) to use full data)
        lines = io.open(path, encoding='UTF-8').read().split('\n')
        #print(lines)
        if data_name == "train":
          self.num_of_train = len(lines) -1
        elif data_name == "val":
          self.num_of_val = len(lines) -1
        else:
          self.num_of_val = len(lines) -1
        word_pairs = [[self.preprocess_sentence(w) for w in l.split('\t')]  for l in lines[:500]]
        #print(word_pairs)

        
        return zip(*word_pairs)

    # Step 3 and Step 4
    def tokenize(self, lang):
        # lang = list of sentences in a language
        
        # print(len(lang), "example sentence: {}".format(lang[0]))
        lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='', char_level = True)
        lang_tokenizer.fit_on_texts(lang)

        ## tf.keras.preprocessing.text.Tokenizer.texts_to_sequences converts string (w1, w2, w3, ......, wn) 
        ## to a list of correspoding integer ids of words (id_w1, id_w2, id_w3, ...., id_wn)
        tensor = lang_tokenizer.texts_to_sequences(lang, ) 

        ## tf.keras.preprocessing.sequence.pad_sequences takes argument a list of integer id sequences 
        ## and pads the sequences to match the longest sequences in the given input
        tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor, padding='post')

        return tensor, lang_tokenizer

    def load_dataset(self, path, num_examples=None, data_name = None):
        # creating cleaned input, output pairs
        targ_lang, inp_lang ,_= self.create_dataset(path, num_examples, data_name)
        #print(targ_lang, inp_lang)
        input_tensor, inp_lang_tokenizer = self.tokenize(inp_lang)
        #print(input_tensor, inp_lang_tokenizer.word_index)
        target_tensor, targ_lang_tokenizer = self.tokenize(targ_lang)
        #print(target_tensor, targ_lang_tokenizer.word_index)
        return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer

    def call(self, num_examples, BUFFER_SIZE, BATCH_SIZE):
        file_path = train_file_path
        input_tensor_train, target_tensor_train, self.inp_lang_tokenizer, self.targ_lang_tokenizer = self.load_dataset(train_file_path, num_examples, "train" )
        train_dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train))
        train_dataset = train_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
        print("val")
        file_path = val_file_path
        input_tensor_val, target_tensor_val, inp_lang_tokenizer, targ_lang_tokenizer = self.load_dataset(val_file_path, num_examples, "val")
        val_dataset = tf.data.Dataset.from_tensor_slices((input_tensor_val, target_tensor_val))
        val_dataset = val_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
        print("test")
        file_path = test_file_path
        input_tensor_test, target_tensor_test, inp_lang_tokenizer, targ_lang_tokenizer = self.load_dataset(test_file_path, num_examples, "test")
        test_dataset = tf.data.Dataset.from_tensor_slices((input_tensor_test, target_tensor_test))
        test_dataset = test_dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
        # val_dataset = tf.data.Dataset.from_tensor_slices((input_tensor_val, target_tensor_val))
        # val_dataset = val_dataset.batch(BATCH_SIZE, drop_remainder=True)

        return train_dataset, val_dataset, test_dataset, self.inp_lang_tokenizer, self.targ_lang_tokenizer

In [5]:
BUFFER_SIZE = 32000
BATCH_SIZE = 64
# Let's limit the #training examples for faster training
num_examples = 500

dataset_creator = NMTDataset('en-hi')
train_dataset, val_dataset, test_dataset, inp_lang, targ_lang = dataset_creator.call(num_examples, BUFFER_SIZE, BATCH_SIZE)

val
test


In [6]:
example_input_batch, example_target_batch = next(iter(train_dataset))
example_input_batch.shape, example_target_batch.shape

(TensorShape([64, 20]), TensorShape([64, 17]))

### Some important parameters

In [7]:
vocab_inp_size = len(inp_lang.word_index)+1
vocab_tar_size = len(targ_lang.word_index)+1
max_length_input = example_input_batch.shape[1]
max_length_output = example_target_batch.shape[1]

embedding_dim = 256
units = 1024
steps_per_epoch = dataset_creator.num_of_train//BATCH_SIZE


In [8]:
print("max_length_english, max_length_spanish, vocab_size_english, vocab_size_spanish")
max_length_input, max_length_output, vocab_inp_size, vocab_tar_size

max_length_english, max_length_spanish, vocab_size_english, vocab_size_spanish


(20, 17, 26, 48)

In [9]:
##### 

class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz, num_of_layers, enc_unit_type, dropout, recurrent_dropout):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.num_of_layers = num_of_layers
    self.enc_unit_type = enc_unit_type
    self.dropout = dropout
    self.recurrent_dropout = recurrent_dropout
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    ##-------- LSTM layer in Encoder ------- ##
    self.encoder_layer = self.get_encoder_layer(self.enc_units,
                                                self.num_of_layers, self.enc_unit_type)
    
  def get_encoder_layer(self, enc_units, num_of_layers, enc_unit_type):
    return tf.keras.layers.RNN(tf.keras.layers.StackedRNNCells( [self.get_cell(enc_unit_type, 
                                                                                 enc_units) for i in range(num_of_layers)],),
                                  return_sequences=True, return_state=True, name = "Encoder")
  def get_cell(self, cell_type = "lstm", num_of_cell = 1, name = None):
      #print(cell_type)
      if cell_type == "lstm":
        return LSTMCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout, )
      elif cell_type == "rnn":
        return SimpleRNNCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout)
      elif cell_type =="gru":
        return GRUCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout)
      else:
        print(f"Invalid cell type: {cell_type}")

  def call(self, x, hidden):
    x = self.embedding(x)
    output = self.encoder_layer(x, initial_state = hidden)
    return output[0], output[1:]

  def initialize_hidden_state(self):
    if self.enc_unit_type == 'rnn' or self.enc_unit_type == "gru":
        return [tf.zeros((self.batch_sz, self.enc_units))]*self.num_of_layers
    else:
        return [[tf.zeros((self.batch_sz, self.enc_units)),tf.zeros((self.batch_sz, self.enc_units))]]*self.num_of_layers

In [10]:
## Test Encoder Stack

encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE, 1, "lstm", 0.2,0.2)


# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_state= encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print(len(sample_state))
# print ('Encoder h vecotr shape: (batch size, units) {}'.format(sample_state[0].shape))
# print ('Encoder c vector shape: (batch size, units) {}'.format(sample_state[1].shape))

Encoder output shape: (batch size, sequence length, units) (64, 20, 1024)
1


In [11]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz, num_of_layers, dec_unit_type, dropout, recurrent_dropout, attention_type='luong',):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.attention_type = attention_type
    self.num_of_layers = num_of_layers
    self.dec_unit_type = dec_unit_type
    self.dropout = dropout
    self.recurrent_dropout = recurrent_dropout
    # Embedding Layer
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    
    #Final Dense layer on which softmax will be applied
    self.fc = tf.keras.layers.Dense(vocab_size)

    # Define the fundamental cell for decoder recurrent structure
    self.decoder_rnn_cell =  self.get_stacked_rnn_cell()
   


    # Sampler
    self.sampler = tfa.seq2seq.sampler.TrainingSampler()

    # Create attention mechanism with memory = None
    self.attention_mechanism = self.build_attention_mechanism(self.dec_units, 
                                                              None, self.batch_sz*[max_length_input], self.attention_type)

    # Wrap attention mechanism with the fundamental rnn cell of decoder
    self.rnn_cell = self.build_rnn_cell(batch_sz)

    # Define the decoder with respect to fundamental rnn cell
    self.decoder = tfa.seq2seq.BasicDecoder(self.rnn_cell, sampler=self.sampler, output_layer=self.fc)

    
  def build_rnn_cell(self, batch_sz):
    rnn_cell = tfa.seq2seq.AttentionWrapper(self.decoder_rnn_cell, 
                                  self.attention_mechanism, attention_layer_size=self.dec_units)
    return rnn_cell

  def build_attention_mechanism(self, dec_units, memory, memory_sequence_length, attention_type='luong'):
    # ------------- #
    # typ: Which sort of attention (Bahdanau, Luong)
    # dec_units: final dimension of attention outputs 
    # memory: encoder hidden states of shape (batch_size, max_length_input, enc_units)
    # memory_sequence_length: 1d array of shape (batch_size) with every element set to max_length_input (for masking purpose)

    if(attention_type=='bahdanau'):
      return tfa.seq2seq.BahdanauAttention(units=dec_units, memory=memory, memory_sequence_length=memory_sequence_length)
    else:
      return tfa.seq2seq.LuongAttention(units=dec_units, memory=memory, memory_sequence_length=memory_sequence_length)

  def build_initial_state(self, batch_sz, encoder_state, Dtype):
    decoder_initial_state = self.rnn_cell.get_initial_state(batch_size=batch_sz, dtype=Dtype)
    decoder_initial_state = decoder_initial_state.clone(cell_state=encoder_state)
    return decoder_initial_state


  def call(self, inputs, initial_state):
    x = self.embedding(inputs)
    outputs, _, _ = self.decoder(x, initial_state=initial_state, sequence_length=self.batch_sz*[max_length_output-1])
    return outputs
  def get_cell(self, cell_type = "lstm", num_of_cell = 1, name = None):
      #print(cell_type)
      if cell_type == "lstm":
        return LSTMCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout, )
      elif cell_type == "rnn":
        return SimpleRNNCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout)
      elif cell_type =="gru":
        return GRUCell(num_of_cell, dropout = self.dropout, recurrent_dropout = self.recurrent_dropout)
      else:
        print(f"Invalid cell type: {cell_type}")

  def get_stacked_rnn_cell(self,):
    return tf.keras.layers.StackedRNNCells( [self.get_cell(self.dec_unit_type, self.dec_units,) for i in range(self.num_of_layers)])


In [12]:
# Test decoder stack

decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE,1, "lstm", 0.2,0.2,  'luong')
sample_x = tf.random.uniform((BATCH_SIZE, max_length_output))
decoder.attention_mechanism.setup_memory(sample_output)
initial_state = decoder.build_initial_state(BATCH_SIZE,tuple(sample_state), tf.float32)


sample_decoder_outputs = decoder(sample_x, initial_state)

print("Decoder Outputs Shape: ", sample_decoder_outputs.rnn_output.shape)


Decoder Outputs Shape:  (64, 16, 48)


## Define the optimizer and the loss function

In [13]:
optimizer = tf.keras.optimizers.Adam()


def loss_function(real, pred):
  # real shape = (BATCH_SIZE, max_length_output)
  # pred shape = (BATCH_SIZE, max_length_output, tar_vocab_size )
  cross_entropy = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
  loss = cross_entropy(y_true=real, y_pred=pred)
  mask = tf.logical_not(tf.math.equal(real,0))   #output 0 for y=0 else output 1
  mask = tf.cast(mask, dtype=loss.dtype)  
  loss = mask* loss
  loss = tf.reduce_mean(loss)
  return loss  

## Checkpoints (Object-based saving)

In [14]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

## One train_step operations

In [15]:
@tf.function
def train_step(inp, targ, enc_hidden):
  loss = 0

  with tf.GradientTape() as tape:
    enc_output, enc_state= encoder(inp, enc_hidden)


    dec_input = targ[ : , :-1 ] # Ignore <end> token
    real = targ[ : , 1: ]         # ignore <start> token

    # Set the AttentionMechanism object with encoder_outputs
    decoder.attention_mechanism.setup_memory(enc_output)

    # Create AttentionWrapperState as initial_state for decoder
    decoder_initial_state = decoder.build_initial_state(BATCH_SIZE, tuple(enc_state) ,tf.float32)
    pred = decoder(dec_input, decoder_initial_state)
    logits = pred.rnn_output
    loss = loss_function(real, logits)
    metric.update_state(real, logits)

  variables = encoder.trainable_variables + decoder.trainable_variables
  gradients = tape.gradient(loss, variables)
  optimizer.apply_gradients(zip(gradients, variables))

  return loss, metric.result().numpy()

## Train the model

In [19]:
EPOCHS = 10
metric = tf.keras.metrics.SparseCategoricalAccuracy()
tf.config.run_functions_eagerly(True)
for epoch in range(EPOCHS):
  start = time.time()
  
  enc_hidden = encoder.initialize_hidden_state()
  total_loss = 0
  total_accuracy = 0
  # print(enc_hidden[0].shape, enc_hidden[1].shape)
  metric.reset_state()
  for (batch, (inp, targ)) in enumerate(train_dataset.take(steps_per_epoch)):
    batch_loss, batch_acc= train_step(inp, targ, enc_hidden)
    total_loss += batch_loss
    total_accuracy+=batch_acc

    if batch % 100 == 0:
      print('Epoch {} Batch {} Loss {:.4f} Acc {:.4f}'.format(epoch + 1,
                                                   batch,
                                                   batch_loss.numpy(), batch_acc*100 ))
  # saving (checkpoint) the model every 2 epochs
  if (epoch + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)

  print('Epoch {} Loss {:.4f} Acc {:.4f}'.format(epoch + 1,
                                      total_loss / steps_per_epoch,
                                      (total_accuracy/ steps_per_epoch)*100
                                      ))
  print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 1.1310 Acc 20.8008
Epoch 1 Loss 0.0110 Acc 0.2051
Time taken for 1 epoch 60.44183397293091 sec

Epoch 2 Batch 0 Loss 1.0517 Acc 20.8984


KeyboardInterrupt: ignored

## Use tf-addons BasicDecoder for decoding


In [45]:
def evaluate_sentence(sentence):
  print("from evaluate",sentence)
  sentence = dataset_creator.preprocess_sentence(sentence)

  inputs = [inp_lang.word_index[i] for i in sentence]
  inputs = [inputs for _ in range(64)]
  inputs = tf.keras.preprocessing.sequence.pad_sequences(inputs,
                                                          maxlen=max_length_input,
                                                          padding='post')
  inputs = tf.convert_to_tensor(inputs)
  #print(inputs)
  inference_batch_size = 64
  result = ''

  enc_start_state =  [[tf.zeros((inference_batch_size, units)),tf.zeros((inference_batch_size, units))]]*1
  enc_out, enc_state  = encoder(inputs, enc_start_state)

  # dec_h = enc_h
  # dec_c = enc_c

  start_tokens = tf.fill([inference_batch_size], targ_lang.word_index['\t'])
  end_token = targ_lang.word_index['\n']

  greedy_sampler = tfa.seq2seq.GreedyEmbeddingSampler(decoder.embedding)

  # Instantiate BasicDecoder object
  decoder_instance = tfa.seq2seq.BasicDecoder(cell=decoder.rnn_cell, sampler=greedy_sampler, output_layer=decoder.fc, maximum_iterations=25)
  # Setup Memory in decoder stack
  decoder.attention_mechanism.setup_memory(enc_out)

  # set decoder_initial_state
  decoder_initial_state = decoder.build_initial_state(inference_batch_size,tuple(enc_state), tf.float32)


  ### Since the BasicDecoder wraps around Decoder's rnn cell only, you have to ensure that the inputs to BasicDecoder 
  ### decoding step is output of embedding layer. tfa.seq2seq.GreedyEmbeddingSampler() takes care of this. 
  ### You only need to get the weights of embedding layer, which can be done by decoder.embedding.variables[0] and pass this callabble to BasicDecoder's call() function

  decoder_embedding_matrix = decoder.embedding.variables
  
  outputs, _, _ = decoder_instance(None, start_tokens = start_tokens, end_token= end_token, initial_state=decoder_initial_state)
  return outputs.sample_id.numpy(), outputs

def translate(sentence):
  result,_= evaluate_sentence(sentence)
  print("-"*80)
  print(result[1])
  result = "".join("".join(targ_lang.sequences_to_texts(result[:1])).split(" "))
  print('Input: %s' % (sentence))
  print('Predicted translation: {}'.format(result))
translate('aaditya')

from evaluate aaditya
--------------------------------------------------------------------------------
[ 2  5 10  6  4  4  3]
Input: aaditya
Predicted translation: अंगराा



## Restore the latest checkpoint and test

In [57]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f4f49e5e950>

InvalidArgumentError: ignored

In [None]:
translate(u'esta es mi vida.')

[[ 20   9  22 190   4   3]]
Input: esta es mi vida.
Predicted translation: ['this is my life . <end>']


In [None]:
translate(u'¿todavia estan en casa?')

[[25  7 90  8  3]]
Input: ¿todavia estan en casa?
Predicted translation: ['are you home ? <end>']


In [None]:
# wrong translation
translate(u'trata de averiguarlo.')

[[126  16 892  11  75   4   3]]
Input: trata de averiguarlo.
Predicted translation: ['try to figure it out . <end>']


## Use tf-addons BeamSearchDecoder 


In [144]:
def beam_evaluate_sentence(sentence, beam_width=3):
  sentence = dataset_creator.preprocess_sentence(sentence)

  inputs = [inp_lang.word_index[i] for i in sentence]
  inputs = [inputs for _ in range(64)]
  inputs = tf.keras.preprocessing.sequence.pad_sequences(inputs,
                                                          maxlen=max_length_input,
                                                          padding='post')
  inputs = tf.convert_to_tensor(inputs)
  print(inputs)
  inference_batch_size = inputs.shape[0]
  result = ''

  enc_start_state = [[tf.zeros((inference_batch_size, units)),tf.zeros((inference_batch_size, units))]]*1
  enc_out, enc_state = encoder(inputs, enc_start_state)

  start_tokens = tf.fill([inference_batch_size], targ_lang.word_index['\t'])
  end_token = targ_lang.word_index['\n']

  # From official documentation
  # NOTE If you are using the BeamSearchDecoder with a cell wrapped in AttentionWrapper, then you must ensure that:
  # The encoder output has been tiled to beam_width via tfa.seq2seq.tile_batch (NOT tf.tile).
  # The batch_size argument passed to the get_initial_state method of this wrapper is equal to true_batch_size * beam_width.
  # The initial state created with get_initial_state above contains a cell_state value containing properly tiled final state from the encoder.

  enc_out = tfa.seq2seq.tile_batch(enc_out, multiplier=beam_width)
  decoder.attention_mechanism.setup_memory(enc_out)
  print("beam_with * [batch_size, max_length_input, rnn_units] :  3 * [1, 16, 1024]] :", enc_out.shape)

  # set decoder_inital_state which is an AttentionWrapperState considering beam_width
  hidden_state = tfa.seq2seq.tile_batch(tuple(enc_state), multiplier=beam_width)
  decoder_initial_state = decoder.rnn_cell.get_initial_state(batch_size=beam_width*inference_batch_size, dtype=tf.float32)
  decoder_initial_state = decoder_initial_state.clone(cell_state=hidden_state)

  # Instantiate BeamSearchDecoder
  decoder_instance = tfa.seq2seq.BeamSearchDecoder(decoder.rnn_cell,beam_width=beam_width, output_layer=decoder.fc, embedding_fn = decoder.embedding)
  decoder_embedding_matrix = decoder.embedding.variables[:]

  # The BeamSearchDecoder object's call() function takes care of everything.
  outputs, final_state, sequence_lengths = decoder_instance(None, start_tokens=start_tokens, end_token=end_token, initial_state=decoder_initial_state)
  # outputs is tfa.seq2seq.FinalBeamSearchDecoderOutput object. 
  # The final beam predictions are stored in outputs.predicted_id
  # outputs.beam_search_decoder_output is a tfa.seq2seq.BeamSearchDecoderOutput object which keep tracks of beam_scores and parent_ids while performing a beam decoding step
  # final_state = tfa.seq2seq.BeamSearchDecoderState object.
  # Sequence Length = [inference_batch_size, beam_width] details the maximum length of the beams that are generated

  
  # outputs.predicted_id.shape = (inference_batch_size, time_step_outputs, beam_width)
  # outputs.beam_search_decoder_output.scores.shape = (inference_batch_size, time_step_outputs, beam_width)
  # Convert the shape of outputs and beam_scores to (inference_batch_size, beam_width, time_step_outputs)
  final_outputs = tf.transpose(outputs.predicted_ids, perm=(0,2,1))
  beam_scores = tf.transpose(outputs.beam_search_decoder_output.scores, perm=(0,2,1))
  
  return final_outputs.numpy(), beam_scores.numpy()
def beam_translate(sentence):
  result, beam_scores = beam_evaluate_sentence(sentence)
  print(result.shape, beam_scores.shape)
  for beam, score in zip(result, beam_scores):
    print(beam.shape, score.shape)
    output = targ_lang.sequences_to_texts(beam)
    output = [a[:a.index('\n')] for a in output]
    beam_score = [a.sum() for a in score]
    print('Input: %s' % (sentence))
    for i in range(len(output)):
      print('{} Predicted translation: {}  {}'.format(i+1, output[i], beam_score[i]))
beam_translate("aande")

tf.Tensor(
[[2 1 1 ... 0 0 0]
 [2 1 1 ... 0 0 0]
 [2 1 1 ... 0 0 0]
 ...
 [2 1 1 ... 0 0 0]
 [2 1 1 ... 0 0 0]
 [2 1 1 ... 0 0 0]], shape=(64, 20), dtype=int32)
beam_with * [batch_size, max_length_input, rnn_units] :  3 * [1, 16, 1024]] : (192, 20, 1024)


InvalidArgumentError: ignored

In [141]:
decoder.fc.get_config()

{'activation': 'linear',
 'activity_regularizer': None,
 'bias_constraint': None,
 'bias_initializer': {'class_name': 'Zeros', 'config': {}},
 'bias_regularizer': None,
 'dtype': 'float32',
 'kernel_constraint': None,
 'kernel_initializer': {'class_name': 'GlorotUniform',
  'config': {'seed': None}},
 'kernel_regularizer': None,
 'name': 'dense_6',
 'trainable': True,
 'units': 48,
 'use_bias': True}

In [157]:
for (_, (inp, targ) )  in enumerate(train_dataset.take(64)):

  enc_start_state = [[tf.zeros((64, units)),tf.zeros((64, units))]]*1

  enc_output, enc_state= encoder(inp , enc_start_state)


  dec_input = targ[ : , :-1 ] # Ignore <end> token
  real = targ[ : , 1: ]         # ignore <start> token

      # Set the AttentionMechanism object with encoder_outputs
  decoder.attention_mechanism.setup_memory(enc_output)

  # Create AttentionWrapperState as initial_state for decoder
  decoder_initial_state = decoder.build_initial_state(64, tuple(enc_state) ,tf.float32)
  pred = decoder(dec_input, decoder_initial_state)


In [59]:
# get html element
def cstr(s, color='black'):
	if s == ' ':
    
		return "<text style=color:#000;padding-left:10px;background-color:{}> </text>".format(color, s)
	else:

		return "<text style=color:#000;background-color:{}>{} </text>".format(color, s)
	
# print html
def print_color(t):
	display(html_print(''.join([cstr(ti, color=ci) for ti,ci in t])))

# get appropriate color for value
def get_clr(value):
	colors = ['#85c2e1', '#89c4e2', '#95cae5', '#99cce6', '#a1d0e8'
		'#b2d9ec', '#baddee', '#c2e1f0', '#eff7fb', '#f9e8e8',
		'#f9e8e8', '#f9d4d4', '#f9bdbd', '#f8a8a8', '#f68f8f',
		'#f47676', '#f45f5f', '#f34343', '#f33b3b', '#f42e2e']
	value = int((value * 18) )
	#print("color value",value)
	return colors[value]

# sigmoid function
def sigmoid(x):
	z = 1/(1 + np.exp(-x)) 
	return z


In [62]:
def visualize(output_values, result_list, cell_no, predicted_char):
    #print( result_list)
    #print("\nPredicted Char : ", predicted_char)
    print(f"Importance of {predicted_char}")
    text_colours = []
    for i in range(len(result_list)):
      #print(i, cell_no)
      #print(result_list[i])
      #print(output_values[i])
      #print(output_values[i][cell_no])
      #print(output_values[i][cell_no])
      #print(output_values[i][cell_no])
      #print(output_values[i][cell_no]*19)
      text = (result_list[i], get_clr(output_values[i][cell_no]))
      text_colours.append(text)
    print_color(text_colours)

In [63]:
visualize([[0.1,0.9,0.9]],['a'],2,'q')

Importance of q


In [71]:
tx = 0
def translate(sentence):
  print(sentence)
  result, output = evaluate_sentence(sentence)
  print("-"*80)
  print(result[0])
  word_list ="".join(targ_lang.sequences_to_texts(result[:1])).split(" ")
  print('Input: %s' % (sentence))
  print('Predicted translation: {}'.format(word_list))
  #print(output.rnn_output)
  print("word_list", word_list)
  print("result ", result[0])
  output_values = []
  for time_step in output.rnn_output[0]:
    step = []
    for char_index in list(result)[0]:
      #print(char_index)
      step.append(sigmoid(time_step[char_index]))
    output_values.append(step)
  output_values = np.array(output_values)
  #print(output_values.shape)
  output_values = output_values.transpose()
  scaler = MinMaxScaler()
  scaler.fit(output_values)
  output_values =scaler.transform(output_values)
  #print(output_values.shape)
  #print(word_list)
  for i,char in enumerate(word_list[:-1]):
    visualize(output_values[:i+1], word_list[:i+1], i,char )
  return output.rnn_output

tx =translate('bbjkal')

bbjkal
from evaluate bbjkal
--------------------------------------------------------------------------------
[ 2 11  7 26  4  3]
Input: bbjkal
Predicted translation: ['अ', 'ज', '्', 'ब', 'ा', '\n']
word_list ['अ', 'ज', '्', 'ब', 'ा', '\n']
result  [ 2 11  7 26  4  3]
Importance of अ


Importance of ज


Importance of ्


Importance of ब


Importance of ा


In [118]:
targ_lang.word_index

{'\t': 1,
 '\n': 3,
 'ं': 5,
 'ः': 46,
 'अ': 2,
 'ई': 39,
 'उ': 45,
 'क': 9,
 'ख': 37,
 'ग': 10,
 'घ': 44,
 'च': 31,
 'छ': 34,
 'ज': 11,
 'ञ': 38,
 'ट': 23,
 'ठ': 40,
 'ड': 30,
 'ण': 35,
 'त': 8,
 'द': 18,
 'ध': 28,
 'न': 14,
 'प': 32,
 'ब': 26,
 'भ': 43,
 'म': 21,
 'य': 13,
 'र': 6,
 'ल': 17,
 'व': 19,
 'श': 22,
 'ष': 24,
 'स': 27,
 'ह': 36,
 '़': 33,
 'ा': 4,
 'ि': 15,
 'ी': 12,
 'ु': 29,
 'ू': 25,
 'ृ': 41,
 'े': 16,
 'ै': 42,
 'ॉ': 47,
 'ो': 20,
 '्': 7}

In [None]:
from sklearn.preprocessing import MinMaxScaler

def get_connectivity(word):
  print("Input word : ", word)
  inputs = [inp_lang.word_index[i] for i in word]
  inputs = [inputs for _ in range(64)]
  inputs = tf.keras.preprocessing.sequence.pad_sequences(inputs,
                                                          maxlen=max_length_input,
                                                          padding='post')
  inputs = tf.convert_to_tensor(inputs)
  print(inputs)
  #print(index_list)

  enc_start_state = [[tf.zeros((64, units)),tf.zeros((64, units))]]*1

  enc_output, enc_state= encoder(inp , enc_start_state)


  dec_input = targ[ : , :-1 ] # Ignore <end> token
  real = targ[ : , 1: ]         # ignore <start> token

      # Set the AttentionMechanism object with encoder_outputs
  decoder.attention_mechanism.setup_memory(enc_output)

  # Create AttentionWrapperState as initial_state for decoder
  decoder_initial_state = decoder.build_initial_state(64, tuple(enc_state) ,tf.float32)
  pred = decoder(dec_input, decoder_initial_state)
  
  output = s2s.call(enc_inp, dec_input)
  temp_list = []
  #for i in range(len(index_list)):
  input_char_list = list(word)
  first_prediction = output[0].rnn_output[0]
  pred_char_index = (argmax(output[0].rnn_output[0], axis =1))
  #print("pred_char_index",pred_char_index)
  scaler = MinMaxScaler()
  for i,  pred_char in enumerate(index_list):
    
    output_values = []  
    for time_step in first_prediction:
        #print(time_step.shape)
        
        prob = []
        for index in pred_char_index:
          #print(index)
          prob.append(time_step[index].numpy())
        #print(prob)
        output_values.append(prob)
    scaler.fit(output_values)
    output_values  = scaler.transform(output_values)
    #print(np.array(output_values).shape)
    #print(len(input_char_list))
    #print("pred_char_index", pred_char_index)
    out_char_list = list(idx_to_word(pred_char_index))

    temp_list.append(idx_to_word(pred_char_index))

    visualize(output_values, input_char_list[:i],i, out_char_list[i])
  pred_word = "".join(out_char_list)
  print(f"\nTransliterate word of {word[:-1]} is {pred_word[:i]}")
get_connectivity("ande")

In [None]:
beam_translate(u'¿todavia estan en casa?')

beam_with * [batch_size, max_length_input, rnn_units] :  3 * [1, 16, 1024]] : (3, 16, 1024)
(1, 3, 7) (1, 3, 7)
(3, 7) (3, 7)
Input: ¿todavia estan en casa?
1 Predicted translation: are you still home ?   -4.036754131317139
2 Predicted translation: are you still at home ?   -15.306867599487305
3 Predicted translation: are you still go home ?   -20.533388137817383
