**Importing Dependencies**

In [1]:
import tensorflow as tf

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from sklearn.model_selection import train_test_split

import unicodedata
import re
import numpy as np
import os
import io
import time

**Download the data from tensorflow**

In [2]:
!wget http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip

--2022-06-27 04:35:09--  http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.120.128, 142.251.161.128, 74.125.126.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.120.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2638744 (2.5M) [application/zip]
Saving to: ‘spa-eng.zip’


2022-06-27 04:35:09 (62.4 MB/s) - ‘spa-eng.zip’ saved [2638744/2638744]



**The "ls" command lets us see the content of our current working directory.**


In [3]:
ls

[0m[01;34msample_data[0m/  spa-eng.zip


In [4]:
!unzip spa-eng.zip

Archive:  spa-eng.zip
   creating: spa-eng/
  inflating: spa-eng/_about.txt      
  inflating: spa-eng/spa.txt         


In [5]:
ls

[0m[01;34msample_data[0m/  [01;34mspa-eng[0m/  spa-eng.zip


In [6]:
cd spa-eng

/content/spa-eng


In [7]:
ls

_about.txt  spa.txt


**This contains data in Source Language (ENGLISH) and its corresponding translation in Target Language (SPANISH).**

In [8]:
!cat spa.txt

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Will you please help me?	¿Me ayudarías, por favor?
Will you turn on the TV?	¿Puedes prender el televisor?
Winners don't use drugs.	Los ganadores no consumen drogas.
Wires carry electricity.	Los cables transmiten electricidad.
Won't you have some tea?	¿Qué tal un té?
Workers lost their jobs.	Los trabajadores perdieron sus empleos.
Working alone is no fun.	Trabajar solo no es divertido.
Would you draw me a map?	¿Me trazarías un mapa?
Would you like a banana?	¿Te gustaría un plátano?
Would you like a banana?	¿Te gustaría un banano?
Would you like a banana?	¿Te gustaría una banana?
Would you like any more?	¿Quieres más?
Would you like any more?	¿Quieres un poco más?
Would you like to dance?	¿Quieres bailar?
Would you like to dance?	¿Quiere bailar?
Would you like to dance?	¿Queréis bailar?
Would you like to order?	¿Desea ordenar?
Would you stop babbling?	¿Dejarás de chismorrear?
Would you wait a second?	¿Podrías esperar un seg

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



e sí, cada familia infeliz lo es a su propio modo.
American politics are interesting to watch, especially during a presidential election.	La política americana es interesante de ver, sobre todo durante elecciones presidenciales.
Democracy is the worst form of government, except all the others that have been tried.	La democracia es la peor forma de gobierno, con excepción de todas las otras que se ha probado.
He said that he could smell something burning and that the telephones weren't working.	Él dijo que pudo oler algo quemándose y que los teléfonos no funcionaban.
I don't need to sound like a native speaker, I just want to be able to speak fluently.	No necesito sonar como un hablante nativo, sólo quiero ser capaz de hablar con fluidez.
I don't understand and I'm not used to not understanding. Please explain it once more.	No entiendo, y no estoy acostumbrado a no entender. Por favor explíquelo una vez más.
I heard that Tom had been smuggling drugs into America for years before he got 

In [9]:
cd ..

/content


**User defined function to preprocess the data.**

**It takes a sentence and returns the lowercased string from the given string, by using the pythonn's in-built ".lower()" function.**

**Then it uses ".strip()" in-built function, which removes any leading (spaces at the beginning) and trailing (spaces at the end) characters (space is the default leading character to remove).**

In [10]:
# Converts the unicode file to ascii
def preprocess_sentence(w):
  w = w.lower().strip()
  
  # adding a start and an end token to the sentence
  # so that the model know when to start and stop predicting.
  w = '<start> ' + w + ' <end>'
  return w

**When a sentence is passed through the above user defined function, it will returns a lowercased & stripped sentence with start and stop at the leading & trailing ends of the sentence.**

In [11]:
en_sentence = "I go home right after work.	"
sp_sentence = "Voy a casa inmediatamente después del trabajo."
print(preprocess_sentence(en_sentence))
print(preprocess_sentence(sp_sentence))

<start> i go home right after work. <end>
<start> voy a casa inmediatamente después del trabajo. <end>


**Function which is used to create dataset which takes input arguments as Path & Number of examples to be considered(num_examples).**

**Steps involved:**

**Removing the accents**

**Clean the Sentences**

**Creating word pairs**

**Word pairs are created using List Comprehensions of pre-processed sentences for all the lines in the Number of Examples.**

**This function returns zip(word_pairs), which returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.**

In [15]:
# 1. Remove the accents
# 2. Clean the sentences
# 3. Return word pairs in the format: [ENGLISH, SPANISH]
def create_dataset(path, num_examples):
  lines = io.open(path, encoding='UTF-8').read().strip().split('\n')

  word_pairs = [[preprocess_sentence(w) for w in l.split('\t')]  for l in lines[:num_examples]]

  return zip(*word_pairs)

In [16]:
path_to_file = 'spa-eng/spa.txt'
en, sp = create_dataset(path_to_file, 3000)
print(en[-1])
print(sp[-1])

<start> i'm very hot. <end>
<start> estoy muy cachondo. <end>


**Function to tokenize the data, which takes language as input argument.**

**Tokenize the data using tf.keras.preprocessing.text.Tokenizer(filters='').**

**A Tensor is created, by using lang_tokenizer.texts_to_sequences(lang), which Transforms each text in texts to a sequence of integers, and pad_sequences is applied on top of that, which is used to ensure that all sequences in a list have the same length. By default this is done by padding 0 in the beginning of each sequence until each sequence has the same length as the longest sequence.**

In [19]:
def tokenize(lang):
  lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
  lang_tokenizer.fit_on_texts(lang)

  tensor = lang_tokenizer.texts_to_sequences(lang)

  tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor, padding='post')

  return tensor, lang_tokenizer

**Function to load dataset which takes path & number of examples as input and returns out:**

**input_tensor**

**target_tensor**

**inp_lang_tokenizer**

**target_lang_tokenizer**

**by tokenizing both input language & target language.**

In [20]:
def load_dataset(path, num_examples=None):
  # creating cleaned input, output pairs
  targ_lang, inp_lang = create_dataset(path, num_examples)

  input_tensor, inp_lang_tokenizer = tokenize(inp_lang)
  target_tensor, targ_lang_tokenizer = tokenize(targ_lang)

  return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer

**Calculating Maximum length of input tensor and target tensor**

In [21]:
num_examples = 3000
input_tensor, target_tensor, inp_lang, targ_lang = load_dataset(path_to_file, num_examples)

# Calculate max_length of the target tensors
max_length_targ, max_length_inp = target_tensor.shape[1], input_tensor.shape[1]

**Maximum length of target tensor is 6, whereas maximum length of input tensor is 9.**

In [23]:
target_tensor.shape[1], input_tensor.shape[1]

(6, 9)

**Splitting the data into Trainig & Validation sets with a ratio of 80:20 respectively**

**Finding out shape of the splitted dataset.**

In [24]:
# Creating training and validation sets using an 80-20 split
input_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = train_test_split(input_tensor, target_tensor, test_size=0.2)

print(len(input_tensor_train), len(target_tensor_train), len(input_tensor_val), len(target_tensor_val))

2400 2400 600 600


**Function which maps Index to the Corresponding word in both Source & Target Languages, which takes language and its tensor representation as input arguments and returns their Index to word mapping.**

In [25]:
def convert(lang, tensor):
  for t in tensor:
    if t!=0:
      print ("%d ----> %s" % (t, lang.index_word[t]))

In [26]:
print ("Input Language; index to word mapping")
convert(inp_lang, input_tensor_train[1])
print ()
print ("Target Language; index to word mapping")
convert(targ_lang, target_tensor_train[1])

Input Language; index to word mapping
1 ----> <start>
13 ----> él
317 ----> miente.
2 ----> <end>

Target Language; index to word mapping
1 ----> <start>
41 ----> he's
183 ----> lying.
2 ----> <end>


**Initialising Buffer size, Batch size, Steps per Epoch, Embedding Dimension, Input vocabulary size(vocab_inp_size), Target vocabulay size(vocab_tar_size).**

**Creating Dataset with the help of tf.data.Dataset.from_tensor_slices(), which is  an iterable over the elements of the dataset, with their tensors converted to numpy arrays.**

In [27]:
BUFFER_SIZE = len(input_tensor_train)
BATCH_SIZE = 64
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
embedding_dim = 256
units = 1024
vocab_inp_size = len(inp_lang.word_index)+1  
vocab_tar_size = len(targ_lang.word_index)+1

dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

**Calculating steps per Epoch as a ratio of length of input training tensor to Batch size.**

In [31]:
len(input_tensor_train)
BATCH_SIZE = 64
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE
steps_per_epoch

37

**Calculating shape of example input batch and target batch, it comes out to be (64,9) and (64,6).**

**since, the maximum length of input & target are 9 and 6, the shape will be always 9 and 6 respectively and batch size as 64 as initialized earlier.**

In [32]:
example_input_batch, example_target_batch = next(iter(dataset))
example_input_batch.shape, example_target_batch.shape

(TensorShape([64, 9]), TensorShape([64, 6]))

**The encoder-decoder model is a way of using recurrent neural networks for sequence-to-sequence prediction problems.**

**It was initially developed for machine translation problems, although it has proven successful at related sequence-to-sequence prediction problems such as text summarization and question answering.**

**The approach involves two recurrent neural networks, one to encode the input sequence, called the "Encoder", and a second to decode the encoded input sequence into the target sequence called the "Decoder".**

**The type of RNN we use in our case is GRU(Gated Reccurent Unit).**

*** 

**We define three functions in our Class Encoder:**

**first being, constructor where we initialize all variables which are useful in building the Class, like Vocabulary size(vocab_size), Batch size, Encoding units, Embedding Dimension.**

**Second on being Call function, which creates embeddings for the given input and returns output and state.**

**At last,we create zeros matrix of size Batch size & Encoding units.**





In [33]:
class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.enc_units, return_sequences=True, return_state=True)

  def call(self, x, hidden):
    x = self.embedding(x)
    output, state = self.gru(x, initial_state = hidden)
    return output, state

  def initialize_hidden_state(self):
    return tf.zeros((self.batch_sz, self.enc_units))

**Creating an object "encoder" from the class "Encoder", which takes all the inputs we initialized in the constructor.**

**Creating sample output & hidden state and printing their shape.**



In [35]:
encoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)

# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))

Encoder output shape: (batch size, sequence length, units) (64, 9, 1024)
Encoder Hidden state shape: (batch size, units) (64, 1024)


**Creating a class "BahadanauAttention".**

**Bahdanau Attention is also known as Additive attention as it performs a linear combination of encoder states and the decoder states.**

****

**we define constructor, to initialize variables, in this case we create dense layers for Bahadanau Attention mechanism.**


**then we define a function to calculate the allignment vector,Now, we have to calculate the Alignment scores. It is calculated between the previous decoder hidden state and each of the encoder’s hidden states. The alignment scores for each encoder hidden state are combined and represented in a single vector and then softmax-ed. The alignment vector is a vector that has the same length as the source sequence. Each of its values is the score (or the probability) of the corresponding word within the source sequence. Alignment vectors put weights on the encoder’s output. With those weights, the Decoder decides what to focus on at each time step.**

**The encoder hidden states and their respective alignment scores (attention weights in the above equation) are multiplied to form the context vector. The context vector is used to compute the final output of the decoder.**



In [36]:
class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # we are doing this to broadcast addition along the time axis to calculate the score
    query_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # we get 1 at the last axis because we are applying score to self.V
    # the shape of the tensor before applying self.V is (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(self.W1(query_with_time_axis) + self.W2(values)))
    #score = np.matmul(self.query_with_time_axis, self.W1(values))
    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

**creating an object "atention layer" from the class "BahadanauAttention" which takes 10 layers as inputs & calculates attention_result and attention_weights by passing attention layer through sample hidden and sample output, and finding out shapes of attention result and attention weights.**

In [37]:
attention_layer = BahdanauAttention(10)
attention_result, attention_weights = attention_layer(sample_hidden, sample_output)

print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))

Attention result shape: (batch size, units) (64, 1024)
Attention weights shape: (batch_size, sequence_length, 1) (64, 9, 1)


**Creating a class "Decoder".**

**Decoder Architecture:**

**The context vector that is obtained in the previous step is concatenated with the previous decoder output and fed into the Decoder RNN cell to produce a new hidden state. Then, this The process repeats itself for each time step of the decoder until an ‘<end>’ token is produced or output is past the specified maximum length. The final output for the time step is obtained by passing the new hidden state through a Linear layer, which acts as a classifier to give the probability scores of the next predicted word.**

In [38]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units, return_sequences=True, return_state=True)
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # passing the concatenated vector to the GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

**Creating an object "decoder" from the class "Deccoder", which takes inputs as train vocabulary size, embedding dimension and batch size, and calculates sample decoder output by passing it through sample_hidden & sample sample_output.**

In [39]:
decoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)

sample_decoder_output, _, _ = decoder(tf.random.uniform((BATCH_SIZE, 1)), sample_hidden, sample_output)

print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))

Decoder output shape: (batch_size, vocab size) (64, 1232)


**Define a Optimizer and Loss function to reduce mean loss.**

In [40]:
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

def loss_function(real, pred):
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  loss_ = loss_object(real, pred)

  mask = tf.cast(mask, dtype=loss_.dtype)
  loss_ *= mask

  return tf.reduce_mean(loss_)

**Training the Dataset using Encoder-Decoder Model.**

***

**Now, for training, we will implement the following:**

**Pass the input and initial hidden states through the Encoder which will return Encoder output sequence and Encoder Hidden state. The Encoder Hidden State, Encoder output, and Decoder input are passed to the Decoder. At the first timestep, Decoder takes ‘<start>’ as the input.** 


**Decoder returns the Decoder Hidden State and predicted word as output. We use teacher forcing for training where we pass the actual word to the Decoder at each time step. Then, calculate the gradient descent, apply it to the optimizer and backpropagate.**

In [41]:
@tf.function
def train_step(inp, targ, enc_hidden):
  loss = 0

  with tf.GradientTape() as tape:
    enc_output, enc_hidden = encoder(inp, enc_hidden)

    dec_hidden = enc_hidden

    dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)

    # Teacher forcing - feeding the target as the next input
    for t in range(1, targ.shape[1]):
      # passing enc_output to the decoder
      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)

      loss += loss_function(targ[:, t], predictions)

      # using teacher forcing
      dec_input = tf.expand_dims(targ[:, t], 1)

  batch_loss = (loss / int(targ.shape[1]))

  variables = encoder.trainable_variables + decoder.trainable_variables

  gradients = tape.gradient(loss, variables)

  optimizer.apply_gradients(zip(gradients, variables))

  return batch_loss

**Training with Multiple Epochs.**

***

**Calculating their Loss, training time.**

In [42]:
EPOCHS = 2

for epoch in range(EPOCHS):
  start = time.time()

  enc_hidden = encoder.initialize_hidden_state()
  total_loss = 0

  for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
    batch_loss = train_step(inp, targ, enc_hidden)
    total_loss += batch_loss

    if batch % 100 == 0:
      print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch, batch_loss.numpy()))
  # saving (checkpoint) the model every 2 epochs

  print('Epoch {} Loss {:.4f}'.format(epoch + 1, total_loss / steps_per_epoch))
  print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 4.2442
Epoch 1 Loss 3.0889
Time taken for 1 epoch 65.16304183006287 sec

Epoch 2 Batch 0 Loss 2.4954
Epoch 2 Loss 2.3497
Time taken for 1 epoch 55.8551230430603 sec

