# **Introduction**

The goal of this notebook is to convert an **English** sentence to it's **Italian** counterpart. I will implement this task using **Attention**.

We could have also used a simple seq-2-seq **encoder-decoder** model , but this model suffers from the infamous **bottleneck problem**, ie, all the information is encoded in to one fixed-length vector.

The attention model solves this by allowing the network to **refer back to the input sequence**, instead of forcing it to encode all information into one fixed-length vector.


# **Model Architecture**

In this notebook , I will be using the **Additive Attention** model by Dzmitry Bahdanau. The model aimed to improve the sequence-to-sequence model in machine translation by **aligning the decoder with the relevant input sentences and implementing Attention.**

---



In [0]:
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
import subprocess
print(subprocess.getoutput('nvidia-smi'))

Sun May 31 06:17:54 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    28W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

Nice! I got **P100 GPU with 12GB memory**. Thanks Google!

In [0]:
#Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import time
import io
import os
import pickle
import warnings
import operator
import gzip
import nltk
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from nltk.stem import WordNetLemmatizer,PorterStemmer
from tqdm import tqdm
import plotly.graph_objects as go
%matplotlib inline

warnings.filterwarnings('ignore')

  import pandas.util.testing as tm


In [0]:
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


True

In [0]:
#The dataset
!wget http://www.manythings.org/anki/ita-eng.zip

--2020-05-30 06:25:21--  http://www.manythings.org/anki/ita-eng.zip
Resolving www.manythings.org (www.manythings.org)... 172.67.173.198, 104.24.108.196, 104.24.109.196, ...
Connecting to www.manythings.org (www.manythings.org)|172.67.173.198|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7345811 (7.0M) [application/zip]
Saving to: ‘ita-eng.zip’


2020-05-30 06:25:22 (10.7 MB/s) - ‘ita-eng.zip’ saved [7345811/7345811]



In [0]:
!unzip ita-eng*.zip

Archive:  ita-eng.zip
  inflating: ita.txt                 
  inflating: _about.txt              


In [0]:
!ls
!pwd

_about.txt  ita-eng.zip  ita.txt  sample_data
/content


In [0]:
language = pd.read_table('ita.txt',names=['English','Italian','Attribution'])
language.head()
#data_path = 'ita.txt'

Unnamed: 0,English,Italian,Attribution
0,Hi.,Ciao!,CC-BY 2.0 (France) Attribution: tatoeba.org #5...
1,Run!,Corri!,CC-BY 2.0 (France) Attribution: tatoeba.org #9...
2,Run!,Corra!,CC-BY 2.0 (France) Attribution: tatoeba.org #9...
3,Run!,Correte!,CC-BY 2.0 (France) Attribution: tatoeba.org #9...
4,Who?,Chi?,CC-BY 2.0 (France) Attribution: tatoeba.org #2...


In [0]:
language.drop('Attribution',axis=1,inplace=True)
language.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 336614 entries, 0 to 336613
Data columns (total 2 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   English  336614 non-null  object
 1   Italian  336614 non-null  object
dtypes: object(2)
memory usage: 5.1+ MB


In [0]:
print('The number of training examples are : {}'.format(language.shape[0]))

The number of training examples are : 336614


The number of training examples is **336614**.Training on the complete dataset  will take a long time. To train faster, we can **limit the size of the dataset to ~80,000 sentences**. We should keep in mind the  **translation quality degrades with less data.**

In [0]:
value_count = language['English'].map(lambda x : len(x.split())).value_counts()
fig = go.FigureWidget(data=[go.Bar(x=value_count.index, y=value_count,
                                 marker={'color': value_count,
                                               'colorscale':'algae'})]) 

fig.update_layout(title_text="Value Counts of Number of words",xaxis_title = 'Number of words', yaxis_title = 'Value Count',font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
    ),title_font_size=30)

fig.show()

It is clear from the above **bar-plot** that the distribution of the data is **right-skewed.** 

In [0]:
language = language[language['English'].map(lambda x: len(x.split()))<18]
language.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 336200 entries, 0 to 336425
Data columns (total 2 columns):
 #   Column   Non-Null Count   Dtype 
---  ------   --------------   ----- 
 0   English  336200 non-null  object
 1   Italian  336200 non-null  object
dtypes: object(2)
memory usage: 7.7+ MB


In [0]:
#Shuffling and decreasing the dataset
language = shuffle(language)
language = language[:70000]
language.head()

Unnamed: 0,English,Italian
166870,Tom is a freelance writer.,Tom è uno scrittore freelance.
149708,Stand back from the rope.,Stai lontana dalla corda.
281191,He has a good head on his shoulders.,Ha una buona testa sulle spalle.
58956,I need to call Tom.,Io devo chiamare Tom.
78693,Tom started working.,Tom iniziò a lavorare.


In [0]:
#Saving the dataset
with open('language_training_data.pickle','wb') as handle:
  pickle.dump(language,handle,protocol = pickle.HIGHEST_PROTOCOL)

In [0]:
#Loading the dataset
with open('language_training_data.pickle','rb') as handle:
  language_nmt = pickle.load(handle)

In [0]:
language_nmt.head()

Unnamed: 0,English,Italian
166870,Tom is a freelance writer.,Tom è uno scrittore freelance.
149708,Stand back from the rope.,Stai lontana dalla corda.
281191,He has a good head on his shoulders.,Ha una buona testa sulle spalle.
58956,I need to call Tom.,Io devo chiamare Tom.
78693,Tom started working.,Tom iniziò a lavorare.


In [0]:
language_nmt.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 70000 entries, 166870 to 179364
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   English  70000 non-null  object
 1   Italian  70000 non-null  object
dtypes: object(2)
memory usage: 1.6+ MB


# **Data Cleaning**

In [0]:
def preprocess(text):
  text = text.lower()
  text = text.strip()
  text = re.sub('\,','',text)
  text = re.sub(r'[!?$@#*]','',text)
  text = re.sub('\.',' ',text)
  text = '<start> ' + text + ' <end>'
  return text

def clean_eng_words(text):
  text = re.sub("aren't", "are not",text)
  text = re.sub("can't","cannot",text)
  text = re.sub("don't","do not",text)
  text = re.sub("couldn't","could not",text)
  text = re.sub("doesn't","does not",text)
  text = re.sub("hadn't","had not",text)
  text = re.sub("where's","where is",text)
  text = re.sub("wouldn't","would not",text)
  text = re.sub("he'll","he will",text)
  text = re.sub("what've","what have",text)
  text = re.sub("who'd","who would",text)
  text = re.sub("haven't","have not",text)
  text = re.sub("who'll","who will",text)
  text = re.sub("i'll","i will",text)
  text = re.sub("i've","i have",text)
  text = re.sub("i'm","i am",text)
  text = re.sub("we're","we are",text)
  text = re.sub("he's","he is",text)
  text = re.sub("i'd","i would",text)
  text = re.sub("you'd","you would",text)
  text = re.sub("you'll","you will",text)
  text = re.sub("you're","you are",text)
  text = re.sub("you've","you have",text)
  text = re.sub("wasn't","was not",text)
  text = re.sub("that's","that is",text)
  text = re.sub("isn't","is not",text)
  text = re.sub("didn't","did not",text)
  text = re.sub("they've","they have",text)
  text = re.sub("they're","they are",text)
  text = re.sub("they'll","they will",text)
  text = re.sub("what's","what is",text)
  text = re.sub("what're","what are",text)
  text = re.sub("what'll","what will",text)
  text = re.sub("there's","there is",text)
  text = re.sub("it's","it is",text)
  text = re.sub("it'll","it will",text)
  text = re.sub("could've","could have",text)
  text = re.sub("it'll","it will",text)
  text = re.sub("shouldn't","should not",text)
  text = re.sub("should've","should have",text)
  text = re.sub("shan't","shall not",text)
  text = re.sub("won't","will not",text)
  text = re.sub("we'd","we would",text)
  text = re.sub("let's","let us",text)
  text = re.sub("that'll","that will",text)
  text = re.sub("weren't","were not",text)
  return text

language_nmt['English'] = language_nmt['English'].apply(lambda x: preprocess(x))
language_nmt['Italian'] = language_nmt['Italian'].apply(lambda x: preprocess(x))
language_nmt['English'] = language_nmt['English'].apply(lambda x: clean_eng_words(x))
language_nmt.head(10)

In [0]:
#Getting the embedding matrix
ps = PorterStemmer() 
lemmatizer = WordNetLemmatizer()

def get_embedding(filename,word_index,vocab_len,dim):
  embedding_index={}
  if filename.split('.')[-1] == 'vec':
    f = open(filename,encoding='utf-8')
    for line in f:
      values = line.split()
      word = values[0]
      coeff = np.array(values[1:],dtype='float32')
      embedding_index[word] = coeff
    f.close()
  elif filename.split('.')[-1] == 'gz':
    f = gzip.open(filename)
    for line in f:
      values = line.split()
      word = values[0].decode(encoding='utf-8')
      coeff = np.asarray(values[1:],dtype='float32')
      embedding_index[word] = coeff
    f.close()
  embedding_matrix = np.zeros((vocab_len+1,dim))
  for word,index in tqdm(word_index.items()):
    if index>vocab_len:
      continue
    embedding_vector = embedding_index.get(word)
    if word == '<oov>':
      embedding_vector = np.random.randn(300,)*0.02
    if embedding_vector is None:
      embedding_vector = embedding_index.get(lemmatizer.lemmatize(word))
    elif embedding_vector is None:
      embedding_vector = embedding_index.get(ps.stem(word))
    elif embedding_vector is not None:
      embedding_matrix[index] = embedding_vector
    
  return embedding_matrix,embedding_index

In [0]:
#Tokenizing and padding the input and output corpus
def tokenize(corpus):
  lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='',oov_token='<oov>')
  lang_tokenizer.fit_on_texts(corpus)
  tensor = lang_tokenizer.texts_to_sequences(corpus)
  tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,padding='post')

  return tensor,lang_tokenizer

input_tensor , input_tokenizer = tokenize(np.asarray(language_nmt['English']))
output_tensor , output_tokenizer = tokenize(np.asarray(language_nmt['Italian']))
vocab_inp_size = len(input_tokenizer.word_index)+1
vocab_tar_size = len(output_tokenizer.word_index)+1
max_length_targ, max_length_inp = output_tensor.shape[1], input_tensor.shape[1]

In [0]:
print('The maximum length of the English corpus is : {}'.format(max_length_inp))
print('The maximum length of the Italian corpus is : {}'.format(max_length_targ))

The maximum length of the English corpus is : 21
The maximum length of the Italian corpus is : 23


In [0]:
# Importing the pre-trained embeddings for english. I will be using the 300 dimensional fast text embeddings
!wget https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip

--2020-05-31 06:20:15--  https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 2606:4700:10::6816:4b8e, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 681808098 (650M) [application/zip]
Saving to: ‘wiki-news-300d-1M.vec.zip’

wiki-news-300d-1M.v  10%[=>                  ]  65.72M  18.2MB/s    in 3.6s    

2020-05-31 06:20:19 (18.2 MB/s) - Read error at byte 68911547/681808098 (Connection reset by peer). Retrying.

--2020-05-31 06:20:20--  (try: 2)  https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 681808098 (650M), 612896551 (585M) remaining [application/zip]
Saving to: ‘w

In [0]:
!unzip wiki-news-300d-1M.vec*.zip

Archive:  wiki-news-300d-1M.vec.zip
  inflating: wiki-news-300d-1M.vec   


In [0]:
!ls
!pwd

language_training_data.pickle  wiki-news-300d-1M.vec
sample_data		       wiki-news-300d-1M.vec.zip
/content


In [0]:
embedding_matrix_eng,embedding_index_eng = get_embedding('wiki-news-300d-1M.vec',input_tokenizer.word_index,vocab_inp_size,300)

100%|██████████| 9127/9127 [00:01<00:00, 4678.86it/s]


In [0]:
#Importing the pre-trained embeddings for italian
!wget https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.it.300.vec.gz

--2020-05-31 06:22:37--  https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.it.300.vec.gz
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 2606:4700:10::6816:4b8e, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1272825284 (1.2G) [binary/octet-stream]
Saving to: ‘cc.it.300.vec.gz’


2020-05-31 06:23:33 (22.0 MB/s) - ‘cc.it.300.vec.gz’ saved [1272825284/1272825284]



In [0]:
embedding_matrix_ita,embedding_index_ita = get_embedding('cc.it.300.vec.gz',output_tokenizer.word_index,vocab_tar_size,300)

100%|██████████| 16370/16370 [00:00<00:00, 302136.21it/s]


In [0]:
#Checking the coverage of english and italian words by the embeddings
def check_coverage(vocab, embeddings_index):

  known_words = {}
  unknown_words = {}
  nb_known_words = 0
  nb_unknown_words = 0
  for word in vocab.keys():
    try:
        known_words[word] = embeddings_index[word]
        nb_known_words += vocab[word]
    except:
        unknown_words[word] = vocab[word]
        nb_unknown_words += vocab[word]
        pass
  print('Found embeddings for {:.3%} of vocab'.format(len(known_words) / len(vocab)))
  print('Found embeddings for  {:.3%} of all text'.format(nb_known_words / (nb_known_words + nb_unknown_words)))
  unknown_words = sorted(unknown_words.items(), key=operator.itemgetter(1))[::-1]

  return unknown_words

In [0]:
eng_oov = check_coverage(input_tokenizer.word_index,embedding_index_eng)

Found embeddings for 96.494% of vocab
Found embeddings for  95.550% of all text


In [0]:
eng_oov[:15]

[("when's", 9103),
 ('"king', 9092),
 ("miller's", 9090),
 ('10:00', 9027),
 ('"tom', 9014),
 ("lidya's", 9011),
 ("water's", 9007),
 ('naka-meguro', 9004),
 ("plan's", 8996),
 ('30-passenger', 8986),
 ("kunio's", 8937),
 ("else's", 8925),
 ('guzmán', 8893),
 ('"neither', 8863),
 ("horse's", 8827)]

Nice! We found **96%** embeddings for the English vocab.

In [0]:
oov_ita = check_coverage(output_tokenizer.word_index,embedding_index_ita)

Found embeddings for 92.883% of vocab
Found embeddings for  91.428% of all text


In the case of the target language, we found **93%** embeddings in the vocabulary, which is pretty good.


In [0]:
#Creating training and validation splits
input_tensor_train,input_tensor_val,output_tensor_train,output_tensor_val = train_test_split(input_tensor,output_tensor,test_size=0.1)

print('Length of Training set : {}'.format(len(input_tensor_train)))
print('Length of Validation set : {}'.format(len(input_tensor_val)))

Length of Training set : 63000
Length of Validation set : 7000


We will use **tf.data.Dataset.from_tensor_slices()** method to get slices of the array in the form of an object, since the dataset is big and we want to create the dataset in memory to be efficient

In [0]:
BATCH_SIZE = 512
units = 1024
embedding_dim = 300
BUFFER_SIZE = len(input_tensor_train)
steps_per_epoch = len(input_tensor_train)//BATCH_SIZE

dataset = tf.data.Dataset.from_tensor_slices((input_tensor_train,output_tensor_train)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE,drop_remainder = True)

# **Implementing the Encoder and Decoder model with Attention**

In [0]:
class Encoder(tf.keras.Model):

  def __init__(self,batches,vocab_size,embedding_dim,enc_units,embedding_matrix):
    super(Encoder,self).__init__()
    self.batches = batches
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size+1,embedding_dim,weights=[embedding_matrix],trainable=True)
    self.gru = tf.keras.layers.GRU(self.enc_units,return_sequences=True,return_state=True,recurrent_initializer='glorot_uniform')
    self.dropout = tf.keras.layers.Dropout(0.5)

  def call(self,x,hidden):
    x = self.embedding(x)
    output,state = self.gru(x,initial_state=hidden)
    output = self.dropout(output)
    return output,state

  def initialize_hidden_states(self):
    return tf.zeros((self.batches,self.enc_units))
    #init_state = [np.zeros((self.batches, self.enc_units)) for i in range(2)]
    #return init_state

class GlobalBahdanauAttention(tf.keras.layers.Layer):

  def __init__(self,units):
    super(GlobalBahdanauAttention,self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self,hidden_state,encoder_output):
    #We need to expand the dimension of hidden_state from (batch_size,hidden_size) to (batch_size,1,hidden_size)
    hidden_state_with_time_axis = tf.expand_dims(hidden_state,axis=1)
    score = self.V(tf.nn.tanh(self.W1(hidden_state_with_time_axis) + self.W2(encoder_output)))
    attention_weights = tf.nn.softmax(score,axis=1)
    #Softmax by default is applied on the last axis but here we want to apply it on the 1st axis, 
    # since the shape of score is (batch_size, max_length, hidden_size).
    context_vector = attention_weights * encoder_output
    context_vector = tf.reduce_sum(context_vector,axis=1)
    return context_vector 


class Decoder(tf.keras.Model):

  def __init__(self,batches,vocab_size,embedding_dim,dec_units,embedding_matrix):
    super(Decoder,self).__init__()
    self.batches = batches
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size+1,embedding_dim,weights=[embedding_matrix],trainable=True)
    self.gru = tf.keras.layers.GRU(self.dec_units,return_sequences=True,return_state=True,recurrent_initializer='glorot_uniform')
    self.dense = tf.keras.layers.Dense(vocab_size)
    self.attention = GlobalBahdanauAttention(self.dec_units)

  def call(self,x,hidden,enc_output):
    context_vector  = self.attention(hidden,enc_output)
    x = self.embedding(x)
    x = tf.concat([tf.expand_dims(context_vector,axis=1),x],axis=-1)
    output,state = self.gru(x)
    output = tf.reshape(output, (-1, output.shape[2]))
    x = self.dense(output)
    return x,state

In [0]:
encoder = Encoder(BATCH_SIZE,vocab_inp_size,embedding_dim,units,embedding_matrix_eng)
decoder = Decoder(BATCH_SIZE,vocab_tar_size,embedding_dim,units,embedding_matrix_ita)

In [0]:
#Defining the optimizer and loss functions
optimizer = tf.keras.optimizers.Adam()
#optimizer = tf.keras.optimizers.RMSprop()
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,reduction='none')

def loss_fn(real,pred):
  mask = tf.math.logical_not(tf.math.equal(real,0))
  loss_ = loss(real,pred)
  mask = tf.cast(mask,dtype = loss_.dtype)
  loss_*= mask
  return tf.reduce_mean(loss_)

In [0]:
checkpoint_dir = 'checkpoints_training'
checkpoint_prefix = os.path.join(checkpoint_dir, "training_nmt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

# **Training**

In [0]:
@tf.function
def training(inputs,target,enc_hidden):
  loss=0
  #Creating a custom training with gradient tape
  with tf.GradientTape() as tape:

    encoder_output,encoder_hidden = encoder(inputs,enc_hidden)
    decoder_hidden = encoder_hidden
    dec_input = tf.expand_dims([output_tokenizer.word_index['<start>']] * BATCH_SIZE, 1)
    for t in range(1,target.shape[1]):
      pred,decoder_hidden = decoder(dec_input,decoder_hidden,encoder_output)
      loss += loss_fn(target[:, t], pred)
      #Using teacher forcing during training
      dec_input = tf.expand_dims(target[:, t], 1)
  batch_loss = (loss / int(target.shape[1]))
  trainable_var = encoder.trainable_variables + decoder.trainable_variables
  gradients = tape.gradient(loss,trainable_var)
  optimizer.apply_gradients(zip(gradients,trainable_var))
  return batch_loss

In [0]:
EPOCHS = 30

for epochs in range(EPOCHS):
  start = time.time()
  enc_hidden = encoder.initialize_hidden_states()
  total_loss = 0

  for (batch,(inp,targ)) in enumerate(dataset.take(steps_per_epoch)):
    batch_loss = training(inp,targ,enc_hidden)
    total_loss+= batch_loss
    if batch % 100 == 0:
      print('Epoch {} Batch {} Loss {:.4f}'.format(epochs + 1,
                                                   batch,
                                                   batch_loss.numpy()))
  if (epochs + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)
  print('Epoch {} Loss {:.4f}'.format(epochs + 1,
                                      total_loss / steps_per_epoch))
  print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 2.6615
Epoch 1 Batch 100 Loss 1.5855
Epoch 1 Loss 1.6797
Time taken for 1 epoch 102.98138952255249 sec

Epoch 2 Batch 0 Loss 1.4861
Epoch 2 Batch 100 Loss 1.1891
Epoch 2 Loss 1.3061
Time taken for 1 epoch 77.84406971931458 sec

Epoch 3 Batch 0 Loss 1.1668
Epoch 3 Batch 100 Loss 0.9754
Epoch 3 Loss 1.0461
Time taken for 1 epoch 76.56199908256531 sec

Epoch 4 Batch 0 Loss 0.9290
Epoch 4 Batch 100 Loss 0.7826
Epoch 4 Loss 0.8411
Time taken for 1 epoch 77.97594285011292 sec

Epoch 5 Batch 0 Loss 0.6980
Epoch 5 Batch 100 Loss 0.6026
Epoch 5 Loss 0.6493
Time taken for 1 epoch 76.4459331035614 sec

Epoch 6 Batch 0 Loss 0.5515
Epoch 6 Batch 100 Loss 0.4706
Epoch 6 Loss 0.4896
Time taken for 1 epoch 77.75262546539307 sec

Epoch 7 Batch 0 Loss 0.3960
Epoch 7 Batch 100 Loss 0.3685
Epoch 7 Loss 0.3731
Time taken for 1 epoch 76.44320464134216 sec

Epoch 8 Batch 0 Loss 0.2780
Epoch 8 Batch 100 Loss 0.2981
Epoch 8 Loss 0.2874
Time taken for 1 epoch 77.97632813453674 sec

Epoch 9 

# **Inference**

In [0]:
def evaluate(sentence):

  sentence = preprocess(sentence)
  sentence = clean_eng_words(sentence)
  #inputs = [input_tokenizer.word_index[i] for i in sentence.split(' ')]
  inputs = input_tokenizer.texts_to_sequences([sentence])
  inputs = tf.keras.preprocessing.sequence.pad_sequences(inputs, maxlen = max_length_inp,padding='post')                                                                                                             
  inputs = tf.convert_to_tensor(inputs)
  result = ''

  hidden = [tf.zeros((1, units))]
  enc_out, enc_hidden = encoder(inputs, hidden)

  dec_hidden = enc_hidden
  dec_input = tf.expand_dims([output_tokenizer.word_index['<start>']], 0)

  for t in range(max_length_targ):

    predictions, dec_hidden = decoder(dec_input,dec_hidden,enc_out)                                                                                          
    predicted_id = tf.argmax(predictions[0]).numpy()
    result += output_tokenizer.index_word[predicted_id] + ' '
    if output_tokenizer.index_word[predicted_id] == '<end>':
      return result, sentence
    # the predicted ID is fed back into the model
    dec_input = tf.expand_dims([predicted_id], 0)

  return result, sentence

In [0]:
!pip install googletrans

Collecting googletrans
  Downloading https://files.pythonhosted.org/packages/fd/f0/a22d41d3846d1f46a4f20086141e0428ccc9c6d644aacbfd30990cf46886/googletrans-2.4.0.tar.gz
Building wheels for collected packages: googletrans
  Building wheel for googletrans (setup.py) ... [?25l[?25hdone
  Created wheel for googletrans: filename=googletrans-2.4.0-cp36-none-any.whl size=15777 sha256=50fc794dbd190bbc7e5062a805a5f0df82a4126d592d57c885e02e7af647de34
  Stored in directory: /root/.cache/pip/wheels/50/d6/e7/a8efd5f2427d5eb258070048718fa56ee5ac57fd6f53505f95
Successfully built googletrans
Installing collected packages: googletrans
Successfully installed googletrans-2.4.0


In [0]:
from googletrans import Translator
translator = Translator()

def translate(sentence):
  result, sentence= evaluate(sentence)

  print('Input                           : %s' % (sentence[7:-5]))
  print('\n')
  print('Predicted translation           : {}'.format(result[:-6]))
  print('\n')
  print('Predicted translation in English: {}'.format(translator.translate(result[:-6]).text))

In [0]:
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fe130307a58>

**Let's test the model on samples it has never seen before.**

In [0]:
translate('i am very happy today')

Input                           :  i am very happy today 


Predicted translation           : io sono molto felice oggi 


Predicted translation in English: I am very happy today


In [0]:
translate("i don't know what i will do with my life")

Input                           :  i do not know what i will do with my life 


Predicted translation           : io non so cosa voglio con la mia vita 


Predicted translation in English: I do not know what I want with my life


In [0]:
translate('i am not going to do anything today')

Input                           :  i am not going to do anything today 


Predicted translation           : io non faccio niente oggi 


Predicted translation in English: I do not do anything today


In [0]:
translate('the moon is very big today')

Input                           :  the moon is very big today 


Predicted translation           : la luna è molto grande oggi 


Predicted translation in English: the moon is very large today


In [0]:
translate('i hate you')

Input                           :  i hate you 


Predicted translation           : ti odio 


Predicted translation in English: I hate you


In [0]:
translate('i really want a dog')

Input                           :  i really want a dog 


Predicted translation           : io voglio davvero un cane 


Predicted translation in English: I really want a dog


In [0]:
translate('i am not feeling well because i ate a lot of food yesterday')

Input                           :  i am not feeling well because i ate a lot of food yesterday 


Predicted translation           : non mi sto sentendo bene perché ho mangiato così cibo ieri 


Predicted translation in English: I'm not feeling well because I ate so food yesterday


In [0]:
translate('she likes to sing but not dance')

Input                           :  she likes to sing but not dance 


Predicted translation           : a lei piace cantare però non danza 


Predicted translation in English: she likes to sing but not dance


### Wow! Even though the model was trained with just **70000 samples and only trained for 30 epochs, the model is giving very good results on samples which it has not seen!.** This shows the power of Attention! The Attention mechanism has revolutionised the way we create NLP models and is currently a standard fixture in most state-of-the-art NLP models. This is because it enables the model to “remember” all the words in the input and focus on specific words when formulating a response.


---





###  Even though the model gave good results on samples which it has   never seen  before, some were not perfect. The challenge of training an effective model can be attributed largely to the lack of training data and training time. Due to the complex nature of the different languages involved and a large number of vocabulary and grammatical permutations, an effective model will require tons of data and training time before any perfect results can be seen on evaluation data.

