# Traductions 2.0 - Neural Machine Translation

Selon le papier de Google [*Attention is all you need*](https://arxiv.org/abs/1706.03762), vous n'avez besoin que de couches d'Attention pour faire comprendre à un modèle de Deep Learning la complexité d'une phrase. Nous allons essayer d'implémenter ce type de modèle pour notre traducteur. 

## Description du projet 

Pour ce projet, nous allons pouvoir reprendre le preprocessing que nous avions fait précedemment. A la seule précision que celui-ci sera simplifié. 

### Import des données 

Vous aurez le même fichier `.txt` contenant une phrase avec sa traduction séparée par une tabulation (`\t`). Vous devrez donc importer ces données et les lire via `pandas` ou `numpy`. 

Vos données se trouvent sur ce lien : https://go.aws/38ECHUB

### Preprocessing 

Tout l'objectif de votre preprocessing est d'arriver à exprimer votre phrase d'entrée (française) en une séquence d'indices.

i.e :

* je suis malade ---> `[123, 21, 34, 0, 0, 0, 0]`

Ce qui donne une *shape* -> `(batch_size, max_len_of_a_french_sentence)`

Les indices correspondent à un numéro que vous devrez attribuer pour chaque token de mots. 

Les zéros correspondent à ce qu'on appelle des [*padded_sequences*](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) qui permettent le fait que tous les séquences de mots aient la même longueur (obligatoire pour votre algorithme). 

Cette fois, vous n'aurez pas à *one hot encoder* votre variable cible. Vous pourrez simplement créer un vecteur similaire à celui de votre phrase d'entrée. 

i.e : 

* I am sick ---> `[43, 2, 42, 0, 0, 0]`

ATTENTION, vous aurez cependant besoin d'ajouter une étape dans votre preprocessing. Pour chacune des phrases, vous aurez besoin d'ajouter un token `<start>` & `<end>` pour indiquer le début et la fin d'une phrase. Vous pourrez le faire via `Spacy`

Pour aider dans votre tâche, vous pourrez utiliser : 

* `Pandas` ou `Numpy` pour la lecture du fichier text
* `Spacy` pour la Tokenisation 
* `Tensorflow` pour le [padded_sequence](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) 

### Modélisation 

Pour la modélisation, vous aurez besoin de mettre en place des couches d'attention. Vous devrez : 

* Créer une classe `Encoder` qui hérite de `tf.keras.Model`
* Créer une couche d'Attention Bahdanau qui va être une classe qui hérite de `tf.keras.layers.Layer`
* Créer enfin une classe `Decoder` qui hérite de `tf.keras.Model`


Vous devrez créer votre propre fonction de coût ainsi que votre propre boucle d'entrainement. 


### Conseils 

Ne prenez pas l'entièreté du dataset au départ pour vos expérimentations, prenez simplement 5000 voire même 3000 phrases. Cela vous permettra d'itérer plus vite et d'éviter des bugs liés simplement à votre besoin en puissance de calcul. 

Aussi, vous pouvez vous inspirer du tutoriel [Neural Machine Translation with Attention](https://www.tensorflow.org/tutorials/text/nmt_with_attention) de TensorFlow. 

Good Luck !



In [None]:
!pip install --upgrade tensorflow 

Requirement already up-to-date: tensorflow in /usr/local/lib/python3.6/dist-packages (2.2.0)


In [None]:
# Import des librairies nécessaires
import pandas as pd
import numpy as np 
import tensorflow_datasets as tfds
import tensorflow as tf 
tf.__version__

'2.2.0'

## Import des données 

In [None]:
# Fonction de chargement du document txt
def load_doc(url):
  df = pd.read_csv("https://go.aws/38ECHUB", delimiter="\t", header=None)
  return df

In [None]:
# Chargement du document txt
doc = load_doc("https://go.aws/38ECHUB")
doc.head()

Unnamed: 0,0,1
0,Go.,Va !
1,Hi.,Salut !
2,Run!,Cours !
3,Run!,Courez !
4,Wow!,Ça alors !


In [None]:
# Prenons simplement un sample de 5000 phrases pour éviter des lenteurs 
doc = doc.sample(5000)

In [None]:
# Add a <start> and <end> token 
def begin_end_sentence(sentence):
  sentence = "<start> "+ sentence + " <end>"
  return sentence

In [None]:
# Add <start> and <end> token
doc.iloc[:, 0] = doc.iloc[:, 0].apply(lambda x: begin_end_sentence(x))
#doc.iloc[:, 1] = doc.iloc[:, 1].apply(lambda x: begin_end_sentence(x))

In [None]:
# Chargement des langages français et anglais de spacy 
!python -m spacy download fr_core_news_md
!python -m spacy download en_core_web_md

Collecting fr_core_news_md==2.2.5
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/fr_core_news_md-2.2.5/fr_core_news_md-2.2.5.tar.gz (88.6MB)
[K     |████████████████████████████████| 88.6MB 1.2MB/s 
Building wheels for collected packages: fr-core-news-md
  Building wheel for fr-core-news-md (setup.py) ... [?25l[?25hdone
  Created wheel for fr-core-news-md: filename=fr_core_news_md-2.2.5-cp36-none-any.whl size=90338488 sha256=55a44fcb29cc9cc78532545a7cb4a10b92cdb0d22ad741d99bc96883dd15c21e
  Stored in directory: /tmp/pip-ephem-wheel-cache-jbfs0xst/wheels/c6/18/b6/f628642acc7872a53cf81269dd1c394d96da69564ccfac5425
Successfully built fr-core-news-md
Installing collected packages: fr-core-news-md
Successfully installed fr-core-news-md-2.2.5
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('fr_core_news_md')
Collecting en_core_web_md==2.2.5
[?25l  Downloading https://github.com/explosion/spacy-models/releases/

In [None]:
# Import de chacun des langages
import fr_core_news_md
import en_core_web_md
nlp_fr = fr_core_news_md.load()
nlp_en = en_core_web_md.load()

In [None]:
# Add <start> & <end> special case
from spacy.symbols import ORTH

start_case = [{ORTH:"<start>"}]
end_case = [{ORTH: "<end>"}]

#nlp_fr.tokenizer.add_special_case("<start>", start_case)
#nlp_fr.tokenizer.add_special_case("<end>", end_case)

nlp_en.tokenizer.add_special_case("<start>", start_case)
nlp_en.tokenizer.add_special_case("<end>", end_case)

In [None]:
# Chargement du corpus entier de phrases françaises et anglaises
fr_corpus = " ".join(doc.iloc[:, 1].to_list())
en_corpus = " ".join(doc.iloc[:, 0].to_list())

In [None]:
# Chargement des deux corpus dans spacy 
%%time
import time
nlp_fr.max_length = len(fr_corpus)
nlp_en.max_length = len(en_corpus)

fr_doc = nlp_fr(fr_corpus)
en_doc = nlp_en(en_corpus)

CPU times: user 13.2 s, sys: 522 ms, total: 13.7 s
Wall time: 13.8 s


In [None]:
# Tokenisation de chacune des phrases via spacy 
%%time
doc["fr_tokens"] = doc.iloc[:, 1].apply(lambda x: nlp_fr.tokenizer(x))
doc["en_tokens"] = doc.iloc[:, 0].apply(lambda x: nlp_en.tokenizer(x))

CPU times: user 362 ms, sys: 14 ms, total: 376 ms
Wall time: 375 ms


In [None]:
doc.tail()

Unnamed: 0,0,1,fr_tokens,en_tokens
146434,"<start> He works as a teacher, but actually he...","Il travaille comme enseignant, mais en fait c'...","(Il, travaille, comme, enseignant, ,, mais, en...","(<start>, He, works, as, a, teacher, ,, but, a..."
40205,<start> Let's take a breather. <end>,Prenons un moment de repos.,"(Prenons, un, moment, de, repos, .)","(<start>, Let, 's, take, a, breather, ., <end>)"
131048,<start> I didn't even know Tom had a girlfrien...,Je ne savais même pas que Tom avait une petite...,"(Je, ne, savais, même, pas, que, Tom, avait, u...","(<start>, I, did, n't, even, know, Tom, had, a..."
131559,<start> I wanted to talk to you about somethin...,Je voulais m'entretenir avec vous de quelque c...,"(Je, voulais, m', entretenir, avec, vous, de, ...","(<start>, I, wanted, to, talk, to, you, about,..."
43753,<start> He is a lovable person. <end>,C'est une personne adorable.,"(C', est, une, personne, adorable, .)","(<start>, He, is, a, lovable, person, ., <end>)"


In [None]:
# Création d'un set() qui va prendre tous les tokens unique de notre corpus de texte
en_tokens = [token.text for token in en_doc]
en_vocabulary_set= set(en_tokens)
en_vocab_size = len(en_vocabulary_set)
print(en_vocab_size)

3523


In [None]:
# Même chose pour le français 
fr_tokens = [token.text for token in fr_doc]
fr_vocabulary_set= set(fr_tokens)
fr_vocab_size = len(fr_vocabulary_set)
print(fr_vocab_size)

4870


In [None]:
en_tokens[:10]

['<start>',
 'I',
 "'ve",
 'already',
 'written',
 'my',
 'part',
 'of',
 'the',
 'report']

In [None]:
[word for word in en_vocabulary_set][:10]

['winter',
 'related',
 'good',
 'raised',
 'waited',
 'singing',
 'fit',
 'earth',
 'holding',
 'leads']

In [None]:
# Création d'un id pour chacun des tokens
all_en_tokens = {}
for i,en_token in enumerate(en_vocabulary_set):
  all_en_tokens[en_token] = i+1 # On prend à i+1 pour laisser la valeur 0 pour la création des padded_sequences

all_fr_tokens = {}
for i, fr_token in enumerate(fr_vocabulary_set):
  all_fr_tokens[fr_token] = i+1

In [None]:
# Création de fonction qui vont créer un vecteur d'indices pour chacune des séquences de tokens
def en_tokens_to_index(tokens):
  indices = []
  for token in tokens:
    indices.append(all_en_tokens[token.text])
  
  return indices

def fr_tokens_to_index(tokens):
  indices = []
  for token in tokens:
    indices.append(all_fr_tokens[token.text])
  
  return indices

In [None]:
# Transformation des tokens en indices
doc["fr_indices"] = doc["fr_tokens"].apply(lambda x: fr_tokens_to_index(x))
doc["en_indices"] = doc["en_tokens"].apply(lambda x: en_tokens_to_index(x))

In [None]:
doc.tail()

Unnamed: 0,0,1,fr_tokens,en_tokens,fr_indices,en_indices
146434,"<start> He works as a teacher, but actually he...","Il travaille comme enseignant, mais en fait c'...","(Il, travaille, comme, enseignant, ,, mais, en...","(<start>, He, works, as, a, teacher, ,, but, a...","[3616, 3952, 217, 3882, 1552, 1867, 2515, 1713...","[1238, 2259, 585, 3239, 3081, 2810, 1129, 2992..."
40205,<start> Let's take a breather. <end>,Prenons un moment de repos.,"(Prenons, un, moment, de, repos, .)","(<start>, Let, 's, take, a, breather, ., <end>)","[2341, 777, 4670, 3155, 3571, 871]","[1238, 3010, 1501, 1803, 3081, 194, 603, 1259]"
131048,<start> I didn't even know Tom had a girlfrien...,Je ne savais même pas que Tom avait une petite...,"(Je, ne, savais, même, pas, que, Tom, avait, u...","(<start>, I, did, n't, even, know, Tom, had, a...","[4390, 458, 2702, 3158, 1392, 1254, 2826, 3453...","[1238, 1031, 3100, 2339, 1657, 2517, 2087, 144..."
131559,<start> I wanted to talk to you about somethin...,Je voulais m'entretenir avec vous de quelque c...,"(Je, voulais, m', entretenir, avec, vous, de, ...","(<start>, I, wanted, to, talk, to, you, about,...","[4390, 510, 3063, 1273, 2576, 4231, 3155, 2090...","[1238, 1031, 3391, 422, 1937, 422, 881, 1372, ..."
43753,<start> He is a lovable person. <end>,C'est une personne adorable.,"(C', est, une, personne, adorable, .)","(<start>, He, is, a, lovable, person, ., <end>)","[1126, 2313, 4577, 886, 4334, 871]","[1238, 2259, 1015, 3081, 3337, 2895, 603, 1259]"


In [None]:
# Création d'une fonction qui va compter la longueur maximum d'une phrase
def max_len(lines):
  return max(len(line) for line in lines)

In [None]:
# Application de la fonction sur les tokens français et anglais 
fr_max_len = max_len(doc['fr_indices'].to_list())
en_max_len = max_len(doc['en_indices'].to_list())

In [None]:
# Utilisation de Keras pour créer des séquences de tokens de la même longueur
%%time
padded_fr_indices = tf.keras.preprocessing.sequence.pad_sequences(doc["fr_indices"], maxlen=fr_max_len, padding="post")
padded_en_indices = tf.keras.preprocessing.sequence.pad_sequences(doc["en_indices"], maxlen=en_max_len, padding="post")

CPU times: user 53.7 ms, sys: 75 µs, total: 53.8 ms
Wall time: 52.6 ms


In [None]:
padded_en_indices

array([[1238, 1031,  175, ...,    0,    0,    0],
       [1238, 1031, 2959, ...,    0,    0,    0],
       [1238, 1031, 1125, ...,    0,    0,    0],
       ...,
       [1238, 1031, 3100, ...,    0,    0,    0],
       [1238, 1031, 3391, ...,    0,    0,    0],
       [1238, 2259, 1015, ...,    0,    0,    0]], dtype=int32)

In [None]:
# Création de variables que l'on va réutiliser pour nos modèles
BATCH_SIZE = 64
TAKE_SIZE = int(0.7*len(doc)/BATCH_SIZE)
BUFFER_SIZE = TAKE_SIZE * BATCH_SIZE
steps_per_epoch = TAKE_SIZE
embedding_dim = 256
units = 1024
vocab_inp_size = fr_vocab_size
vocab_tar_size = en_vocab_size

In [None]:
# Create a tensorflow dataset complet
tf_ds = tf.data.Dataset.from_tensor_slices((padded_fr_indices, padded_en_indices))

In [None]:
# Shuffle & Batch
tf_ds = tf_ds.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

In [None]:
# Train Test Split
train_data = tf_ds.take(TAKE_SIZE)
test_data = tf_ds.skip(TAKE_SIZE)

In [None]:
input_text, output_text = next(iter(train_data))
print(input_text.numpy().shape)
print(output_text.numpy().shape)

(64, 28)
(64, 28)


In [None]:
# Encoder 
class Encoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
    super(Encoder, self).__init__()
    self.batch_sz = batch_sz
    self.enc_units = enc_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.enc_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')

  def call(self, x, hidden):
    x = self.embedding(x)
    output, state = self.gru(x, initial_state = hidden)
    return output, state

  def initialize_hidden_state(self):
    return tf.zeros((self.batch_sz, self.enc_units))

In [None]:
encoder = Encoder(vocab_inp_size +1, embedding_dim, units, BATCH_SIZE)

# Echantillon d'output
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(input_text, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))

Encoder output shape: (batch size, sequence length, units) (64, 28, 1024)
Encoder Hidden state shape: (batch size, units) (64, 1024)


In [None]:
class BahdanauAttention(tf.keras.layers.Layer):
  def __init__(self, units):
    super(BahdanauAttention, self).__init__()
    self.W1 = tf.keras.layers.Dense(units)
    self.W2 = tf.keras.layers.Dense(units)
    self.V = tf.keras.layers.Dense(1)

  def call(self, query, values):
    # hidden shape == (batch_size, hidden size)
    # hidden_with_time_axis shape == (batch_size, 1, hidden size)
    # Ceci est fait pour pour calculer notre score "d'attention"
    hidden_with_time_axis = tf.expand_dims(query, 1)

    # score shape == (batch_size, max_length, 1)
    # On obtient 1 sur le dernier axe car on applique le score à self.V
    # La shape du tenseur avant que l'on applique self.V est (batch_size, max_length, units)
    score = self.V(tf.nn.tanh(
        self.W1(values) + self.W2(hidden_with_time_axis)))

    # attention_weights shape == (batch_size, max_length, 1)
    attention_weights = tf.nn.softmax(score, axis=1)

    # context_vector shape after sum == (batch_size, hidden_size)
    context_vector = attention_weights * values
    context_vector = tf.reduce_sum(context_vector, axis=1)

    return context_vector, attention_weights

In [None]:
attention_layer = BahdanauAttention(10)
attention_result, attention_weights = attention_layer(sample_hidden, sample_output)

print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))

Attention result shape: (batch size, units) (64, 1024)
Attention weights shape: (batch_size, sequence_length, 1) (64, 28, 1)


In [None]:
class Decoder(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
    super(Decoder, self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,
                                   return_sequences=True,
                                   return_state=True,
                                   recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # Utilisé pour attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self, x, hidden, enc_output):
    # enc_output shape == (batch_size, max_length, hidden_size)
    context_vector, attention_weights = self.attention(hidden, enc_output)

    # x shape après embedding == (batch_size, 1, embedding_dim)
    x = self.embedding(x)

    # x shape après concaténation == (batch_size, 1, embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

    # Passage du vecteur concaténé à la couche GRU
    output, state = self.gru(x)

    # output shape == (batch_size * 1, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # output shape == (batch_size, vocab)
    x = self.fc(output)

    return x, state, attention_weights

In [None]:
decoder = Decoder(vocab_tar_size + 1, embedding_dim, units, BATCH_SIZE)

sample_decoder_output, _, _ = decoder(tf.random.uniform((BATCH_SIZE, 1)),
                                      sample_hidden, sample_output)

print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))

Decoder output shape: (batch_size, vocab size) (64, 3524)


# Loss

In [None]:
optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')

def loss_function(real, pred):
  mask = tf.math.logical_not(tf.math.equal(real, 0))
  loss_ = loss_object(real, pred)

  mask = tf.cast(mask, dtype=loss_.dtype)
  loss_ *= mask

  return tf.reduce_mean(loss_)

In [None]:
import os
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

# Training 

In [None]:
@tf.function
def train_step(inp, targ, enc_hidden):
  loss = 0

  with tf.GradientTape() as tape:
    enc_output, enc_hidden = encoder(inp, enc_hidden)

    dec_hidden = enc_hidden

    dec_input = tf.expand_dims([all_en_tokens["<start>"]] * BATCH_SIZE, 1)

    # Teacher forcing - feeding the target as the next input
    for t in range(1, targ.shape[1]):
      # passing enc_output to the decoder
      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)

      loss += loss_function(targ[:, t], predictions)

      # using teacher forcing
      dec_input = tf.expand_dims(targ[:, t], 1)

  batch_loss = (loss / int(targ.shape[1]))

  variables = encoder.trainable_variables + decoder.trainable_variables

  gradients = tape.gradient(loss, variables)

  optimizer.apply_gradients(zip(gradients, variables))

  return batch_loss

In [None]:
EPOCHS = 30
steps_per_epoch = TAKE_SIZE

for epoch in range(EPOCHS):
  start = time.time()

  enc_hidden = encoder.initialize_hidden_state()
  total_loss = 0

  for (batch, (inp, targ)) in enumerate(train_data.take(steps_per_epoch)):
    batch_loss = train_step(inp, targ, enc_hidden)
    total_loss += batch_loss

    if batch % 10 == 0:
      print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                   batch,
                                                   batch_loss.numpy()))
  
  # saving (checkpoint) the model every 2 epochs
  if (epoch + 1) % 2 == 0:
    checkpoint.save(file_prefix = checkpoint_prefix)

  print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                      total_loss / steps_per_epoch))
  print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 2.4566
Epoch 1 Batch 10 Loss 1.8106
Epoch 1 Batch 20 Loss 1.8215
Epoch 1 Batch 30 Loss 1.7302
Epoch 1 Batch 40 Loss 1.6291
Epoch 1 Batch 50 Loss 1.5803
Epoch 1 Loss 1.7505
Time taken for 1 epoch 48.712414026260376 sec

Epoch 2 Batch 0 Loss 1.5499
Epoch 2 Batch 10 Loss 1.4397
Epoch 2 Batch 20 Loss 1.3814
Epoch 2 Batch 30 Loss 1.3921
Epoch 2 Batch 40 Loss 1.5381
Epoch 2 Batch 50 Loss 1.3987
Epoch 2 Loss 1.4542
Time taken for 1 epoch 8.688588619232178 sec

Epoch 3 Batch 0 Loss 1.3288
Epoch 3 Batch 10 Loss 1.3104
Epoch 3 Batch 20 Loss 1.2824
Epoch 3 Batch 30 Loss 1.2887
Epoch 3 Batch 40 Loss 1.2676
Epoch 3 Batch 50 Loss 1.2518
Epoch 3 Loss 1.3216
Time taken for 1 epoch 8.300629138946533 sec

Epoch 4 Batch 0 Loss 1.2007
Epoch 4 Batch 10 Loss 1.1316
Epoch 4 Batch 20 Loss 1.2264
Epoch 4 Batch 30 Loss 1.2310
Epoch 4 Batch 40 Loss 1.1657
Epoch 4 Batch 50 Loss 1.2415
Epoch 4 Loss 1.2058
Time taken for 1 epoch 9.120444536209106 sec

Epoch 5 Batch 0 Loss 1.2286
Epoch 5 Batch 1

In [None]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f0c5598c780>

In [None]:
checkpoint.restore("/content/training_checkpoints/ckpt-3")
encoder_old=checkpoint.encoder
decoder_old=checkpoint.decoder

In [None]:
encoder_old

<__main__.Encoder at 0x7f0cb58081d0>

In [None]:
for example, label in test_data.take(10):

  hidden = [tf.zeros((1, units))]
  input_t = example[0]
  output_label = label[0]
  enc_out, enc_hidden = encoder(tf.expand_dims(input_t, axis=0), hidden)

  dec_hidden = enc_hidden
  dec_input = tf.expand_dims([all_en_tokens["<start>"]], 0)

  result = ""

  for t in range(padded_fr_indices.shape[-1]):
    predictions, dec_hidden, attention_weights = decoder(dec_input,
                                                          dec_hidden,
                                                          enc_out)

    predicted_id = tf.argmax(predictions[0]).numpy()
    corresponding_word = [word for word, id in all_en_tokens.items() if id==predicted_id]
    result += corresponding_word[0] + " "

    if corresponding_word[0] == '<end>':
      break

    # the predicted ID is fed back into the model
    dec_input = tf.expand_dims([predicted_id], 0)

  input_sentence = ""
  for token_id in input_t:
    if token_id==0:
      break
    
    corresponding_word = [word for word, id in all_fr_tokens.items() if id==token_id]
    input_sentence += corresponding_word[0] + " "
    if corresponding_word[0] == "<end>":
      break

  true_translation = ""
  for token_id in output_label:
    if token_id==0:
      break
    corresponding_word = [word for word, id in all_en_tokens.items() if id==token_id]
    true_translation += corresponding_word[0] + " "
    if corresponding_word[0] == "<end>":
      break 


print("French sentence: {}".format(input_sentence))
print("True translation: {}".format(true_translation))
print("Model translation: {}".format(result))

French sentence: Qu' est -ce qui se passe , ici ? 
True translation: <start> What 's the deal here ? <end> 
Model translation: What 's your way ? <end> 


[]

In [None]:
for example, label in test_data.take(10):

  hidden = [tf.zeros((1, units))]
  input_t = example[0]
  output_label = label[0]
  enc_out, enc_hidden = encoder_old(tf.expand_dims(input_t, axis=0), hidden)

  dec_hidden = enc_hidden
  dec_input = tf.expand_dims([all_en_tokens["<start>"]], 0)

  result = ""

  for t in range(padded_fr_indices.shape[-1]):
    predictions, dec_hidden, attention_weights = decoder_old(dec_input,
                                                          dec_hidden,
                                                          enc_out)

    predicted_id = tf.argmax(predictions[0]).numpy()
    corresponding_word = [word for word, id in all_en_tokens.items() if id==predicted_id]
    result += corresponding_word[0] + " "

    if corresponding_word[0] == '<end>':
      break

    # the predicted ID is fed back into the model
    dec_input = tf.expand_dims([predicted_id], 0)

  input_sentence = ""
  for token_id in input_t:
    
    corresponding_word = [word for word, id in all_fr_tokens.items() if id==token_id]
    input_sentence += corresponding_word[0] + " "
    if corresponding_word[0] == "<end>":
      break

  true_translation = ""
  for token_id in output_label:
    corresponding_word = [word for word, id in all_en_tokens.items() if id==token_id]
    true_translation += corresponding_word[0] + " "
    if corresponding_word[0] == "<end>":
      break 


print("French sentence: {}".format(input_sentence))
print("True translation: {}".format(true_translation))
print("Model translation: {}".format(result))

French sentence: <start> Je veux t' allouer suffisamment de temps pour faire ça . <end> 
True translation: <start> I want to give you enough time to do that . <end> 
Model translation: I want you want to do . <end> 
