# Capstone Project
## Neural translation model
### Instructions

In this notebook, you will create a neural network that translates from English to German. You will use concepts from throughout this course, including building more flexible model architectures, freezing layers, data processing pipeline and sequence modelling.

This project is peer-assessed. Within this notebook you will find instructions in each section for how to complete the project. Pay close attention to the instructions as the peer review will be carried out according to a grading rubric that checks key parts of the project instructions. Feel free to add extra cells into the notebook as required.

### How to submit

When you have completed the Capstone project notebook, you will submit a pdf of the notebook for peer review. First ensure that the notebook has been fully executed from beginning to end, and all of the cell outputs are visible. This is important, as the grading rubric depends on the reviewer being able to view the outputs of your notebook. Save the notebook as a pdf (File -> Download as -> PDF via LaTeX). You should then submit this pdf for review.

### Let's get started!

We'll start by running some imports, and loading the dataset. For this project you are free to make further imports throughout the notebook as you wish. 

In [1]:
import tensorflow as tf
import tensorflow_hub as hub
import unicodedata
import re
import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Layer


![Flags overview image](data/germany_uk_flags.png)

For the capstone project, you will use a language dataset from http://www.manythings.org/anki/ to build a neural translation model. This dataset consists of over 200,000 pairs of sentences in English and German. In order to make the training quicker, we will restrict to our dataset to 20,000 pairs. Feel free to change this if you wish - the size of the dataset used is not part of the grading rubric.

Your goal is to develop a neural translation model from English to German, making use of a pre-trained English word embedding module.

In [2]:
# Run this cell to load the dataset

NUM_EXAMPLES = 20000
data_examples = []
with open('data/deu.txt', 'r', encoding='utf8') as f:
    for line in f.readlines():
        if len(data_examples) < NUM_EXAMPLES:
            data_examples.append(line)
        else:
            break

In [3]:
# These functions preprocess English and German sentences

def unicode_to_ascii(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')

def preprocess_sentence(sentence):
    sentence = sentence.lower().strip()
    sentence = re.sub(r"ü", 'ue', sentence)
    sentence = re.sub(r"ä", 'ae', sentence)
    sentence = re.sub(r"ö", 'oe', sentence)
    sentence = re.sub(r'ß', 'ss', sentence)
    
    sentence = unicode_to_ascii(sentence)
    sentence = re.sub(r"([?.!,])", r" \1 ", sentence)
    sentence = re.sub(r"[^a-z?.!,']+", " ", sentence)
    sentence = re.sub(r'[" "]+', " ", sentence)
    
    return sentence.strip()

#### The custom translation model
The following is a schematic of the custom translation model architecture you will develop in this project.

![Model Schematic](data/neural_translation_model.png)

Key:
![Model key](data/neural_translation_model_key.png)

The custom model consists of an encoder RNN and a decoder RNN. The encoder takes words of an English sentence as input, and uses a pre-trained word embedding to embed the words into a 128-dimensional space. To indicate the end of the input sentence, a special end token (in the same 128-dimensional space) is passed in as an input. This token is a TensorFlow Variable that is learned in the training phase (unlike the pre-trained word embedding, which is frozen).

The decoder RNN takes the internal state of the encoder network as its initial state. A start token is passed in as the first input, which is embedded using a learned German word embedding. The decoder RNN then makes a prediction for the next German word, which during inference is then passed in as the following input, and this process is repeated until the special `<end>` token is emitted from the decoder.

## 1. Text preprocessing
* Create separate lists of English and German sentences, and preprocess them using the `preprocess_sentence` function provided for you above.
* Add a special `"<start>"` and `"<end>"` token to the beginning and end of every German sentence.
* Use the Tokenizer class from the `tf.keras.preprocessing.text` module to tokenize the German sentences, ensuring that no character filters are applied. _Hint: use the Tokenizer's "filter" keyword argument._
* Print out at least 5 randomly chosen examples of (preprocessed) English and German sentence pairs. For the German sentence, print out the text (with start and end tokens) as well as the tokenized sequence.
* Pad the end of the tokenized German sequences with zeros, and batch the complete set of sequences into one numpy array.

In [4]:
tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='.!?')


In [5]:

eng_sent = []
ger_sent = []
for line in data_examples:
    tmp_sentence = re.split(r'[.!?]',preprocess_sentence(line))
    eng_sent.append(tmp_sentence[0])
    ger_sent.append('<start>' + tmp_sentence[1] + '<end>')
for t in [50, 73, 100]:
    t = int(t)
    print('English sentence: ', eng_sent[t],'\n')
    print('German Sentence: ', ger_sent[t],'\n')


English sentence:  no way  

German Sentence:  <start> unmoeglich <end> 

English sentence:  beat it  

German Sentence:  <start> hau ab <end> 

English sentence:  get out  

German Sentence:  <start> geh raus <end> 



In [6]:

tokenizer = tf.keras.preprocessing.text.Tokenizer(filters='')
tokenizer.fit_on_texts(ger_sent)
ger_sent = tokenizer.texts_to_sequences(ger_sent)
ger_padd_sent = tf.keras.preprocessing.sequence.pad_sequences(ger_sent,padding='post')

## 2. Prepare the data with tf.data.Dataset objects

#### Load the embedding layer
As part of the dataset preproceessing for this project, you will use a pre-trained English word embedding module from TensorFlow Hub. The URL for the module is https://tfhub.dev/google/tf2-preview/nnlm-en-dim128-with-normalization/1. This module has also been made available as a complete saved model in the folder `'./models/tf2-preview_nnlm-en-dim128_1'`. 

This embedding takes a batch of text tokens in a 1-D tensor of strings as input. It then embeds the separate tokens into a 128-dimensional space. 

The code to load and test the embedding layer is provided for you below.

**NB:** this model can also be used as a sentence embedding module. The module will process each token by removing punctuation and splitting on spaces. It then averages the word embeddings over a sentence to give a single embedding vector. However, we will use it only as a word embedding module, and will pass each word in the input sentence as a separate token.

In [7]:
# Load embedding module from Tensorflow Hub

embedding_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1", 
                                 output_shape=[128], input_shape=[], dtype=tf.string)



In [8]:
# Test the layer

embedding_layer(tf.constant(["these", "aren't", "the", "droids", "you're", "looking", "for"])).shape

TensorShape([7, 128])

You should now prepare the training and validation Datasets.

* Create a random training and validation set split of the data, reserving e.g. 20% of the data for validation (NB: each English dataset example is a single sentence string, and each German dataset example is a sequence of padded integer tokens).
* Load the training and validation sets into a tf.data.Dataset object, passing in a tuple of English and German data for both training and validation sets.
* Create a function to map over the datasets that splits each English sentence at spaces. Apply this function to both Dataset objects using the map method. _Hint: look at the tf.strings.split function._
* Create a function to map over the datasets that embeds each sequence of English words using' the loaded embedding layer/model. Apply this function to both Dataset objects using the map method.
* Create a function to filter out dataset examples where the English sentence is more than 13 (embedded) tokens in length. Apply this function to both Dataset objects using the filter method.
* Create a function to map over the datasets that pads each English sequence of embeddings with some distinct padding value before the sequence, so that each sequence is length 13. Apply this function to both Dataset objects using the map method. _Hint: look at the tf.pad function. You can extract a Tensor shape using tf.shape; you might also find the tf.math.maximum function useful._
* Batch both training and validation Datasets with a batch size of 16.
* Print the `element_spec` property for the training and validation Datasets. 
* Using the Dataset `.take(1)` method, print the shape of the English data example from the training Dataset.
* Using the Dataset `.take(1)` method, print the German data example Tensor from the validation Dataset.

In [9]:

eng_train, eng_test,ger_train, ger_test = train_test_split( eng_sent, ger_padd_sent, test_size=0.2)
dataset_val = tf.data.Dataset.from_tensor_slices((eng_test, ger_test))
dataset_train = tf.data.Dataset.from_tensor_slices((eng_train, ger_train))

In [10]:

def split_sentenence_space(dataset):
    def map_data(eng_sent, ger_sent):
        tmp_sent_list = tf.strings.split(eng_sent, sep=' ')
        return tmp_sent_list, ger_sent
    dataset = dataset.map(map_data)
    return dataset
dataset_train = split_sentenence_space(dataset_train)
dataset_val = split_sentenence_space(dataset_val)

In [11]:

def embed_sentenence(dataset):
    def map_data(eng_word, ger_sent):
        tmp_word_list = embedding_layer(eng_word)
        return tmp_word_list, ger_sent
    dataset = dataset.map(map_data)
    return dataset
embeded_dataset_train = embed_sentenence(dataset_train)
embeded_dataset_val = embed_sentenence(dataset_val)

In [12]:

def filter_long_length(dataset):
    def filter_fn(token, ger_sent):
        return tf.shape(token)[0]<= 13
    filtered_data = dataset.filter(filter_fn)
    return filtered_data
filtered_long_train_dataset = filter_long_length(embeded_dataset_train)
filtered_long_val_dataset = filter_long_length(embeded_dataset_val)


In [13]:

def filter_dataset(dataset):
    def pad_to_length_13(eng_emb, ger_sent):
        padding_value = -1  # You can set any distinct padding value
        padding_needed = tf.math.maximum(0, 13 - tf.shape(eng_emb)[0])
        padded_eng_emb = tf.pad(eng_emb, paddings=[[0, padding_needed], [0, 0]], constant_values=padding_value)
        return padded_eng_emb, ger_sent
    padded_dataset = dataset.map(pad_to_length_13)
    return padded_dataset 
filtered_train_dataset = filter_dataset(filtered_long_train_dataset)
filtered_val_dataset = filter_dataset(filtered_long_val_dataset)

In [14]:
batched_train_dataset = filtered_train_dataset.batch(16)
batched_val_dataset = filtered_val_dataset.batch(16)

In [15]:

# Print the shape of the English data example from the training dataset
sample_data_train = next(iter(batched_train_dataset.take(1)))
print("Shape of English Data Example (Training):", sample_data_train[0].shape)
sample_data_val = next(iter(batched_val_dataset.take(1)))
print("Shape of German Data Example (Validation):", sample_data_val[1].shape)

Shape of English Data Example (Training): (16, 13, 128)
Shape of German Data Example (Validation): (16, 13)


## 3. Create the custom layer
You will now create a custom layer to add the learned end token embedding to the encoder model:

![Encoder schematic](data/neural_translation_model_encoder.png)

You should now build the custom layer.
* Using layer subclassing, create a custom layer that takes a batch of English data examples from one of the Datasets, and adds a learned embedded ‘end’ token to the end of each sequence. 
* This layer should create a TensorFlow Variable (that will be learned during training) that is 128-dimensional (the size of the embedding space). _Hint: you may find it helpful in the call method to use the tf.tile function to replicate the end token embedding across every element in the batch._
* Using the Dataset `.take(1)` method, extract a batch of English data examples from the training Dataset and print the shape. Test the custom layer by calling the layer on the English data batch Tensor and print the resulting Tensor shape (the layer should increase the sequence length by one).

In [77]:
class Embedding_Layer_class(Layer):
    def __init__(self, **kwargs):
        super(Embedding_Layer_class, self).__init__(**kwargs)

        # Load embedding module from Tensorflow Hub
        self.embedding_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1",
                                          output_shape=[128], input_shape=[], dtype=tf.string)

        # Initialize end token embedding as a trainable variable
        self.end_token_embedding = tf.Variable(initial_value=tf.random.uniform([1, 128]), trainable=True)

    def call(self, inputs):
        batch_size = tf.shape(inputs)[0]  # Get batch size from inputs
        tiled_end_token_embedding = tf.tile(
            self.end_token_embedding,  # Use the trainable variable
            [batch_size, 1]  # Tile across the sequence length
        )
        output = tf.concat([inputs, tf.expand_dims(tiled_end_token_embedding, axis=1)], axis=1)
        return output

#class Embedding_Layer_class(Layer):
#    def __init__(self, **kwargs):
#        super(Embedding_Layer_class, self).__init__(**kwargs)
#
#    def build(self, input_shape):
#        # Load embedding module from Tensorflow Hub
#        self.embedding_layer = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1", 
#                                              output_shape=[128], input_shape=[], dtype=tf.string)
#
#    def call(self, inputs):
#        # Load the 'end' token embedding
#        end_token_embedding = self.embedding_layer(tf.constant(['end']))
#        # Get the batch size
#        inputs = tf.convert_to_tensor(inputs)
#        batch_size = tf.shape(inputs)[0]
#        # Tile the 'end' token embedding across the sequence length and batch size
#        tiled_end_token_embedding = tf.tile(
#            end_token_embedding,  # Add batch dimension
#            [batch_size, 1]  # Tile across the sequence length
#        )
#        output = tf.concat([inputs, tf.expand_dims(tiled_end_token_embedding,axis=1)], axis=1)
#        return output
#

In [78]:

sample_data_batch_eng = next(iter(batched_train_dataset))[0]
embedded_layer = Embedding_Layer_class()
output_embedded_layer = embedded_layer(sample_data_batch_eng)

In [79]:
output_embedded_layer.shape

TensorShape([16, 14, 128])

## 4. Build the encoder network
The encoder network follows the schematic diagram above. You should now build the RNN encoder model.
* Using the functional API, build the encoder network according to the following spec:
    * The model will take a batch of sequences of embedded English words as input, as given by the Dataset objects.
    * The next layer in the encoder will be the custom layer you created previously, to add a learned end token embedding to the end of the English sequence.
    * This is followed by a Masking layer, with the `mask_value` set to the distinct padding value you used when you padded the English sequences with the Dataset preprocessing above.
    * The final layer is an LSTM layer with 512 units, which also returns the hidden and cell states.
    * The encoder is a multi-output model. There should be two output Tensors of this model: the hidden state and cell states of the LSTM layer. The output of the LSTM layer is unused.
* Using the Dataset `.take(1)` method, extract a batch of English data examples from the training Dataset and test the encoder model by calling it on the English data Tensor, and print the shape of the resulting Tensor outputs.
* Print the model summary for the encoder network.

In [81]:

class EncoderModel(tf.keras.Model):
    def __init__(self, embedding_dim=128):
        super().__init__()
        self.input_layer = tf.keras.layers.Input(shape=(None, 128))
        self.embedding = Embedding_Layer_class()
        self.masking_layer = tf.keras.layers.Masking(mask_value=-1)  # Replace with your padding value
        self.lstm_layer = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True)

    def call(self, inputs):
        embedded_inputs = self.embedding(inputs)
        masked_inputs = self.masking_layer(embedded_inputs)
        _, hidden_state, cell_state = self.lstm_layer(masked_inputs)
        return hidden_state, cell_state


In [82]:

embedding_layer_class = Embedding_Layer_class()
sample_data_eng = next(iter(batched_train_dataset))[0]

encoder_class = EncoderModel()


In [83]:
o_1, o_2 = encoder_class(sample_data_batch_eng)
o_1.shape, o_2.shape

(TensorShape([16, 512]), TensorShape([16, 512]))

In [84]:
encoder_class.summary()

Model: "encoder_model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding__layer_class_8 (E  multiple                 124642816 
 mbedding_Layer_class)                                           
                                                                 
 masking_4 (Masking)         multiple                  0         
                                                                 
 lstm_7 (LSTM)               multiple                  1312768   
                                                                 
Total params: 125,955,584
Trainable params: 1,312,896
Non-trainable params: 124,642,688
_________________________________________________________________


## 5. Build the decoder network
The decoder network follows the schematic diagram below. 

![Decoder schematic](data/neural_translation_model_decoder.png)

You should now build the RNN decoder model.
* Using Model subclassing, build the decoder network according to the following spec:
    * The initializer should create the following layers:
        * An Embedding layer with vocabulary size set to the number of unique German tokens, embedding dimension 128, and set to mask zero values in the input.
        * An LSTM layer with 512 units, that returns its hidden and cell states, and also returns sequences.
        * A Dense layer with number of units equal to the number of unique German tokens, and no activation function.
    * The call method should include the usual `inputs` argument, as well as the additional keyword arguments `hidden_state` and `cell_state`. The default value for these keyword arguments should be `None`.
    * The call method should pass the inputs through the Embedding layer, and then through the LSTM layer. If the `hidden_state` and `cell_state` arguments are provided, these should be used for the initial state of the LSTM layer. _Hint: use the_ `initial_state` _keyword argument when calling the LSTM layer on its input._
    * The call method should pass the LSTM output sequence through the Dense layer, and return the resulting Tensor, along with the hidden and cell states of the LSTM layer.
* Using the Dataset `.take(1)` method, extract a batch of English and German data examples from the training Dataset. Test the decoder model by first calling the encoder model on the English data Tensor to get the hidden and cell states, and then call the decoder model on the German data Tensor and hidden and cell states, and print the shape of the resulting decoder Tensor outputs.
* Print the model summary for the decoder network.

In [85]:

vocab_size_ger = len(tokenizer.word_index) + 1
class DecoderNetwork(tf.keras.Model):
    def __init__(self, vocab_size=vocab_size_ger, embedding_dim=128):
        super(DecoderNetwork, self).__init__()
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, mask_zero=True)
        self.lstm = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True)
        self.dense = tf.keras.layers.Dense(vocab_size)

    def call(self, inputs, hidden_state=None, cell_state=None):
        embedded_inputs = self.embedding(inputs)
        lstm_outputs, hidden_state, cell_state = self.lstm(embedded_inputs, initial_state=(hidden_state, cell_state))
        predictions = self.dense(lstm_outputs)
        return predictions, hidden_state, cell_state




In [86]:

data=next(iter(batched_train_dataset.take(1)))
english_data = data[0]
german_data = data[1]
data=next(iter(batched_train_dataset.take(1)))
english_data = data[0]
german_data = data[1]
encoder_hidden_state, encoder_cell_state = EncoderModel()(english_data)

In [87]:

decoder = DecoderNetwork()
# Call the decoder model on the German data Tensor
decoder_outputs, decoder_hidden_state, decoder_cell_state = decoder(german_data, hidden_state=encoder_hidden_state, cell_state=encoder_cell_state)
print(decoder_outputs.shape)

(16, 13, 5746)


## 6. Make a custom training loop
You should now write a custom training loop to train your custom neural translation model.
* Define a function that takes a Tensor batch of German data (as extracted from the training Dataset), and returns a tuple containing German inputs and outputs for the decoder model (refer to schematic diagram above).
* Define a function that computes the forward and backward pass for your translation model. This function should take an English input, German input and German output as arguments, and should do the following:
    * Pass the English input into the encoder, to get the hidden and cell states of the encoder LSTM.
    * These hidden and cell states are then passed into the decoder, along with the German inputs, which returns a sequence of outputs (the hidden and cell state outputs of the decoder LSTM are unused in this function).
    * The loss should then be computed between the decoder outputs and the German output function argument.
    * The function returns the loss and gradients with respect to the encoder and decoder’s trainable variables.
    * Decorate the function with @tf.function
* Define and run a custom training loop for a number of epochs (for you to choose) that does the following:
    * Iterates through the training dataset, and creates decoder inputs and outputs from the German sequences.
    * Updates the parameters of the translation model using the gradients of the function above and an optimizer object.
    * Every epoch, compute the validation loss on a number of batches from the validation and save the epoch training and validation losses.
* Plot the learning curves for loss vs epoch for both training and validation sets.

_Hint: This model is computationally demanding to train. The quality of the model or length of training is not a factor in the grading rubric. However, to obtain a better model we recommend using the GPU accelerator hardware on Colab._

In [88]:

def german_outs(german_data):
    german_input = german_data[:,:-1]
    german_output = german_data[:,1:]
    return german_input, german_output

In [89]:

optimizer = tf.keras.optimizers.RMSprop()
decoder_model = DecoderNetwork()
encoder_model = EncoderModel()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In [106]:

#@tf.function
def forward_backward_pass(english_input, german_input, german_output, loss_obj, encoder_obj, decoder_obj):
    with tf.GradientTape() as tape:
        hidden_state, cell_state = encoder_obj(english_input)

        decoder_outputs, _, _ = decoder_obj(german_input, hidden_state=hidden_state, cell_state=cell_state)

    # Cast decoder_outputs to the data type of german_output

    #german_output = tf.cast(german_output, dtype=decoder_outputs.dtype)
    #all_variables = tf.nest.flatten(encoder_obj.trainable_variables + decoder_obj.trainable_variables)
        print('trainable variables:', tf.nest.flatten(encoder_obj.trainable_variables + decoder_obj.trainable_variables))


    # Compute gradients manually using tf.gradient
        loss_value = loss_obj(german_output, decoder_outputs)
        mean_loss = tf.reduce_mean(loss_value)
        #print('mean loss:', mean_loss)
        gradient = [tape.gradient(mean_loss, var) for var in encoder_obj.trainable_variables + decoder_obj.trainable_variables]

#        gradient = tape.gradient(mean_loss, tf.nest.flatten(encoder_obj.trainable_variables + decoder_obj.trainable_variables))

        for grad, var in zip(gradient, tf.nest.flatten(encoder_obj.trainable_variables + decoder_obj.trainable_variables)):
            print(f'Gradient for {var.name}: {grad}')
    return mean_loss, gradient

In [109]:

def training_loop(dataset, val_dataset, epochs, grad_fn, loss_obj, encoder_obj, decoder_obj, optim_obj):
    train_losses = []    
    val_losses = []

    for epoch in range(epochs):
        epoch_loss_avg = tf.keras.metrics.Mean()
        print('epoch:', epoch)
        for eng_data, german_data in dataset:
            germ_input, germ_output = german_outs(german_data=german_data)

            mean_loss, gradient = grad_fn(eng_data, germ_input, germ_output, loss_obj, encoder_obj, decoder_obj)
            #gradient = [tf.convert_to_tensor(g) if g is not None else None for g in gradient]
            optim_obj.apply_gradients(zip(gradient, tf.nest.flatten(encoder_obj.trainable_variables + decoder_obj.trainable_variables)))
            
            epoch_loss_avg(mean_loss)

        train_losses.append(epoch_loss_avg.result())

        # Validation
        val_loss_avg = tf.keras.metrics.Mean()
        for val_eng_data, val_german_data in val_dataset:
            val_germ_input, val_germ_output = german_outs(val_german_data)

            val_mean_loss, _ = grad_fn(val_eng_data, val_germ_input, val_germ_output, loss_obj, encoder_obj, decoder_obj)

            val_loss_avg(val_mean_loss)

        val_losses.append(val_loss_avg.result())

        print("Epoch {}: Training Loss: {:.4f}, Validation Loss: {:.4f}".format(epoch, epoch_loss_avg.result(), val_loss_avg.result()))

    return train_losses, val_losses


In [110]:

loss_train, loss_val = training_loop(batched_train_dataset, batched_val_dataset, epochs=1,
                                      loss_obj=loss_fn, grad_fn=forward_backward_pass, encoder_obj=encoder_model,decoder_obj=decoder_model, optim_obj=optimizer)

epoch: 0
trainable variables: [<tf.Variable 'Variable:0' shape=(1, 128) dtype=float32, numpy=
array([[0.41994023, 0.13174438, 0.30766904, 0.3508879 , 0.12855184,
        0.10586417, 0.8486258 , 0.9067781 , 0.93516874, 0.10087538,
        0.8060719 , 0.979203  , 0.5533979 , 0.32789028, 0.05007434,
        0.773085  , 0.8148911 , 0.0137924 , 0.96738994, 0.7330539 ,
        0.9520906 , 0.03202343, 0.9939234 , 0.02844405, 0.9036627 ,
        0.44223547, 0.5123595 , 0.5743748 , 0.52296424, 0.7613176 ,
        0.5420103 , 0.35760176, 0.9791    , 0.24486768, 0.9679347 ,
        0.703635  , 0.12343478, 0.64792335, 0.24930716, 0.621333  ,
        0.24185729, 0.93071055, 0.37672257, 0.4831221 , 0.00672877,
        0.0384115 , 0.6450989 , 0.57856834, 0.4816606 , 0.57952726,
        0.7033075 , 0.13766134, 0.86197543, 0.9280298 , 0.09775603,
        0.77230203, 0.01409829, 0.62153363, 0.44402707, 0.700397  ,
        0.7227044 , 0.92150736, 0.08288491, 0.7559185 , 0.8141806 ,
        0.23712623, 0.

ValueError: `labels.shape` must equal `logits.shape` except for the last dimension. Received: labels.shape=(16, 12) and logits.shape=(16, 13, 5746)

In [None]:
import matplotlib.pyplot as plt

plt.plot(loss_train, label='Training')
plt.plot(loss_val, label='Validation')
plt.xlabel('epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

## 7. Use the model to translate
Now it's time to put your model into practice! You should run your translation for five randomly sampled English sentences from the dataset. For each sentence, the process is as follows:
* Preprocess and embed the English sentence according to the model requirements.
* Pass the embedded sentence through the encoder to get the encoder hidden and cell states.
* Starting with the special  `"<start>"` token, use this token and the final encoder hidden and cell states to get the one-step prediction from the decoder, as well as the decoder’s updated hidden and cell states.
* Create a loop to get the next step prediction and updated hidden and cell states from the decoder, using the most recent hidden and cell states. Terminate the loop when the `"<end>"` token is emitted, or when the sentence has reached a maximum length.
* Decode the output token sequence into German text and print the English text and the model's German translation.

In [None]:
# Function to preprocess and embed the English sentence
def preprocess_and_embed(sentence):
    sentence = preprocess_sentence(sentence)
    sentence = tf.strings.split(sentence, sep=' ')
    embedded_sentence = embedding_layer(sentence)
    return tf.expand_dims(embedded_sentence, axis=0)


In [None]:

# Function to translate an English sentence to German using the trained model
def translate_sentence(encoder, decoder, sentence):
    # Preprocess and embed the English sentence
    embedded_sentence = preprocess_and_embed(sentence)
    
    # Pass the embedded sentence through the encoder to get hidden and cell states
    encoder_hidden, encoder_cell = encoder(embedded_sentence)
    
    # Initialize the decoder input with the '<start>' token
    decoder_input = tf.constant(tokenizer.texts_to_sequences(['<start>']), dtype=tf.float32)
    
    # Initialize an empty list to store the decoded words
    decoded_words = []
    
    # Create a loop to generate the translation
    for _ in range(13):
        # Pass the decoder input and encoder states through the decoder
        predictions, decoder_hidden, decoder_cell = decoder(decoder_input, hidden_state=encoder_hidden, cell_state=encoder_cell)
        
        # Get the predicted word index
        predicted_word_index = tf.argmax(predictions, axis=-1).numpy()[0, -1]
        
        # If the predicted word is '<end>', terminate the loop
        if predicted_word_index == tokenizer.word_index['<end>']:
            break
        
        # Convert the predicted word index to its corresponding word
        predicted_word = tokenizer.index_word[predicted_word_index]
        
        # Append the predicted word to the list of decoded words
        decoded_words.append(predicted_word)
        
        # Update the decoder input for the next iteration
        decoder_input = tf.constant(tokenizer.texts_to_sequences([predicted_word]), dtype=tf.float32)
    
    # Join the decoded words to form the German translation
    german_translation = ' '.join(decoded_words)
    
    return german_translation


In [None]:
# Choose five random indices from the validation dataset
random_indices = np.random.choice(len(eng_test), size=5, replace=False)

In [None]:

# Translate and print the English sentence along with the model's German translation
for index in random_indices:
    english_sentence = eng_test[index]
    german_translation = translate_sentence(encoder_model, decoder_model, english_sentence)
    
    print("English Sentence:", english_sentence)
    print("German Translation:", german_translation)
    print()
