# Artificial Intelligence Nanodegree
## Machine Translation Project
In this notebook, sections that end with **'(IMPLEMENTATION)'** in the header indicate that the following blocks of code will require additional functionality which you must provide. Please be sure to read the instructions carefully!

## Introduction
In this notebook, you will build a deep neural network that functions as part of an end-to-end machine translation pipeline. Your completed pipeline will accept English text as input and return the French translation.

- **Preprocess** - You'll convert text to sequence of integers.
- **Models** Create models which accepts a sequence of integers as input and returns a probability distribution over possible translations. After learning about the basic types of neural networks that are often used for machine translation, you will engage in your own investigations, to design your own model!
- **Prediction** Run the model on English text.

## Dataset
We begin by investigating the dataset that will be used to train and evaluate your pipeline.  The most common datasets used for machine translation are from [WMT](http://www.statmt.org/).  However, that will take a long time to train a neural network on.  We'll be using a dataset we created for this project that contains a small vocabulary.  You'll be able to train your model in a reasonable time with this dataset.
### Load Data
The data is located in `data/small_vocab_en` and `data/small_vocab_fr`. The `small_vocab_en` file contains English sentences with their French translations in the `small_vocab_fr` file. Load the English and French data from these files from running the cell below.

In [1]:
import helper


# Load English data
english_sentences = helper.load_data('data/small_vocab_en')
# Load French data
french_sentences = helper.load_data('data/small_vocab_fr')

print('Dataset Loaded')

Dataset Loaded


### Files
Each line in `small_vocab_en` contains an English sentence with the respective translation in each line of `small_vocab_fr`.  View the first two lines from each file.

In [2]:
for sample_i in range(2):
    print('small_vocab_en Line {}:  {}'.format(sample_i + 1, english_sentences[sample_i]))
    print('small_vocab_fr Line {}:  {}'.format(sample_i + 1, french_sentences[sample_i]))

small_vocab_en Line 1:  new jersey is sometimes quiet during autumn , and it is snowy in april .
small_vocab_fr Line 1:  new jersey est parfois calme pendant l' automne , et il est neigeux en avril .
small_vocab_en Line 2:  the united states is usually chilly during july , and it is usually freezing in november .
small_vocab_fr Line 2:  les états-unis est généralement froid en juillet , et il gèle habituellement en novembre .


From looking at the sentences, you can see they have been preprocessed already.  The puncuations have been delimited using spaces. All the text have been converted to lowercase.  This should save you some time, but the text requires more preprocessing.
### Vocabulary
The complexity of the problem is determined by the complexity of the vocabulary.  A more complex vocabulary is a more complex problem.  Let's look at the complexity of the dataset we'll be working with.

In [3]:
import collections


english_words_counter = collections.Counter([word for sentence in english_sentences for word in sentence.split()])
french_words_counter = collections.Counter([word for sentence in french_sentences for word in sentence.split()])

print('{} English words.'.format(len([word for sentence in english_sentences for word in sentence.split()])))
print('{} unique English words.'.format(len(english_words_counter)))
print('10 Most common words in the English dataset:')
print('"' + '" "'.join(list(zip(*english_words_counter.most_common(10)))[0]) + '"')
print()
print('{} French words.'.format(len([word for sentence in french_sentences for word in sentence.split()])))
print('{} unique French words.'.format(len(french_words_counter)))
print('10 Most common words in the French dataset:')
print('"' + '" "'.join(list(zip(*french_words_counter.most_common(10)))[0]) + '"')

1823250 English words.
227 unique English words.
10 Most common words in the English dataset:
"is" "," "." "in" "it" "during" "the" "but" "and" "sometimes"

1961295 French words.
355 unique French words.
10 Most common words in the French dataset:
"est" "." "," "en" "il" "les" "mais" "et" "la" "parfois"


For comparison, _Alice's Adventures in Wonderland_ contains 2,766 unique words of a total of 15,500 words.
## Preprocess
For this project, you won't use text data as input to your model. Instead, you'll convert the text into sequences of integers using the following preprocess methods:
1. Tokenize the words into ids
2. Add padding to make all the sequences the same length.

Time to start preprocessing the data...
### Tokenize (IMPLEMENTATION)
For a neural network to predict on text data, it first has to be turned into data it can understand. Text data like "dog" is a sequence of ASCII character encodings.  Since a neural network is a series of multiplication and addition operations, the input data needs to be number(s).

We can turn each character into a number or each word into a number.  These are called character and word ids, respectively.  Character ids are used for character level models that generate text predictions for each character.  A word level model uses word ids that generate text predictions for each word.  Word level models tend to learn better, since they are lower in complexity, so we'll use those.

Turn each sentence into a sequence of words ids using Keras's [`Tokenizer`](https://keras.io/preprocessing/text/#tokenizer) function. Use this function to tokenize `english_sentences` and `french_sentences` in the cell below.

Running the cell will run `tokenize` on sample data and show output for debugging.

In [11]:
import project_tests as tests
from keras.preprocessing.text import Tokenizer



def tokenize(x):
    """
    Tokenize x
    :param x: List of sentences/strings to be tokenized
    :return: Tuple of (tokenized x data, tokenizer used to tokenize x)
    """
    
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(x)

        
    return tokenizer.texts_to_sequences(x), tokenizer

tests.test_tokenize(tokenize)

# Tokenize Example output
text_sentences = [
    'The quick brown fox jumps over the lazy dog .',
    'By Jove , my quick study of lexicography won a prize .',
    'This is a short sentence .']
text_tokenized, text_tokenizer = tokenize(text_sentences)
print(text_tokenizer.word_index)
print()
for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(sent))
    print('  Output: {}'.format(token_sent))

{'the': 1, 'quick': 2, 'a': 3, 'brown': 4, 'fox': 5, 'jumps': 6, 'over': 7, 'lazy': 8, 'dog': 9, 'by': 10, 'jove': 11, 'my': 12, 'study': 13, 'of': 14, 'lexicography': 15, 'won': 16, 'prize': 17, 'this': 18, 'is': 19, 'short': 20, 'sentence': 21}

Sequence 1 in x
  Input:  The quick brown fox jumps over the lazy dog .
  Output: [1, 2, 4, 5, 6, 7, 1, 8, 9]
Sequence 2 in x
  Input:  By Jove , my quick study of lexicography won a prize .
  Output: [10, 11, 12, 2, 13, 14, 15, 16, 3, 17]
Sequence 3 in x
  Input:  This is a short sentence .
  Output: [18, 19, 3, 20, 21]


### Padding (IMPLEMENTATION)
When batching the sequence of word ids together, each sequence needs to be the same length.  Since sentences are dynamic in length, we can add padding to the end of the sequences to make them the same length.

Make sure all the English sequences have the same length and all the French sequences have the same length by adding padding to the **end** of each sequence using Keras's [`pad_sequences`](https://keras.io/preprocessing/sequence/#pad_sequences) function.

In [13]:
import numpy as np
from keras.preprocessing.sequence import pad_sequences


def pad(x, length=None):
    """
    Pad x
    :param x: List of sequences.
    :param length: Length to pad the sequence to.  If None, use length of longest sequence in x.
    :return: Padded numpy array of sequences
    """
    # TODO: Implement
    return pad_sequences(x, maxlen=length,padding='post')
tests.test_pad(pad)

# Pad Tokenized output
test_pad = pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(np.array(token_sent)))
    print('  Output: {}'.format(pad_sent))

Sequence 1 in x
  Input:  [1 2 4 5 6 7 1 8 9]
  Output: [1 2 4 5 6 7 1 8 9 0]
Sequence 2 in x
  Input:  [10 11 12  2 13 14 15 16  3 17]
  Output: [10 11 12  2 13 14 15 16  3 17]
Sequence 3 in x
  Input:  [18 19  3 20 21]
  Output: [18 19  3 20 21  0  0  0  0  0]


### Preprocess Pipeline
Your focus for this project is to build neural network architecture, so we won't ask you to create a preprocess pipeline.  Instead, we've provided you with the implementation of the `preprocess` function.

In [14]:
def preprocess(x, y):
    """
    Preprocess x and y
    :param x: Feature List of sentences
    :param y: Label List of sentences
    :return: Tuple of (Preprocessed x, Preprocessed y, x tokenizer, y tokenizer)
    """
    preprocess_x, x_tk = tokenize(x)
    preprocess_y, y_tk = tokenize(y)

    preprocess_x = pad(preprocess_x)
    preprocess_y = pad(preprocess_y)

    # Keras's sparse_categorical_crossentropy function requires the labels to be in 3 dimensions
    preprocess_y = preprocess_y.reshape(*preprocess_y.shape, 1)

    return preprocess_x, preprocess_y, x_tk, y_tk

preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer =\
    preprocess(english_sentences, french_sentences)

print('Data Preprocessed')

Data Preprocessed


## Models
In this section, you will experiment with various neural network architectures.
You will begin by training four relatively simple architectures.
- Model 1 is a simple RNN
- Model 2 is a RNN with Embedding
- Model 3 is a Bidirectional RNN
- Model 4 is an optional Encoder-Decoder RNN

After experimenting with the four simple architectures, you will construct a deeper architecture that is designed to outperform all four models.
### Ids Back to Text
The neural network will be translating the input to words ids, which isn't the final form we want.  We want the French translation.  The function `logits_to_text` will bridge the gab between the logits from the neural network to the French translation.  You'll be using this function to better understand the output of the neural network.

In [15]:
def logits_to_text(logits, tokenizer):
    """
    Turn logits from a neural network into text using the tokenizer
    :param logits: Logits from a neural network
    :param tokenizer: Keras Tokenizer fit on the labels
    :return: String that represents the text of the logits
    """
    index_to_words = {id: word for word, id in tokenizer.word_index.items()}
    index_to_words[0] = '<PAD>'

    return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits, 1)])

print('`logits_to_text` function loaded.')

`logits_to_text` function loaded.


### Model 1: RNN (IMPLEMENTATION)
![RNN](images/rnn.png)
A basic RNN model is a good baseline for sequence data.  In this model, you'll build a RNN that translates English to French.

In [19]:
from keras.layers import GRU, Input, Dense, TimeDistributed
from keras.models import Model
from keras.layers import Activation
from keras.optimizers import Adam
from keras.losses import sparse_categorical_crossentropy


def simple_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a basic RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Build the layers
    
    input_ = Input(input_shape[1:])

    x = GRU(100,return_sequences=True)(input_)
    x = Dense(french_vocab_size, activation="softmax")(x)

    model = Model(input_, x)
    
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(0.001),
                  metrics=['accuracy'])
    return model
tests.test_simple_model(simple_model)


# Reshaping the input to work with a basic RNN
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1))

# Train the neural network
simple_rnn_model = simple_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index),
    len(french_tokenizer.word_index))
simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/10

  1024/110288 [..............................] - ETA: 10:17 - loss: 5.8756 - acc: 3.2552e-04
  2048/110288 [..............................] - ETA: 5:09 - loss: 5.8424 - acc: 0.0910     
  4096/110288 [>.............................] - ETA: 2:33 - loss: 5.7852 - acc: 0.2310
  6144/110288 [>.............................] - ETA: 1:41 - loss: 5.7281 - acc: 0.2913
  8192/110288 [=>............................] - ETA: 1:15 - loss: 5.6673 - acc: 0.3225
 10240/110288 [=>............................] - ETA: 1:00 - loss: 5.6028 - acc: 0.3427
 12288/110288 [==>...........................] - ETA: 49s - loss: 5.5306 - acc: 0.3555 
 14336/110288 [==>...........................] - ETA: 42s - loss: 5.4506 - acc: 0.3640
 16384/110288 [===>..........................] - ETA: 36s - loss: 5.3581 - acc: 0.3704
 18432/110288 [====>.........................] - ETA: 32s - loss: 5.2520 - acc: 0.3751
 20480/110288 [====>.........................] - E

Epoch 3/10

  1024/110288 [..............................] - ETA: 3s - loss: 2.0302 - acc: 0.5240
  3072/110288 [..............................] - ETA: 3s - loss: 2.0355 - acc: 0.5240
  5120/110288 [>.............................] - ETA: 3s - loss: 2.0289 - acc: 0.5254
  7168/110288 [>.............................] - ETA: 3s - loss: 2.0214 - acc: 0.5281
  9216/110288 [=>............................] - ETA: 3s - loss: 2.0148 - acc: 0.5298
 11264/110288 [==>...........................] - ETA: 3s - loss: 2.0118 - acc: 0.5305
 13312/110288 [==>...........................] - ETA: 3s - loss: 2.0096 - acc: 0.5315
 15360/110288 [===>..........................] - ETA: 3s - loss: 2.0078 - acc: 0.5320
 17408/110288 [===>..........................] - ETA: 3s - loss: 2.0036 - acc: 0.5333
 19456/110288 [====>.........................] - ETA: 3s - loss: 2.0005 - acc: 0.5341
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.9983 - acc: 0.5346
 23552/110288 [=====>.....................

Epoch 5/10

  1024/110288 [..............................] - ETA: 3s - loss: 1.6072 - acc: 0.5928
  3072/110288 [..............................] - ETA: 3s - loss: 1.5976 - acc: 0.5962
  5120/110288 [>.............................] - ETA: 3s - loss: 1.5928 - acc: 0.5965
  7168/110288 [>.............................] - ETA: 3s - loss: 1.5952 - acc: 0.5951
  9216/110288 [=>............................] - ETA: 3s - loss: 1.5929 - acc: 0.5957
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.5920 - acc: 0.5956
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.5908 - acc: 0.5955
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.5891 - acc: 0.5953
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.5885 - acc: 0.5950
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.5879 - acc: 0.5947
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.5869 - acc: 0.5945
 23552/110288 [=====>.....................

 13312/110288 [==>...........................] - ETA: 3s - loss: 1.4963 - acc: 0.6077
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.4952 - acc: 0.6079
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.4945 - acc: 0.6079
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.4946 - acc: 0.6076
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.4943 - acc: 0.6077
 23552/110288 [=====>........................] - ETA: 2s - loss: 1.4929 - acc: 0.6078
 25600/110288 [=====>........................] - ETA: 2s - loss: 1.4913 - acc: 0.6081
Epoch 7/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.4417 - acc: 0.6157
  3072/110288 [..............................] - ETA: 3s - loss: 1.4467 - acc: 0.6139
  5120/110288 [>.............................] - ETA: 3s - loss: 1.4410 - acc: 0.6147
  7168/110288 [>.............................] - ETA: 3s - loss: 1.4369 - acc: 0.6157
  9216/110288 [=>.........................

Epoch 8/10

  1024/110288 [..............................] - ETA: 3s - loss: 1.3607 - acc: 0.6301
  3072/110288 [..............................] - ETA: 3s - loss: 1.3675 - acc: 0.6276
  5120/110288 [>.............................] - ETA: 3s - loss: 1.3717 - acc: 0.6254
  7168/110288 [>.............................] - ETA: 3s - loss: 1.3735 - acc: 0.6247
  9216/110288 [=>............................] - ETA: 3s - loss: 1.3721 - acc: 0.6255
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.3760 - acc: 0.6246
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.3765 - acc: 0.6246
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.3729 - acc: 0.6255
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.3717 - acc: 0.6258
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.3693 - acc: 0.6264
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.3700 - acc: 0.6263
 23552/110288 [=====>.....................

Epoch 10/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.2784 - acc: 0.6428
  3072/110288 [..............................] - ETA: 3s - loss: 1.2757 - acc: 0.6441
  5120/110288 [>.............................] - ETA: 3s - loss: 1.2761 - acc: 0.6449
  7168/110288 [>.............................] - ETA: 3s - loss: 1.2728 - acc: 0.6459
  9216/110288 [=>............................] - ETA: 3s - loss: 1.2703 - acc: 0.6462
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.2716 - acc: 0.6457
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.2719 - acc: 0.6453
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.2741 - acc: 0.6450
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.2731 - acc: 0.6452
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.2730 - acc: 0.6448
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.2712 - acc: 0.6449
 23552/110288 [=====>....................

### Model 2: Embedding (IMPLEMENTATION)
![RNN](images/embedding.png)
You've turned the words into ids, but there's a better representation of a word.  This is called word embeddings.  An embedding is a vector representation of the word that is close to similar words in n-dimensional space, where the n represents the size of the embedding vectors.

In this model, you'll create a RNN model using embedding.

In [22]:
from keras.layers.embeddings import Embedding


def embed_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a RNN model using word embedding on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    # TODO: Implement
    input_ = Input(input_shape[1:])
    x = Embedding(english_vocab_size, 20)(input_)
    x = GRU(100,return_sequences=True)(x)
    x = Dense(french_vocab_size, activation="softmax")(x)

    model = Model(input_, x)
    
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(0.001),
                  metrics=['accuracy'])
    return model
tests.test_embed_model(embed_model)


# TODO: Reshape the input
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))


# TODO: Train the neural network
embed_rnn_model = embed_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
embed_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

# TODO: Print prediction(s)
print(logits_to_text(embed_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/10

  1024/110288 [..............................] - ETA: 51s - loss: 5.8439 - acc: 8.8356e-04
  2048/110288 [..............................] - ETA: 28s - loss: 5.8412 - acc: 0.0021    
  4096/110288 [>.............................] - ETA: 16s - loss: 5.8351 - acc: 0.2028
  6144/110288 [>.............................] - ETA: 11s - loss: 5.8281 - acc: 0.2725
  8192/110288 [=>............................] - ETA: 9s - loss: 5.8201 - acc: 0.3071 
 10240/110288 [=>............................] - ETA: 8s - loss: 5.8108 - acc: 0.3271
 12288/110288 [==>...........................] - ETA: 7s - loss: 5.7994 - acc: 0.3411
 14336/110288 [==>...........................] - ETA: 6s - loss: 5.7855 - acc: 0.3505
 16384/110288 [===>..........................] - ETA: 6s - loss: 5.7677 - acc: 0.3574
 18432/110288 [====>.........................] - ETA: 5s - loss: 5.7434 - acc: 0.3630
 20480/110288 [====>.........................] - ETA: 5s - loss:

Epoch 3/10

  1024/110288 [..............................] - ETA: 4s - loss: 2.5390 - acc: 0.4608
  3072/110288 [..............................] - ETA: 4s - loss: 2.5283 - acc: 0.4639
  5120/110288 [>.............................] - ETA: 3s - loss: 2.5220 - acc: 0.4642
  7168/110288 [>.............................] - ETA: 3s - loss: 2.5230 - acc: 0.4637
  9216/110288 [=>............................] - ETA: 3s - loss: 2.5152 - acc: 0.4643
 11264/110288 [==>...........................] - ETA: 3s - loss: 2.5144 - acc: 0.4640
 13312/110288 [==>...........................] - ETA: 3s - loss: 2.5115 - acc: 0.4641
 15360/110288 [===>..........................] - ETA: 3s - loss: 2.5091 - acc: 0.4640
 17408/110288 [===>..........................] - ETA: 3s - loss: 2.5069 - acc: 0.4640
 19456/110288 [====>.........................] - ETA: 3s - loss: 2.5056 - acc: 0.4636
 21504/110288 [====>.........................] - ETA: 3s - loss: 2.5017 - acc: 0.4637
 23552/110288 [=====>.....................

Epoch 5/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.6998 - acc: 0.5910
  3072/110288 [..............................] - ETA: 3s - loss: 1.7090 - acc: 0.5905
  5120/110288 [>.............................] - ETA: 3s - loss: 1.7018 - acc: 0.5945
  7168/110288 [>.............................] - ETA: 3s - loss: 1.6986 - acc: 0.5956
  9216/110288 [=>............................] - ETA: 3s - loss: 1.6948 - acc: 0.5973
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.6888 - acc: 0.5986
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.6861 - acc: 0.5997
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.6861 - acc: 0.5994
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.6827 - acc: 0.5998
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.6796 - acc: 0.6006
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.6750 - acc: 0.6008
 23552/110288 [=====>.....................

 13312/110288 [==>...........................] - ETA: 3s - loss: 1.4330 - acc: 0.6400
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.4315 - acc: 0.6403
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.4303 - acc: 0.6407
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.4276 - acc: 0.6414
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.4251 - acc: 0.6421
 23552/110288 [=====>........................] - ETA: 3s - loss: 1.4234 - acc: 0.6424
 25600/110288 [=====>........................] - ETA: 3s - loss: 1.4219 - acc: 0.6425
Epoch 7/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.2884 - acc: 0.6745
  3072/110288 [..............................] - ETA: 4s - loss: 1.2817 - acc: 0.6740
  5120/110288 [>.............................] - ETA: 3s - loss: 1.2802 - acc: 0.6741
  7168/110288 [>.............................] - ETA: 3s - loss: 1.2736 - acc: 0.6757
  9216/110288 [=>.........................

Epoch 8/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.1654 - acc: 0.7157
  3072/110288 [..............................] - ETA: 4s - loss: 1.1442 - acc: 0.7193
  5120/110288 [>.............................] - ETA: 4s - loss: 1.1306 - acc: 0.7214
  7168/110288 [>.............................] - ETA: 3s - loss: 1.1322 - acc: 0.7218
  9216/110288 [=>............................] - ETA: 3s - loss: 1.1274 - acc: 0.7223
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.1253 - acc: 0.7229
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.1242 - acc: 0.7230
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.1219 - acc: 0.7235
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.1207 - acc: 0.7237
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.1191 - acc: 0.7240
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.1161 - acc: 0.7249
 23552/110288 [=====>.....................

Epoch 10/10

  1024/110288 [..............................] - ETA: 4s - loss: 0.8730 - acc: 0.7753
  3072/110288 [..............................] - ETA: 4s - loss: 0.8693 - acc: 0.7754
  5120/110288 [>.............................] - ETA: 3s - loss: 0.8680 - acc: 0.7750
  7168/110288 [>.............................] - ETA: 3s - loss: 0.8706 - acc: 0.7744
  9216/110288 [=>............................] - ETA: 3s - loss: 0.8702 - acc: 0.7752
 11264/110288 [==>...........................] - ETA: 3s - loss: 0.8726 - acc: 0.7750
 13312/110288 [==>...........................] - ETA: 3s - loss: 0.8741 - acc: 0.7748
 15360/110288 [===>..........................] - ETA: 3s - loss: 0.8728 - acc: 0.7754
 17408/110288 [===>..........................] - ETA: 3s - loss: 0.8717 - acc: 0.7758
 19456/110288 [====>.........................] - ETA: 3s - loss: 0.8715 - acc: 0.7759
 21504/110288 [====>.........................] - ETA: 3s - loss: 0.8713 - acc: 0.7759
 23552/110288 [=====>....................

### Model 3: Bidirectional RNNs (IMPLEMENTATION)
![RNN](images/bidirectional.png)
One restriction of a RNN is that it can't see the future input, only the past.  This is where bidirectional recurrent neural networks come in.  They are able to see the future data.

In [26]:
from keras.layers import Bidirectional, LSTM


def bd_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a bidirectional RNN model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    
    input_ = Input(input_shape[1:])
    x = Bidirectional(LSTM(100, return_sequences=True), input_shape=input_shape[1:])(input_)
    x = Dense(french_vocab_size, activation="softmax")(x)

    model = Model(input_, x)
    
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(0.001),
                  metrics=['accuracy'])
    return model
    
    
    return model
tests.test_bd_model(bd_model)


# TODO: Train and Print prediction(s)
# Reshaping the input to work with a basic RNN
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))

# Train the neural network
bidirectional_rnn_model = embed_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
bidirectional_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(bidirectional_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/10

  1024/110288 [..............................] - ETA: 54s - loss: 5.8463 - acc: 0.0022
  3072/110288 [..............................] - ETA: 20s - loss: 5.8407 - acc: 0.0086
  5120/110288 [>.............................] - ETA: 13s - loss: 5.8342 - acc: 0.1696
  7168/110288 [>.............................] - ETA: 10s - loss: 5.8271 - acc: 0.2388
  9216/110288 [=>............................] - ETA: 9s - loss: 5.8191 - acc: 0.2758 
 11264/110288 [==>...........................] - ETA: 8s - loss: 5.8097 - acc: 0.2996
 13312/110288 [==>...........................] - ETA: 7s - loss: 5.7985 - acc: 0.3160
 15360/110288 [===>..........................] - ETA: 6s - loss: 5.7845 - acc: 0.3285
 17408/110288 [===>..........................] - ETA: 6s - loss: 5.7667 - acc: 0.3374
 19456/110288 [====>.........................] - ETA: 5s - loss: 5.7424 - acc: 0.3450
 21504/110288 [====>.........................] - ETA: 5s - loss: 5.7078 

Epoch 3/10

  1024/110288 [..............................] - ETA: 4s - loss: 2.5191 - acc: 0.4702
  3072/110288 [..............................] - ETA: 4s - loss: 2.5327 - acc: 0.4655
  5120/110288 [>.............................] - ETA: 3s - loss: 2.5294 - acc: 0.4645
  7168/110288 [>.............................] - ETA: 3s - loss: 2.5260 - acc: 0.4642
  9216/110288 [=>............................] - ETA: 3s - loss: 2.5264 - acc: 0.4635
 11264/110288 [==>...........................] - ETA: 3s - loss: 2.5181 - acc: 0.4645
 13312/110288 [==>...........................] - ETA: 3s - loss: 2.5146 - acc: 0.4647
 15360/110288 [===>..........................] - ETA: 3s - loss: 2.5106 - acc: 0.4648
 17408/110288 [===>..........................] - ETA: 3s - loss: 2.5063 - acc: 0.4652
 19456/110288 [====>.........................] - ETA: 3s - loss: 2.5042 - acc: 0.4653
 21504/110288 [====>.........................] - ETA: 3s - loss: 2.5002 - acc: 0.4656
 23552/110288 [=====>.....................

Epoch 5/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.6715 - acc: 0.5956
  3072/110288 [..............................] - ETA: 3s - loss: 1.6498 - acc: 0.6029
  5120/110288 [>.............................] - ETA: 3s - loss: 1.6485 - acc: 0.6025
  7168/110288 [>.............................] - ETA: 3s - loss: 1.6397 - acc: 0.6042
  9216/110288 [=>............................] - ETA: 3s - loss: 1.6341 - acc: 0.6048
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.6350 - acc: 0.6051
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.6311 - acc: 0.6059
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.6294 - acc: 0.6066
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.6286 - acc: 0.6066
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.6256 - acc: 0.6069
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.6244 - acc: 0.6070
 23552/110288 [=====>.....................

 13312/110288 [==>...........................] - ETA: 3s - loss: 1.4116 - acc: 0.6525
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.4103 - acc: 0.6528
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.4068 - acc: 0.6531
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.4067 - acc: 0.6527
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.4046 - acc: 0.6528
 23552/110288 [=====>........................] - ETA: 3s - loss: 1.4026 - acc: 0.6529
 25600/110288 [=====>........................] - ETA: 3s - loss: 1.4021 - acc: 0.6528
Epoch 7/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.2705 - acc: 0.6819
  3072/110288 [..............................] - ETA: 4s - loss: 1.2609 - acc: 0.6829
  5120/110288 [>.............................] - ETA: 3s - loss: 1.2634 - acc: 0.6820
  7168/110288 [>.............................] - ETA: 3s - loss: 1.2602 - acc: 0.6818
  9216/110288 [=>.........................

Epoch 8/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.1327 - acc: 0.7227
  3072/110288 [..............................] - ETA: 4s - loss: 1.1167 - acc: 0.7273
  5120/110288 [>.............................] - ETA: 3s - loss: 1.1224 - acc: 0.7244
  7168/110288 [>.............................] - ETA: 3s - loss: 1.1224 - acc: 0.7235
  9216/110288 [=>............................] - ETA: 3s - loss: 1.1191 - acc: 0.7244
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.1195 - acc: 0.7249
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.1165 - acc: 0.7259
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.1140 - acc: 0.7268
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.1126 - acc: 0.7270
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.1117 - acc: 0.7273
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.1101 - acc: 0.7276
 23552/110288 [=====>.....................

Epoch 10/10

  1024/110288 [..............................] - ETA: 4s - loss: 0.8546 - acc: 0.7808
  3072/110288 [..............................] - ETA: 4s - loss: 0.8760 - acc: 0.7743
  5120/110288 [>.............................] - ETA: 3s - loss: 0.8755 - acc: 0.7751
  7168/110288 [>.............................] - ETA: 3s - loss: 0.8765 - acc: 0.7759
  9216/110288 [=>............................] - ETA: 3s - loss: 0.8749 - acc: 0.7760
 11264/110288 [==>...........................] - ETA: 3s - loss: 0.8724 - acc: 0.7766
 13312/110288 [==>...........................] - ETA: 3s - loss: 0.8741 - acc: 0.7766
 15360/110288 [===>..........................] - ETA: 3s - loss: 0.8715 - acc: 0.7770
 17408/110288 [===>..........................] - ETA: 3s - loss: 0.8664 - acc: 0.7781
 19456/110288 [====>.........................] - ETA: 3s - loss: 0.8658 - acc: 0.7783
 21504/110288 [====>.........................] - ETA: 3s - loss: 0.8661 - acc: 0.7784
 23552/110288 [=====>....................

### Model 4: Encoder-Decoder (OPTIONAL)
Time to look at encoder-decoder models.  This model is made up of an encoder and decoder. The encoder creates a matrix representation of the sentence.  The decoder takes this matrix as input and predicts the translation as output.

Create an encoder-decoder model in the cell below.

In [51]:
from keras.layers import RepeatVector


def encdec_model(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train an encoder-decoder model on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    input_ = Input(input_shape[1:])
    encode = LSTM(english_vocab_size)(input_)
    middle = RepeatVector(output_sequence_length)(encode)
    decode = LSTM(french_vocab_size, return_sequences=True)(middle)

    model = Model(input_, decode)
    
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(0.001),
                  metrics=['accuracy'])
    return model
tests.test_encdec_model(encdec_model)


# OPTIONAL: Train and Print prediction(s)
tmp_x = pad(preproc_english_sentences, preproc_french_sentences.shape[1])
tmp_x = tmp_x.reshape((-1, preproc_french_sentences.shape[-2]))

# Train the neural network
bidirectional_rnn_model = embed_model(
    tmp_x.shape,
    preproc_french_sentences.shape[1],
    len(english_tokenizer.word_index)+1,
    len(french_tokenizer.word_index)+1)
bidirectional_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size=1024, epochs=10, validation_split=0.2)

# Print prediction(s)
print(logits_to_text(bidirectional_rnn_model.predict(tmp_x[:1])[0], french_tokenizer))

Train on 110288 samples, validate on 27573 samples
Epoch 1/10

  1024/110288 [..............................] - ETA: 1:46 - loss: 5.8438 - acc: 3.7202e-04
  3072/110288 [..............................] - ETA: 37s - loss: 5.8384 - acc: 0.1382     
  5120/110288 [>.............................] - ETA: 23s - loss: 5.8323 - acc: 0.2441
  7168/110288 [>.............................] - ETA: 17s - loss: 5.8253 - acc: 0.2915
  9216/110288 [=>............................] - ETA: 14s - loss: 5.8173 - acc: 0.3176
 11264/110288 [==>...........................] - ETA: 12s - loss: 5.8080 - acc: 0.3340
 13312/110288 [==>...........................] - ETA: 10s - loss: 5.7967 - acc: 0.3454
 15360/110288 [===>..........................] - ETA: 9s - loss: 5.7825 - acc: 0.3540 
 17408/110288 [===>..........................] - ETA: 8s - loss: 5.7641 - acc: 0.3608
 19456/110288 [====>.........................] - ETA: 7s - loss: 5.7396 - acc: 0.3653
 21504/110288 [====>.........................] - ETA: 7s - 

Epoch 3/10

  1024/110288 [..............................] - ETA: 4s - loss: 2.5674 - acc: 0.4710
  3072/110288 [..............................] - ETA: 4s - loss: 2.5637 - acc: 0.4725
  5120/110288 [>.............................] - ETA: 3s - loss: 2.5570 - acc: 0.4737
  7168/110288 [>.............................] - ETA: 3s - loss: 2.5508 - acc: 0.4739
  9216/110288 [=>............................] - ETA: 3s - loss: 2.5455 - acc: 0.4740
 11264/110288 [==>...........................] - ETA: 3s - loss: 2.5427 - acc: 0.4742
 13312/110288 [==>...........................] - ETA: 3s - loss: 2.5336 - acc: 0.4754
 15360/110288 [===>..........................] - ETA: 3s - loss: 2.5293 - acc: 0.4761
 17408/110288 [===>..........................] - ETA: 3s - loss: 2.5263 - acc: 0.4762
 19456/110288 [====>.........................] - ETA: 3s - loss: 2.5218 - acc: 0.4766
 21504/110288 [====>.........................] - ETA: 3s - loss: 2.5186 - acc: 0.4766
 23552/110288 [=====>.....................

Epoch 5/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.6753 - acc: 0.5821
  3072/110288 [..............................] - ETA: 3s - loss: 1.6915 - acc: 0.5812
  5120/110288 [>.............................] - ETA: 3s - loss: 1.6951 - acc: 0.5808
  7168/110288 [>.............................] - ETA: 3s - loss: 1.6960 - acc: 0.5804
  9216/110288 [=>............................] - ETA: 3s - loss: 1.6921 - acc: 0.5813
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.6873 - acc: 0.5825
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.6818 - acc: 0.5841
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.6805 - acc: 0.5848
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.6762 - acc: 0.5859
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.6746 - acc: 0.5864
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.6724 - acc: 0.5866
 23552/110288 [=====>.....................

 13312/110288 [==>...........................] - ETA: 3s - loss: 1.4611 - acc: 0.6379
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.4589 - acc: 0.6381
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.4579 - acc: 0.6381
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.4577 - acc: 0.6381
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.4550 - acc: 0.6384
 23552/110288 [=====>........................] - ETA: 3s - loss: 1.4532 - acc: 0.6390
 25600/110288 [=====>........................] - ETA: 3s - loss: 1.4507 - acc: 0.6398
Epoch 7/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.2992 - acc: 0.6733
  3072/110288 [..............................] - ETA: 4s - loss: 1.2803 - acc: 0.6783
  5120/110288 [>.............................] - ETA: 3s - loss: 1.2778 - acc: 0.6789
  7168/110288 [>.............................] - ETA: 3s - loss: 1.2802 - acc: 0.6775
  9216/110288 [=>.........................

Epoch 8/10

  1024/110288 [..............................] - ETA: 4s - loss: 1.1377 - acc: 0.7166
  3072/110288 [..............................] - ETA: 4s - loss: 1.1430 - acc: 0.7128
  5120/110288 [>.............................] - ETA: 4s - loss: 1.1411 - acc: 0.7144
  7168/110288 [>.............................] - ETA: 3s - loss: 1.1371 - acc: 0.7154
  9216/110288 [=>............................] - ETA: 3s - loss: 1.1311 - acc: 0.7167
 11264/110288 [==>...........................] - ETA: 3s - loss: 1.1276 - acc: 0.7178
 13312/110288 [==>...........................] - ETA: 3s - loss: 1.1271 - acc: 0.7177
 15360/110288 [===>..........................] - ETA: 3s - loss: 1.1272 - acc: 0.7175
 17408/110288 [===>..........................] - ETA: 3s - loss: 1.1261 - acc: 0.7175
 19456/110288 [====>.........................] - ETA: 3s - loss: 1.1242 - acc: 0.7178
 21504/110288 [====>.........................] - ETA: 3s - loss: 1.1227 - acc: 0.7183
 23552/110288 [=====>.....................

Epoch 10/10

  1024/110288 [..............................] - ETA: 4s - loss: 0.8807 - acc: 0.7723
  3072/110288 [..............................] - ETA: 4s - loss: 0.8753 - acc: 0.7723
  5120/110288 [>.............................] - ETA: 4s - loss: 0.8865 - acc: 0.7700
  7168/110288 [>.............................] - ETA: 3s - loss: 0.8825 - acc: 0.7703
  9216/110288 [=>............................] - ETA: 3s - loss: 0.8807 - acc: 0.7709
 11264/110288 [==>...........................] - ETA: 3s - loss: 0.8839 - acc: 0.7706
 13312/110288 [==>...........................] - ETA: 3s - loss: 0.8876 - acc: 0.7690
 15360/110288 [===>..........................] - ETA: 3s - loss: 0.8885 - acc: 0.7689
 17408/110288 [===>..........................] - ETA: 3s - loss: 0.8911 - acc: 0.7689
 19456/110288 [====>.........................] - ETA: 3s - loss: 0.8905 - acc: 0.7691
 21504/110288 [====>.........................] - ETA: 3s - loss: 0.8864 - acc: 0.7701
 23552/110288 [=====>....................

### Model 5: Custom (IMPLEMENTATION)
Use everything you learned from the previous models to create a model that incorporates embedding and a bidirectional rnn into one model.

In [177]:
def model_final(input_shape, output_sequence_length, english_vocab_size, french_vocab_size):
    """
    Build and train a model that incorporates embedding, encoder-decoder, and bidirectional RNN on x and y
    :param input_shape: Tuple of input shape
    :param output_sequence_length: Length of output sequence
    :param english_vocab_size: Number of unique English words in the dataset
    :param french_vocab_size: Number of unique French words in the dataset
    :return: Keras model built, but not trained
    """
    input_ = Input(input_shape[1:])
    embeding = Embedding(english_vocab_size, 30, input_length=input_shape[1])(input_)
    
    encoder = Bidirectional(LSTM(150, dropout=0.))(embeding)
    
    middle = RepeatVector(output_sequence_length)(encoder)
    
    decoder = Bidirectional(LSTM(150, dropout=0., return_sequences=True))(middle)
    
    preds = TimeDistributed(Dense(french_vocab_size, activation="softmax"))(decoder)
    
    model = Model(input_, preds)
    
    model.compile(loss=sparse_categorical_crossentropy,
                  optimizer=Adam(0.01),
                  metrics=['accuracy'])
    model.summary()
    return model

tests.test_model_final(model_final)


print('Final Model Loaded')

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_164 (InputLayer)       (None, 15)                0         
_________________________________________________________________
embedding_141 (Embedding)    (None, 15, 30)            5970      
_________________________________________________________________
bidirectional_87 (Bidirectio (None, 300)               217200    
_________________________________________________________________
repeat_vector_100 (RepeatVec (None, 21, 300)           0         
_________________________________________________________________
bidirectional_88 (Bidirectio (None, 21, 300)           541200    
_________________________________________________________________
time_distributed_114 (TimeDi (None, 21, 344)           103544    
Total params: 867,914
Trainable params: 867,914
Non-trainable params: 0
_________________________________________________________________
Fina

## Prediction (IMPLEMENTATION)

In [178]:
import numpy as np
from keras.preprocessing.sequence import pad_sequences


def final_predictions(x, y, x_tk, y_tk):
    """
    Gets predictions using the final model
    :param x: Preprocessed English data
    :param y: Preprocessed French data
    :param x_tk: English tokenizer
    :param y_tk: French tokenizer
    """
    # TODO: Train neural network using model_final
    x = pad(x, y.shape[1])
    x = x.reshape((-1, preproc_french_sentences.shape[-2]))
    model = model_final(
        tmp_x.shape,
        preproc_french_sentences.shape[1],
        len(english_tokenizer.word_index)+1,
        len(french_tokenizer.word_index)+1)
    
    model.fit(x, y, batch_size=1024, epochs=10, validation_split=0.2, verbose=2)

    
    ## DON'T EDIT ANYTHING BELOW THIS LINE
    y_id_to_word = {value: key for key, value in y_tk.word_index.items()}
    y_id_to_word[0] = '<PAD>'

    sentence = 'he saw a old yellow truck'
    sentence = [x_tk.word_index[word] for word in sentence.split()]
    sentence = pad_sequences([sentence], maxlen=x.shape[-1], padding='post')
    sentences = np.array([sentence[0], x[0]])
    predictions = model.predict(sentences, len(sentences))

    print('Sample 1:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]]))
    print('Il a vu un vieux camion jaune')
    print('Sample 2:')
    print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]]))
    print(' '.join([y_id_to_word[np.argmax(x)] for x in y[0]]))


final_predictions(preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_tokenizer)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_165 (InputLayer)       (None, 21)                0         
_________________________________________________________________
embedding_142 (Embedding)    (None, 21, 30)            6000      
_________________________________________________________________
bidirectional_89 (Bidirectio (None, 300)               217200    
_________________________________________________________________
repeat_vector_101 (RepeatVec (None, 21, 300)           0         
_________________________________________________________________
bidirectional_90 (Bidirectio (None, 21, 300)           541200    
_________________________________________________________________
time_distributed_115 (TimeDi (None, 21, 345)           103845    
Total params: 868,245
Trainable params: 868,245
Non-trainable params: 0
_________________________________________________________________
Trai

## Submission
When you are ready to submit your project, do the following steps:
1. Ensure you pass all points on the [rubric](https://review.udacity.com/#!/rubrics/1004/view).
2. Submit the following in a zip file.
  - `helper.py`
  - `machine_translation.ipynb`
  - `machine_translation.html`
    - You can export the notebook by navigating to **File -> Download as -> HTML (.html)**.