# Language Translation
In this project, I’ll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French.
## Get the Data
The data will be small subsets of both languages.

In [1]:
%autosave 20

Autosaving every 20 seconds


In [2]:
import helper
import problem_unittests as tests

source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
source_text = helper.load_data(source_path)
target_text = helper.load_data(target_path)

## Explore the Data
Use function view_sentence_range to view different parts of the data.

In [3]:
view_sentence_range = (0, 10)

import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in source_text.split()})))

sentences = source_text.split('\n')
word_counts = [len(sentence.split()) for sentence in sentences]
print('Number of sentences: {}'.format(len(sentences)))
print('Average number of words in a sentence: {}'.format(np.average(word_counts)))

print()
print('English sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))
print()
print('French sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(target_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 227
Number of sentences: 137861
Average number of words in a sentence: 13.225277634719028

English sentences 0 to 10:
new jersey is sometimes quiet during autumn , and it is snowy in april .
the united states is usually chilly during july , and it is usually freezing in november .
california is usually quiet during march , and it is usually hot in june .
the united states is sometimes mild during june , and it is cold in september .
your least liked fruit is the grape , but my least liked is the apple .
his favorite fruit is the orange , but my favorite is the grape .
paris is relaxing during december , but it is usually chilly in july .
new jersey is busy during spring , and it is never hot in march .
our least liked fruit is the lemon , but my least liked is the grape .
the united states is sometimes busy during january , and it is sometimes warm in november .

French sentences 0 to 10:
new jersey est parfois calme pendant l' automne 

## Implement Preprocessing Function
### Text to Word Ids
In the function `text_to_ids()`, `source_text` and `target_text` are processed from words to ids.  Additionally, the `<EOS>` word id is added at the end of `target_text`.  This will help the neural network predict when the sentence should end.

You can get the `<EOS>` word id by doing:
```python
target_vocab_to_int['<EOS>']
```
You can get other word ids using `source_vocab_to_int` and `target_vocab_to_int`.

In [4]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
    Convert source and target text to proper word ids
    :param source_text: String that contains all the source text.
    :param target_text: String that contains all the target text.
    :param source_vocab_to_int: Dictionary to go from the source words to an id
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: A tuple of lists (source_id_text, target_id_text)
    """
    source_id_text = []
    target_id_text = []
    
    for sentence in source_text.split("\n"):
        source_addition = []
        for word in sentence.split():
            source_addition.append(source_vocab_to_int[word])
        source_id_text.append(source_addition)
    
    for sentence in target_text.split("\n"):
        target_addition = []
        for word in sentence.split():
            target_addition.append(target_vocab_to_int[word])
        target_addition.append(target_vocab_to_int['<EOS>'])
        target_id_text.append(target_addition)

    return source_id_text, target_id_text

tests.test_text_to_ids(text_to_ids)

Tests Passed


### Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [5]:
helper.preprocess_and_save_data(source_path, target_path, text_to_ids)

# Check Point
The preprocessed data has been saved to disk.

In [6]:
import numpy as np
import helper
import problem_unittests as tests

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()

### Check the Version of TensorFlow and Access to GPU
Check to make sure we have the correct version of TensorFlow and access to a GPU.

In [7]:
!pip install tensorflow==1.1.0



In [8]:
from distutils.version import LooseVersion
import warnings
import tensorflow as tf
from tensorflow.python.layers.core import Dense

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0


  if sys.path[0] == '':


## Build the Neural Network
Built the components necessary to build a Sequence-to-Sequence model by implementing the following functions below:
- `model_inputs`
- `process_decoder_input`
- `encoding_layer`
- `decoding_layer_train`
- `decoding_layer_infer`
- `decoding_layer`
- `seq2seq_model`

### Input
Implement the `model_inputs()` function to create TF Placeholders for the Neural Network. It should create the following placeholders:

- Input text placeholder named "input" using the TF Placeholder name parameter with rank 2.
- Targets placeholder with rank 2.
- Learning rate placeholder with rank 0.
- Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0.
- Target sequence length placeholder named "target_sequence_length" with rank 1
- Max target sequence length tensor named "max_target_len" getting its value from applying tf.reduce_max on the target_sequence_length placeholder. Rank 0.
- Source sequence length placeholder named "source_sequence_length" with rank 1

Return the placeholders in the following the tuple (input, targets, learning rate, keep probability, target sequence length, max target sequence length, source sequence length)

In [9]:
def model_inputs():
    """
    Create TF Placeholders for input, targets, learning rate, and lengths of source and target sequences.
    :return: Tuple (input, targets, learning rate, keep probability, target sequence length,
    max target sequence length, source sequence length)
    """
    # Create tf.placeholders
    inputs = tf.placeholder(tf.int32, [None, None], name="input")
    targets = tf.placeholder(tf.int32, [None, None], name="target")
    learning_rate = tf.placeholder(tf.float32, None, name="learning_rate")
    keep_prob = tf.placeholder(tf.float32, None, name="keep_prob")
    target_seq = tf.placeholder(tf.int32, [None], name="target_sequence_length")
    max_target_seq = tf.placeholder(tf.int32, name="max_target_len")
    source_seq = tf.placeholder(tf.int32, [None], name="source_sequence_length")
    #Set value
    max_target_seq = tf.reduce_max(target_seq)
    
    return inputs, targets, learning_rate, keep_prob, target_seq, max_target_seq, source_seq

tests.test_model_inputs(model_inputs)

Tests Passed


### Process Decoder Input
Implement `process_decoder_input` by removing the last word id from each batch in `target_data` and concat the GO ID to the begining of each batch.

In [10]:
def process_decoder_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for encoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
tf.strided_slice(
    input_,
    begin,
    end,
    strides=None,
    begin_mask=0,
    end_mask=0,
    ellipsis_mask=0,
    new_axis_mask=0,
    shrink_axis_mask=0,
    var=None,
    name=None
 )
tf.fill(
    dims,
    value,
    name=None
 )
tf.concat(
    values,
    axis,
    name='concat'
 )
    """
    removed_last = tf.strided_slice(target_data, [0,0], [batch_size, -1], [1,1])
    add_go = tf.fill([batch_size, 1], target_vocab_to_int['<GO>'])
    
    processed_target_data = tf.concat([add_go, removed_last], 1)
    
    return processed_target_data

tests.test_process_encoding_input(process_decoder_input)

Tests Passed


### Encoding
Implement `encoding_layer()` to create a Encoder RNN layer:
 * Embedded the encoder input using [`tf.contrib.layers.embed_sequence`](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/embed_sequence)
 * Constructed a [stacked](https://github.com/tensorflow/tensorflow/blob/6947f65a374ebf29e74bb71e36fd82760056d82c/tensorflow/docs_src/tutorials/recurrent.md#stacking-multiple-lstms) [`tf.contrib.rnn.LSTMCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell) wrapped in a [`tf.contrib.rnn.DropoutWrapper`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper)
 * Passed cell and embedded input to [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)

In [11]:
from imp import reload
reload(tests)

def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob, 
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :param source_sequence_length: a list of the lengths of each sequence in the batch
    :param source_vocab_size: vocabulary size of source data
    :param encoding_embedding_size: embedding size of source data
    :return: tuple (RNN output, RNN state)

embed_sequence(
    ids,
    vocab_size=None,
    embed_dim=None,
    unique=False,
    initializer=None,
    regularizer=None,
    trainable=True,
    scope=None,
    reuse=None
 )
    """
    
    embedded_encoder = tf.contrib.layers.embed_sequence(rnn_inputs, source_vocab_size, encoding_embedding_size)

    cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.LSTMCell(
                                            rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1)) 
                                                for _ in range(num_layers)])
    rnn_output, rnn_state = tf.nn.dynamic_rnn(cell, embedded_encoder, 
                                              sequence_length=source_sequence_length, dtype=tf.float32)
    return rnn_output, rnn_state

tests.test_encoding_layer(encoding_layer)

Tests Passed


### Decoding - Training
Create a training decoding layer:
* Created a [`tf.contrib.seq2seq.TrainingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/TrainingHelper) 
* Created a [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* Obtained the decoder outputs from [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)

In [12]:

def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, 
                         target_sequence_length, max_summary_length, 
                         output_layer, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_summary_length: The length of the longest sequence in the batch
    :param output_layer: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing training logits and sample_id

tf.contrib.seq2seq.TrainingHelper
__init__(
    inputs,
    sequence_length,
    time_major=False,
    name=None
 )
tf.contrib.seq2seq.BasicDecoder
__init__(
    cell,
    helper,
    initial_state,
    output_layer=None
 )
dynamic_decode(
    decoder,
    output_time_major=False,
    impute_finished=False,
    maximum_iterations=None,
    parallel_iterations=32,
    swap_memory=False,
    scope=None
 )
    """
    
    train_helper = tf.contrib.seq2seq.TrainingHelper(dec_embed_input, target_sequence_length)
    train_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, train_helper, encoder_state, output_layer)

    decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(train_decoder, impute_finished=True, 
                                                          maximum_iterations = max_summary_length)
    logits = decoder_output.rnn_output
    sample_id = decoder_output.sample_id
    logits = tf.nn.dropout(logits, keep_prob)
    
    return tf.contrib.seq2seq.BasicDecoderOutput(logits, sample_id)

tests.test_decoding_layer_train(decoding_layer_train)

Tests Passed


### Decoding - Inference
Create inference decoder:
* Create a [`tf.contrib.seq2seq.GreedyEmbeddingHelper`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/GreedyEmbeddingHelper)
* Create a [`tf.contrib.seq2seq.BasicDecoder`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/BasicDecoder)
* Obtain the decoder outputs from [`tf.contrib.seq2seq.dynamic_decode`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_decode)

In [13]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id,
                         end_of_sequence_id, max_target_sequence_length,
                         vocab_size, output_layer, batch_size, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param max_target_sequence_length: Maximum length of target sequences
    :param vocab_size: Size of decoder/target vocabulary
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_layer: Function to apply the output layer
    :param batch_size: Batch size
    :param keep_prob: Dropout keep probability
    :return: BasicDecoderOutput containing inference logits and sample_id

tf.contrib.seq2seq.GreedyEmbeddingHelper    
    __init__(
    embedding,
    start_tokens,
    end_token
 )
 
tile(
    input,
    multiples,
    name=None
 )
    """
    # Create start tokens for GreedyEmbeddingHelper function
    start_tokens = tf.tile(
        tf.constant([start_of_sequence_id], dtype=tf.int32), [batch_size], 
        name='Start-Tokens')

    helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings, start_tokens, end_of_sequence_id)

    decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, helper, encoder_state, output_layer)
        
    decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(decoder, impute_finished=True,
                                                          maximum_iterations = max_target_sequence_length)
    logits = decoder_output.rnn_output
    sample_id = decoder_output.sample_id
    logits = tf.nn.dropout(logits, keep_prob)
    
    return tf.contrib.seq2seq.BasicDecoderOutput(logits, sample_id)

tests.test_decoding_layer_infer(decoding_layer_infer)

Tests Passed


### Build the Decoding Layer
Implement `decoding_layer()` to create a Decoder RNN layer.

* Embed the target sequences
* Construct the decoder LSTM cell (just like you constructed the encoder cell above)
* Create an output layer to map the outputs of the decoder to the elements of our vocabulary
* Use the `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, target_sequence_length, max_target_sequence_length, output_layer, keep_prob)` function to get the training logits.
* Use the `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, max_target_sequence_length, vocab_size, output_layer, batch_size, keep_prob)` function to get the inference logits.

Note: [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [14]:
def decoding_layer(dec_input, encoder_state,
                   target_sequence_length, max_target_sequence_length,
                   rnn_size,
                   num_layers, target_vocab_to_int, target_vocab_size,
                   batch_size, keep_prob, decoding_embedding_size):
    """
    Create decoding layer
    :param dec_input: Decoder input
    :param encoder_state: Encoder state
    :param target_sequence_length: The lengths of each sequence in the target batch
    :param max_target_sequence_length: Maximum length of target sequences
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param target_vocab_size: Size of target vocabulary
    :param batch_size: The size of the batch
    :param keep_prob: Dropout keep probability
    :param decoding_embedding_size: Decoding embedding size
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """

    embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    embed_input = tf.nn.embedding_lookup(embeddings, dec_input)

    def create_cell(rnn_size):
        lstm = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1))
        dropout = tf.contrib.rnn.DropoutWrapper(lstm, keep_prob)
        return dropout
    
    cell = tf.contrib.rnn.MultiRNNCell([create_cell(rnn_size) for _ in range(num_layers)])

    output_layer = Dense(target_vocab_size, kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    with tf.variable_scope("decode"):
        train_logits = decoding_layer_train(encoder_state, cell, embed_input, target_sequence_length, 
                                            max_target_sequence_length, output_layer, keep_prob) 
    
    with tf.variable_scope("decode", reuse=True):
        infer_logits = decoding_layer_infer(encoder_state, cell, embeddings, target_vocab_to_int['<GO>'], 
                                            target_vocab_to_int['<EOS>'], max_target_sequence_length, 
                                            target_vocab_size, output_layer, batch_size, keep_prob)
    return train_logits, infer_logits

tests.test_decoding_layer(decoding_layer)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:

- Encode the input using your `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob,  source_sequence_length, source_vocab_size, encoding_embedding_size)`.
- Process target data using your `process_decoder_input(target_data, target_vocab_to_int, batch_size)` function.
- Decode the encoded input using your `decoding_layer(dec_input, enc_state, target_sequence_length, max_target_sentence_length, rnn_size, num_layers, target_vocab_to_int, target_vocab_size, batch_size, keep_prob, dec_embedding_size)` function.

In [15]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size,
                  source_sequence_length, target_sequence_length,
                  max_target_sentence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size,
                  rnn_size, num_layers, target_vocab_to_int):
    """
    Build the Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param source_sequence_length: Sequence Lengths of source sequences in the batch
    :param target_sequence_length: Sequence Lengths of target sequences in the batch
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training BasicDecoderOutput, Inference BasicDecoderOutput)
    """

    _, enc_state = encoding_layer(input_data, rnn_size, num_layers, keep_prob, source_sequence_length,
                               source_vocab_size, enc_embedding_size)
    
    
    decoder_input = process_decoder_input(target_data, target_vocab_to_int, batch_size)
    
    trn_decoder_output, inf_decoder_output = decoding_layer(decoder_input, enc_state, target_sequence_length, 
                                                                  max_target_sentence_length, rnn_size, num_layers,
                                                                  target_vocab_to_int, target_vocab_size, batch_size,
                                                                  keep_prob, dec_embedding_size)
    
    return trn_decoder_output, inf_decoder_output

tests.test_seq2seq_model(seq2seq_model)

Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `num_layers` to the number of layers.
- Set `encoding_embedding_size` to the size of the embedding for the encoder.
- Set `decoding_embedding_size` to the size of the embedding for the decoder.
- Set `learning_rate` to the learning rate.
- Set `keep_probability` to the Dropout keep probability
- Set `display_step` to state how many steps between each debug output statement

In [20]:
# Number of Epochs
epochs = 5
# Batch Size
batch_size = 512
# RNN Size
rnn_size = 512
# Number of Layers
num_layers = 4
# Embedding Size
encoding_embedding_size = 128
decoding_embedding_size = 128
# Learning Rate
learning_rate = .001
# Dropout Keep Probability
keep_probability = .77
display_step = 2

### Build the Graph
Build the graph using the neural network you implemented.

In [21]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = helper.load_preprocess()
max_target_sentence_length = max([len(sentence) for sentence in source_int_text])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob, target_sequence_length, max_target_sequence_length, source_sequence_length = model_inputs()

    #sequence_length = tf.placeholder_with_default(max_target_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)

    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                   targets,
                                                   keep_prob,
                                                   batch_size,
                                                   source_sequence_length,
                                                   target_sequence_length,
                                                   max_target_sequence_length,
                                                   len(source_vocab_to_int),
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size,
                                                   decoding_embedding_size,
                                                   rnn_size,
                                                   num_layers,
                                                   target_vocab_to_int)


    training_logits = tf.identity(train_logits.rnn_output, name='logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')

    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)


Batch and pad the source and target sequences

In [22]:
def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]


def get_batches(sources, targets, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size

        # Slice the right amount for the batch
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]

        # Pad
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))

        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))

        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))

        yield pad_sources_batch, pad_targets_batch, pad_source_lengths, pad_targets_lengths


### Train
Train the neural network on the preprocessed data. If you have a hard time getting a good loss, check the forms to see if anyone is having the same problem.

In [23]:
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1])],
            'constant')

    return np.mean(np.equal(target, logits))

# Split data to training and validation sets
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]
valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]
(valid_sources_batch, valid_targets_batch, valid_sources_lengths, valid_targets_lengths ) = next(get_batches(valid_source,
                                                                                                             valid_target,
                                                                                                             batch_size,
                                                                                                             source_vocab_to_int['<PAD>'],
                                                                                                             target_vocab_to_int['<PAD>']))                                                                                                  
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch, sources_lengths, targets_lengths) in enumerate(
                get_batches(train_source, train_target, batch_size,
                            source_vocab_to_int['<PAD>'],
                            target_vocab_to_int['<PAD>'])):

            _, loss = sess.run(
                [train_op, cost],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths,
                 keep_prob: keep_probability})


            if batch_i % display_step == 0 and batch_i > 0:


                batch_train_logits = sess.run(
                    inference_logits,
                    {input_data: source_batch,
                     source_sequence_length: sources_lengths,
                     target_sequence_length: targets_lengths,
                     keep_prob: 1.0})


                batch_valid_logits = sess.run(
                    inference_logits,
                    {input_data: valid_sources_batch,
                     source_sequence_length: valid_sources_lengths,
                     target_sequence_length: valid_targets_lengths,
                     keep_prob: 1.0})

                train_acc = get_accuracy(target_batch, batch_train_logits)

                valid_acc = get_accuracy(valid_targets_batch, batch_valid_logits)

                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.4f}, Validation Accuracy: {:>6.4f}, Loss: {:>6.4f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

Epoch   0 Batch    2/269 - Train Accuracy: 0.0989, Validation Accuracy: 0.0909, Loss: 5.6460
Epoch   0 Batch    4/269 - Train Accuracy: 0.2793, Validation Accuracy: 0.3502, Loss: 4.3136
Epoch   0 Batch    6/269 - Train Accuracy: 0.3378, Validation Accuracy: 0.3664, Loss: 3.9786
Epoch   0 Batch    8/269 - Train Accuracy: 0.3515, Validation Accuracy: 0.4121, Loss: 3.9034
Epoch   0 Batch   10/269 - Train Accuracy: 0.3547, Validation Accuracy: 0.4168, Loss: 3.7250
Epoch   0 Batch   12/269 - Train Accuracy: 0.3819, Validation Accuracy: 0.4347, Loss: 3.6611
Epoch   0 Batch   14/269 - Train Accuracy: 0.4083, Validation Accuracy: 0.4417, Loss: 3.5223
Epoch   0 Batch   16/269 - Train Accuracy: 0.4086, Validation Accuracy: 0.4353, Loss: 3.4353
Epoch   0 Batch   18/269 - Train Accuracy: 0.4022, Validation Accuracy: 0.4576, Loss: 3.4321
Epoch   0 Batch   20/269 - Train Accuracy: 0.4204, Validation Accuracy: 0.4665, Loss: 3.3772
Epoch   0 Batch   22/269 - Train Accuracy: 0.4568, Validation Accuracy

Epoch   0 Batch  180/269 - Train Accuracy: 0.5637, Validation Accuracy: 0.5709, Loss: 2.1489
Epoch   0 Batch  182/269 - Train Accuracy: 0.5605, Validation Accuracy: 0.5743, Loss: 2.1555
Epoch   0 Batch  184/269 - Train Accuracy: 0.5312, Validation Accuracy: 0.5698, Loss: 2.1824
Epoch   0 Batch  186/269 - Train Accuracy: 0.5344, Validation Accuracy: 0.5749, Loss: 2.1990
Epoch   0 Batch  188/269 - Train Accuracy: 0.5709, Validation Accuracy: 0.5765, Loss: 2.0794
Epoch   0 Batch  190/269 - Train Accuracy: 0.5591, Validation Accuracy: 0.5702, Loss: 2.0736
Epoch   0 Batch  192/269 - Train Accuracy: 0.5611, Validation Accuracy: 0.5627, Loss: 2.0794
Epoch   0 Batch  194/269 - Train Accuracy: 0.5626, Validation Accuracy: 0.5649, Loss: 2.0791
Epoch   0 Batch  196/269 - Train Accuracy: 0.5537, Validation Accuracy: 0.5789, Loss: 2.1272
Epoch   0 Batch  198/269 - Train Accuracy: 0.4923, Validation Accuracy: 0.5342, Loss: 2.2986
Epoch   0 Batch  200/269 - Train Accuracy: 0.5213, Validation Accuracy

Epoch   1 Batch   92/269 - Train Accuracy: 0.6044, Validation Accuracy: 0.6182, Loss: 1.7832
Epoch   1 Batch   94/269 - Train Accuracy: 0.6252, Validation Accuracy: 0.6214, Loss: 1.7750
Epoch   1 Batch   96/269 - Train Accuracy: 0.6297, Validation Accuracy: 0.6214, Loss: 1.7737
Epoch   1 Batch   98/269 - Train Accuracy: 0.6287, Validation Accuracy: 0.6325, Loss: 1.7477
Epoch   1 Batch  100/269 - Train Accuracy: 0.6418, Validation Accuracy: 0.6218, Loss: 1.7800
Epoch   1 Batch  102/269 - Train Accuracy: 0.6137, Validation Accuracy: 0.6165, Loss: 1.7848
Epoch   1 Batch  104/269 - Train Accuracy: 0.6065, Validation Accuracy: 0.6132, Loss: 1.7352
Epoch   1 Batch  106/269 - Train Accuracy: 0.6238, Validation Accuracy: 0.6222, Loss: 1.7574
Epoch   1 Batch  108/269 - Train Accuracy: 0.6297, Validation Accuracy: 0.6269, Loss: 1.7526
Epoch   1 Batch  110/269 - Train Accuracy: 0.6154, Validation Accuracy: 0.6345, Loss: 1.7256
Epoch   1 Batch  112/269 - Train Accuracy: 0.6298, Validation Accuracy

Epoch   2 Batch    4/269 - Train Accuracy: 0.6521, Validation Accuracy: 0.6710, Loss: 1.5646
Epoch   2 Batch    6/269 - Train Accuracy: 0.6666, Validation Accuracy: 0.6648, Loss: 1.5504
Epoch   2 Batch    8/269 - Train Accuracy: 0.6540, Validation Accuracy: 0.6797, Loss: 1.5878
Epoch   2 Batch   10/269 - Train Accuracy: 0.6612, Validation Accuracy: 0.6784, Loss: 1.5535
Epoch   2 Batch   12/269 - Train Accuracy: 0.6492, Validation Accuracy: 0.6827, Loss: 1.5771
Epoch   2 Batch   14/269 - Train Accuracy: 0.6568, Validation Accuracy: 0.6773, Loss: 1.5338
Epoch   2 Batch   16/269 - Train Accuracy: 0.6860, Validation Accuracy: 0.6782, Loss: 1.5378
Epoch   2 Batch   18/269 - Train Accuracy: 0.6437, Validation Accuracy: 0.6508, Loss: 1.5677
Epoch   2 Batch   20/269 - Train Accuracy: 0.6618, Validation Accuracy: 0.6822, Loss: 1.5401
Epoch   2 Batch   22/269 - Train Accuracy: 0.6760, Validation Accuracy: 0.6760, Loss: 1.5661
Epoch   2 Batch   24/269 - Train Accuracy: 0.6498, Validation Accuracy

Epoch   2 Batch  182/269 - Train Accuracy: 0.7780, Validation Accuracy: 0.7581, Loss: 1.4151
Epoch   2 Batch  184/269 - Train Accuracy: 0.7695, Validation Accuracy: 0.7662, Loss: 1.4111
Epoch   2 Batch  186/269 - Train Accuracy: 0.7582, Validation Accuracy: 0.7687, Loss: 1.3952
Epoch   2 Batch  188/269 - Train Accuracy: 0.7744, Validation Accuracy: 0.7499, Loss: 1.3916
Epoch   2 Batch  190/269 - Train Accuracy: 0.7872, Validation Accuracy: 0.7675, Loss: 1.3586
Epoch   2 Batch  192/269 - Train Accuracy: 0.7873, Validation Accuracy: 0.7745, Loss: 1.3863
Epoch   2 Batch  194/269 - Train Accuracy: 0.7737, Validation Accuracy: 0.7671, Loss: 1.3765
Epoch   2 Batch  196/269 - Train Accuracy: 0.7671, Validation Accuracy: 0.7701, Loss: 1.3711
Epoch   2 Batch  198/269 - Train Accuracy: 0.7616, Validation Accuracy: 0.7763, Loss: 1.3844
Epoch   2 Batch  200/269 - Train Accuracy: 0.7952, Validation Accuracy: 0.7733, Loss: 1.3913
Epoch   2 Batch  202/269 - Train Accuracy: 0.7751, Validation Accuracy

Epoch   3 Batch   94/269 - Train Accuracy: 0.8788, Validation Accuracy: 0.8669, Loss: 1.2250
Epoch   3 Batch   96/269 - Train Accuracy: 0.8523, Validation Accuracy: 0.8705, Loss: 1.2381
Epoch   3 Batch   98/269 - Train Accuracy: 0.8686, Validation Accuracy: 0.8803, Loss: 1.2121
Epoch   3 Batch  100/269 - Train Accuracy: 0.8802, Validation Accuracy: 0.8704, Loss: 1.2135
Epoch   3 Batch  102/269 - Train Accuracy: 0.8796, Validation Accuracy: 0.8733, Loss: 1.1850
Epoch   3 Batch  104/269 - Train Accuracy: 0.8704, Validation Accuracy: 0.8657, Loss: 1.2219
Epoch   3 Batch  106/269 - Train Accuracy: 0.8539, Validation Accuracy: 0.8686, Loss: 1.2136
Epoch   3 Batch  108/269 - Train Accuracy: 0.8704, Validation Accuracy: 0.8635, Loss: 1.2119
Epoch   3 Batch  110/269 - Train Accuracy: 0.8712, Validation Accuracy: 0.8759, Loss: 1.2244
Epoch   3 Batch  112/269 - Train Accuracy: 0.8678, Validation Accuracy: 0.8627, Loss: 1.2039
Epoch   3 Batch  114/269 - Train Accuracy: 0.8736, Validation Accuracy

Epoch   4 Batch    6/269 - Train Accuracy: 0.9313, Validation Accuracy: 0.9157, Loss: 1.1103
Epoch   4 Batch    8/269 - Train Accuracy: 0.9219, Validation Accuracy: 0.9158, Loss: 1.1656
Epoch   4 Batch   10/269 - Train Accuracy: 0.9176, Validation Accuracy: 0.9215, Loss: 1.1239
Epoch   4 Batch   12/269 - Train Accuracy: 0.9148, Validation Accuracy: 0.9190, Loss: 1.1575
Epoch   4 Batch   14/269 - Train Accuracy: 0.9083, Validation Accuracy: 0.9229, Loss: 1.1081
Epoch   4 Batch   16/269 - Train Accuracy: 0.9142, Validation Accuracy: 0.9181, Loss: 1.0929
Epoch   4 Batch   18/269 - Train Accuracy: 0.9201, Validation Accuracy: 0.9175, Loss: 1.1290
Epoch   4 Batch   20/269 - Train Accuracy: 0.9212, Validation Accuracy: 0.9150, Loss: 1.0994
Epoch   4 Batch   22/269 - Train Accuracy: 0.9208, Validation Accuracy: 0.9238, Loss: 1.1349
Epoch   4 Batch   24/269 - Train Accuracy: 0.9149, Validation Accuracy: 0.9190, Loss: 1.1363
Epoch   4 Batch   26/269 - Train Accuracy: 0.9275, Validation Accuracy

Epoch   4 Batch  184/269 - Train Accuracy: 0.9385, Validation Accuracy: 0.9418, Loss: 1.0936
Epoch   4 Batch  186/269 - Train Accuracy: 0.9309, Validation Accuracy: 0.9367, Loss: 1.1010
Epoch   4 Batch  188/269 - Train Accuracy: 0.9360, Validation Accuracy: 0.9407, Loss: 1.0798
Epoch   4 Batch  190/269 - Train Accuracy: 0.9374, Validation Accuracy: 0.9425, Loss: 1.1005
Epoch   4 Batch  192/269 - Train Accuracy: 0.9430, Validation Accuracy: 0.9317, Loss: 1.0942
Epoch   4 Batch  194/269 - Train Accuracy: 0.9385, Validation Accuracy: 0.9388, Loss: 1.1215
Epoch   4 Batch  196/269 - Train Accuracy: 0.9434, Validation Accuracy: 0.9387, Loss: 1.0788
Epoch   4 Batch  198/269 - Train Accuracy: 0.9193, Validation Accuracy: 0.9359, Loss: 1.1133
Epoch   4 Batch  200/269 - Train Accuracy: 0.9399, Validation Accuracy: 0.9344, Loss: 1.1310
Epoch   4 Batch  202/269 - Train Accuracy: 0.9343, Validation Accuracy: 0.9303, Loss: 1.0656
Epoch   4 Batch  204/269 - Train Accuracy: 0.9212, Validation Accuracy

### Save Parameters
Save the `batch_size` and `save_path` parameters for inference.

In [24]:
# Save parameters for checkpoint
helper.save_params(save_path)

# Checkpoint

In [25]:
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = helper.load_preprocess()
load_path = helper.load_params()

## Sentence to Sequence
To feed a sentence into the model for translation, you first need to preprocess it.  Implement the function `sentence_to_seq()` to preprocess new sentences.

- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
 - Convert words not in the vocabulary, to the `<UNK>` word id.

In [26]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """

    sentence = sentence.lower()
    return [vocab_to_int.get(unk, vocab_to_int['<UNK>']) for unk in sentence.split()]

tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translate
This will translate `translate_sentence` from English to French.

In [27]:
translate_sentence = 'he saw a old yellow truck .'

translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence]*batch_size,
                                         target_sequence_length: [len(translate_sentence)*2]*batch_size,
                                         source_sequence_length: [len(translate_sentence)]*batch_size,
                                         keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in translate_logits]))
print('  French Words: {}'.format(" ".join([target_int_to_vocab[i] for i in translate_logits])))


INFO:tensorflow:Restoring parameters from checkpoints/dev
Input
  Word Ids:      [60, 82, 166, 162, 38, 94, 211]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [156, 285, 190, 275, 275, 201, 201, 175, 284, 275, 259, 76, 1]
  French Words: il est le l' l' de de d' automne l' citrons . <EOS>


## Imperfect Translation
You might notice that some sentences translate better than others.  Since the dataset you're using only has a vocabulary of 227 English words of the thousands that you use, you're only going to see good results using these words.  For this project, you don't need a perfect translation. However, if you want to create a better translation model, you'll need better data.

You can train on the [WMT10 French-English corpus](http://www.statmt.org/wmt10/training-giga-fren.tar).  This dataset has more vocabulary and richer in topics discussed.  However, this will take you days to train, so make sure you've a GPU and the neural network is performing well on dataset we provided.  Just make sure you play with the WMT10 corpus after you've submitted this project.
## Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_language_translation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.