# Machine Language Translation
In this project, you’re going to take a peek into the realm of neural network machine translation.  You’ll be training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French.
## Get the Data
The dataset is already in this repository as files, so there is no extra steps to download. What I need to do is to load the files. There are 2 files, and each contains bunch of sentences in English and French respectively. They are just plain text data.

In [1]:
import os
import pickle
import copy
import numpy as np

def load_data(path):
    input_file = os.path.join(path)
    with open(input_file, 'r', encoding='utf-8') as f:
        data = f.read()

    return data

In [2]:
import problem_unittests as tests

source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
source_text = load_data(source_path)
target_text = load_data(target_path)

## Explore the Data

The two datasets store bunch of sentences in different language, and that is something we don't have to explore for now. You probably already know how your data looks like when you decided to download this one. **However**, it is worthwhile to explore how complex the datasets are. The complexity could suggest how we should approach to get the right result still considering some of restrictions. 

`note: ` The two files exactly contains the same number of lines. Each i-th line in both files has the same meaning but expressed in different languages.

In [3]:
from collections import Counter

print('Dataset Brief Stats')
print('* number of unique words in English sample sentences: {}\
        [this is roughly measured/without any preprocessing]'.format(len(Counter(source_text.split()))))
print()

english_sentences = source_text.split('\n')
print('* English sentences')
print('\t- number of sentences: {}'.format(len(english_sentences)))
print('\t- avg. number of words in a sentence: {}'.format(np.average([len(sentence.split()) for sentence in english_sentences])))

french_sentences = target_text.split('\n')
print('* French sentences')
print('\t- number of sentences: {} [data integrity check / should have the same number]'.format(len(french_sentences)))
print('\t- avg. number of words in a sentence: {}'.format(np.average([len(sentence.split()) for sentence in french_sentences])))
print()

sample_sentence_range = (0, 5)
side_by_side_sentences = list(zip(english_sentences, french_sentences))[sample_sentence_range[0]:sample_sentence_range[1]]
print('* Sample sentences range from {} to {}'.format(sample_sentence_range[0], sample_sentence_range[1]))

for index, sentence in enumerate(side_by_side_sentences):
    en_sent, fr_sent = sentence
    print('[{}-th] sentence'.format(index+1))
    print('\tEN: {}'.format(en_sent))
    print('\tFR: {}'.format(fr_sent))
    print()

Dataset Brief Stats
* number of unique words in English sample sentences: 227        [this is roughly measured/without any preprocessing]

* English sentences
	- number of sentences: 137861
	- avg. number of words in a sentence: 13.225277634719028
* French sentences
	- number of sentences: 137861 [data integrity check / should have the same number]
	- avg. number of words in a sentence: 14.226612312401622

* Sample sentences range from 0 to 5
[1-th] sentence
	EN: new jersey is sometimes quiet during autumn , and it is snowy in april .
	FR: new jersey est parfois calme pendant l' automne , et il est neigeux en avril .

[2-th] sentence
	EN: the united states is usually chilly during july , and it is usually freezing in november .
	FR: les états-unis est généralement froid en juillet , et il gèle habituellement en novembre .

[3-th] sentence
	EN: california is usually quiet during march , and it is usually hot in june .
	FR: california est généralement calme en mars , et il est généraleme

## Preprocessing

Here are brief overview what steps will be done in this section

- **create lookup tables** 
  - create two mapping tables 
      - (key, value) == (unique word string, its unique index)     - `(1)`
      - (key, value) == (its unique index, unique word string)     - `(2)`
      - `(1)` is used in the next step, and (2) is used later for prediction step
      
      
- **text to word ids**
  - convert each string word in the list of sentences to the index
  - `(1)` is used for converting process
  
  
- **save the pre-processed data**
  - create two `(1)` mapping tables for English and French
  - using the mapping tables, replace strings in the original source and target dataset with indicies

### Create Lookup Tables

As mentioned breifly, I am going to implement a function to create lookup tables. Since every models are mathmatically represented, the input and the output(prediction) should also be represented as numbers. That is why this step is necessary for NLP problem because human readable text is not machine readable. This function takes a list of sentences and returns two mapping tables (dictionary data type). Along with the list of sentences, there are special tokens, `<PAD>`, `<EOS>`, `<UNK>`, and `<GO>` to be added in the mapping tables. 

- (key, value) == (unique word string, its unique index)     - `(1)`
- (key, value) == (its unique index, unique word string)     - `(2)`

`(1)` will be used in the next step, `test to word ids`, to find a match between word and its index. `(2)` is not used in pre-processing step, but `(2)` will be used later. After making a prediction, the sequences of words in the output sentence will be represented as their indicies. The predicted output is machine readable but not human readable. That is why we need `(2)` to convert each indicies of words back into human readable words in string.

<br/>
<img src='./lookup.png' alt='Drawing' width='70%'>

#### References
- [Why special tokens?](https://datascience.stackexchange.com/questions/26947/why-do-we-need-to-add-start-s-end-s-symbols-when-using-recurrent-neural-n)
- [Python `enumerate`](https://docs.python.org/3/library/functions.html#enumerate)

In [4]:
CODES = {'<PAD>': 0, '<EOS>': 1, '<UNK>': 2, '<GO>': 3 }

def create_lookup_tables(text):
    # make a list of unique words
    vocab = set(text.split())

    # (1)
    # starts with the special tokens
    vocab_to_int = copy.copy(CODES)

    # the index (v_i) will starts from 4 (the 2nd arg in enumerate() specifies the starting index)
    # since vocab_to_int already contains special tokens
    for v_i, v in enumerate(vocab, len(CODES)):
        vocab_to_int[v] = v_i

    # (2)
    int_to_vocab = {v_i: v for v, v_i in vocab_to_int.items()}

    return vocab_to_int, int_to_vocab

### Text to Word Ids

Two `(1)` lookup tables will be provided in `text_to_ids` functions as arguments. They will be used in the converting process for English(source) and French(target) respectively. This part is more like a programming part, so there are not much to mention. I will just go over few minor things to remember before jumping in.

- original(raw) source & target datas contain a list of sentences
  - they are represented as a string 

- the number of sentences are the same for English and French
 
- by accessing each sentences, need to convert word into the corresponding index.
  - each word should be stored in a list
  - this makes the resuling list as a 2-D array ( row: sentence, column: word index )
  
- for every target sentences, special token, `<EOS>` should be inserted at the end
  - this token suggests when to stop creating a sequence
  
<br/>
<img src='./conversion.png' alt='Drawing' width='70%'>
<br/>

In [5]:
def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
        1st, 2nd args: raw string text to be converted
        3rd, 4th args: lookup tables for 1st and 2nd args respectively
    
        return: A tuple of lists (source_id_text, target_id_text) converted
    """
    
    # empty list of converted sentences
    source_text_id = []
    target_text_id = []
    
    # make a list of sentences (extraction)
    source_sentences = source_text.split("\n")
    target_sentences = target_text.split("\n")
    
    max_source_sentence_length = max([len(sentence.split(" ")) for sentence in source_sentences])
    max_target_sentence_length = max([len(sentence.split(" ")) for sentence in target_sentences])
    
    # iterating through each sentences (# of sentences in source&target is the same)
    for i in range(len(source_sentences)):
        # extract sentences one by one
        source_sentence = source_sentences[i]
        target_sentence = target_sentences[i]
        
        # make a list of tokens/words (extraction) from the chosen sentence
        source_tokens = source_sentence.split(" ")
        target_tokens = target_sentence.split(" ")
        
        # empty list of converted words to index in the chosen sentence
        source_token_id = [source_vocab_to_int['<PAD>']] * max_source_sentence_length
        target_token_id = [target_vocab_to_int['<PAD>']] * max_target_sentence_length
        
        for index, token in enumerate(source_tokens):
            if (token != ""):
                source_token_id[index] =  source_vocab_to_int[token]
        
        for index, token in enumerate(target_tokens):
            if (token != ""):
                target_token_id[index] = target_vocab_to_int[token]
                
        # put <EOS> token at the end of the chosen target sentence
        # this token suggests when to stop creating a sequence
        target_token_id.append(target_vocab_to_int['<EOS>'])
            
        # add each converted sentences in the final list
        source_text_id.append(source_token_id)
        target_text_id.append(target_token_id)
    
    return source_text_id, target_text_id

### Preprocess and Save Data

`create_lookup_tables`, `text_to_ids` are generalized functions. It can really be used for other languages too. In this particular project, the target languages are English and French, so those languages have to fed into `create_lookup_tables`, `text_to_ids` functions to generate pre-processed dataset for this project. Here is the step to do it.

- Load data(text) from the original file for English and French
- Make them lower case letters
- Create lookup tables for both English and French
- Convert the original data into the list of sentences whose words are represented in index
- Finally, save the preprocessed data to the external file (checkpoint)

In [6]:
def preprocess_and_save_data(source_path, target_path, text_to_ids):
    # Preprocess
    
    # load original data (English, French)
    source_text = load_data(source_path)
    target_text = load_data(target_path)

    # to the lower case
    source_text = source_text.lower()
    target_text = target_text.lower()

    # create lookup tables for English and French data
    source_vocab_to_int, source_int_to_vocab = create_lookup_tables(source_text)
    target_vocab_to_int, target_int_to_vocab = create_lookup_tables(target_text)

    # create list of sentences whose words are represented in index
    source_text, target_text = text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int)

    # Save data for later use
    pickle.dump((
        (source_text, target_text),
        (source_vocab_to_int, target_vocab_to_int),
        (source_int_to_vocab, target_int_to_vocab)), open('preprocess.p', 'wb'))


In [7]:
preprocess_and_save_data(source_path, target_path, text_to_ids)

# Check Point

 This project uses a small set of sentences. However, in general, NLP requires a huge amount of raw text data. It would take quite a long time to preprocess, so it is recommended to avoid whenever possible. In practice, save the preprocessed data to the external file could speed up your job and let you focus more on building a model.

In [8]:
import numpy as np

(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = pickle.load(open('preprocess.p', mode='rb'))

### Check the Environment (TensorFlow and GPU)

Since the Recurrent Neural Networks is kind of heavy model to train, it is recommended to train the model in GPU environment. 

In [9]:
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.7.0


  # Remove the CWD from sys.path while we load stuff.


## Build the Neural Network

 In this notebook, I am going to build a special kind of model called 'sequence to sequence' (seq2seq in short). You can separate the entire model into 2 small sub-models. The first sub-model is called as __[E]__ Encoder, and the second sub-model is called as __[D]__ Decoder. __[E]__ takes a raw input text data just like any other RNN architectures do. At the end, __[E]__ outputs a neural representation. This is a very typical work, but you need to pay attention what this output really is. The output of __[E]__ is going to be the input data for __[D]__.

That is why we call __[E]__ as Encoder and __[D]__ as Decoder. __[E]__ makes an output encoded in neural representational form, and we don't know what it really is. It is somewhat encrypted. __[D]__ has the ability to look inside the __[E]__'s output, and it will create a totally different output data (translated in French in this case). 

In order to build such a model, there are 4 steps to do overall.
- __(1)__ make Input to Encoder networks
- __(2)__ build Encoder networks and retrieve the output
- __(3)__ make Input to Decoder networks (may need to add special characters. explained in more detail later)
- __(4)__ build Decoder networks and retrieve the output

Those step is conceptually right, be we should even break each steps into finer granularity for actual implementation. Only additional steps are highlighted in bold text style. Along with each detailed steps, I will list function names that I am going to implement.



- `model_inputs`
- `process_decoding_input`
- `encoding_layer`
- `decoding_layer_train`
- `decoding_layer_infer`
- `decoding_layer`
- `seq2seq_model`

(2018/04/18)

### Input
Implement the `model_inputs()` function to create TF Placeholders for the Neural Network. It should create the following placeholders:

- Input text placeholder named "input" using the TF Placeholder name parameter with rank 2.
- Targets placeholder with rank 2.
- Learning rate placeholder with rank 0.
- Keep probability placeholder named "keep_prob" using the TF Placeholder name parameter with rank 0.

Return the placeholders in the following the tuple (Input, Targets, Learing Rate, Keep Probability)

In [10]:
def model_inputs(source_seq_length, target_seq_length):
    # 1st None: batch
    # 2nd None: the sequence of inputs (the length varies from sentence to sentence)
    input = tf.placeholder(tf.int32, [None, source_seq_length], name="input")
    
    # 1st None: batch
    # 2nd None: same reason (the length of matching translated sentences varies in size - not same as the input sentence either)
    targets = tf.placeholder(tf.int32, [None, target_seq_length], name="targets")
    
    # simply floating value for learning_rate (set this value, so that it could be changed over time)
    learning_rate = tf.placeholder(tf.float32)
    
    # keep_prob
    keep_prob = tf.placeholder(tf.float32, name="keep_prob")
    
    return input, targets, learning_rate, keep_prob

### Process Decoding Input

<img src="./training_phase.png" style="width:400px;"/>

The figure above is borrowed from Thang Luong's thesis ['Neural Machine Translation'](https://github.com/lmthang/thesis/blob/master/thesis.pdf)

Alternately, [tf.contrib.seq2seq.TrainingHelper](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/TrainingHelper) can be used.

In [11]:
def process_decoding_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for dencoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
    """
    # get '<GO>' id
    go_id = target_vocab_to_int['<GO>']
    
    # TF strided_slice (https://www.tensorflow.org/api_docs/python/tf/strided_slice)
    # -- extracts a strided slice of a tensor (generalized python array indexing).
    #   -- can be thought as splitting into multiple tensors with the striding window size from begin to end
    # -- arguments: TF Tensor, Begin, End, Strides
    after_slice = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    
    # TF concat (https://www.tensorflow.org/api_docs/python/tf/concat)
    # -- Concatenates tensors along one dimension.
    # -- arguments: a list of TF Tensor (tf.fill and after_slice in this case), axis=1
    
    # TF fill (https://www.tensorflow.org/api_docs/python/tf/fill)
    # -- Creates a tensor filled with a scalar value.
    # -- arguments: TF Tensor (must be int32/int64), value to fill
    after_concat = tf.concat( [tf.fill([batch_size, 1], go_id), after_slice], 1)
    
    return after_concat

### Encoding
Implement `encoding_layer()` to create a Encoder RNN layer using [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn).

In [12]:
def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :return: RNN state
    """
    # TensorFlow Architecture (https://www.tensorflow.org/extend/architecture#overview)
    # TensorFlow Fused Ops (https://www.tensorflow.org/performance/performance_guide#common_fused_ops)
    # what is kernel in TensorFlow? (https://www.tensorflow.org/extend/adding_an_op#implement-the-kernel-for-the-op)
    # RNN cell category (https://www.tensorflow.org/api_guides/python/contrib.rnn#Core_RNN_Cell_wrappers_RNNCells_that_wrap_other_RNNCells_)
    
    # create one LSTM cell
    # -- LSTM in blog (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
    # -- BasicLSTMCell (https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell)
    #     - It does not allow cell clipping, a projection layer, and does not use peep-hole connections: it is the basic baseline. (threshold)
    # -- LSTMCell (https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMCell)
    # -- LSTMBlockCell (https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMBlockCell)
    # -- LSTMBlockFusedCell (https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LSTMBlockFusedCell)
    # -- CudnnLSTM (https://www.tensorflow.org/api_docs/python/tf/contrib/cudnn_rnn/CudnnLSTM)
    # 
    # TensorFlow RNN Performance (official doc)
    # -- https://www.tensorflow.org/performance/performance_guide#rnn_performance
    cell = tf.contrib.rnn.BasicLSTMCell(rnn_size)
    
    # make an array of LSTM cells in length of num_layers
    # -- https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell
    enc_cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
#     print('enc_cell: {}'.format(enc_cell))
    
    # add dropout
    # -- https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper
    enc_cell = tf.contrib.rnn.DropoutWrapper(enc_cell, keep_prob)
    
    # dynamic_rnn (https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
    # -- Performs fully dynamic unrolling of inputs.
    print('rnn_inputs: {}'.format(rnn_inputs))

    _, enc_state = tf.nn.dynamic_rnn(cell, rnn_inputs, dtype=tf.float32)
    return enc_state

### Decoding - Training
Create training logits using [`tf.contrib.seq2seq.simple_decoder_fn_train()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/simple_decoder_fn_train) and [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder).  Apply the `output_fn` to the [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder) outputs.

In [13]:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, sequence_length, decoding_scope,
                         output_fn, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param sequence_length: Sequence Length
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_fn: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: Train Logits
    """
    dec_cell = tf.contrib.rnn.DropoutWrapper(dec_cell, keep_prob)
    train_decoder_fn = tf.contrib.seq2seq.simple_decoder_fn_train(encoder_state)
    train_pred, _, _ = tf.contrib.seq2seq.dynamic_rnn_decoder(dec_cell, 
                                                              train_decoder_fn, 
                                                              dec_embed_input,
                                                              sequence_length,
                                                              scope=decoding_scope)
    logits = output_fn(train_pred)
    return logits

### Decoding - Inference
Create inference logits using [`tf.contrib.seq2seq.simple_decoder_fn_inference()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/simple_decoder_fn_inference) and [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder). 

In [14]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id,
                         maximum_length, vocab_size, decoding_scope, output_fn, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param maximum_length: The maximum allowed time steps to decode
    :param vocab_size: Size of vocabulary
    :param decoding_scope: TensorFlow Variable Scope for decoding
    :param output_fn: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: Inference Logits
    """
    infer_decoder_fn = tf.contrib.seq2seq.simple_decoder_fn_inference(output_fn, 
                                                                     encoder_state,
                                                                     dec_embeddings,
                                                                     start_of_sequence_id, end_of_sequence_id,
                                                                     maximum_length, vocab_size)
    logits, _, _ = tf.contrib.seq2seq.dynamic_rnn_decoder(dec_cell,
                                                         infer_decoder_fn,
                                                         scope=decoding_scope)
    return logits

### Build the Decoding Layer
Implement `decoding_layer()` to create a Decoder RNN layer.

- Create RNN cell for decoding using `rnn_size` and `num_layers`.
- Create the output fuction using [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) to transform it's input, logits, to class logits.
- Use the your `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, sequence_length, decoding_scope, output_fn, keep_prob)` function to get the training logits.
- Use your `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, maximum_length, vocab_size, decoding_scope, output_fn, keep_prob)` function to get the inference logits.

Note: You'll need to use [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [15]:
from tensorflow.contrib.seq2seq import TrainingHelper, GreedyEmbeddingHelper, BasicDecoder, dynamic_decode
# from tensorflow.layers import Dense

def decoding_layer(dec_embed_input, dec_embeddings, encoder_state, vocab_size, sequence_length, rnn_size,
                   num_layers, target_vocab_to_int, keep_prob, batch_size):
    """
    Create decoding layer
    :param dec_embed_input: Decoder embedded input
    :param dec_embeddings: Decoder embeddings
    :param encoder_state: The encoded state
    :param vocab_size: Size of vocabulary
    :param sequence_length: Sequence Length
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param keep_prob: Dropout keep probability
    :return: Tuple of (Training Logits, Inference Logits)
    """
    cell = tf.contrib.rnn.BasicLSTMCell(rnn_size)
#     dec_cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
    
    train_helper = TrainingHelper(dec_embed_input, sequence_length)

    projection_layer = tf.layers.Dense(len(target_vocab_to_int))

    train_decoder = BasicDecoder(cell, 
                                 train_helper, 
                                 encoder_state,
                                output_layer=projection_layer)

    dec_train_logits, _, _ = dynamic_decode(train_decoder)
    
    # inference
    infer_helper = GreedyEmbeddingHelper(dec_embeddings, tf.fill([batch_size], target_vocab_to_int['<GO>']), target_vocab_to_int['<EOS>'])

    infer_decoder = BasicDecoder(cell, 
                                 infer_helper, 
                                 encoder_state,
                                output_layer=projection_layer)
    
    dec_infer_logits, _, _ = dynamic_decode(infer_decoder)
    
#     with tf.variable_scope('decoding') as decoding_scope:
#         output_fn = lambda x: tf.contrib.layers.fully_connected(x, 
#                                                                 vocab_size, 
#                                                                 activation_fn=None,
#                                                                 scope=decoding_scope)
    
#     with tf.variable_scope('decoding') as decoding_scope:
#         dec_train_logits = decoding_layer_train(encoder_state,
#                                                 dec_cell,
#                                                 dec_embed_input, 
#                                                 sequence_length, 
#                                                 decoding_scope, 
#                                                 output_fn, 
#                                                 keep_prob)

#     decoding_scope.reuse_variables() # with tf.variable_scope('decoding', reuse=True) as decoding_scope:
#     dec_infer_logits = decoding_layer_infer(encoder_state, 
#                                             dec_cell, 
#                                             dec_embeddings, 
#                                             target_vocab_to_int['<GO>'],
#                                             target_vocab_to_int['<EOS>'], 
#                                             sequence_length-1, # <EOS> not needed for inference
#                                             vocab_size, 
#                                             decoding_scope, 
#                                             output_fn, 
#                                             keep_prob)
    
    return dec_train_logits, dec_infer_logits

Instructions for updating:
Use the retry module or similar alternatives.


### Build the Neural Network
Apply the functions you implemented above to:

- Apply embedding to the input data for the encoder.
- Encode the input using your `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob)`.
- Process target data using your `process_decoding_input(target_data, target_vocab_to_int, batch_size)` function.
- Apply embedding to the target data for the decoder.
- Decode the encoded input using your `decoding_layer(dec_embed_input, dec_embeddings, encoder_state, vocab_size, sequence_length, rnn_size, num_layers, target_vocab_to_int, keep_prob)`.

In [16]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size, sequence_length, source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size, rnn_size, num_layers, target_vocab_to_int):
    """
    Build the Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param sequence_length: Sequence Length
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training Logits, Inference Logits)
    """
    enc_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, enc_embedding_size)
    print(enc_embed_input)
    enc_state = encoding_layer(enc_embed_input, rnn_size, num_layers, keep_prob)
    
    dec_input = process_decoding_input(target_data, target_vocab_to_int, batch_size)
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, dec_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)
    
    train_logits, infer_logits = decoding_layer(dec_embed_input, 
                                                dec_embeddings, 
                                                enc_state,
                                                target_vocab_size, 
                                                sequence_length, 
                                                rnn_size, 
                                                num_layers, 
                                                target_vocab_to_int, 
                                                keep_prob,
                                                batch_size)
    return train_logits, infer_logits

## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `num_layers` to the number of layers.
- Set `encoding_embedding_size` to the size of the embedding for the encoder.
- Set `decoding_embedding_size` to the size of the embedding for the decoder.
- Set `learning_rate` to the learning rate.
- Set `keep_probability` to the Dropout keep probability

In [17]:
epochs = 3
batch_size = 256
rnn_size = 512
num_layers = 2
encoding_embedding_size = 200 
decoding_embedding_size = 200 
learning_rate = 0.001
keep_probability = 0.5

### Build the Graph
Build the graph using the neural network you implemented.

In [18]:
save_path = 'checkpoints/dev'
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = pickle.load(open('preprocess.p', mode='rb'))
max_source_sentence_length = max([len(sentence) for sentence in source_int_text])

source_seq_length = len(source_int_text[0])
target_seq_length = len(target_int_text[0])

train_graph = tf.Graph()
with train_graph.as_default():
    input_data, targets, lr, keep_prob = model_inputs(source_seq_length, target_seq_length)
    target_length = tf.placeholder(tf.int32, (None,), name='target_length')
    input_shape = tf.shape(input_data)
    
    train_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]), 
                                                   targets, 
                                                   keep_prob, 
                                                   batch_size, 
                                                   target_length, 
                                                   len(source_vocab_to_int), 
                                                   len(target_vocab_to_int),
                                                   encoding_embedding_size, 
                                                   decoding_embedding_size, 
                                                   rnn_size, num_layers, 
                                                   target_vocab_to_int)

#     tf.identity(inference_logits, 'logits')
#     with tf.name_scope("optimization"):
        # Loss function
#     crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=targets, logits=train_logits.rnn_output)
#     train_loss = (tf.reduce_sum(crossent * tf.ones([input_shape[0], target_seq_length])) / batch_size)
    
    
    cost = tf.contrib.seq2seq.sequence_loss(
        train_logits.rnn_output,
        targets,
        tf.ones([input_shape[0], target_seq_length]))

        # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.minimize(cost)

Tensor("EmbedSequence/embedding_lookup:0", shape=(?, 17, 200), dtype=float32)
rnn_inputs: Tensor("EmbedSequence/embedding_lookup:0", shape=(?, 17, 200), dtype=float32)


### Train
Train the neural network on the preprocessed data. If you have a hard time getting a good loss, check the forms to see if anyone is having the same problem.

In [19]:
def pad_sentence_batch(sentence_batch):
    """
    Pad sentence with <PAD> id
    """
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [CODES['<PAD>']] * (max_sentence - len(sentence))
            for sentence in sentence_batch]

In [20]:
def batch_data(source, target, batch_size):
    """
    Batch source and target together
    """
    for batch_i in range(0, len(source)//batch_size):
        start_i = batch_i * batch_size
        source_batch = source[start_i:start_i + batch_size]
        target_batch = target[start_i:start_i + batch_size]
        yield (np.asarray(source_batch), np.asarray(target_batch))

In [None]:
import time

def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1]), (0,0)],
            'constant')

    return np.mean(np.equal(target, np.argmax(logits, 2)))

train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]

valid_source = source_int_text[:batch_size]
valid_target = target_int_text[:batch_size]

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch) in enumerate(batch_data(train_source, train_target, batch_size)):
            start_time = time.time()
            
#             _, loss = sess.run(
#                 [train_op, cost],
#                 {input_data: source_batch,
#                  targets: target_batch,
#                  lr: learning_rate,
#                  target_length: [target_seq_length] * batch_size,
#                  keep_prob: keep_probability})
            
            batch_train_logits = sess.run(
                inference_logits,
                {input_data: source_batch, keep_prob: 1.0})
            
#             batch_valid_logits = sess.run(
#                 inference_logits,
#                 {input_data: valid_source, keep_prob: 1.0})
                
#             train_acc = get_accuracy(target_batch, batch_train_logits)
#             valid_acc = get_accuracy(np.array(valid_target), batch_valid_logits)
#             end_time = time.time()
#             print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.3f}, Validation Accuracy: {:>6.3f}, Loss: {:>6.3f}'
#                   .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_path)
    print('Model Trained and Saved')

### Save Parameters
Save the `batch_size` and `save_path` parameters for inference.

In [None]:
# Save parameters for checkpoint
pickle.dump(save_path, open('params.p', 'wb'))

# Checkpoint

In [60]:
import tensorflow as tf
import numpy as np
import problem_unittests as tests

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = pickle.load(open('preprocess.p', mode='rb'))
load_path = pickle.load(open('params.p', mode='rb'))

## Sentence to Sequence
To feed a sentence into the model for translation, you first need to preprocess it.  Implement the function `sentence_to_seq()` to preprocess new sentences.

- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
 - Convert words not in the vocabulary, to the `<UNK>` word id.

In [61]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """
    sentence = sentence.lower()
    
    output = []
    
    for token in sentence.split(' '):
        if (token in vocab_to_int):
            output.append(vocab_to_int[token])
        else:
            output.append(vocab_to_int['<UNK>'])
    
    return output

tests.test_sentence_to_seq(sentence_to_seq)

Tests Passed


## Translate
This will translate `translate_sentence` from English to French.

In [63]:
translate_sentence = 'he saw a old yellow truck .'

translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('logits:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence], keep_prob: 1.0})[0]

print('Input')
print('  Word Ids:      {}'.format([i for i in translate_sentence]))
print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

print('\nPrediction')
print('  Word Ids:      {}'.format([i for i in np.argmax(translate_logits, 1)]))
print('  French Words: {}'.format([target_int_to_vocab[i] for i in np.argmax(translate_logits, 1)]))

Input
  Word Ids:      [136, 157, 103, 34, 63, 156, 188]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [129, 220, 346, 305, 338, 102, 165, 1]
  French Words: ['il', 'a', 'vu', 'au', 'automobile', 'jaune', '.', '<EOS>']


## Imperfect Translation
You might notice that some sentences translate better than others.  Since the dataset you're using only has a vocabulary of 227 English words of the thousands that you use, you're only going to see good results using these words.  For this project, you don't need a perfect translation. However, if you want to create a better translation model, you'll need better data.

You can train on the [WMT10 French-English corpus](http://www.statmt.org/wmt10/training-giga-fren.tar).  This dataset has more vocabulary and richer in topics discussed.  However, this will take you days to train, so make sure you've a GPU and the neural network is performing well on dataset we provided.  Just make sure you play with the WMT10 corpus after you've submitted this project.
## Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_language_translation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.