# Language translation
# Seq2Seq
Tutorial on how to converting from english to french using a 'small' database, using seq2seq model APIs in tensorflow.

In [1]:
# modules to load
import os
import numpy as np
import tensorflow as tf
from distutils.version import LooseVersion
# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

TensorFlow Version: 1.0.0


In [2]:
# Load the data
source_path = 'data/small_vocab_en'
target_path = 'data/small_vocab_fr'
def load_data(path):
    """
    Load Dataset from File
    """
    input_file = os.path.join(path)
    with open(input_file, 'r', encoding='utf-8') as f:
        data = f.read()

    return data
source_text = load_data(source_path)
target_text = load_data(target_path)

# Convert all text to lower case
source_text = source_text.lower()
target_text = target_text.lower()

## Explore the data

Lets view the source and target examples

In [3]:
view_sentence_range = (0, 5)
print('English sentences {} to {}:'.format(*view_sentence_range))
for i, source in enumerate(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]):
    print([i],source)

print()
print('French sentences {} to {}:'.format(*view_sentence_range))
for i, source in enumerate(source_text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]):
    print([i],source)

English sentences 0 to 5:
[0] new jersey is sometimes quiet during autumn , and it is snowy in april .
[1] the united states is usually chilly during july , and it is usually freezing in november .
[2] california is usually quiet during march , and it is usually hot in june .
[3] the united states is sometimes mild during june , and it is cold in september .
[4] your least liked fruit is the grape , but my least liked is the apple .

French sentences 0 to 5:
[0] new jersey is sometimes quiet during autumn , and it is snowy in april .
[1] the united states is usually chilly during july , and it is usually freezing in november .
[2] california is usually quiet during march , and it is usually hot in june .
[3] the united states is sometimes mild during june , and it is cold in september .
[4] your least liked fruit is the grape , but my least liked is the apple .


## Implement preprocessing of the data

The data has to be converted to IDs which are the list of integers.

In [4]:
import copy
# Special characters that we need
CODES = {'<PAD>': 0, '<EOS>': 1, '<UNK>': 2, '<GO>': 3 }
def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    """
    vocab = set(text.split())
    vocab_to_int = copy.copy(CODES)

    for v_i, v in enumerate(vocab, len(CODES)):
        vocab_to_int[v] = v_i

    int_to_vocab = {v_i: v for v, v_i in vocab_to_int.items()}

    return vocab_to_int, int_to_vocab
source_vocab_to_int, source_int_to_vocab = create_lookup_tables(source_text)
target_vocab_to_int, target_int_to_vocab = create_lookup_tables(target_text)
print('Dictionary vocab-int pairs (first 10): \n', list(source_vocab_to_int.items())[:10])

Dictionary vocab-int pairs (first 10): 
 [('mangoes.', 4), ('lemon', 5), ('animals', 7), ('wonderful', 8), ('horse', 119), ('warm', 9), ('april', 10), ('chinese', 79), ('winter', 206), ('july', 11)]


In [5]:
# Convert inputs and targets to int. Add the <EOS> word id at the end of each sentence from target_text. This will help the neural network predict when the sentence should end

def text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int):
    """
    Convert source and target text to proper word ids
    :param source_text: String that contains all the source text.
    :param target_text: String that contains all the target text.
    :param source_vocab_to_int: Dictionary to go from the source words to an id
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: A tuple of lists (source_id_text, target_id_text)
    """
    
    source_text_ids = [[source_vocab_to_int[word] for word in (line).split()] for line in source_text.split('\n')]
    target_text_ids = [[target_vocab_to_int[word] for word in (line + ' <EOS>').split()] for line in target_text.split('\n')]


  
    return source_text_ids, target_text_ids
source_text_ids, target_text_ids = text_to_ids(source_text, target_text, source_vocab_to_int, target_vocab_to_int)

print('An French example: \n')
print(target_text_ids[0])
target_text.split('\n')[0]

An French example: 

[93, 267, 79, 52, 165, 131, 207, 85, 330, 174, 323, 79, 200, 55, 121, 12, 1]


"new jersey est parfois calme pendant l' automne , et il est neigeux en avril ."

In [6]:
import pickle
# Lets save the data
pickle.dump(((source_text_ids, target_text_ids),
        (source_vocab_to_int, target_vocab_to_int),
        (source_int_to_vocab, target_int_to_vocab)), open('preprocess.p', 'wb'))

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [7]:
#Load Data
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = pickle.load(open('preprocess.p', mode='rb'))

## Build the Neural Network
The following components are necessary to build a Sequence-to-Sequence model:   
- **model_inputs** : A function to recreate placeholders for the neural network
- **process_decoding_input**: Remove the last word id in target data and concat the 'GO' ID to the begining of each batch
- **encoding_layer** : Create Encoder RNN layer. The encoder hidden state is given to the decoder and have it process its output.
- **decoding_layer_train** : We need to declare a decoder for the training phase, and a decoder for the inference/prediction phase. These two decoders will share their parameters (so that all the weights and biases that are set during the training phase can be used when we deploy the model). The training decoder **does not** feed the output of each time step to the next. Rather, the inputs to the decoder time steps are the target sequence from the training dataset (the orange letters).
    <img src="assets/sequence-to-sequence-training-decoder.png"/>
    
    
- **decoding_layer_infer** : The inference decoder feeds the output of each time step as an input to the next.
    <img src="assets/sequence-to-sequence-inference-decoder.png"/>


- **decoding_layer** : Creates a decoder RNN. 
- **seq2seq_model** : Puts together all the above.

### Input
The `model_inputs()` function creates TF Placeholders for the Neural Network. 

In [8]:
def model_inputs():
    """
     Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate, keep probability)
    """

    Input = tf.placeholder(tf.int32,[None, None], name='input')
    Target = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='lr')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    return Input, Target, lr, keep_prob

### Process Decoding Input
Implement `process_decoding_input` using TensorFlow to remove the last word id from each batch in `target_data` and concat the GO ID to the begining of each batch. (Check the diagram above)

In [9]:
def process_decoding_input(target_data, target_vocab_to_int, batch_size):
    """
    Preprocess target data for dencoding
    :param target_data: Target Placehoder
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param batch_size: Batch Size
    :return: Preprocessed target data
    """
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    out = tf.concat([tf.fill([batch_size, 1], target_vocab_to_int['<GO>']), ending], 1)

    return out

#Checking the implementation
target_vocab_to_int = {'<GO>': 3}
batch_size = 2
seq_length = 10
target_data = tf.placeholder(tf.int32, [batch_size, seq_length])
dec_input = process_decoding_input(target_data, target_vocab_to_int, batch_size)

with tf.Session() as sess:
    demonstration_outputs = np.reshape(range(batch_size * seq_length), (batch_size, seq_length))
    print('GO ID :', target_vocab_to_int.items())
    print('Input: \n', demonstration_outputs[:2])
    print("\n")
    print('Output: \n', sess.run(dec_input, {target_data: demonstration_outputs})[:2])

GO ID : dict_items([('<GO>', 3)])
Input: 
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]


Output: 
 [[ 3  0  1  2  3  4  5  6  7  8]
 [ 3 10 11 12 13 14 15 16 17 18]]


### Encoding
`encoding_layer()` creates a Encoder RNN layer using [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn). Added the dropout layer on top each cell as it solves overfitting issues. The basic cell used is the LSTM cell. We can construct a stack layer of MultiRNNCell using the 'num_layer' parameter. The encoder_state is return to be passed to the decoder layer.

In [10]:
def encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob):
    """
    Create encoding layer
    :param rnn_inputs: Inputs for the RNN
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param keep_prob: Dropout keep probability
    :return: RNN state
    """

    cell = basic_cell = tf.contrib.rnn.DropoutWrapper(
        tf.contrib.rnn.BasicLSTMCell(rnn_size),
        output_keep_prob=keep_prob)
    if num_layers > 1:
        cell = tf.contrib.rnn.MultiRNNCell([basic_cell]*num_layers)
    
    _, enc_state = tf.nn.dynamic_rnn(cell, rnn_inputs, dtype=tf.float32)
    return enc_state


### Decoding - Training
Decoder training logits is created using [`tf.contrib.seq2seq.simple_decoder_fn_train()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/simple_decoder_fn_train) and [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder).  Apply the `output_fn` to the [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder) outputs.

In [11]:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input, sequence_length, decoding_scope,
                         output_fn, keep_prob):
    """
    Create a decoding layer for training
    :param encoder_state: Encoder State
    :param dec_cell: Decoder RNN Cell
    :param dec_embed_input: Decoder embedded input
    :param sequence_length: Sequence Length
    :param decoding_scope: TenorFlow Variable Scope for decoding
    :param output_fn: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: Train Logits
    """

    train_decoder_fn = tf.contrib.seq2seq.simple_decoder_fn_train(encoder_state)
    train_pred, _, _ = tf.contrib.seq2seq.dynamic_rnn_decoder(
        dec_cell, train_decoder_fn, dec_embed_input, sequence_length, scope=decoding_scope)
    
    # Apply output function
    train_logits =  output_fn(train_pred)
    
    return train_logits


### Decoding - Inference
Create inference logits using [`tf.contrib.seq2seq.simple_decoder_fn_inference()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/simple_decoder_fn_inference) and [`tf.contrib.seq2seq.dynamic_rnn_decoder()`](https://www.tensorflow.org/api_docs/python/tf/contrib/seq2seq/dynamic_rnn_decoder). 

In [12]:
def decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id,
                         maximum_length, vocab_size, decoding_scope, output_fn, keep_prob):
    """
    Create a decoding layer for inference
    :param encoder_state: Encoder state
    :param dec_cell: Decoder RNN Cell
    :param dec_embeddings: Decoder embeddings
    :param start_of_sequence_id: GO ID
    :param end_of_sequence_id: EOS Id
    :param maximum_length: The maximum allowed time steps to decode
    :param vocab_size: Size of vocabulary
    :param decoding_scope: TensorFlow Variable Scope for decoding
    :param output_fn: Function to apply the output layer
    :param keep_prob: Dropout keep probability
    :return: Inference Logits
    """

    # Inference Decoder
    infer_decoder_fn = tf.contrib.seq2seq.simple_decoder_fn_inference(output_fn, encoder_state, dec_embeddings, 
                                                                      start_of_sequence_id, 
                                                                      end_of_sequence_id, 
                                                                      maximum_length, vocab_size)
    inference_logits, _, _ = tf.contrib.seq2seq.dynamic_rnn_decoder(dec_cell, infer_decoder_fn, scope=decoding_scope)
    
    return inference_logits

### Build the Decoding Layer
`decoding_layer()` combines the above to create a Decoder RNN layer.
Consits of the following steps:
- Create RNN cell for decoding using `rnn_size` and `num_layers`.
- Create the output fuction using [`lambda`](https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions) to transform it's input, logits, to class logits.
- Use the function `decoding_layer_train(encoder_state, dec_cell, dec_embed_input, sequence_length, decoding_scope, output_fn, keep_prob)` function to get the training logits.
- Use the function `decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, maximum_length, vocab_size, decoding_scope, output_fn, keep_prob)` function to get the inference logits.

We need to use [tf.variable_scope](https://www.tensorflow.org/api_docs/python/tf/variable_scope) to share variables between training and inference.

In [13]:
def decoding_layer(dec_embed_input, dec_embeddings, encoder_state, vocab_size, sequence_length, rnn_size,
                   num_layers, target_vocab_to_int, keep_prob):
    """
    Create decoding layer
    :param dec_embed_input: Decoder embedded input
    :param dec_embeddings: Decoder embeddings
    :param encoder_state: The encoded state
    :param vocab_size: Size of vocabulary
    :param sequence_length: Sequence Length
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :param keep_prob: Dropout keep probability
    :return: Tuple of (Training Logits, Inference Logits)
    """

    start_of_sequence_id = target_vocab_to_int['<GO>']
    end_of_sequence_id = target_vocab_to_int['<EOS>']
    
    # Decoder RNNs
    dec_cell = basic_cell = tf.contrib.rnn.DropoutWrapper(
        tf.contrib.rnn.BasicLSTMCell(rnn_size),
        output_keep_prob=keep_prob)
    if num_layers > 1:
        dec_cell = tf.contrib.rnn.MultiRNNCell([basic_cell]*num_layers)
    
    with tf.variable_scope("decoding") as decoding_scope:
        output_fn = lambda x: tf.contrib.layers.fully_connected(x, vocab_size, None, scope=decoding_scope)
        train_logits = decoding_layer_train(encoder_state, dec_cell, dec_embed_input, sequence_length, decoding_scope, output_fn, keep_prob)
    
    with tf.variable_scope("decoding", reuse=True) as decoding_scope:
        inference_logits = decoding_layer_infer(encoder_state, dec_cell, dec_embeddings, start_of_sequence_id, end_of_sequence_id, sequence_length, vocab_size, decoding_scope, output_fn, keep_prob)

    return train_logits, inference_logits

### Build the Seq2Seq Network
Apply the functions implemented above to:

- Apply embedding to the input data for the encoder.
- Encode the input using your `encoding_layer(rnn_inputs, rnn_size, num_layers, keep_prob)`.
- Process target data using your `process_decoding_input(target_data, target_vocab_to_int, batch_size)` function.
- Apply embedding to the target data for the decoder.
- Decode the encoded input using your `decoding_layer(dec_embed_input, dec_embeddings, encoder_state, vocab_size, sequence_length, rnn_size, num_layers, target_vocab_to_int, keep_prob)`.

In [14]:
def seq2seq_model(input_data, target_data, keep_prob, batch_size, sequence_length, source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size, rnn_size, num_layers, target_vocab_to_int):
    """
    Build the Sequence-to-Sequence part of the neural network
    :param input_data: Input placeholder
    :param target_data: Target placeholder
    :param keep_prob: Dropout keep probability placeholder
    :param batch_size: Batch Size
    :param sequence_length: Sequence Length
    :param source_vocab_size: Source vocabulary size
    :param target_vocab_size: Target vocabulary size
    :param enc_embedding_size: Decoder embedding size
    :param dec_embedding_size: Encoder embedding size
    :param rnn_size: RNN Size
    :param num_layers: Number of layers
    :param target_vocab_to_int: Dictionary to go from the target words to an id
    :return: Tuple of (Training Logits, Inference Logits)
    """
        
    #embedding input data
    enc_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, enc_embedding_size)
    encoder_state = encoding_layer(enc_embed_input, rnn_size, num_layers, keep_prob)
    
    target_data = process_decoding_input(target_data, target_vocab_to_int, batch_size)
    
    #embedding target data
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, dec_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, target_data)

    train_logits, inference_logits = decoding_layer(dec_embed_input, dec_embeddings, encoder_state, target_vocab_size, sequence_length, rnn_size, num_layers, target_vocab_to_int, keep_prob)


    return train_logits, inference_logits

## Neural Network Training

In [15]:
# Check for a GPU
import warnings
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

Default GPU Device: /gpu:0


In [16]:
#Setting up hyperparameters

# Number of Epochs
epochs = 10
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 256
# Number of Layers
num_layers = 2
# Embedding Size
encoding_embedding_size = 256
decoding_embedding_size = 256
# Learning Rate
learning_rate = 3e-4
# Dropout Keep Probability
keep_probability = 0.5

### Build the Graph
Build the graph using the neural network

In [17]:
max_source_sentence_length = max([len(sentence) for sentence in source_int_text])
(source_int_text, target_int_text), (source_vocab_to_int, target_vocab_to_int), _ = pickle.load(open('preprocess.p', mode='rb'))

train_graph = tf.Graph()

with train_graph.as_default():
    input_data, targets, lr, keep_prob = model_inputs()
    sequence_length = tf.placeholder_with_default(max_source_sentence_length, None, name='sequence_length')
    input_shape = tf.shape(input_data)
    
    train_logits, inference_logits = seq2seq_model(
        tf.reverse(input_data, [-1]), targets, keep_prob, batch_size, sequence_length, len(source_vocab_to_int), len(target_vocab_to_int),
        encoding_embedding_size, decoding_embedding_size, rnn_size, num_layers, target_vocab_to_int)

    tf.identity(inference_logits, 'logits')
    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            train_logits,
            targets,
            tf.ones([input_shape[0], sequence_length]))

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
        tf.summary.scalar('loss', cost, collections=['train'])
      
    s_train = tf.summary.merge_all('train')
    

### Train helper functions
Helper functions used for training

In [18]:
# PAD input batch
def pad_sentence_batch(sentence_batch):
    """
    Pad sentence with <PAD> id
    """
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [CODES['<PAD>']] * (max_sentence - len(sentence))
            for sentence in sentence_batch]

def batch_data(source, target, batch_size):
    """
    Batch source and target together
    """
    for batch_i in range(0, len(source)//batch_size):
        start_i = batch_i * batch_size
        source_batch = source[start_i:start_i + batch_size]
        target_batch = target[start_i:start_i + batch_size]
        yield np.array(pad_sentence_batch(source_batch)), np.array(pad_sentence_batch(target_batch))
        
# Accuracy calculation
def get_accuracy(target, logits):
    """
    Calculate accuracy
    """
    max_seq = max(target.shape[1], logits.shape[1])
    if max_seq - target.shape[1]:
        target = np.pad(
            target,
            [(0,0),(0,max_seq - target.shape[1])],
            'constant')
    if max_seq - logits.shape[1]:
        logits = np.pad(
            logits,
            [(0,0),(0,max_seq - logits.shape[1]), (0,0)],
            'constant')

    return np.mean(np.equal(target, np.argmax(logits, 2)))

#Split train validation data and PAD the train data
train_source = source_int_text[batch_size:]
train_target = target_int_text[batch_size:]

valid_source = pad_sentence_batch(source_int_text[:batch_size])
valid_target = pad_sentence_batch(target_int_text[:batch_size])


### Train
Train the neural network on the preprocessed data.

In [19]:
#Use tf.Supervisor to run a managed session
import time
log_dir = 'log/'
sv = tf.train.Supervisor(graph=train_graph, logdir=log_dir, save_model_secs=30)
count = 0
with sv.managed_session() as sess:
    for epoch_i in range(epochs):
        for batch_i, (source_batch, target_batch) in enumerate(
                batch_data(train_source, train_target, batch_size)):
            start_time = time.time()
            
            _, loss, s_t = sess.run(
                [train_op, cost,s_train],
                {input_data: source_batch,
                 targets: target_batch,
                 lr: learning_rate,
                 sequence_length: target_batch.shape[1],
                 keep_prob: keep_probability})
            count += 1
            sv.summary_computed(sess, s_t, global_step=count)
            
            if (batch_i % 100) == 0:
                batch_train_logits = sess.run(
                    inference_logits, 
                    {input_data: source_batch, keep_prob: 1.0})
                batch_valid_logits= sess.run(
                    inference_logits, 
                    {input_data: valid_source, keep_prob: 1.0})
                
                train_acc = get_accuracy(target_batch, batch_train_logits)
                valid_acc = get_accuracy(np.array(valid_target), batch_valid_logits)
                end_time = time.time()
            
                print('Epoch {:>3} Batch {:>4}/{} - Train Accuracy: {:>6.3f}, Validation Accuracy: {:>6.3f}, Loss: {:>6.3f}'
                      .format(epoch_i, batch_i, len(source_int_text) // batch_size, train_acc, valid_acc, loss))
    sv.saver.save(sess, sv.save_path)
    print('Model Trained and Saved')

Epoch   0 Batch    0/1077 - Train Accuracy:  0.294, Validation Accuracy:  0.305, Loss:  5.935
Epoch   0 Batch  100/1077 - Train Accuracy:  0.415, Validation Accuracy:  0.473, Loss:  2.710
Epoch   0 Batch  200/1077 - Train Accuracy:  0.441, Validation Accuracy:  0.502, Loss:  2.159
Epoch   0 Batch  300/1077 - Train Accuracy:  0.410, Validation Accuracy:  0.482, Loss:  1.870
Epoch   0 Batch  400/1077 - Train Accuracy:  0.423, Validation Accuracy:  0.453, Loss:  1.555
Epoch   0 Batch  500/1077 - Train Accuracy:  0.457, Validation Accuracy:  0.483, Loss:  1.383
Epoch   0 Batch  600/1077 - Train Accuracy:  0.523, Validation Accuracy:  0.537, Loss:  1.192
Epoch   0 Batch  700/1077 - Train Accuracy:  0.511, Validation Accuracy:  0.558, Loss:  1.089
Epoch   0 Batch  800/1077 - Train Accuracy:  0.477, Validation Accuracy:  0.565, Loss:  1.035
Epoch   0 Batch  900/1077 - Train Accuracy:  0.551, Validation Accuracy:  0.562, Loss:  0.973
Epoch   0 Batch 1000/1077 - Train Accuracy:  0.600, Validati

**Note**:The above model will be saved every '30' seconds as mentioned in the argument 'save_model_secs'. Try to interuppt the above cell and then start again. The training will start from the last point saved and the model will continue training.

### Tensorboard graph 
View the tensorboard graph using the command: tensorboard --logdir=log. Navigating to http://127.0.1.1:6006 one can check the summaries that have been collected (in our case, inference/loss), as well as the graph of the network, embedding vectors that are trained.

## Test the model
## Checkpoint
Can start from this checkpoint, the models and data are loaded from disk.

In [20]:
import tensorflow as tf
import numpy as np

_, (source_vocab_to_int, target_vocab_to_int), (source_int_to_vocab, target_int_to_vocab) = pickle.load(open('preprocess.p', mode='rb'))
load_path = 'log/model.ckpt'

## Sentence to Sequence
To feed a sentence into the model for translation, we need to preprocess it.  Te function `sentence_to_seq()` implements:
- Convert the sentence to lowercase
- Convert words into ids using `vocab_to_int`
- Convert words not in the vocabulary, to the `<UNK>` word id.

In [21]:
def sentence_to_seq(sentence, vocab_to_int):
    """
    Convert a sentence to a sequence of ids
    :param sentence: String
    :param vocab_to_int: Dictionary to go from the words to an id
    :return: List of word ids
    """
    out = [vocab_to_int.get(word.lower(), vocab_to_int['<UNK>']) for word in sentence.split()]

    return out

## Translate
This will translate `translate_sentence` from English to French.

In [22]:
translate_sentence = 'he saw a old yellow truck .'

translate_sentence = sentence_to_seq(translate_sentence, source_vocab_to_int)

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_path + '.meta')
    loader.restore(sess, load_path)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('logits:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')

    translate_logits = sess.run(logits, {input_data: [translate_sentence], keep_prob: 1.0})[0]

    print('Input')
    print('  Word Ids:      {}'.format([i for i in translate_sentence]))
    print('  English Words: {}'.format([source_int_to_vocab[i] for i in translate_sentence]))

    print('\nPrediction')
    print('  Word Ids:      {}'.format([i for i in np.argmax(translate_logits, 1)]))
    print('  French Words: {}'.format([target_int_to_vocab[i] for i in np.argmax(translate_logits, 1)]))

Input
  Word Ids:      [132, 42, 163, 142, 97, 94, 28]
  English Words: ['he', 'saw', 'a', 'old', 'yellow', 'truck', '.']

Prediction
  Word Ids:      [323, 195, 142, 199, 67, 234, 12, 1]
  French Words: ['il', 'pas', 'un', 'vieux', 'camion', 'jaune', '.', '<EOS>']
