# Sequential to Sequential learning with Keras Functional API

What is the difference between Keras Sequential API and Functional API?

Sequential API:
The simplest API where you first call model = Sequential() and keep adding layers, e.g. model.add(Dense(...)) 

Functional API:
Advance API where you can create custom models with arbitrary input/outputs. Defining a model needs to be done bit carefully as there’s lot to be done on user’s end. Model can be defined using model = Model(inputs=[...], outputs=[...])


In [None]:
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py

In [1]:
from __future__ import print_function

from keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np

batch_size = 64  # Batch size for training.
epochs = 100  # Number of epochs to train for.
latent_dim = 256  # Latent dimensionality of the encoding space.
num_samples = 10000  # Number of samples to train on.
# Path to the data txt file on disk.
data_path = 'fra-eng/fra.txt'

# Vectorize the data.
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, 'r', encoding='utf-8') as f:
    lines = f.read().split('\n')
for line in lines[: min(num_samples, len(lines) - 1)]: #make sure the sample size we defined does not excess the sample we have
    input_text, target_text = line.split('\t') #in the dataset, the input and the output is seperated by a tab
    # We use "tab" as the "start sequence" character
    # for the targets, and "\n" as "end sequence" character.
    target_text = '\t' + target_text + '\n'
    input_texts.append(input_text)
    target_texts.append(target_text)
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)

Using TensorFlow backend.


### Dealing with Texture data
I would explain the process of dealing with data in NLP. In this case, our focus is on character based machine translation, which means that we consider the charaters as features in out model. In the next cell I will show how do we treat texture data. First I'll show a brief explaination of the data. `input_texts` is a list of `sentences` that will be transformed. `target_texts` is a list of the correct translated `sentences`. We will call these sentences as sequences, so "I_love_Dogs." is a sequence. In this sequence, there are 12 characters(blanks and periods are included). `input_characters` are the unique characters in `input_texts`, which is a list of `sentences`. basicly one could guess that in the list of unique characters, there should be A-Z and the symbols, such as the exclamation mark etc. `max_encoder_seq_length` is the maximum length of sentences in `input_texts`. Why this is important? Because we need to use the largest sequences as the input size so that any sequences in `input_texts` culd be inputted in the encoder.

In [2]:
print(input_texts[:10])
print(target_texts[:10])
print(input_characters)
print(target_characters)

input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)

['Go.', 'Hi.', 'Hi.', 'Run!', 'Run!', 'Who?', 'Wow!', 'Fire!', 'Help!', 'Jump.']
['\tVa !\n', '\tSalut !\n', '\tSalut.\n', '\tCours\u202f!\n', '\tCourez\u202f!\n', '\tQui ?\n', '\tÇa alors\u202f!\n', '\tAu feu !\n', "\tÀ l'aide\u202f!\n", '\tSaute.\n']
{'f', '7', 'a', '8', '-', '5', 'L', 'w', 't', 'P', 'n', 'C', 'j', "'", '&', '9', '6', 'g', 'E', '.', '3', 'N', 'T', 'u', 'l', 'd', 'c', 'm', 'H', 'o', 'J', 's', 'p', '1', 'b', 'U', '2', ' ', 'Q', 'S', 'x', 'i', '?', 'K', 'Y', 'q', '$', '0', 'R', 'I', 'r', 'B', 'z', 'M', 'D', 'e', 'W', 'G', 'k', 'v', ':', 'F', 'A', 'h', '%', 'y', 'O', ',', '!', 'V'}
{'a', '-', 'P', '&', 'ù', 'N', 'u', 'H', 'É', '\u2009', '1', 'S', 'i', 'é', '?', 'K', '0', 'ê', 'D', 'À', ',', 't', 'è', '.', 'J', 'U', '\n', ' ', '$', '\u202f', 'Y', 'q', 'I', 'œ', 'k', ':', 'A', '\xa0', 'h', 'f', 'î', '»', '«', '5', 'L', 'n', 'ô', 'j', '’', 'E', 'Ê', '(', 'o', 'Ç', 's', 'b', 'â', '\t', 'R', 'r', 'v', 'G', 'ë', '%', 'O', '!', 'V', '8', 'à', 'C', "'", 'ç', 'ï', '9', 'g', '3', 

### One hot encoding
In this section we are going to use on hot encoding on the texture data. How it is done? I would like to show and example:
Let's assume that one of the sentences is: 'adc_faad', it iwill then we translated into an array of o and 1s.

  a b c d e f g ... 
a 1 0 0 0 0 0 0 ...
d 0 0 0 1 0 0 0 ...
c 0 0 1 0 0 0 0 ...
_ 0 0 1 0 0 0 0 ...
f 0 0 0 0 0 1 0 ...
a 1 0 0 0 0 0 0 ...
a 1 0 0 0 0 0 0 ...
d 0 0 0 1 0 0 0 ...

We have to first design `input_token_index`, `output_token_index` that maps each character to an index, so that we could find the indicies in the array and input 1, so that it becomes a one hot vector. `encoder_input_data` and `decoder_input_data` would look like the array we showed above.

In [3]:
input_token_index = dict(
    [(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict(
    [(char, i) for i, char in enumerate(target_characters)])

encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens),
    dtype='float32')
decoder_input_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
    dtype='float32')
decoder_target_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
    dtype='float32')
# doing one hot encoding
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.

In [4]:
print(input_token_index)

{' ': 0, '!': 1, '$': 2, '%': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '0': 9, '1': 10, '2': 11, '3': 12, '5': 13, '6': 14, '7': 15, '8': 16, '9': 17, ':': 18, '?': 19, 'A': 20, 'B': 21, 'C': 22, 'D': 23, 'E': 24, 'F': 25, 'G': 26, 'H': 27, 'I': 28, 'J': 29, 'K': 30, 'L': 31, 'M': 32, 'N': 33, 'O': 34, 'P': 35, 'Q': 36, 'R': 37, 'S': 38, 'T': 39, 'U': 40, 'V': 41, 'W': 42, 'Y': 43, 'a': 44, 'b': 45, 'c': 46, 'd': 47, 'e': 48, 'f': 49, 'g': 50, 'h': 51, 'i': 52, 'j': 53, 'k': 54, 'l': 55, 'm': 56, 'n': 57, 'o': 58, 'p': 59, 'q': 60, 'r': 61, 's': 62, 't': 63, 'u': 64, 'v': 65, 'w': 66, 'x': 67, 'y': 68, 'z': 69}


In [5]:
print(len(encoder_input_data))

10000


In [6]:
print(len(input_text))
print(len(input_texts[-1]))
print(input_text)

16
16
Do you trust me?


### The Return Sequences arguments
Return Sequences controls the output of one hidden state h for `each` input. 

In the encoder phase, we use the default return_sequences=False, because we dont need to access the hidden state output for each input time step. The model needs this information, but we dont. We are not using the hidden state to do anything. But, in the decoder phase we do need to use the hiden states( which is also the output of the previous cell) to feed to the next cell as input. Thus we do need to set return sequence as True in the decoder phase.

We set up our decoder to return full output sequences, and to return internal states as well. We don't use the return states in the training model, but we will use them in inference.

### The Return States arguments
Keras provides the return_state argument to the LSTM layer that will provide access to the `final` hidden state output (state_h) and the cell state (state_c)

### looking at the outputs of encoder and decoder


encoder_outputs, state_h, state_c = encoder(encoder_inputs)

decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)

In [7]:
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens)) #this is same as he tensorflow input: batch_size*features
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) 
# this LSTM function only specifies what a single cell looks like, 
# such as how many neurons are in a cell and what is the input of the first cell
# later we need a model that contains a series of cell, so the num of cells is controlled
# in the model, not here
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)
# Save model
model.save('s2s.h5')

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Train on 8000 samples, validate on 2000 samples
Epoch 1/100


InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

In [None]:
# Next: inference mode (sampling).
# Here's the drill:
# 1) encode input and retrieve initial decoder state
# 2) run one step of decoder with this initial state
# and a "start of sequence" token as target.
# Output will be the next target token
# 3) Repeat with the current target token and current states

# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
# Note that in "[decoder_inputs] + decoder_states_inputs", the [decoder_inputs] will be for only one cell.
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())

In [None]:
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence

In [None]:
for seq_index in range(100):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)