# Sequence to Sequence Learning with Keras
Author: Hayson Cheung [hayson.cheung@mail.utoronto.ca]

In this notebook, we learn from the works of Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks, NIPS 2014. We will implement a simple sequence to sequence model using LSTM in Keras. The model will be trained on a dataset of English sentences and their corresponding German sentences. The model will be able to translate English sentences to German sentences.

We map sequences of English words to sequences of German words. The model is trained on a dataset of English sentences and their corresponding German sentences. The model will be able to translate English sentences to German sentences.

In [78]:
# sample.ipynb

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Embedding


In [62]:
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Num GPUs Available:  0


In [7]:
# Parameters

# Latent dimension is the number of hidden units |h(t)| in the LSTM cell
LATENT_DIM = 256

In [54]:
import load_data

load_data.main("tldr.tmx")

from load_data import INPUT_VOCAB_SIZE, OUTPUT_VOCAB_SIZE, MAX_INPUT_LENGTH, MAX_OUTPUT_LENGTH, input_tokenizer, \
    output_tokenizer

print(f"Input vocab size: {INPUT_VOCAB_SIZE}")
print(f"Output vocab size: {OUTPUT_VOCAB_SIZE}")
print(f"Max input length: {MAX_INPUT_LENGTH}")
print(f"Max output length: {MAX_OUTPUT_LENGTH}")

Input vocab size: 3750
Output vocab size: 2950
Max input length: 46
Max output length: 65


## ENCODER and DECODER

In the two LSTM models, the encoder LSTM model will take the input sequence and return the encoder states. The decoder LSTM model will take the output sequence and the encoder states as input and return the output sequence. The encoder and decoder models are defined separately and then combined to form the final model.

In [80]:
# Define Encoder
encoder_input = Input(shape=(MAX_INPUT_LENGTH,))

encoder_embedding = Embedding(INPUT_VOCAB_SIZE, LATENT_DIM)(encoder_input)
encoder_lstm = LSTM(LATENT_DIM, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Define Decoder
decoder_input = Input(shape=(MAX_OUTPUT_LENGTH,))
decoder_embedding = Embedding(OUTPUT_VOCAB_SIZE, LATENT_DIM)(decoder_input)
decoder_lstm = LSTM(LATENT_DIM, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Softmax means output is a probability distribution, and enhances the maximum probability output
# dense layer is a regular densely-connected NN layer with softmax activation
decoder_dense = Dense(OUTPUT_VOCAB_SIZE, activation='softmax')
decoder_output = decoder_dense(decoder_outputs)

In [None]:
# Define the model
model = Model([encoder_input, decoder_input], decoder_output)

# Compile the model
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
print(model.summary())

None


## Training the Model
This is where we train the model. We use the encoder input and decoder input to predict the decoder output. The model is trained on the dataset of English sentences and their corresponding German sentences.

This takes a while to run. We can save the model and load it later.

### Explaination of the data set
encoder_input_train: Training data for the encoder (German sentences).
decoder_input_train: Training data for the decoder (English sentences with <start> token).
decoder_target_train: Target data for the decoder (English sentences).

encoder_input_val: Validation data for the encoder (German sentences).
decoder_input_val: Validation data for the decoder (English sentences with <start> token).
decoder_target_val: Target data for the decoder (English sentences).



In [None]:
# Data Set Preparation
from load_data import encoder_input_train, decoder_input_train, decoder_target_train, encoder_input_val, decoder_input_val, decoder_target_val
with tf.device('/GPU:0'):
  model.fit(
      [encoder_input_train, decoder_input_train],  # Inputs for encoder and decoder
      decoder_target_train,  # Target data for decoder
      batch_size=16,  # Adjust as needed
      epochs=30,  # Adjust as needed
      validation_data=([encoder_input_val, decoder_input_val], decoder_target_val),
      verbose=1
  )

Epoch 1/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 20ms/step - accuracy: 0.8721 - loss: 1.5921 - val_accuracy: 0.9268 - val_loss: 0.4965
Epoch 2/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 22ms/step - accuracy: 0.9323 - loss: 0.4434 - val_accuracy: 0.9351 - val_loss: 0.4370
Epoch 3/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 19ms/step - accuracy: 0.9385 - loss: 0.4010 - val_accuracy: 0.9396 - val_loss: 0.4094
Epoch 4/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 18ms/step - accuracy: 0.9414 - loss: 0.3718 - val_accuracy: 0.9407 - val_loss: 0.3958
Epoch 5/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 18ms/step - accuracy: 0.9417 - loss: 0.3584 - val_accuracy: 0.9420 - val_loss: 0.3872
Epoch 6/30
[1m281/281[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 18ms/step - accuracy: 0.9441 - loss: 0.3388 - val_accuracy: 0.9424 - val_loss: 0.3817
Epoch 7/30
[1m281

As you can see, it would take forever to train the model. If we use the ted dataset, lets use soemthing simpler.

In [81]:
model.save("/content/seq2seq_model.h5")



OSError: [Errno 30] Read-only file system: '/content'

In [103]:
# Load the model
from tensorflow.keras.models import load_model
from load_data import INPUT_VOCAB_SIZE, OUTPUT_VOCAB_SIZE, MAX_INPUT_LENGTH, MAX_OUTPUT_LENGTH, input_tokenizer, output_tokenizer
model = load_model("seq2seq_model.h5")

# set up the encoder and decoder, from the trained model
encoder_model = Model(encoder_input, encoder_states)

decoder_state_input_h = Input(shape=(LATENT_DIM,))
decoder_state_input_c = Input(shape=(LATENT_DIM,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_embedding = Embedding(OUTPUT_VOCAB_SIZE, LATENT_DIM)(decoder_input)

decoder_outputs, state_h, state_c = decoder_lstm(decoder_embedding, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)

decoder_model = Model(
    [decoder_input] + decoder_states_inputs,  # input: [decoder_input, h, c]
    [decoder_outputs] + decoder_states  # output: [output, h, c]
)

# map indexes back into real words
idx2word_input = {v:k for k, v in input_tokenizer.word_index.items()}
idx2word_target = {v:k for k, v in output_tokenizer.word_index.items()}
import numpy as np

def decode_sequence(input_seq):
    # Step 1: Get encoder states
    states_value = encoder_model.predict(input_seq)

    # Step 2: Generate empty target sequence of length 1
    target_seq = np.zeros((1, 1))
    target_seq[0, 0] = output_tokenizer.word_index['start']

    # Step 3: Loop to generate the translated sequence
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_word = idx2word_target.get(sampled_token_index, '<UNK>')

        # Append the sampled word to the decoded sentence
        decoded_sentence += ' ' + sampled_word

        # Exit condition: either hit max length or find stop token
        if (sampled_word == 'end' or len(decoded_sentence.split()) > MAX_OUTPUT_LENGTH):
            stop_condition = True

        # Update the target sequence (of length 1)
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]

    return decoded_sentence.strip()

def translate(input_text):
    # Tokenize the input sequence
    input_seq = input_tokenizer.texts_to_sequences([input_text])
    input_seq = tf.keras.preprocessing.sequence.pad_sequences(input_seq, maxlen=MAX_INPUT_LENGTH)

    # Get the translated sentence
    translated_sentence = decode_sequence(input_seq)
    return translated_sentence




In [104]:
# Test the model
print(translate("Ich bin ein Student."))  # I am a student.
print(translate("Ich bin traurig."))  # I am sad.
print(translate("Ich bin mude."))  # I am tired.
        



[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15