# Sequence to Sequence Learning with Keras
Author: Hayson Cheung [hayson.cheung@mail.utoronto.ca]

In this notebook, we learn from the works of Ilya Sutskever, Oriol Vinyals, Quoc V. Le, Sequence to Sequence Learning with Neural Networks, NIPS 2014. We will implement a simple sequence to sequence model using LSTM in Keras. The model will be trained on a dataset of English sentences and their corresponding German sentences. The model will be able to translate English sentences to German sentences.

We map sequences of English words to sequences of German words. The model is trained on a dataset of English sentences and their corresponding German sentences. The model will be able to translate English sentences to German sentences.

In [1]:
# sample.ipynb

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Embedding

In [2]:
# Parameters

# Latent dimension is the number of hidden units |h(t)| in the LSTM cell
LATENT_DIM = 256  

In [3]:
import load_data

load_data.main("de-en.tmx")

from load_data import INPUT_VOCAB_SIZE, OUTPUT_VOCAB_SIZE, MAX_INPUT_LENGTH, MAX_OUTPUT_LENGTH
print(f"Input vocab size: {INPUT_VOCAB_SIZE}")
print(f"Output vocab size: {OUTPUT_VOCAB_SIZE}")
print(f"Max input length: {MAX_INPUT_LENGTH}")
print(f"Max output length: {MAX_OUTPUT_LENGTH}")

Input vocab size: 98575
Output vocab size: 45853
Max input length: 663
Max output length: 631


## ENCODER and DECODER

In the two LSTM models, the encoder LSTM model will take the input sequence and return the encoder states. The decoder LSTM model will take the output sequence and the encoder states as input and return the output sequence. The encoder and decoder models are defined separately and then combined to form the final model.

In [4]:
# Define Encoder
encoder_input = Input(shape=(MAX_INPUT_LENGTH,))

encoder_embedding = Embedding(INPUT_VOCAB_SIZE, LATENT_DIM)(encoder_input)
encoder_lstm = LSTM(LATENT_DIM, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
encoder_states = [state_h, state_c]

# Define Decoder
decoder_input = Input(shape=(MAX_OUTPUT_LENGTH,))
decoder_embedding = Embedding(OUTPUT_VOCAB_SIZE, LATENT_DIM)(decoder_input)
decoder_lstm = LSTM(LATENT_DIM, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=encoder_states)

# Softmax means output is a probability distribution, and enhances the maximum probability output
# dense layer is a regular densely-connected NN layer with softmax activation
decoder_dense = Dense(OUTPUT_VOCAB_SIZE, activation='softmax')
decoder_output = decoder_dense(decoder_outputs)

In [5]:
# Define the model
model = Model([encoder_input, decoder_input], decoder_output)

# Compile the model
model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
print(model.summary())

None


## Training the Model
This is where we train the model. We use the encoder input and decoder input to predict the decoder output. The model is trained on the dataset of English sentences and their corresponding German sentences.

This takes a while to run. We can save the model and load it later.

### Explaination of the data set
encoder_input_train: Training data for the encoder (German sentences).
decoder_input_train: Training data for the decoder (English sentences with <start> token).
decoder_target_train: Target data for the decoder (English sentences).

encoder_input_val: Validation data for the encoder (German sentences).
decoder_input_val: Validation data for the decoder (English sentences with <start> token).
decoder_target_val: Target data for the decoder (English sentences).



In [None]:
# Data Set Preparation
from load_data import encoder_input_train, decoder_input_train, decoder_target_train, encoder_input_val, decoder_input_val, decoder_target_val

model.fit(
    [encoder_input_train, decoder_input_train],  # Inputs for encoder and decoder
    decoder_target_train,  # Target data for decoder
    batch_size=64,  # Adjust as needed
    epochs=30,  # Adjust as needed
    validation_data=([encoder_input_val, decoder_input_val], decoder_target_val),  # Validation data
    verbose=1
)

Epoch 1/30
[1m   1/3570[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m297:22:35[0m 300s/step - accuracy: 0.0000e+00 - loss: 10.7346

As you can see, it would take forever to train the model. 3570s * 300 epochs = 1071000s = 297.5 hours = 12.4 days. We can save the model and load it later. At this rate, it takes a whole year to train the model. We can save the model and load it later.