# Sequence-to-Sequence Model Training with LSTM

## Project Overview: Sequence-to-Sequence Model for Language Translation

In this project, my aim is to build a **Sequence-to-Sequence (Seq2Seq) neural network** using Long Short-Term Memory (LSTM) units to perform language translation. The Seq2Seq model is a specific type of Recurrent Neural Network (RNN) architecture that I’m using to tackle a common natural language processing task—translating text from one language to another. This approach is not only employed for translation but also in tasks like text summarisation and question answering.

## Project Objectives
The primary objective of this project is to train a neural network that can translate a given sequence of text in one language (the input) into another language (the output). To achieve this, I’m implementing an **encoder-decoder architecture**. This architecture encodes an input sequence into a fixed-size context vector, which the decoder then uses to generate an output sequence in a different language.

## Model Architecture
The model I’m using consists of two core components:

1. **Encoder**: This part processes the input sequence. It uses an LSTM layer to read the entire input and summarise it into a context vector. I use the final hidden and cell states from the encoder as the initial states for the decoder.
2. **Decoder**: The decoder takes the context vector from the encoder and generates the output sequence. It’s also an LSTM network, but it outputs a probability distribution over the tokens of the target language at each step.

## Key Features and Steps
- **Data Preprocessing**: I prepare the training data by tokenising and encoding the input and output sequences.
- **Hyperparameter Definition**: I define key parameters such as the number of hidden units (latent dimensions), batch size, and number of training epochs to fine-tune the model.
- **Model Compilation and Training**: I utilise the `rmsprop` optimiser and a `categorical_crossentropy` loss function, which are well-suited for multi-class prediction tasks.
- **Model Evaluation**: I include a validation split during training to monitor the model’s accuracy and performance on unseen data.

## Applications
The Seq2Seq model I’m building can be applied in various real-world scenarios, such as:
- **Language Translation**: Translating text between different languages.
- **Text Summarisation**: Condensing long documents into shorter summaries.
- **Chatbots and Conversational Agents**: Generating human-like responses for automated chat systems.

By the end of this project, my goal is to have a trained Seq2Seq model capable of translating sentences from a source language to a target language using the encoder-decoder architecture I designed.

## 1. Importing Required Libraries and Variables
First, I import the necessary data and modules from the `preprocessing` module. This includes information about the number of tokens in the encoder and decoder, as well as the data for training inputs and targets.

In [4]:
from preprocessing import num_encoder_tokens, num_decoder_tokens, decoder_target_data, encoder_input_data, decoder_input_data, decoder_target_data

['!', '$', ',', '.', '00', '17', '18', '19', '2', '30', '300', '50', '8', ':', '?', 'A', 'Abandon', 'Act', 'After', 'Aim', 'All', 'Am', 'Answer', 'Anybody', 'Anyone', 'Anything', 'Are', 'Arm', 'Arrive', 'Ask', 'Attack', 'Awesome', 'BMW', 'Back', 'Be', 'Bear', 'Beat', 'Beef', "Beer's", 'Begin', 'Behave', 'Beware', 'Birds', 'Bless', 'Blood', 'Boil', 'Boston', 'Bottoms', 'Boys', 'Break'] Agáchense
1752


I then import the Keras library from TensorFlow, and specifically add the Input, LSTM, and Dense layers along with the Model class.

In [9]:
from tensorflow import keras
# Add Dense to the imported layers
from keras.layers import Input, LSTM, Dense
from keras.models import Model

## 2. Handling Potential Mac Errors
If I am running this code on a Mac, I might run into an error due to a duplicated library issue. To avoid this, I set the KMP_DUPLICATE_LIB_OK environment variable to 'True'.

In [11]:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

## 3. Setting Hyperparameters
I define the dimensionality of the LSTM's internal state (latent space) as 256. This means the hidden state of my LSTM will have 256 dimensions. I also set the batch size to 50 and the number of epochs for training to 100.

In [14]:
latent_dim = 256
batch_size = 50
epochs = 100

## 4. Encoder Setup
For the encoder, I define an input layer that expects a sequence of unspecified length (None), with each element of the sequence having a dimension equal to the number of encoder tokens (num_encoder_tokens).

In [17]:
encoder_inputs = Input(shape=(None, num_encoder_tokens))

I then define an LSTM layer for the encoder with the specified latent dimension. I also specify that I want the layer to return both the hidden state and the cell state.

In [20]:
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_hidden, state_cell = encoder_lstm(encoder_inputs)
encoder_states = [state_hidden, state_cell]

## 5. Decoder Setup
For the decoder, I define a similar input layer for the decoder tokens. I then set up another LSTM layer that takes the encoder's final hidden and cell states as its initial states.

In [23]:
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, decoder_state_hidden, decoder_state_cell = decoder_lstm(decoder_inputs, initial_state=encoder_states)

Next, I define a Dense layer with a number of units equal to the number of decoder tokens (num_decoder_tokens) and use a softmax activation function to output a probability distribution for each timestep.

In [26]:
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

## 6. Building the Model
I construct the training model by linking the encoder and decoder input layers to the output of the decoder. This will be the main model I use for training.

In [29]:
training_model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

I print a summary of the model architecture to visualize its structure.

In [32]:
print("Model summary:\n")
training_model.summary()
print("\n\n")

Model summary:








## 7. Compiling the Model
Before training, I compile the model with the rmsprop optimizer and a categorical cross-entropy loss function, which is suitable for multi-class classification. I also track the accuracy as a metric.

In [35]:
training_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

## 8. Training the Model
I use the fit method to train the model, feeding it the encoder and decoder input data along with the target data. I also set a validation split of 20% to monitor the model's performance on unseen data during training.

In [38]:
training_model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size = batch_size, epochs = epochs, validation_split = 0.2)

Epoch 1/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 189ms/step - accuracy: 0.0607 - loss: 1.9045 - val_accuracy: 0.0728 - val_loss: 1.5400
Epoch 2/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 190ms/step - accuracy: 0.0840 - loss: 1.2910 - val_accuracy: 0.0818 - val_loss: 1.4902
Epoch 3/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 193ms/step - accuracy: 0.0866 - loss: 1.2453 - val_accuracy: 0.0831 - val_loss: 1.5165
Epoch 4/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 186ms/step - accuracy: 0.0916 - loss: 1.2113 - val_accuracy: 0.0906 - val_loss: 1.4757
Epoch 5/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 194ms/step - accuracy: 0.0948 - loss: 1.1769 - val_accuracy: 0.0906 - val_loss: 1.4635
Epoch 6/100
[1m96/96[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 191ms/step - accuracy: 0.0974 - loss: 1.1640 - val_accuracy: 0.0908 - val_loss: 1.4558
Epoch 7/100
[1m

<keras.src.callbacks.history.History at 0x140be8940>

## 9. Saving the Model
After training is complete, I save the model to a file named 'training_model.h5'.

In [None]:
training_model.save('training_model.h5')

This completes the training process for a basic sequence-to-sequence model using an encoder-decoder architecture with LSTM layers.