# Sentiment Analysis on IMDB Dataset using RNN, LSTM, and GRU Models

This project implements and compares three types of Recurrent Neural Networks (RNN): **SimpleRNN**, **LSTM (Long Short-Term Memory)**, and **GRU (Gated Recurrent Unit)** for sentiment analysis on the **IMDB movie review dataset**. 

## Steps:
1. **Data Loading**: The IMDB dataset is loaded using TensorFlow's `imdb.load_data()` method, restricting the vocabulary to the top 10,000 most frequent words.
2. **Data Preprocessing**: The sequences of words are padded to a maximum length of 500 to ensure consistent input size.
3. **Model Building**: A function is created to build models using different RNN cells (`SimpleRNN`, `LSTM`, `GRU`). The models use an embedding layer, followed by the RNN cell, and a dense layer with a sigmoid activation for binary sentiment classification.
4. **Training**: The models are trained for 10 epochs using the Adam optimizer, binary cross-entropy loss, and accuracy as the evaluation metric. Early stopping is applied to prevent overfitting.
5. **Evaluation**: The models' performance is evaluated on the test set, and the loss and accuracy are printed.

## Objective:
The goal of this project is to explore the performance of various RNN architectures (SimpleRNN, LSTM, GRU) in predicting sentiment (positive or negative) of movie reviews. The evaluation includes training on a set of preprocessed reviews, followed by testing the models' generalization ability on unseen data.

## Expected Outcome:
- Compare the test accuracy of SimpleRNN, LSTM, and GRU models.
- Identify which RNN architecture performs best for sentiment analysis on the IMDB dataset.


In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence

# Constants
VOCAB_SIZE = 10000
MAXLEN = 500
BATCH_SIZE = 64

print("Loading data...")
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=VOCAB_SIZE)
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')

print('Pad sequences (samples x time)')
input_train = sequence.pad_sequences(input_train, maxlen=MAXLEN)
input_test = sequence.pad_sequences(input_test, maxlen=MAXLEN)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)

def build_model(rnn_cell):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(VOCAB_SIZE, 32),
        rnn_cell(32),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

rnn_model = build_model(tf.keras.layers.SimpleRNN)

lstm_model = build_model(tf.keras.layers.LSTM)

gru_model = build_model(tf.keras.layers.GRU)

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

history = rnn_model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=BATCH_SIZE,
                    validation_split=0.2,
                    callbacks=[callback])

results = rnn_model.evaluate(input_test, y_test)
print(f'Test Loss: {results[0]} - Test Accuracy: {results[1]}')


Loading data...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 0us/step
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
input_train shape: (25000, 500)
input_test shape: (25000, 500)
Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 71ms/step - accuracy: 0.6013 - loss: 0.6451 - val_accuracy: 0.8100 - val_loss: 0.4371
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 75ms/step - accuracy: 0.7408 - loss: 0.5441 - val_accuracy: 0.8094 - val_loss: 0.4366
Epoch 3/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m24s[0m 76ms/step - accuracy: 0.8871 - loss: 0.2878 - val_accuracy: 0.8074 - val_loss: 0.4364
Epoch 4/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 85ms/step - accuracy: 0.9406 - loss: 0.1702 - val_accuracy: 0.8204 - val_loss: 0.4602
Epoch 5/10
[1m

## LSTM Model Training and Evaluation

In this section, the **LSTM (Long Short-Term Memory)** model is trained and evaluated for sentiment analysis on the **IMDB dataset**. The LSTM model is one of the most widely used RNN architectures, known for its ability to learn long-term dependencies in sequential data.

### Training:
- **Epochs**: The model is trained for 10 epochs, meaning it will iterate over the training data 10 times.
- **Batch Size**: A batch size of 64 is used, meaning that the model will update its weights after processing 64 samples at a time.
- **Validation Split**: 20% of the training data is used for validation during training, allowing for real-time monitoring of the model's performance on unseen data.
- **Early Stopping**: Early stopping is applied to prevent overfitting. If the validation loss does not improve after 2 epochs, training will stop.

### Evaluation:
After training the model, it is evaluated on the **test set** to assess its performance in terms of loss and accuracy. The evaluation metrics are:
- **Loss**: The binary cross-entropy loss, which measures how well the model predicts the sentiment (positive or negative).
- **Accuracy**: The accuracy metric, which shows the percentage of correct predictions made by the model on the test data.

### Output:
The results of the evaluation (loss and accuracy) are printed out, allowing you to compare the performance of the LSTM model with other RNN models such as SimpleRNN and GRU.

```python
Test Loss: {results_LSTM[0]} - Test Accuracy: {results_LSTM[1]}


In [2]:
history_LSTM = lstm_model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=BATCH_SIZE,
                    validation_split=0.2,
                    callbacks=[callback])

results_LSTM = lstm_model.evaluate(input_test, y_test)
print(f'Test Loss: {results_LSTM[0]} - Test Accuracy: {results_LSTM[1]}')

Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 109ms/step - accuracy: 0.6747 - loss: 0.5941 - val_accuracy: 0.8462 - val_loss: 0.3558
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 115ms/step - accuracy: 0.8864 - loss: 0.2870 - val_accuracy: 0.8726 - val_loss: 0.3172
Epoch 3/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 123ms/step - accuracy: 0.9241 - loss: 0.2112 - val_accuracy: 0.8790 - val_loss: 0.3055
Epoch 4/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 114ms/step - accuracy: 0.9449 - loss: 0.1550 - val_accuracy: 0.8704 - val_loss: 0.3379
Epoch 5/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 123ms/step - accuracy: 0.9607 - loss: 0.1179 - val_accuracy: 0.8704 - val_loss: 0.3490
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 26ms/step - accuracy: 0.8645 - loss: 0.3683
Test Loss: 0.36148661375045776 - Test Accuracy: 0.86555999517440

## GRU Model Training and Evaluation

In this section, the **GRU (Gated Recurrent Unit)** model is trained and evaluated for sentiment analysis on the **IMDB dataset**. The GRU model is another type of recurrent neural network (RNN) designed to solve the vanishing gradient problem in traditional RNNs, while being computationally more efficient than LSTMs.

### Training:
- **Epochs**: The model is trained for 10 epochs, meaning it iterates over the training data 10 times.
- **Batch Size**: A batch size of 64 is used, meaning that the model updates its weights after processing 64 samples at a time.
- **Validation Split**: 20% of the training data is used for validation during training, allowing for real-time monitoring of the model's performance on unseen data.
- **Early Stopping**: Early stopping is applied to prevent overfitting. If the validation loss does not improve after 2 epochs, training stops early.

### Evaluation:
After training, the model is evaluated on the **test set** to measure its performance in terms of loss and accuracy. The evaluation metrics are:
- **Loss**: The binary cross-entropy loss, which measures how well the model predicts sentiment (positive or negative).
- **Accuracy**: The accuracy metric, which shows the percentage of correct predictions made by the model on the test data.

### Output:
The results of the evaluation (loss and accuracy) are printed, allowing you to compare the performance of the GRU model with other RNN models such as SimpleRNN and LSTM.

```python
Test Loss: {results_GRU[0]} - Test Accuracy: {results_GRU[1]}


In [3]:
history_GRU = gru_model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=BATCH_SIZE,
                    validation_split=0.2,
                    callbacks=[callback])

results_GRU = gru_model.evaluate(input_test, y_test)
print(f'Test Loss: {results_GRU[0]} - Test Accuracy: {results_GRU[1]}')

Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 128ms/step - accuracy: 0.6628 - loss: 0.5784 - val_accuracy: 0.8442 - val_loss: 0.3581
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 148ms/step - accuracy: 0.8946 - loss: 0.2744 - val_accuracy: 0.8684 - val_loss: 0.3231
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 29ms/step - accuracy: 0.8569 - loss: 0.3382
Test Loss: 0.33401742577552795 - Test Accuracy: 0.8593599796295166


## LSTM Model with Attention

In this section, we extend the basic **LSTM (Long Short-Term Memory)** model by incorporating an **Attention Mechanism** to improve the model's ability to focus on important parts of the input sequence for sentiment analysis on the **IMDB dataset**.

### Model Architecture:
1. **Input Layer**: The input layer accepts sequences of integer-encoded words, each sequence having a maximum length of 500 words (`MAXLEN`).
2. **Embedding Layer**: The embedding layer is used to convert the integer-encoded words into dense vectors of fixed size (32 dimensions). The vocabulary size is set to 10,000 (`VOCAB_SIZE`).
3. **LSTM Layer**: An **LSTM** layer is used to process the input sequence and capture long-term dependencies in the data. The number of units in the LSTM is set to 32 (`rnn_units`), and it outputs the full sequence (due to `return_sequences=True`).
4. **Attention Mechanism**: The attention layer is applied to the output of the LSTM layer. It allows the model to focus on important parts of the sequence when making predictions. The attention mechanism computes a weighted sum of the inputs, which helps the model better understand the context of each word in the sequence.
5. **Flatten Layer**: The attention output is flattened to a one-dimensional vector.
6. **Dense Layer**: The final dense layer with a sigmoid activation function produces a binary output (sentiment: positive or negative).

### Training Configuration:
- **Optimizer**: Adam optimizer is used to minimize the binary cross-entropy loss, which is commonly used for binary classification tasks.
- **Loss Function**: Binary cross-entropy loss, appropriate for binary classification problems like sentiment analysis.
- **Metrics**: Accuracy is used as the evaluation metric to measure the percentage of correct predictions.

### Model Summary:
The model summary is printed to show the architecture of the model, including the number of parameters at each layer and the overall model.

### Next Steps:
This model can be trained on the IMDB dataset (or any other sequence-based data) by fitting it to the training data and evaluating its performance on the test data. The addition of the attention mechanism should improve the model's performance, especially in handling long sequences with varying levels of importance in different parts of the input.

```python
lstm_model_with_attention.summary()


In [4]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Flatten
from tensorflow.keras.models import Model

VOCAB_SIZE = 10000
MAXLEN = 500

def build_model2(rnn_cell, rnn_units=32):
    sequence_input = Input(shape=(MAXLEN,), dtype='int32')
    embedded_sequences = Embedding(VOCAB_SIZE, rnn_units)(sequence_input)
    rnn_output = rnn_cell(rnn_units, return_sequences=True)(embedded_sequences)

    query_value_attention_seq = tf.keras.layers.Attention()([rnn_output, rnn_output])

    attention_flatten = Flatten()(query_value_attention_seq)

    dense_output = Dense(1, activation='sigmoid')(attention_flatten)

    model = Model(inputs=sequence_input, outputs=dense_output)

    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    return model

# LSTM Model with Attention
lstm_model_with_attention = build_model2(tf.keras.layers.LSTM)
lstm_model_with_attention.summary()


## Training and Evaluation of LSTM Model with Attention

In this section, the **LSTM model with Attention** is trained on the **IMDB dataset** to perform sentiment analysis and evaluate its performance on a test set.

### Steps:
1. **Model Training**:
   The model is trained using the `fit` method for 10 epochs with a batch size of 64. A validation split of 20% is used, meaning that 20% of the training data is set aside for validation during training. The **EarlyStopping callback** is used to stop the training early if the validation loss does not improve, which prevents overfitting.

   ```python
   history_ = lstm_model_with_attention.fit(input_train, y_train,
                       epochs=10,
                       batch_size=BATCH_SIZE,
                       validation_split=0.2,
                       callbacks=[callback])


In [5]:
history_ = lstm_model_with_attention.fit(input_train, y_train,
                    epochs=10,
                    batch_size=BATCH_SIZE,
                    validation_split=0.2,
                    callbacks=[callback])

results_ = lstm_model_with_attention.evaluate(input_test, y_test)
print(f'Test Loss: {results_[0]} - Test Accuracy: {results_[1]}')

Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 195ms/step - accuracy: 0.6152 - loss: 0.6075 - val_accuracy: 0.8664 - val_loss: 0.3283
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 190ms/step - accuracy: 0.9004 - loss: 0.2542 - val_accuracy: 0.8912 - val_loss: 0.2830
Epoch 3/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 193ms/step - accuracy: 0.9355 - loss: 0.1692 - val_accuracy: 0.8896 - val_loss: 0.3176
Epoch 4/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 197ms/step - accuracy: 0.9534 - loss: 0.1196 - val_accuracy: 0.8890 - val_loss: 0.3553
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 35ms/step - accuracy: 0.8812 - loss: 0.3617
Test Loss: 0.36264434456825256 - Test Accuracy: 0.8790000081062317
