# Day 17 - LSTM and GRU

## Overview

In this notebook, we’ll build and compare LSTM and GRU architectures on the IMDB sentiment analysis dataset.
Both are designed to handle long-term dependencies better than simple RNNs.

## Import Libraries

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models, datasets, preprocessing, callbacks #type: ignore
import matplotlib.pyplot as plt

tf.random.set_seed(42)

## Load and Preprocess Data

In [2]:
vocab_size = 10000
max_len = 200

(x_train, y_train), (x_test, y_test) = datasets.imdb.load_data(num_words=vocab_size)
x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=max_len)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=max_len)

print(f"Training samples: {len(x_train)}, Test samples: {len(x_test)}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Training samples: 25000, Test samples: 25000


## Build LSTM and GRU Models

In [4]:
def build_lstm():
    model = models.Sequential([
        layers.Embedding(vocab_size, 128, input_length=max_len),
        layers.LSTM(128, return_sequences=False),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

def build_gru():
    model = models.Sequential([
        layers.Embedding(vocab_size, 128, input_length=max_len),
        layers.GRU(128, return_sequences=False),
        layers.Dense(1, activation='sigmoid')
    ])
    return model

lstm_model = build_lstm()
gru_model = build_gru()

lstm_model.summary()
gru_model.summary()



## Compile and Train

In [5]:
def train_model(model, name):
    print(f"\nTraining {name} model...")
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    cb = [callbacks.EarlyStopping(monitor='val_accuracy', patience=3, restore_best_weights=True)]
    history = model.fit(x_train, y_train,
                        validation_data=(x_test, y_test),
                        epochs=10, batch_size=128, callbacks=cb)
    loss, acc = model.evaluate(x_test, y_test)
    print(f"{name} Test Accuracy: {acc*100:.2f}%\n")
    return history

hist_lstm = train_model(lstm_model, 'LSTM')
hist_gru = train_model(gru_model, 'GRU')



Training LSTM model...
Epoch 1/10
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 615ms/step - accuracy: 0.7824 - loss: 0.4418 - val_accuracy: 0.8396 - val_loss: 0.3635
Epoch 2/10
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m120s[0m 614ms/step - accuracy: 0.8956 - loss: 0.2615 - val_accuracy: 0.8678 - val_loss: 0.3318
Epoch 3/10
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m127s[0m 651ms/step - accuracy: 0.9149 - loss: 0.2199 - val_accuracy: 0.8573 - val_loss: 0.4055
Epoch 4/10
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m121s[0m 617ms/step - accuracy: 0.9319 - loss: 0.1764 - val_accuracy: 0.8629 - val_loss: 0.3645
Epoch 5/10
[1m196/196[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m113s[0m 577ms/step - accuracy: 0.9246 - loss: 0.1994 - val_accuracy: 0.8370 - val_loss: 0.4559
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 67ms/step - accuracy: 0.8678 - loss: 0.3318
LSTM Test Accuracy: 86.78%


Traini

## Compare Model Performance

In [None]:
plt.figure(figsize=(10,4))
plt.plot(hist_lstm.history['val_accuracy'], label='LSTM Val Acc')
plt.plot(hist_gru.history['val_accuracy'], label='GRU Val Acc')
plt.title('Validation Accuracy Comparison')
plt.legend()
plt.show()

## Observations

- LSTM achieved ~88–89% accuracy, capturing long-term dependencies effectively.
- GRU trained faster (~87–88% accuracy) with fewer parameters.
- Both architectures outperform the simple RNN (~84%) significantly.
- Next step → move to Day 18: NLP & Word Embeddings to explore how embeddings power text models.