# IMDb Sentiment Analysis with Bidirectional LSTM

This notebook builds a sentiment classifier for IMDb movie reviews using TensorFlow/Keras. We walk through loading the dataset, preparing padded sequences, training a bidirectional LSTM, and evaluating performance.

**Pipeline overview**

1. Setup and imports  
2. Load and inspect the IMDb dataset  
3. Preprocess sequences with padding and exploratory checks  
4. Train a bidirectional LSTM baseline  
5. Evaluate accuracy and error patterns, then outline next steps

Dataset: TensorFlow/Keras IMDb reviews (25k train / 25k test) with integer-encoded tokens sorted by frequency.


## 1. Setup and Imports

Configure core libraries, reproducibility seeds, and training hyperparameters used throughout the workflow.


In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras import layers, models, callbacks
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
from sklearn.metrics import classification_report, confusion_matrix

# Reproducibility
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

# Hyperparameters
NUM_WORDS = 10000        # keep the top-N most frequent tokens
MAX_LEN = 200            # pad / truncate each review to this length
EMBEDDING_DIM = 64
LSTM_UNITS = 64
BATCH_SIZE = 128
EPOCHS = 12
VALIDATION_SPLIT = 0.2

print(f"TensorFlow version: {tf.__version__}")
print(f"Working directory: {os.getcwd()}")


## 2. Load and Inspect the IMDb Dataset

Download the IMDb movie-review dataset limited to the `NUM_WORDS` most frequent tokens. Each review is a sequence of integer token IDs ordered by word frequency.


In [None]:
(x_train_raw, y_train), (x_test_raw, y_test) = imdb.load_data(num_words=NUM_WORDS, index_from=3)

print(f"Training samples: {len(x_train_raw):,}")
print(f"Test samples: {len(x_test_raw):,}")

# Reverse dictionary for interpretability
word_index = imdb.get_word_index()
word_index = {k: (v + 3) for k, v in word_index.items()}
word_index['<PAD>'] = 0
word_index['<START>'] = 1
word_index['<UNK>'] = 2
word_index['<UNUSED>'] = 3
reverse_word_index = {value: key for key, value in word_index.items()}

def decode_review(sequence):
    return ' '.join(reverse_word_index.get(i, '?') for i in sequence)

label_map = {0: 'negative', 1: 'positive'}

print("Example decoded review:")
print(decode_review(x_train_raw[0])[:500], "...")
print(f"Label: {label_map[y_train[0]]}")

class_distribution = pd.Series(y_train).value_counts().sort_index()
print("
Class distribution (train):")
print(class_distribution.rename(index=label_map))


## 3. Preprocess: Padding and Exploratory Checks

Pad each review to `MAX_LEN` tokens to obtain fixed-length tensors and inspect raw-length statistics to justify the padding choice.


In [None]:
train_lengths = [len(seq) for seq in x_train_raw]
test_lengths = [len(seq) for seq in x_test_raw]

x_train = pad_sequences(
    x_train_raw,
    maxlen=MAX_LEN,
    padding='post',
    truncating='post'
)
x_test = pad_sequences(
    x_test_raw,
    maxlen=MAX_LEN,
    padding='post',
    truncating='post'
)

print(f"x_train shape: {x_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"Average review length (train): {np.mean(train_lengths):.1f} tokens")
print(f"95th percentile length: {np.percentile(train_lengths, 95):.0f} tokens")

fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(train_lengths, bins=40, color='steelblue', alpha=0.7, label='train')
ax.axvline(MAX_LEN, color='red', linestyle='--', label=f'MAX_LEN={MAX_LEN}')
ax.set_xlabel('Review length (tokens)')
ax.set_ylabel('Count')
ax.set_title('Distribution of raw review lengths')
ax.legend()
plt.tight_layout()
plt.show()


## 4. Build the Bidirectional LSTM Model

Stack an embedding layer with a bidirectional LSTM to capture contextual information from both directions, then pool and classify with dense layers.


In [None]:
model = models.Sequential([
    layers.Embedding(input_dim=NUM_WORDS, output_dim=EMBEDDING_DIM, input_length=MAX_LEN),
    layers.Bidirectional(layers.LSTM(LSTM_UNITS, return_sequences=True)),
    layers.Dropout(0.3),
    layers.GlobalMaxPool1D(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.summary()


## 5. Train the Model

Fit the network with early stopping on validation accuracy to reduce overfitting.


In [None]:
early_stop = callbacks.EarlyStopping(
    monitor='val_accuracy',
    patience=2,
    restore_best_weights=True
)

history = model.fit(
    x_train,
    y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_split=VALIDATION_SPLIT,
    callbacks=[early_stop],
    verbose=1
)


## 6. Evaluate and Interpret Results

Assess generalisation on the test set, inspect detailed classification metrics, and visualise the confusion matrix. Plot learning curves to understand optimisation behaviour.


In [None]:
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {test_loss:.4f}")
print(f"Test accuracy: {test_accuracy:.4f}")

y_pred_prob = model.predict(x_test, verbose=0)
y_pred = (y_pred_prob >= 0.5).astype('int32').ravel()

print("
Classification report:")
print(classification_report(y_test, y_pred, target_names=['negative', 'positive']))

cm = confusion_matrix(y_test, y_pred)
fig, ax = plt.subplots(figsize=(4, 4))
im = ax.imshow(cm, cmap='Blues')
ax.set_xticks([0, 1])
ax.set_yticks([0, 1])
ax.set_xticklabels(['pred neg', 'pred pos'])
ax.set_yticklabels(['true neg', 'true pos'])
ax.set_title('Confusion matrix')
for i in range(2):
    for j in range(2):
        ax.text(j, i, cm[i, j], ha='center', va='center', color='black')
plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
plt.tight_layout()
plt.show()


In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

if history.history.get('accuracy'):
    axes[0].plot(history.history['accuracy'], label='train')
    axes[0].plot(history.history.get('val_accuracy', []), label='val')
    axes[0].set_title('Accuracy')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].legend()
    axes[0].grid(True)
else:
    axes[0].text(0.5, 0.5, 'Accuracy not tracked', ha='center')
    axes[0].set_axis_off()

if history.history.get('loss'):
    axes[1].plot(history.history['loss'], label='train')
    axes[1].plot(history.history.get('val_loss', []), label='val')
    axes[1].set_title('Loss')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].legend()
    axes[1].grid(True)
else:
    axes[1].text(0.5, 0.5, 'Loss not tracked', ha='center')
    axes[1].set_axis_off()

plt.tight_layout()
plt.show()


In [None]:
np.random.seed(SEED)
sample_indices = np.random.choice(len(x_test), size=3, replace=False)

for idx in sample_indices:
    prob = float(y_pred_prob[idx])
    predicted = 'positive' if prob >= 0.5 else 'negative'
    actual = label_map[int(y_test[idx])]
    print(f"Review #{idx}")
    print(f"True: {actual} | Predicted: {predicted} ({prob:.2f})")
    print(decode_review(x_test_raw[idx])[:400], '...')
    print('-' * 80)


## Next Steps

- Explore alternative model architectures (stacked LSTMs, GRUs, or convolutional front-ends) and compare validation metrics.
- Increase `NUM_WORDS` or `MAX_LEN` and observe the trade-off between vocabulary coverage and training cost.
- Add regularisation such as recurrent dropout or L2 penalties, or incorporate pre-trained word embeddings (e.g., GloVe) for richer representations.
