# Bidirectional RNN for Text Classification

## Overview
This notebook implements Bidirectional RNN models for text classification tasks using three different datasets:
1. **IMDb Dataset** - Movie review sentiment analysis
2. **ReviewTokoBaju.csv** - Clothing review sentiment analysis
3. **DeteksiSarkasme.json** - Sarcasm detection

## Objectives
- Build BiRNN models using TensorFlow/Keras
- Achieve minimum 90% accuracy on both training and testing sets
- Implement comprehensive evaluation metrics (Accuracy, Precision, Recall, F1-Score, AUC, ROC)
- Perform hyperparameter tuning
- Visualize training metrics and confusion matrices

## Hardware Recommendation
This notebook is optimized for Google Colab with T4 GPU or TPU for faster training.

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import re
import warnings
warnings.filterwarnings('ignore')

# TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Embedding, LSTM, GRU, Bidirectional, Dense, Dropout, GlobalMaxPooling1D, Input
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.utils import to_categorical

# Scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix, classification_report, roc_curve
from sklearn.preprocessing import LabelEncoder

# NLTK for text preprocessing
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download required NLTK data
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)

print("All libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

In [None]:
# Text preprocessing functions
def clean_text(text):
    """Clean and preprocess text data"""
    if pd.isna(text):
        return ""
    
    # Convert to lowercase
    text = str(text).lower()
    
    # Remove HTML tags
    text = re.sub(r'<.*?>', '', text)
    
    # Remove special characters and numbers
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    
    # Remove extra whitespace
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

def preprocess_text(text, remove_stopwords=True, lemmatize=True):
    """Advanced text preprocessing with stopword removal and lemmatization"""
    text = clean_text(text)
    
    # Tokenization
    tokens = word_tokenize(text)
    
    # Remove stopwords
    if remove_stopwords:
        stop_words = set(stopwords.words('english'))
        tokens = [token for token in tokens if token not in stop_words]
    
    # Lemmatization
    if lemmatize:
        lemmatizer = WordNetLemmatizer()
        tokens = [lemmatizer.lemmatize(token) for token in tokens]
    
    return ' '.join(tokens)

# Evaluation functions
def plot_training_history(history, title="Model Training History"):
    """Plot training and validation accuracy and loss"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Accuracy plot
    ax1.plot(history.history['accuracy'], label='Training Accuracy', marker='o')
    ax1.plot(history.history['val_accuracy'], label='Validation Accuracy', marker='s')
    ax1.set_title(f'{title} - Accuracy')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Accuracy')
    ax1.legend()
    ax1.grid(True)
    
    # Loss plot
    ax2.plot(history.history['loss'], label='Training Loss', marker='o')
    ax2.plot(history.history['val_loss'], label='Validation Loss', marker='s')
    ax2.set_title(f'{title} - Loss')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Loss')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

def evaluate_model(model, X_test, y_test, y_pred_proba, dataset_name):
    """Comprehensive model evaluation"""
    # Predictions
    y_pred = (y_pred_proba > 0.5).astype(int)
    
    # Metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    # AUC-ROC (handle binary and multiclass)
    if len(np.unique(y_test)) == 2:
        auc = roc_auc_score(y_test, y_pred_proba)
    else:
        auc = roc_auc_score(y_test, y_pred_proba, multi_class='ovr', average='weighted')
    
    print(f"\n=== {dataset_name} Model Evaluation ===")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1-Score: {f1:.4f}")
    print(f"AUC-ROC: {auc:.4f}")
    
    # Confusion Matrix
    cm = confusion_matrix(y_test, y_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(f'{dataset_name} - Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()
    
    # Classification Report
    print(f"\nClassification Report:\n{classification_report(y_test, y_pred)}")
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'auc_roc': auc
    }

def plot_roc_curve(y_test, y_pred_proba, dataset_name):
    """Plot ROC curve for binary classification"""
    if len(np.unique(y_test)) == 2:
        fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
        auc = roc_auc_score(y_test, y_pred_proba)
        
        plt.figure(figsize=(8, 6))
        plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {auc:.4f})')
        plt.plot([0, 1], [0, 1], 'k--', label='Random Classifier')
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title(f'{dataset_name} - ROC Curve')
        plt.legend()
        plt.grid(True)
        plt.show()

print("Preprocessing and evaluation functions defined successfully!")

## Dataset 1: IMDb Movie Reviews

IMDb dataset contains movie reviews with binary sentiment labels (positive/negative). This is a classic sentiment analysis task perfect for BiRNN implementation.

In [None]:
# Load IMDb dataset
print("Loading IMDb dataset...")
(X_train_imdb, y_train_imdb), (X_test_imdb, y_test_imdb) = tf.keras.datasets.imdb.load_data(num_words=10000)

# Get word index
word_index = tf.keras.datasets.imdb.get_word_index()
reverse_word_index = {v: k for k, v in word_index.items()}

def decode_review(encoded_review):
    """Decode numerical review back to text"""
    return ' '.join([reverse_word_index.get(i-3, '?') for i in encoded_review])

# Convert back to text for preprocessing
print("Converting encoded reviews back to text...")
X_train_imdb_text = [decode_review(review) for review in X_train_imdb]
X_test_imdb_text = [decode_review(review) for review in X_test_imdb]

# Preprocess text data
print("Preprocessing IMDb text data...")
X_train_imdb_clean = [preprocess_text(text) for text in X_train_imdb_text]
X_test_imdb_clean = [preprocess_text(text) for text in X_test_imdb_text]

# Tokenization and padding
tokenizer_imdb = Tokenizer(num_words=10000, oov_token='<OOV>')
tokenizer_imdb.fit_on_texts(X_train_imdb_clean)

X_train_imdb_seq = tokenizer_imdb.texts_to_sequences(X_train_imdb_clean)
X_test_imdb_seq = tokenizer_imdb.texts_to_sequences(X_test_imdb_clean)

# Padding sequences
max_length_imdb = 500
X_train_imdb_pad = pad_sequences(X_train_imdb_seq, maxlen=max_length_imdb, padding='post', truncating='post')
X_test_imdb_pad = pad_sequences(X_test_imdb_seq, maxlen=max_length_imdb, padding='post', truncating='post')

print(f"IMDb dataset shape:")
print(f"Training set: {X_train_imdb_pad.shape}")
print(f"Test set: {X_test_imdb_pad.shape}")
print(f"Labels distribution - Train: {np.bincount(y_train_imdb)}")
print(f"Labels distribution - Test: {np.bincount(y_test_imdb)}")

# Display sample data
print(f"\nSample original review: {X_train_imdb_text[0][:200]}...")
print(f"Sample preprocessed review: {X_train_imdb_clean[0][:200]}...")
print(f"Label: {y_train_imdb[0]} ({'Positive' if y_train_imdb[0] == 1 else 'Negative'})")

In [None]:
# Create Bidirectional RNN model for IMDb
def create_birnn_model(vocab_size, embedding_dim, max_length, rnn_units, dropout_rate):
    """Create a Bidirectional RNN model"""
    model = Sequential([
        Embedding(vocab_size, embedding_dim, input_length=max_length),
        Bidirectional(LSTM(rnn_units, return_sequences=True, dropout=dropout_rate, recurrent_dropout=dropout_rate)),
        Bidirectional(LSTM(rnn_units//2, dropout=dropout_rate, recurrent_dropout=dropout_rate)),
        Dropout(dropout_rate),
        Dense(64, activation='relu'),
        Dropout(dropout_rate),
        Dense(1, activation='sigmoid')
    ])
    return model

# Hyperparameters for IMDb
vocab_size_imdb = 10000
embedding_dim_imdb = 128
rnn_units_imdb = 64
dropout_rate_imdb = 0.3

# Create model
model_imdb = create_birnn_model(
    vocab_size=vocab_size_imdb,
    embedding_dim=embedding_dim_imdb,
    max_length=max_length_imdb,
    rnn_units=rnn_units_imdb,
    dropout_rate=dropout_rate_imdb
)

# Compile model
model_imdb.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("IMDb BiRNN Model Architecture:")
model_imdb.summary()

In [None]:
# Training IMDb model
print("Training IMDb BiRNN model...")

# Callbacks
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=0.0001)

# Train model
history_imdb = model_imdb.fit(
    X_train_imdb_pad, y_train_imdb,
    validation_split=0.2,
    epochs=20,
    batch_size=128,
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

# Plot training history
plot_training_history(history_imdb, "IMDb BiRNN")

# Evaluate on test set
print("\nEvaluating IMDb model on test set...")
test_loss, test_accuracy = model_imdb.evaluate(X_test_imdb_pad, y_test_imdb, verbose=0)
print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test Loss: {test_loss:.4f}")

# Predictions for detailed evaluation
y_pred_proba_imdb = model_imdb.predict(X_test_imdb_pad).flatten()
results_imdb = evaluate_model(model_imdb, X_test_imdb_pad, y_test_imdb, y_pred_proba_imdb, "IMDb")

# Plot ROC curve
plot_roc_curve(y_test_imdb, y_pred_proba_imdb, "IMDb")

## Dataset 2: ReviewTokoBaju.csv

This dataset contains clothing reviews with ratings. We'll convert the ratings to binary sentiment labels for classification.

In [None]:
# Load ReviewTokoBaju dataset
print("Loading ReviewTokoBaju dataset...")
df_review = pd.read_csv(r"d:\Backup\GitHub\DeepLearning\05. Week 5\Dataset\ReviewTokoBaju.csv")

print(f"Dataset shape: {df_review.shape}")
print(f"Columns: {df_review.columns.tolist()}")
print("\nFirst few rows:")
print(df_review.head())

# Check rating distribution
print(f"\nRating distribution:")
print(df_review['Rating'].value_counts().sort_index())

# Data preprocessing
# Combine Title and Review Text for more context
df_review['Combined_Text'] = df_review['Title'].fillna('') + ' ' + df_review['Review Text'].fillna('')
df_review['Combined_Text'] = df_review['Combined_Text'].str.strip()

# Remove rows with empty text
df_review = df_review[df_review['Combined_Text'].str.len() > 0].copy()

# Convert ratings to binary sentiment (1-3: negative, 4-5: positive)
df_review['Sentiment'] = (df_review['Rating'] >= 4).astype(int)

print(f"\nAfter preprocessing:")
print(f"Dataset shape: {df_review.shape}")
print(f"Sentiment distribution: {df_review['Sentiment'].value_counts()}")

# Preprocess text
print("Preprocessing text data...")
df_review['Clean_Text'] = df_review['Combined_Text'].apply(preprocess_text)

# Remove rows with very short text after preprocessing
df_review = df_review[df_review['Clean_Text'].str.len() >= 10].copy()

print(f"Final dataset shape: {df_review.shape}")

# Split data
X_review = df_review['Clean_Text'].values
y_review = df_review['Sentiment'].values

X_train_review, X_test_review, y_train_review, y_test_review = train_test_split(
    X_review, y_review, test_size=0.2, random_state=42, stratify=y_review
)

print(f"Training set: {X_train_review.shape}")
print(f"Test set: {X_test_review.shape}")
print(f"Training labels distribution: {np.bincount(y_train_review)}")
print(f"Test labels distribution: {np.bincount(y_test_review)}")

# Display sample
print(f"\nSample review: {X_review[0][:200]}...")
print(f"Sample clean text: {df_review['Clean_Text'].iloc[0][:200]}...")
print(f"Sentiment: {y_review[0]} ({'Positive' if y_review[0] == 1 else 'Negative'})")

In [None]:
# Tokenization and padding for ReviewTokoBaju
tokenizer_review = Tokenizer(num_words=15000, oov_token='<OOV>')
tokenizer_review.fit_on_texts(X_train_review)

X_train_review_seq = tokenizer_review.texts_to_sequences(X_train_review)
X_test_review_seq = tokenizer_review.texts_to_sequences(X_test_review)

# Padding sequences
max_length_review = 300
X_train_review_pad = pad_sequences(X_train_review_seq, maxlen=max_length_review, padding='post', truncating='post')
X_test_review_pad = pad_sequences(X_test_review_seq, maxlen=max_length_review, padding='post', truncating='post')

print(f"ReviewTokoBaju sequences shape:")
print(f"Training set: {X_train_review_pad.shape}")
print(f"Test set: {X_test_review_pad.shape}")

# Hyperparameters for ReviewTokoBaju
vocab_size_review = 15000
embedding_dim_review = 100
rnn_units_review = 80
dropout_rate_review = 0.4

# Create model
model_review = create_birnn_model(
    vocab_size=vocab_size_review,
    embedding_dim=embedding_dim_review,
    max_length=max_length_review,
    rnn_units=rnn_units_review,
    dropout_rate=dropout_rate_review
)

# Compile model
model_review.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("ReviewTokoBaju BiRNN Model Architecture:")
model_review.summary()

In [None]:
# Training ReviewTokoBaju model
print("Training ReviewTokoBaju BiRNN model...")

# Callbacks
early_stopping_review = EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True)
reduce_lr_review = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=0.0001)

# Train model
history_review = model_review.fit(
    X_train_review_pad, y_train_review,
    validation_split=0.2,
    epochs=25,
    batch_size=64,
    callbacks=[early_stopping_review, reduce_lr_review],
    verbose=1
)

# Plot training history
plot_training_history(history_review, "ReviewTokoBaju BiRNN")

# Evaluate on test set
print("\nEvaluating ReviewTokoBaju model on test set...")
test_loss_review, test_accuracy_review = model_review.evaluate(X_test_review_pad, y_test_review, verbose=0)
print(f"Test Accuracy: {test_accuracy_review:.4f}")
print(f"Test Loss: {test_loss_review:.4f}")

# Predictions for detailed evaluation
y_pred_proba_review = model_review.predict(X_test_review_pad).flatten()
results_review = evaluate_model(model_review, X_test_review_pad, y_test_review, y_pred_proba_review, "ReviewTokoBaju")

# Plot ROC curve
plot_roc_curve(y_test_review, y_pred_proba_review, "ReviewTokoBaju")

## Dataset 3: DeteksiSarkasme.json

This dataset contains news headlines with sarcasm labels. This is a challenging task as sarcasm detection requires understanding context and subtle language patterns.

In [None]:
# Load DeteksiSarkasme dataset
print("Loading DeteksiSarkasme dataset...")
sarcasm_data = []

with open(r"d:\Backup\GitHub\DeepLearning\06. Week 6\Dataset\DeteksiSarkasme.json", 'r', encoding='utf-8') as file:
    for line in file:
        sarcasm_data.append(json.loads(line))

df_sarcasm = pd.DataFrame(sarcasm_data)

print(f"Dataset shape: {df_sarcasm.shape}")
print(f"Columns: {df_sarcasm.columns.tolist()}")
print("\nFirst few rows:")
print(df_sarcasm.head())

# Check label distribution
print(f"\nSarcasm distribution:")
print(df_sarcasm['is_sarcastic'].value_counts())

# Data preprocessing
print("Preprocessing sarcasm text data...")
df_sarcasm['Clean_Headline'] = df_sarcasm['headline'].apply(preprocess_text)

# Remove rows with very short text after preprocessing
df_sarcasm = df_sarcasm[df_sarcasm['Clean_Headline'].str.len() >= 5].copy()

print(f"After preprocessing:")
print(f"Dataset shape: {df_sarcasm.shape}")
print(f"Sarcasm distribution: {df_sarcasm['is_sarcastic'].value_counts()}")

# Split data
X_sarcasm = df_sarcasm['Clean_Headline'].values
y_sarcasm = df_sarcasm['is_sarcastic'].values

X_train_sarcasm, X_test_sarcasm, y_train_sarcasm, y_test_sarcasm = train_test_split(
    X_sarcasm, y_sarcasm, test_size=0.2, random_state=42, stratify=y_sarcasm
)

print(f"Training set: {X_train_sarcasm.shape}")
print(f"Test set: {X_test_sarcasm.shape}")
print(f"Training labels distribution: {np.bincount(y_train_sarcasm)}")
print(f"Test labels distribution: {np.bincount(y_test_sarcasm)}")

# Display samples
print(f"\nSample non-sarcastic headline: {df_sarcasm[df_sarcasm['is_sarcastic']==0]['headline'].iloc[0]}")
print(f"Sample sarcastic headline: {df_sarcasm[df_sarcasm['is_sarcastic']==1]['headline'].iloc[0]}")
print(f"Sample clean headline: {df_sarcasm['Clean_Headline'].iloc[0]}")

In [None]:
# Tokenization and padding for DeteksiSarkasme
tokenizer_sarcasm = Tokenizer(num_words=12000, oov_token='<OOV>')
tokenizer_sarcasm.fit_on_texts(X_train_sarcasm)

X_train_sarcasm_seq = tokenizer_sarcasm.texts_to_sequences(X_train_sarcasm)
X_test_sarcasm_seq = tokenizer_sarcasm.texts_to_sequences(X_test_sarcasm)

# Padding sequences (headlines are shorter than reviews)
max_length_sarcasm = 100
X_train_sarcasm_pad = pad_sequences(X_train_sarcasm_seq, maxlen=max_length_sarcasm, padding='post', truncating='post')
X_test_sarcasm_pad = pad_sequences(X_test_sarcasm_seq, maxlen=max_length_sarcasm, padding='post', truncating='post')

print(f"DeteksiSarkasme sequences shape:")
print(f"Training set: {X_train_sarcasm_pad.shape}")
print(f"Test set: {X_test_sarcasm_pad.shape}")

# Hyperparameters for DeteksiSarkasme
vocab_size_sarcasm = 12000
embedding_dim_sarcasm = 128
rnn_units_sarcasm = 64
dropout_rate_sarcasm = 0.5

# Create model (with slightly different architecture for sarcasm detection)
def create_sarcasm_birnn_model(vocab_size, embedding_dim, max_length, rnn_units, dropout_rate):
    """Create a specialized Bidirectional RNN model for sarcasm detection"""
    model = Sequential([
        Embedding(vocab_size, embedding_dim, input_length=max_length),
        Bidirectional(LSTM(rnn_units, return_sequences=True, dropout=dropout_rate, recurrent_dropout=dropout_rate)),
        Bidirectional(GRU(rnn_units//2, return_sequences=True, dropout=dropout_rate, recurrent_dropout=dropout_rate)),
        GlobalMaxPooling1D(),
        Dropout(dropout_rate),
        Dense(128, activation='relu'),
        Dropout(dropout_rate),
        Dense(64, activation='relu'),
        Dropout(dropout_rate),
        Dense(1, activation='sigmoid')
    ])
    return model

model_sarcasm = create_sarcasm_birnn_model(
    vocab_size=vocab_size_sarcasm,
    embedding_dim=embedding_dim_sarcasm,
    max_length=max_length_sarcasm,
    rnn_units=rnn_units_sarcasm,
    dropout_rate=dropout_rate_sarcasm
)

# Compile model
model_sarcasm.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("DeteksiSarkasme BiRNN Model Architecture:")
model_sarcasm.summary()

In [None]:
# Training DeteksiSarkasme model
print("Training DeteksiSarkasme BiRNN model...")

# Callbacks
early_stopping_sarcasm = EarlyStopping(monitor='val_accuracy', patience=7, restore_best_weights=True)
reduce_lr_sarcasm = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=4, min_lr=0.0001)

# Train model
history_sarcasm = model_sarcasm.fit(
    X_train_sarcasm_pad, y_train_sarcasm,
    validation_split=0.2,
    epochs=30,
    batch_size=64,
    callbacks=[early_stopping_sarcasm, reduce_lr_sarcasm],
    verbose=1
)

# Plot training history
plot_training_history(history_sarcasm, "DeteksiSarkasme BiRNN")

# Evaluate on test set
print("\nEvaluating DeteksiSarkasme model on test set...")
test_loss_sarcasm, test_accuracy_sarcasm = model_sarcasm.evaluate(X_test_sarcasm_pad, y_test_sarcasm, verbose=0)
print(f"Test Accuracy: {test_accuracy_sarcasm:.4f}")
print(f"Test Loss: {test_loss_sarcasm:.4f}")

# Predictions for detailed evaluation
y_pred_proba_sarcasm = model_sarcasm.predict(X_test_sarcasm_pad).flatten()
results_sarcasm = evaluate_model(model_sarcasm, X_test_sarcasm_pad, y_test_sarcasm, y_pred_proba_sarcasm, "DeteksiSarkasme")

# Plot ROC curve
plot_roc_curve(y_test_sarcasm, y_pred_proba_sarcasm, "DeteksiSarkasme")

## Hyperparameter Tuning

Let's perform hyperparameter tuning to optimize our models further. We'll use a systematic approach to test different combinations of hyperparameters.

In [None]:
# Hyperparameter tuning function
def hyperparameter_tuning(X_train, y_train, X_val, y_val, dataset_name, max_length, vocab_size):
    """Perform hyperparameter tuning for BiRNN models"""
    
    # Define hyperparameter search space
    param_grid = {
        'embedding_dim': [64, 128, 256],
        'rnn_units': [32, 64, 128],
        'dropout_rate': [0.2, 0.3, 0.5],
        'learning_rate': [0.001, 0.01, 0.0001],
        'batch_size': [32, 64, 128]
    }
    
    best_accuracy = 0
    best_params = {}
    results = []
    
    print(f"Starting hyperparameter tuning for {dataset_name}...")
    print(f"Total combinations to test: {len(param_grid['embedding_dim']) * len(param_grid['rnn_units']) * len(param_grid['dropout_rate']) * len(param_grid['learning_rate']) * len(param_grid['batch_size'])}")
    
    # Random search (testing subset of combinations)
    import random
    random.seed(42)
    
    # Generate random combinations
    n_trials = 10  # Reduced for demonstration
    
    for trial in range(n_trials):
        # Random hyperparameter selection
        embedding_dim = random.choice(param_grid['embedding_dim'])
        rnn_units = random.choice(param_grid['rnn_units'])
        dropout_rate = random.choice(param_grid['dropout_rate'])
        learning_rate = random.choice(param_grid['learning_rate'])
        batch_size = random.choice(param_grid['batch_size'])
        
        print(f"\nTrial {trial + 1}/{n_trials}")
        print(f"Params: embedding_dim={embedding_dim}, rnn_units={rnn_units}, dropout_rate={dropout_rate}, lr={learning_rate}, batch_size={batch_size}")
        
        try:
            # Create model
            model = create_birnn_model(
                vocab_size=vocab_size,
                embedding_dim=embedding_dim,
                max_length=max_length,
                rnn_units=rnn_units,
                dropout_rate=dropout_rate
            )
            
            # Compile model
            model.compile(
                optimizer=Adam(learning_rate=learning_rate),
                loss='binary_crossentropy',
                metrics=['accuracy']
            )
            
            # Train model (fewer epochs for tuning)
            history = model.fit(
                X_train, y_train,
                validation_data=(X_val, y_val),
                epochs=5,  # Reduced for faster tuning
                batch_size=batch_size,
                verbose=0
            )
            
            # Get validation accuracy
            val_accuracy = max(history.history['val_accuracy'])
            
            print(f"Validation Accuracy: {val_accuracy:.4f}")
            
            # Store results
            results.append({
                'embedding_dim': embedding_dim,
                'rnn_units': rnn_units,
                'dropout_rate': dropout_rate,
                'learning_rate': learning_rate,
                'batch_size': batch_size,
                'val_accuracy': val_accuracy
            })
            
            # Update best parameters
            if val_accuracy > best_accuracy:
                best_accuracy = val_accuracy
                best_params = {
                    'embedding_dim': embedding_dim,
                    'rnn_units': rnn_units,
                    'dropout_rate': dropout_rate,
                    'learning_rate': learning_rate,
                    'batch_size': batch_size
                }
                print(f"New best accuracy: {best_accuracy:.4f}")
            
            # Clean up
            del model
            tf.keras.backend.clear_session()
            
        except Exception as e:
            print(f"Error in trial {trial + 1}: {e}")
            continue
    
    print(f"\n=== Hyperparameter Tuning Results for {dataset_name} ===")
    print(f"Best Validation Accuracy: {best_accuracy:.4f}")
    print(f"Best Parameters: {best_params}")
    
    # Display top 5 results
    results_df = pd.DataFrame(results)
    results_df = results_df.sort_values('val_accuracy', ascending=False)
    
    print(f"\nTop 5 Results:")
    print(results_df.head())
    
    return best_params, results_df

# Perform hyperparameter tuning for IMDb dataset (example)
print("=== HYPERPARAMETER TUNING FOR IMDB DATASET ===")

# Create validation split from training data
X_train_imdb_tune, X_val_imdb_tune, y_train_imdb_tune, y_val_imdb_tune = train_test_split(
    X_train_imdb_pad, y_train_imdb, test_size=0.2, random_state=42
)

best_params_imdb, results_imdb_tune = hyperparameter_tuning(
    X_train_imdb_tune, y_train_imdb_tune, 
    X_val_imdb_tune, y_val_imdb_tune,
    "IMDb", max_length_imdb, vocab_size_imdb
)

In [None]:
# Train optimized models with best hyperparameters
def train_optimized_model(X_train, y_train, X_test, y_test, best_params, dataset_name, max_length, vocab_size):
    """Train final optimized model with best hyperparameters"""
    
    print(f"\n=== Training Optimized {dataset_name} Model ===")
    print(f"Using best parameters: {best_params}")
    
    # Create optimized model
    model_optimized = create_birnn_model(
        vocab_size=vocab_size,
        embedding_dim=best_params['embedding_dim'],
        max_length=max_length,
        rnn_units=best_params['rnn_units'],
        dropout_rate=best_params['dropout_rate']
    )
    
    # Compile with best learning rate
    model_optimized.compile(
        optimizer=Adam(learning_rate=best_params['learning_rate']),
        loss='binary_crossentropy',
        metrics=['accuracy']
    )
    
    # Callbacks
    early_stopping = EarlyStopping(monitor='val_accuracy', patience=10, restore_best_weights=True)
    reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.00001)
    
    # Train model
    history_optimized = model_optimized.fit(
        X_train, y_train,
        validation_split=0.2,
        epochs=50,  # More epochs for final training
        batch_size=best_params['batch_size'],
        callbacks=[early_stopping, reduce_lr],
        verbose=1
    )
    
    # Plot training history
    plot_training_history(history_optimized, f"Optimized {dataset_name} BiRNN")
    
    # Evaluate on test set
    test_loss, test_accuracy = model_optimized.evaluate(X_test, y_test, verbose=0)
    print(f"\nOptimized {dataset_name} Model Results:")
    print(f"Test Accuracy: {test_accuracy:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    
    # Detailed evaluation
    y_pred_proba = model_optimized.predict(X_test).flatten()
    results = evaluate_model(model_optimized, X_test, y_test, y_pred_proba, f"Optimized {dataset_name}")
    
    return model_optimized, history_optimized, results

# Train optimized IMDb model
if 'best_params_imdb' in locals() and best_params_imdb:
    model_imdb_optimized, history_imdb_optimized, results_imdb_optimized = train_optimized_model(
        X_train_imdb_pad, y_train_imdb, X_test_imdb_pad, y_test_imdb,
        best_params_imdb, "IMDb", max_length_imdb, vocab_size_imdb
    )
else:
    print("Skipping optimized IMDb model training - using default hyperparameters")

## Results Summary and Model Comparison

Let's create a comprehensive summary of all our models' performances and compare them side by side.

In [None]:
# Comprehensive Results Summary
def create_results_summary():
    """Create a comprehensive summary of all model results"""
    
    # Collect all results
    summary_data = []
    
    # Check if results exist and add them
    if 'results_imdb' in locals():
        summary_data.append({
            'Dataset': 'IMDb Movies',
            'Model': 'BiRNN (LSTM)',
            'Accuracy': results_imdb['accuracy'],
            'Precision': results_imdb['precision'],
            'Recall': results_imdb['recall'],
            'F1-Score': results_imdb['f1_score'],
            'AUC-ROC': results_imdb['auc_roc'],
            'Data Size': f"{len(X_train_imdb_pad) + len(X_test_imdb_pad):,}",
            'Task': 'Sentiment Analysis'
        })
    
    if 'results_review' in locals():
        summary_data.append({
            'Dataset': 'ReviewTokoBaju',
            'Model': 'BiRNN (LSTM)',
            'Accuracy': results_review['accuracy'],
            'Precision': results_review['precision'],
            'Recall': results_review['recall'],
            'F1-Score': results_review['f1_score'],
            'AUC-ROC': results_review['auc_roc'],
            'Data Size': f"{len(X_train_review_pad) + len(X_test_review_pad):,}",
            'Task': 'Review Sentiment'
        })
    
    if 'results_sarcasm' in locals():
        summary_data.append({
            'Dataset': 'DeteksiSarkasme',
            'Model': 'BiRNN (LSTM+GRU)',
            'Accuracy': results_sarcasm['accuracy'],
            'Precision': results_sarcasm['precision'],
            'Recall': results_sarcasm['recall'],
            'F1-Score': results_sarcasm['f1_score'],
            'AUC-ROC': results_sarcasm['auc_roc'],
            'Data Size': f"{len(X_train_sarcasm_pad) + len(X_test_sarcasm_pad):,}",
            'Task': 'Sarcasm Detection'
        })
    
    # Create DataFrame
    if summary_data:
        summary_df = pd.DataFrame(summary_data)
        
        print("=== COMPREHENSIVE RESULTS SUMMARY ===")
        print(summary_df.round(4))
        
        # Check 90% accuracy requirement
        print("\n=== ACCURACY REQUIREMENT CHECK (≥90%) ===")
        for _, row in summary_df.iterrows():
            status = "✅ PASSED" if row['Accuracy'] >= 0.90 else "❌ NEEDS IMPROVEMENT"
            print(f"{row['Dataset']}: {row['Accuracy']:.4f} ({row['Accuracy']*100:.2f}%) - {status}")
        
        # Visualization
        fig, axes = plt.subplots(2, 2, figsize=(15, 12))
        
        # Accuracy comparison
        axes[0,0].bar(summary_df['Dataset'], summary_df['Accuracy'], color=['skyblue', 'lightgreen', 'salmon'])
        axes[0,0].set_title('Model Accuracy Comparison')
        axes[0,0].set_ylabel('Accuracy')
        axes[0,0].axhline(y=0.90, color='red', linestyle='--', label='90% Target')
        axes[0,0].legend()
        axes[0,0].tick_params(axis='x', rotation=45)
        
        # F1-Score comparison
        axes[0,1].bar(summary_df['Dataset'], summary_df['F1-Score'], color=['orange', 'purple', 'brown'])
        axes[0,1].set_title('F1-Score Comparison')
        axes[0,1].set_ylabel('F1-Score')
        axes[0,1].tick_params(axis='x', rotation=45)
        
        # AUC-ROC comparison
        axes[1,0].bar(summary_df['Dataset'], summary_df['AUC-ROC'], color=['gold', 'teal', 'pink'])
        axes[1,0].set_title('AUC-ROC Comparison')
        axes[1,0].set_ylabel('AUC-ROC')
        axes[1,0].tick_params(axis='x', rotation=45)
        
        # Comprehensive metrics heatmap
        metrics_data = summary_df[['Accuracy', 'Precision', 'Recall', 'F1-Score', 'AUC-ROC']].values
        sns.heatmap(metrics_data, 
                   xticklabels=['Accuracy', 'Precision', 'Recall', 'F1-Score', 'AUC-ROC'],
                   yticklabels=summary_df['Dataset'],
                   annot=True, fmt='.3f', cmap='YlOrRd', ax=axes[1,1])
        axes[1,1].set_title('All Metrics Heatmap')
        
        plt.tight_layout()
        plt.show()
        
        return summary_df
    else:
        print("No results available for summary")
        return None

# Create results summary
summary_df = create_results_summary()

# Model Architecture Comparison
print("\n=== MODEL ARCHITECTURE COMPARISON ===")
architecture_info = [
    {
        'Dataset': 'IMDb',
        'Architecture': 'Embedding → BiLSTM → BiLSTM → Dense → Output',
        'Sequence Length': max_length_imdb,
        'Vocab Size': vocab_size_imdb,
        'Parameters': '~2M (estimated)'
    },
    {
        'Dataset': 'ReviewTokoBaju', 
        'Architecture': 'Embedding → BiLSTM → BiLSTM → Dense → Output',
        'Sequence Length': max_length_review,
        'Vocab Size': vocab_size_review,
        'Parameters': '~2.5M (estimated)'
    },
    {
        'Dataset': 'DeteksiSarkasme',
        'Architecture': 'Embedding → BiLSTM → BiGRU → GlobalMaxPool → Dense → Output',
        'Sequence Length': max_length_sarcasm,
        'Vocab Size': vocab_size_sarcasm,
        'Parameters': '~1.8M (estimated)'
    }
]

arch_df = pd.DataFrame(architecture_info)
print(arch_df)

In [None]:
# Save trained models
def save_models():
    """Save all trained models"""
    print("Saving trained models...")
    
    # Create directory if it doesn't exist
    import os
    save_dir = "saved_models"
    os.makedirs(save_dir, exist_ok=True)
    
    # Save models
    if 'model_imdb' in locals():
        model_imdb.save(f"{save_dir}/imdb_birnn_model.keras")
        print("✅ IMDb BiRNN model saved")
    
    if 'model_review' in locals():
        model_review.save(f"{save_dir}/review_birnn_model.keras")
        print("✅ ReviewTokoBaju BiRNN model saved")
    
    if 'model_sarcasm' in locals():
        model_sarcasm.save(f"{save_dir}/sarcasm_birnn_model.keras")
        print("✅ DeteksiSarkasme BiRNN model saved")
    
    # Save tokenizers
    import pickle
    
    if 'tokenizer_imdb' in locals():
        with open(f"{save_dir}/imdb_tokenizer.pkl", 'wb') as f:
            pickle.dump(tokenizer_imdb, f)
        print("✅ IMDb tokenizer saved")
    
    if 'tokenizer_review' in locals():
        with open(f"{save_dir}/review_tokenizer.pkl", 'wb') as f:
            pickle.dump(tokenizer_review, f)
        print("✅ ReviewTokoBaju tokenizer saved")
    
    if 'tokenizer_sarcasm' in locals():
        with open(f"{save_dir}/sarcasm_tokenizer.pkl", 'wb') as f:
            pickle.dump(tokenizer_sarcasm, f)
        print("✅ DeteksiSarkasme tokenizer saved")

# Save all models
save_models()

# Final performance summary
print("\n" + "="*60)
print("FINAL PERFORMANCE SUMMARY")
print("="*60)

performance_summary = """
🎯 PROJECT OBJECTIVES STATUS:

1. ✅ Bidirectional RNN Models Created:
   - IMDb: BiLSTM architecture for movie review sentiment analysis
   - ReviewTokoBaju: BiLSTM architecture for clothing review sentiment
   - DeteksiSarkasme: BiLSTM+BiGRU hybrid for sarcasm detection

2. ✅ Comprehensive Evaluation Metrics Implemented:
   - Accuracy, Precision, Recall, F1-Score, AUC-ROC
   - Confusion matrices and ROC curves
   - Training/validation loss and accuracy visualization

3. ✅ Hyperparameter Tuning:
   - Systematic search across embedding dimensions, RNN units, dropout rates
   - Learning rate and batch size optimization
   - Early stopping and learning rate reduction callbacks

4. 🎯 Accuracy Target (≥90%):
   - Models trained with early stopping and regularization
   - Performance varies by dataset complexity
   - Sarcasm detection being most challenging task

5. ✅ TensorFlow/Keras Implementation:
   - Professional-grade code with proper preprocessing
   - GPU/TPU compatible architecture
   - Modular and reusable functions

📊 KEY INSIGHTS:
- Bidirectional RNNs effectively capture context from both directions
- Proper text preprocessing significantly impacts performance
- Hyperparameter tuning is crucial for optimal results
- Different datasets require different architectural approaches
- Sarcasm detection remains a challenging NLP task

🚀 RECOMMENDATIONS:
- Use Google Colab with T4 GPU for faster training
- Consider ensemble methods for better accuracy
- Implement attention mechanisms for improved performance
- Try BERT or other transformer models for comparison
"""

print(performance_summary)

print("\n" + "="*60)
print("NOTEBOOK EXECUTION COMPLETE")
print("="*60)

## Google Colab Setup Instructions

For optimal performance using Google Colab with T4 GPU or TPU, follow these steps:

### 1. Enable GPU/TPU Runtime
```python
# In Google Colab, go to Runtime → Change runtime type → Select GPU (T4) or TPU
# Then run this cell to verify:
import tensorflow as tf
print("GPU Available: ", tf.config.list_physical_devices('GPU'))
print("TPU Available: ", tf.config.list_physical_devices('TPU'))
```

### 2. Mount Google Drive (if datasets are stored there)
```python
from google.colab import drive
drive.mount('/content/drive')
```

### 3. Install Additional Packages (if needed)
```python
!pip install -q seaborn matplotlib nltk
```

### 4. Memory Management for Large Models
```python
# Clear memory between model training sessions
import gc
tf.keras.backend.clear_session()
gc.collect()
```

### 5. Download Datasets
```python
# For IMDb - already available in Keras
# For other datasets, upload to Colab or download from URLs
!wget "your_dataset_url" -O dataset.csv
```

**Note**: This notebook is designed to work both locally and in Google Colab. Adjust file paths accordingly.