# Part 2.4: RNN Applications - Sentiment Analysis

Real-world application of RNNs for text sentiment analysis with comprehensive preprocessing and evaluation.

## Objective
- Implement end-to-end sentiment analysis pipeline
- Compare RNN architectures on real text data
- Text preprocessing and tokenization
- Evaluation metrics and baseline comparison

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torch.nn.utils.rnn import pad_sequence

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Text processing
import re
import string
from collections import Counter, defaultdict

# Metrics
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(42)
np.random.seed(42)

print(f"Using device: {device}")
print("RNN Sentiment Analysis Application")
print("=" * 40)

In [None]:
# Create synthetic movie review dataset (simulating IMDB-style data)
def create_movie_reviews_dataset(num_samples=5000):
    """Create synthetic movie review dataset"""
    
    positive_words = [
        'excellent', 'amazing', 'fantastic', 'wonderful', 'brilliant', 'outstanding',
        'superb', 'great', 'good', 'beautiful', 'perfect', 'love', 'awesome',
        'incredible', 'impressive', 'remarkable', 'entertaining', 'enjoyable',
        'spectacular', 'magnificent', 'marvelous', 'delightful', 'charming'
    ]
    
    negative_words = [
        'terrible', 'awful', 'horrible', 'bad', 'worst', 'disappointing',
        'boring', 'dull', 'stupid', 'waste', 'pathetic', 'annoying',
        'ridiculous', 'absurd', 'pointless', 'tedious', 'uninteresting',
        'frustrating', 'confusing', 'poorly', 'badly', 'mediocre', 'weak'
    ]
    
    neutral_words = [
        'movie', 'film', 'actor', 'actress', 'director', 'plot', 'story',
        'character', 'scene', 'dialogue', 'cinematography', 'soundtrack',
        'performance', 'script', 'drama', 'comedy', 'action', 'thriller',
        'romance', 'adventure', 'watch', 'see', 'think', 'feel', 'time'
    ]
    
    review_templates = {
        'positive': [
            "This movie was {} and {} with {} performances.",
            "I {} this film! The {} was {} and the acting was {}.",
            "What an {} movie! {} cinematography and {} storyline.",
            "The {} was {} and the {} made it even more {}.",
            "Absolutely {} film with {} direction and {} characters."
        ],
        'negative': [
            "This movie was {} and {} with {} acting.",
            "I found this film {} and {}. The plot was {} and {}.",
            "What a {} movie! {} direction and {} performances.",
            "The {} was {} and the {} made it even more {}.",
            "Absolutely {} film with {} writing and {} execution."
        ]
    }
    
    reviews = []
    labels = []
    
    for _ in range(num_samples):
        # Choose sentiment
        is_positive = np.random.choice([True, False])
        label = 1 if is_positive else 0
        
        # Choose template
        sentiment_type = 'positive' if is_positive else 'negative'
        template = np.random.choice(review_templates[sentiment_type])
        
        # Fill template
        if is_positive:
            words_to_use = positive_words + neutral_words
        else:
            words_to_use = negative_words + neutral_words
        
        # Count placeholders
        num_placeholders = template.count('{}')
        chosen_words = np.random.choice(words_to_use, num_placeholders, replace=True)
        
        review = template.format(*chosen_words)
        
        # Add some noise and variation
        if np.random.random() < 0.3:  # Add extra sentences
            extra_word = np.random.choice(words_to_use)
            extra_sentence = f" The {extra_word} was really something."
            review += extra_sentence
        
        reviews.append(review)
        labels.append(label)
    
    return reviews, labels

# Create dataset
reviews, labels = create_movie_reviews_dataset(5000)

print(f"Created {len(reviews)} movie reviews")
print(f"Positive reviews: {sum(labels)} ({sum(labels)/len(labels)*100:.1f}%)")
print(f"Negative reviews: {len(labels)-sum(labels)} ({(len(labels)-sum(labels))/len(labels)*100:.1f}%)")

print("\nSample reviews:")
for i in range(3):
    sentiment = "Positive" if labels[i] == 1 else "Negative"
    print(f"{sentiment}: {reviews[i]}")

In [None]:
# Text preprocessing and tokenization
class TextPreprocessor:
    def __init__(self, vocab_size=10000, max_length=100):
        self.vocab_size = vocab_size
        self.max_length = max_length
        self.word_to_idx = {'<PAD>': 0, '<UNK>': 1}
        self.idx_to_word = {0: '<PAD>', 1: '<UNK>'}
        self.vocab_built = False
        
    def clean_text(self, text):
        """Clean and normalize text"""
        # Convert to lowercase
        text = text.lower()
        
        # Remove punctuation
        text = re.sub(f'[{string.punctuation}]', '', text)
        
        # Remove extra whitespace
        text = re.sub(r'\s+', ' ', text).strip()
        
        return text
    
    def build_vocab(self, texts):
        """Build vocabulary from training texts"""
        word_counts = Counter()
        
        for text in texts:
            cleaned_text = self.clean_text(text)
            words = cleaned_text.split()
            word_counts.update(words)
        
        # Get most common words
        most_common = word_counts.most_common(self.vocab_size - 2)  # -2 for PAD and UNK
        
        # Build word mappings
        for idx, (word, count) in enumerate(most_common, start=2):
            self.word_to_idx[word] = idx
            self.idx_to_word[idx] = word
        
        self.vocab_built = True
        print(f"Vocabulary built with {len(self.word_to_idx)} words")
        print(f"Most common words: {[word for word, _ in most_common[:10]]}")
    
    def text_to_sequence(self, text):
        """Convert text to sequence of indices"""
        if not self.vocab_built:
            raise ValueError("Vocabulary not built. Call build_vocab first.")
        
        cleaned_text = self.clean_text(text)
        words = cleaned_text.split()
        
        # Convert to indices
        sequence = []
        for word in words:
            idx = self.word_to_idx.get(word, self.word_to_idx['<UNK>'])
            sequence.append(idx)
        
        # Truncate or pad
        if len(sequence) > self.max_length:
            sequence = sequence[:self.max_length]
        else:
            sequence.extend([self.word_to_idx['<PAD>']] * (self.max_length - len(sequence)))
        
        return torch.LongTensor(sequence)
    
    def sequences_to_texts(self, sequences):
        """Convert sequences back to texts"""
        texts = []
        for seq in sequences:
            words = []
            for idx in seq:
                word = self.idx_to_word.get(idx.item(), '<UNK>')
                if word != '<PAD>':
                    words.append(word)
            texts.append(' '.join(words))
        return texts

# Split data
train_texts, test_texts, train_labels, test_labels = train_test_split(
    reviews, labels, test_size=0.2, random_state=42, stratify=labels)

# Further split training into train/validation
train_texts, val_texts, train_labels, val_labels = train_test_split(
    train_texts, train_labels, test_size=0.2, random_state=42, stratify=train_labels)

print(f"Training samples: {len(train_texts)}")
print(f"Validation samples: {len(val_texts)}")
print(f"Test samples: {len(test_texts)}")

# Initialize preprocessor and build vocabulary
preprocessor = TextPreprocessor(vocab_size=5000, max_length=50)
preprocessor.build_vocab(train_texts)

# Convert texts to sequences
print("\nConverting texts to sequences...")
train_sequences = torch.stack([preprocessor.text_to_sequence(text) for text in train_texts])
val_sequences = torch.stack([preprocessor.text_to_sequence(text) for text in val_texts])
test_sequences = torch.stack([preprocessor.text_to_sequence(text) for text in test_texts])

train_labels_tensor = torch.LongTensor(train_labels)
val_labels_tensor = torch.LongTensor(val_labels)
test_labels_tensor = torch.LongTensor(test_labels)

print("Text preprocessing completed!")
print(f"Sequence shape: {train_sequences.shape}")
print(f"Vocabulary size: {len(preprocessor.word_to_idx)}")

In [None]:
# Dataset class for sentiment analysis
class SentimentDataset(Dataset):
    def __init__(self, sequences, labels):
        self.sequences = sequences
        self.labels = labels
        
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        return self.sequences[idx], self.labels[idx]

# Create data loaders
batch_size = 32

train_dataset = SentimentDataset(train_sequences, train_labels_tensor)
val_dataset = SentimentDataset(val_sequences, val_labels_tensor)
test_dataset = SentimentDataset(test_sequences, test_labels_tensor)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print("Data loaders created!")

In [None]:
# RNN Models for Sentiment Analysis
class SentimentRNN(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, output_dim, rnn_type='LSTM', 
                 num_layers=1, bidirectional=False, dropout=0.3):
        super(SentimentRNN, self).__init__()
        
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
        self.dropout = nn.Dropout(dropout)
        
        # RNN layer
        if rnn_type == 'LSTM':
            self.rnn = nn.LSTM(embed_dim, hidden_dim, num_layers=num_layers, 
                              bidirectional=bidirectional, dropout=dropout if num_layers > 1 else 0,
                              batch_first=True)
        elif rnn_type == 'GRU':
            self.rnn = nn.GRU(embed_dim, hidden_dim, num_layers=num_layers,
                             bidirectional=bidirectional, dropout=dropout if num_layers > 1 else 0,
                             batch_first=True)
        elif rnn_type == 'RNN':
            self.rnn = nn.RNN(embed_dim, hidden_dim, num_layers=num_layers,
                             bidirectional=bidirectional, dropout=dropout if num_layers > 1 else 0,
                             batch_first=True)
        
        # Output layer
        rnn_output_dim = hidden_dim * 2 if bidirectional else hidden_dim
        self.fc = nn.Linear(rnn_output_dim, output_dim)
        
        self.rnn_type = rnn_type
        self.bidirectional = bidirectional
        
    def forward(self, x):
        # Embedding
        embedded = self.embedding(x)
        embedded = self.dropout(embedded)
        
        # RNN
        rnn_output, _ = self.rnn(embedded)
        
        # Use last output (for unidirectional) or concatenate last outputs (for bidirectional)
        if self.bidirectional:
            # Take last output from both directions
            last_forward = rnn_output[:, -1, :rnn_output.size(2)//2]
            last_backward = rnn_output[:, 0, rnn_output.size(2)//2:]
            last_output = torch.cat([last_forward, last_backward], dim=1)
        else:
            last_output = rnn_output[:, -1, :]
        
        # Classification
        output = self.fc(self.dropout(last_output))
        return output

# Model configurations
model_configs = {
    'LSTM': {'rnn_type': 'LSTM', 'bidirectional': False},
    'BiLSTM': {'rnn_type': 'LSTM', 'bidirectional': True},
    'GRU': {'rnn_type': 'GRU', 'bidirectional': False},
    'BiGRU': {'rnn_type': 'GRU', 'bidirectional': True},
    'RNN': {'rnn_type': 'RNN', 'bidirectional': False},
}

print("RNN models defined!")
print(f"Model configurations: {list(model_configs.keys())}")

In [None]:
# Training function
def train_sentiment_model(model, train_loader, val_loader, num_epochs=20, lr=0.001):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)
    
    history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
    
    for epoch in range(num_epochs):
        # Training
        model.train()
        train_loss = 0
        train_correct = 0
        total_train = 0
        
        for sequences, labels in train_loader:
            sequences, labels = sequences.to(device), labels.to(device)
            
            optimizer.zero_grad()
            outputs = model(sequences)
            loss = criterion(outputs, labels)
            loss.backward()
            
            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            optimizer.step()
            
            train_loss += loss.item()
            train_correct += (outputs.argmax(1) == labels).sum().item()
            total_train += labels.size(0)
        
        # Validation
        model.eval()
        val_loss = 0
        val_correct = 0
        total_val = 0
        
        with torch.no_grad():
            for sequences, labels in val_loader:
                sequences, labels = sequences.to(device), labels.to(device)
                outputs = model(sequences)
                loss = criterion(outputs, labels)
                
                val_loss += loss.item()
                val_correct += (outputs.argmax(1) == labels).sum().item()
                total_val += labels.size(0)
        
        # Calculate metrics
        train_loss /= len(train_loader)
        val_loss /= len(val_loader)
        train_acc = train_correct / total_train
        val_acc = val_correct / total_val
        
        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)
        
        if (epoch + 1) % 5 == 0:
            print(f'Epoch {epoch+1}: Train Acc: {train_acc:.3f}, Val Acc: {val_acc:.3f}')
    
    return history

# Train all models
embed_dim = 100
hidden_dim = 128
output_dim = 2  # Binary classification
vocab_size = len(preprocessor.word_to_idx)

models = {}
histories = {}

print("\nStarting model training...")
print("=" * 40)

for model_name, config in model_configs.items():
    print(f"\nüîÑ Training {model_name}...")
    
    model = SentimentRNN(
        vocab_size=vocab_size,
        embed_dim=embed_dim,
        hidden_dim=hidden_dim,
        output_dim=output_dim,
        **config
    ).to(device)
    
    history = train_sentiment_model(model, train_loader, val_loader, num_epochs=25)
    
    models[model_name] = model
    histories[model_name] = history
    
    final_val_acc = history['val_acc'][-1]
    print(f"‚úÖ {model_name} completed - Final Val Accuracy: {final_val_acc:.3f}")

print("\nüéâ All models trained!")

In [None]:
# Baseline: TF-IDF + Logistic Regression
print("\nTraining baseline model (TF-IDF + Logistic Regression)...")

# TF-IDF feature extraction
tfidf = TfidfVectorizer(max_features=5000, stop_words='english', ngram_range=(1, 2))
X_train_tfidf = tfidf.fit_transform(train_texts)
X_val_tfidf = tfidf.transform(val_texts)
X_test_tfidf = tfidf.transform(test_texts)

# Logistic Regression
lr_baseline = LogisticRegression(random_state=42, max_iter=1000)
lr_baseline.fit(X_train_tfidf, train_labels)

# Baseline predictions
val_pred_baseline = lr_baseline.predict(X_val_tfidf)
baseline_val_acc = accuracy_score(val_labels, val_pred_baseline)

print(f"Baseline validation accuracy: {baseline_val_acc:.3f}")

In [None]:
# Comprehensive evaluation on test set
def evaluate_model(model, test_loader, model_name):
    model.eval()
    predictions = []
    true_labels = []
    
    with torch.no_grad():
        for sequences, labels in test_loader:
            sequences = sequences.to(device)
            outputs = model(sequences)
            predicted = outputs.argmax(1).cpu().numpy()
            
            predictions.extend(predicted)
            true_labels.extend(labels.numpy())
    
    # Calculate metrics
    accuracy = accuracy_score(true_labels, predictions)
    precision, recall, f1, _ = precision_recall_fscore_support(true_labels, predictions, average='binary')
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'predictions': predictions,
        'true_labels': true_labels
    }

# Evaluate all models
test_results = {}

print("\nEvaluating models on test set...")
print("=" * 40)

for model_name, model in models.items():
    results = evaluate_model(model, test_loader, model_name)
    test_results[model_name] = results
    print(f"{model_name}: Acc={results['accuracy']:.3f}, F1={results['f1']:.3f}")

# Evaluate baseline
test_pred_baseline = lr_baseline.predict(X_test_tfidf)
baseline_accuracy = accuracy_score(test_labels, test_pred_baseline)
baseline_precision, baseline_recall, baseline_f1, _ = precision_recall_fscore_support(
    test_labels, test_pred_baseline, average='binary')

test_results['Baseline (TF-IDF+LR)'] = {
    'accuracy': baseline_accuracy,
    'precision': baseline_precision,
    'recall': baseline_recall,
    'f1': baseline_f1,
    'predictions': test_pred_baseline,
    'true_labels': test_labels
}

print(f"Baseline: Acc={baseline_accuracy:.3f}, F1={baseline_f1:.3f}")

In [None]:
# Comprehensive visualization
fig, axes = plt.subplots(2, 3, figsize=(20, 12))
fig.suptitle('RNN Sentiment Analysis Results', fontsize=16, fontweight='bold')

colors = plt.cm.Set2(np.linspace(0, 1, len(model_configs)))
model_colors = dict(zip(model_configs.keys(), colors))

# 1. Training curves - Validation Accuracy
ax = axes[0, 0]
for model_name, history in histories.items():
    ax.plot(history['val_acc'], color=model_colors[model_name], 
           linewidth=2, label=model_name)

ax.axhline(y=baseline_val_acc, color='red', linestyle='--', 
          linewidth=2, label='Baseline (TF-IDF)')
ax.set_xlabel('Epoch')
ax.set_ylabel('Validation Accuracy')
ax.set_title('Learning Curves')
ax.legend()
ax.grid(True, alpha=0.3)

# 2. Test Performance Comparison
ax = axes[0, 1]
model_names = list(test_results.keys())
accuracies = [test_results[name]['accuracy'] for name in model_names]
f1_scores = [test_results[name]['f1'] for name in model_names]

x = np.arange(len(model_names))
width = 0.35

bars1 = ax.bar(x - width/2, accuracies, width, label='Accuracy', alpha=0.8)
bars2 = ax.bar(x + width/2, f1_scores, width, label='F1-Score', alpha=0.8)

ax.set_xlabel('Models')
ax.set_ylabel('Score')
ax.set_title('Test Performance Comparison')
ax.set_xticks(x)
ax.set_xticklabels(model_names, rotation=45, ha='right')
ax.legend()
ax.grid(True, alpha=0.3)

# Add value labels
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height + 0.005,
               f'{height:.3f}', ha='center', va='bottom', fontsize=8)

# 3. Confusion Matrix for Best Model
best_model_name = max(test_results.keys(), key=lambda x: test_results[x]['f1'])
best_predictions = test_results[best_model_name]['predictions']
best_true_labels = test_results[best_model_name]['true_labels']

ax = axes[0, 2]
cm = confusion_matrix(best_true_labels, best_predictions)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax)
ax.set_xlabel('Predicted')
ax.set_ylabel('Actual')
ax.set_title(f'Confusion Matrix - {best_model_name}')

# 4. Training Loss Curves
ax = axes[1, 0]
for model_name, history in histories.items():
    ax.plot(history['train_loss'], color=model_colors[model_name], 
           linewidth=2, label=f'{model_name} (Train)', alpha=0.7)
    ax.plot(history['val_loss'], color=model_colors[model_name], 
           linewidth=2, linestyle='--', label=f'{model_name} (Val)')

ax.set_xlabel('Epoch')
ax.set_ylabel('Loss')
ax.set_title('Training and Validation Loss')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)

# 5. Model Complexity Analysis
ax = axes[1, 1]
param_counts = []
for model_name in model_configs.keys():
    model = models[model_name]
    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    param_counts.append(num_params)

rnn_accuracies = [test_results[name]['accuracy'] for name in model_configs.keys()]

scatter = ax.scatter(param_counts, rnn_accuracies, c=colors, s=150, alpha=0.7)
for i, model_name in enumerate(model_configs.keys()):
    ax.annotate(model_name, (param_counts[i], rnn_accuracies[i]), 
               xytext=(5, 5), textcoords='offset points', fontweight='bold')

ax.set_xlabel('Number of Parameters')
ax.set_ylabel('Test Accuracy')
ax.set_title('Model Complexity vs Performance')
ax.grid(True, alpha=0.3)

# 6. Performance Summary Table
ax = axes[1, 2]
ax.axis('tight')
ax.axis('off')

table_data = []
for model_name in model_names:
    result = test_results[model_name]
    table_data.append([
        model_name,
        f"{result['accuracy']:.3f}",
        f"{result['precision']:.3f}",
        f"{result['recall']:.3f}",
        f"{result['f1']:.3f}"
    ])

table = ax.table(cellText=table_data,
                colLabels=['Model', 'Accuracy', 'Precision', 'Recall', 'F1'],
                cellLoc='center',
                loc='center')
table.auto_set_font_size(False)
table.set_fontsize(10)
table.scale(1.2, 2)
ax.set_title('Detailed Performance Metrics', fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Prediction examples and error analysis
def analyze_predictions(model, preprocessor, test_texts, test_labels, model_name, num_examples=10):
    model.eval()
    
    print(f"\nüîç PREDICTION ANALYSIS - {model_name}")
    print("=" * 60)
    
    correct_examples = []
    incorrect_examples = []
    
    for i, (text, true_label) in enumerate(zip(test_texts, test_labels)):
        sequence = preprocessor.text_to_sequence(text).unsqueeze(0).to(device)
        
        with torch.no_grad():
            output = model(sequence)
            predicted_label = output.argmax(1).item()
            confidence = torch.softmax(output, dim=1).max().item()
        
        example = {
            'text': text,
            'true_label': true_label,
            'predicted_label': predicted_label,
            'confidence': confidence
        }
        
        if predicted_label == true_label:
            correct_examples.append(example)
        else:
            incorrect_examples.append(example)
    
    # Show high-confidence correct predictions
    print("\n‚úÖ HIGH-CONFIDENCE CORRECT PREDICTIONS:")
    correct_examples_sorted = sorted(correct_examples, key=lambda x: x['confidence'], reverse=True)
    for i, ex in enumerate(correct_examples_sorted[:3]):
        sentiment = "Positive" if ex['predicted_label'] == 1 else "Negative"
        print(f"{i+1}. [{sentiment}] (Confidence: {ex['confidence']:.3f})")
        print(f"   Text: {ex['text']}")
        print()
    
    # Show high-confidence incorrect predictions
    print("‚ùå HIGH-CONFIDENCE INCORRECT PREDICTIONS:")
    incorrect_examples_sorted = sorted(incorrect_examples, key=lambda x: x['confidence'], reverse=True)
    for i, ex in enumerate(incorrect_examples_sorted[:3]):
        true_sentiment = "Positive" if ex['true_label'] == 1 else "Negative"
        pred_sentiment = "Positive" if ex['predicted_label'] == 1 else "Negative"
        print(f"{i+1}. Predicted: [{pred_sentiment}], True: [{true_sentiment}] (Confidence: {ex['confidence']:.3f})")
        print(f"   Text: {ex['text']}")
        print()

# Analyze predictions for the best model
best_model = models[best_model_name]
analyze_predictions(best_model, preprocessor, test_texts, test_labels, best_model_name)

# Also show some examples with the baseline
print(f"\nüîç BASELINE PREDICTIONS (TF-IDF + Logistic Regression)")
print("=" * 60)

baseline_correct = []
baseline_incorrect = []

for i, (text, true_label, pred_label) in enumerate(zip(test_texts, test_labels, test_pred_baseline)):
    if pred_label == true_label:
        baseline_correct.append((text, true_label, pred_label))
    else:
        baseline_incorrect.append((text, true_label, pred_label))

print("\n‚úÖ CORRECT BASELINE PREDICTIONS:")
for i, (text, true_label, pred_label) in enumerate(baseline_correct[:3]):
    sentiment = "Positive" if pred_label == 1 else "Negative"
    print(f"{i+1}. [{sentiment}] - {text}")

print("\n‚ùå INCORRECT BASELINE PREDICTIONS:")
for i, (text, true_label, pred_label) in enumerate(baseline_incorrect[:3]):
    true_sentiment = "Positive" if true_label == 1 else "Negative"
    pred_sentiment = "Positive" if pred_label == 1 else "Negative"
    print(f"{i+1}. Predicted: [{pred_sentiment}], True: [{true_sentiment}]")
    print(f"   Text: {text}")
    print()

## RNN Sentiment Analysis - Comprehensive Analysis

### Key Findings

#### 1. Model Performance Comparison
- **Bidirectional models** generally outperform unidirectional versions
- **LSTM/GRU** significantly better than vanilla RNN for this task
- **BiLSTM** typically achieves best performance due to context from both directions

#### 2. Architecture Advantages

**Bidirectional RNNs**:
- Access to future context improves sentiment understanding
- Better at capturing negation and context-dependent sentiment
- Higher computational cost but improved accuracy

**LSTM vs GRU**:
- LSTM: Better for complex, long-range dependencies
- GRU: More efficient, competitive performance on shorter texts
- Both significantly outperform vanilla RNN

#### 3. Comparison with Traditional ML
- **Deep learning advantage**: Better handling of context and word order
- **TF-IDF baseline**: Competitive on bag-of-words features
- **RNN benefit**: Sequential processing captures sentiment flow

#### 4. Implementation Insights

**Text Preprocessing**:
- Vocabulary size: Balance between coverage and efficiency
- Sequence length: Shorter sequences for efficiency, longer for context
- Padding: Consistent sequence lengths for batch processing

**Training Considerations**:
- Gradient clipping: Essential for RNN training stability
- Dropout: Prevents overfitting in embedding and RNN layers
- Learning rate: Lower rates often better for RNN convergence

#### 5. Error Analysis
- **Common errors**: Sarcasm, negation, context-dependent sentiment
- **Model confidence**: High confidence doesn't always mean correctness
- **Baseline comparison**: RNNs better at complex linguistic patterns

### Practical Applications

1. **Social Media Monitoring**: Real-time sentiment analysis
2. **Product Reviews**: Customer feedback analysis
3. **Financial News**: Market sentiment tracking
4. **Content Moderation**: Detecting negative content

### Best Practices for RNN Sentiment Analysis

1. **Choose BiLSTM** for best accuracy when computational resources allow
2. **Use GRU** for faster training with competitive performance
3. **Implement proper text preprocessing** pipeline
4. **Apply gradient clipping** to prevent exploding gradients
5. **Use early stopping** based on validation performance
6. **Compare with simpler baselines** to validate deep learning benefit