# Comparative Study of Transformer-Based Models and Bi-LSTM for Bangla Sentiment Analysis Using Hybrid Optimizers

## Dataset Details

### Dataset Characteristics
- **Total Size**: 35,000 labeled text samples
- **Source**: Social media platforms, news portals, and product reviews
- **Language**: Bangla (Bengali)
- **Distribution**: Balanced across three sentiment categories

### Class Distribution
| Sentiment Class | Number of Samples |
|-----------------|-------------------|
| Positive | 11,700 |
| Negative | 11,600 |
| Neutral | 11,700 |
| **Total** | **35,000** |

### Data Split
- **Training**: 80% (28,000 samples)
- **Validation**: 10% (3,500 samples)
- **Testing**: 10% (3,500 samples)

### Preprocessing Steps
- Tokenization (Bangla-specific)
- Lowercasing
- Stopword removal
- Punctuation and special character removal
- Text normalization

## Best Performing Model

### Model Architecture
**Bangla-RoBERTa with Hybrid Optimizer (AdamW + Lookahead)**

### Model Configuration
| Parameter | Value |
|-----------|-------|
| Model | Bangla-RoBERTa |
| Max Sequence Length | 128 tokens |
| Optimizer | Hybrid (AdamW + Lookahead) |
| Learning Rate | 2e-5 |
| Batch Size | 32 |
| Epochs | 4 |
| Dropout Rate | 0.5 |

### Performance Results

#### Model Comparison
| Model | Optimizer | Accuracy (%) | F1-Score | Convergence Speed (Epochs) |
|-------|-----------|--------------|----------|---------------------------|
| Bi-LSTM | Adam | 84.7 | 0.825 | 12 |
| Bi-LSTM | Hybrid | 86.3 | 0.841 | 9 |
| BERT | Adam | 90.5 | 0.894 | 8 |
| BERT | Hybrid | 91.8 | 0.909 | 6 |
| RoBERTa | Adam | 92.3 | 0.917 | 7 |
| **RoBERTa** | **Hybrid** | **93.6** | **0.931** | **5** |

### Key Findings
- Transformer models (BERT/RoBERTa) significantly outperform Bi-LSTM
- Hybrid optimizers improve both accuracy and convergence speed
- RoBERTa with hybrid optimizer achieves the best performance: **93.6% accuracy**
- Fastest convergence: Only 5 epochs needed with hybrid optimizer

### Confusion Matrix (Best Model)
| Actual/Predicted | Positive | Negative |
|------------------|----------|----------|
| Positive | 890 (TP) | 110 (FN) |
| Negative | 90 (FP) | 910 (TN) |

### Resource Efficiency
- **Bi-LSTM**: Lower memory usage, faster training time
- **RoBERTa**: Higher memory requirements but superior performance
- Trade-off between performance and computational resources

# Cell 1: Import necessary libraries


In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoTokenizer, 
    AutoModelForSequenceClassification,
    AdamW,
    get_linear_schedule_with_warmup
)
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Check if GPU is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Cell 2: Implement Lookahead Optimizer


In [None]:
class Lookahead(object):
    """Lookahead optimizer implementation"""
    def __init__(self, base_optimizer, alpha=0.5, k=5):
        self.optimizer = base_optimizer
        self.alpha = alpha
        self.k = k
        self.step_count = 0
        self.slow_weights = [[p.clone().detach() for p in group['params']] 
                            for group in self.optimizer.param_groups]
        
    def step(self, closure=None):
        loss = self.optimizer.step(closure)
        self.step_count += 1
        
        if self.step_count % self.k == 0:
            for group, slow_weights in zip(self.optimizer.param_groups, self.slow_weights):
                for p, q in zip(group['params'], slow_weights):
                    q.data.add_(self.alpha, p.data - q.data)
                    p.data.copy_(q.data)
        return loss
    
    def state_dict(self):
        return self.optimizer.state_dict()
    
    def load_state_dict(self, state_dict):
        self.optimizer.load_state_dict(state_dict)
        
    def zero_grad(self):
        self.optimizer.zero_grad()
        
    @property
    def param_groups(self):
        return self.optimizer.param_groups

# Cell 3: Load and preprocess dataset


In [None]:
# Load your dataset
df = pd.read_csv('/kaggle/input/final-dataset/final-dataset.csv')

# Display dataset info
print(f"Dataset shape: {df.shape}")
print(f"\nSentiment distribution:")
print(df['Polarity'].value_counts())

# Preprocessing function
def preprocess_bangla_text(text):
    """Preprocess Bangla text according to the paper"""
    if pd.isna(text):
        return ""
    
    text = str(text)
    # Convert to lowercase
    text = text.lower()
    
    # Remove extra spaces
    text = ' '.join(text.split())
    
    return text

# Apply preprocessing
df['processed_text'] = df['Text'].apply(preprocess_bangla_text)

# Remove empty texts
df = df[df['processed_text'].str.len() > 0]

# Encode labels
label_mapping = {'positive': 0, 'negative': 1, 'neutral': 2}
df['label'] = df['Polarity'].map(label_mapping)

print(f"\nProcessed dataset shape: {df.shape}")

# Cell 4: Create PyTorch Dataset


In [None]:
class BanglaSentimentDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length=128):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]
        
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding='max_length',
            max_length=self.max_length,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'label': torch.tensor(label, dtype=torch.long)
        }

# Cell 5: Split data and create data loaders


In [None]:
# Split data
X_train, X_temp, y_train, y_temp = train_test_split(
    df['processed_text'].values, 
    df['label'].values, 
    test_size=0.2, 
    random_state=42,
    stratify=df['label']
)

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, 
    y_temp, 
    test_size=0.5, 
    random_state=42,
    stratify=y_temp
)

print(f"Training set: {len(X_train)}")
print(f"Validation set: {len(X_val)}")
print(f"Test set: {len(X_test)}")

# Load tokenizer
# Note: Replace with actual Bangla-RoBERTa model name
model_name = "csebuetnlp/banglishbert"  # Using BanglishBERT as proxy
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create datasets
train_dataset = BanglaSentimentDataset(X_train, y_train, tokenizer)
val_dataset = BanglaSentimentDataset(X_val, y_val, tokenizer)
test_dataset = BanglaSentimentDataset(X_test, y_test, tokenizer)

# Create data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

# Cell 6: Load and configure the model


In [None]:
# Load pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3,  # 3 sentiment classes
    output_attentions=False,
    output_hidden_states=False
)

model.to(device)

# Configure optimizer parameters
learning_rate = 2e-5
num_epochs = 4
warmup_steps = len(train_loader) * 2  # Warmup for 2 epochs

# Create AdamW optimizer
optimizer_base = AdamW(
    model.parameters(),
    lr=learning_rate,
    eps=1e-8,
    weight_decay=0.01  # AdamW includes weight decay
)

# Wrap with Lookahead
optimizer = Lookahead(optimizer_base, alpha=0.5, k=5)

# Create learning rate scheduler
total_steps = len(train_loader) * num_epochs
scheduler = get_linear_schedule_with_warmup(
    optimizer_base,  # Use base optimizer for scheduler
    num_warmup_steps=warmup_steps,
    num_training_steps=total_steps
)

print(f"Model loaded: {model_name}")
print(f"Total training steps: {total_steps}")

# Cell 7: Define training and evaluation functions


In [None]:
def train_epoch(model, data_loader, optimizer, scheduler, device):
    model.train()
    total_loss = 0
    predictions = []
    true_labels = []
    
    for batch in tqdm(data_loader, desc="Training"):
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['label'].to(device)
        
        # Zero gradients
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )
        
        loss = outputs.loss
        total_loss += loss.item()
        
        # Backward pass
        loss.backward()
        
        # Clip gradients
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        
        # Update weights
        optimizer.step()
        scheduler.step()
        
        # Get predictions
        _, preds = torch.max(outputs.logits, dim=1)
        predictions.extend(preds.cpu().numpy())
        true_labels.extend(labels.cpu().numpy())
    
    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(true_labels, predictions)
    f1 = f1_score(true_labels, predictions, average='weighted')
    
    return avg_loss, accuracy, f1

def evaluate(model, data_loader, device):
    model.eval()
    total_loss = 0
    predictions = []
    true_labels = []
    
    with torch.no_grad():
        for batch in tqdm(data_loader, desc="Evaluating"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['label'].to(device)
            
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                labels=labels
            )
            
            loss = outputs.loss
            total_loss += loss.item()
            
            _, preds = torch.max(outputs.logits, dim=1)
            predictions.extend(preds.cpu().numpy())
            true_labels.extend(labels.cpu().numpy())
    
    avg_loss = total_loss / len(data_loader)
    accuracy = accuracy_score(true_labels, predictions)
    f1 = f1_score(true_labels, predictions, average='weighted')
    
    return avg_loss, accuracy, f1, predictions, true_labels

# Cell 8: Train the model


In [None]:
# Training loop
train_losses = []
val_losses = []
train_accuracies = []
val_accuracies = []
train_f1_scores = []
val_f1_scores = []

best_val_f1 = 0
best_model_state = None

for epoch in range(num_epochs):
    print(f"\nEpoch {epoch + 1}/{num_epochs}")
    print("-" * 30)
    
    # Train
    train_loss, train_acc, train_f1 = train_epoch(
        model, train_loader, optimizer, scheduler, device
    )
    
    # Validate
    val_loss, val_acc, val_f1, _, _ = evaluate(model, val_loader, device)
    
    # Store metrics
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accuracies.append(train_acc)
    val_accuracies.append(val_acc)
    train_f1_scores.append(train_f1)
    val_f1_scores.append(val_f1)
    
    print(f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | Train F1: {train_f1:.4f}")
    print(f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f} | Val F1: {val_f1:.4f}")
    
    # Save best model
    if val_f1 > best_val_f1:
        best_val_f1 = val_f1
        best_model_state = model.state_dict().copy()
        print("New best model saved!")

# Load best model
model.load_state_dict(best_model_state)

# Cell 9: Plot training history


In [None]:
# Plot training history
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Loss plot
axes[0].plot(train_losses, label='Train Loss')
axes[0].plot(val_losses, label='Val Loss')
axes[0].set_title('Model Loss')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()

# Accuracy plot
axes[1].plot(train_accuracies, label='Train Accuracy')
axes[1].plot(val_accuracies, label='Val Accuracy')
axes[1].set_title('Model Accuracy')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()

# F1-Score plot
axes[2].plot(train_f1_scores, label='Train F1')
axes[2].plot(val_f1_scores, label='Val F1')
axes[2].set_title('Model F1-Score')
axes[2].set_xlabel('Epoch')
axes[2].set_ylabel('F1-Score')
axes[2].legend()

plt.tight_layout()
plt.show()

# Cell 10: Evaluate on test set


In [None]:
# Evaluate on test set
test_loss, test_acc, test_f1, test_preds, test_labels = evaluate(
    model, test_loader, device
)

print(f"\nTest Results:")
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f}")
print(f"Test F1-Score: {test_f1:.4f}")

# Classification report
label_names = ['Positive', 'Negative', 'Neutral']
print("\nClassification Report:")
print(classification_report(test_labels, test_preds, target_names=label_names))

# Cell 11: Confusion Matrix


In [None]:
# Create confusion matrix
cm = confusion_matrix(test_labels, test_preds)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=label_names, yticklabels=label_names)
plt.title('Confusion Matrix - RoBERTa with Hybrid Optimizer')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Calculate per-class metrics
for i, label in enumerate(label_names):
    precision = cm[i, i] / cm[:, i].sum() if cm[:, i].sum() > 0 else 0
    recall = cm[i, i] / cm[i, :].sum() if cm[i, :].sum() > 0 else 0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    print(f"{label}: Precision={precision:.3f}, Recall={recall:.3f}, F1={f1:.3f}")

# Cell 12: Save the model


In [None]:
# Save model and tokenizer
output_dir = "./bangla_roberta_sentiment_hybrid"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

# Save optimizer state
torch.save({
    'epoch': num_epochs,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'best_val_f1': best_val_f1,
}, f"{output_dir}/checkpoint.pt")

print(f"Model saved to {output_dir}")

# Cell 13: Inference function


In [None]:
def predict_sentiment(text, model, tokenizer, device):
    """Predict sentiment for new text"""
    model.eval()
    
    # Preprocess text
    text = preprocess_bangla_text(text)
    
    # Tokenize
    encoding = tokenizer(
        text,
        truncation=True,
        padding='max_length',
        max_length=128,
        return_tensors='pt'
    )
    
    # Move to device
    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)
    
    # Predict
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)
        _, predicted = torch.max(logits, dim=1)
    
    # Get label
    label_names = ['Positive', 'Negative', 'Neutral']
    predicted_label = label_names[predicted.item()]
    confidence = probs[0][predicted.item()].item()
    
    return {
        'text': text,
        'sentiment': predicted_label,
        'confidence': confidence,
        'probabilities': {
            label_names[i]: probs[0][i].item() 
            for i in range(len(label_names))
        }
    }

# Test the function
test_texts = [
    "এই পণ্যটি অসাধারণ",
    "খুবই খারাপ সার্ভিস",
    "মোটামুটি ভালো"
]

for text in test_texts:
    result = predict_sentiment(text, model, tokenizer, device)
    print(f"\nText: {result['text']}")
    print(f"Sentiment: {result['sentiment']} (Confidence: {result['confidence']:.3f})")
    print(f"All probabilities: {result['probabilities']}")