# Twitter Sentiment Analysis - Ensemble Methods

This optional notebook demonstrates how to combine multiple models using ensemble techniques:

1. Model Stacking
2. Weighted Averaging (Blending)
3. Voting Ensemble

We'll use the best models from the previous notebooks to create more robust predictions.

## 1. Setup and Imports

In [0]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import os
import warnings
warnings.filterwarnings('ignore')

# Machine learning libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier, StackingClassifier
from sklearn.metrics import accuracy_score, f1_score, classification_report, confusion_matrix
import scipy.sparse as sp

# Deep learning libraries
import torch
import torch.nn as nn

# Visualization settings
plt.style.use('ggplot')
sns.set(style='whitegrid')
%matplotlib inline

## 2. Load Models and Data

In [0]:
# For Google Colab, uncomment these lines to mount Google Drive
# from google.colab import drive
# drive.mount('/content/drive')
# features_dir = '/content/drive/MyDrive/path/to/features'
# models_dir = '/content/drive/MyDrive/path/to/models'
# results_dir = '/content/drive/MyDrive/path/to/results'

# For local development
features_dir = '../data/features'
models_dir = '../models'
results_dir = '../results'
os.makedirs(results_dir, exist_ok=True)

# Load model comparison results
model_results = pd.read_csv(os.path.join(results_dir, 'model_comparison_results.csv'))

# Get top performing models
top_models = model_results.head(5)  # Get top 5 models
print("Top 5 performing models:")
top_models

In [0]:
# Load test data
# Load labels
y = np.load(os.path.join(features_dir, 'labels.npy'))

# Load features
X_bow = sp.load_npz(os.path.join(features_dir, 'bow_features.npz'))
X_tfidf = sp.load_npz(os.path.join(features_dir, 'tfidf_features.npz'))
X_word2vec = np.load(os.path.join(features_dir, 'word2vec_features.npy'))
X_glove = np.load(os.path.join(features_dir, 'glove_features.npy'))
X_bert = np.load(os.path.join(features_dir, 'bert_features.npy'))

print("Features loaded successfully.")

# Create train/val/test splits
def split_data(X, y, test_size=0.15, val_size=0.15, random_state=42):
    # First split: training + validation vs test
    X_train_val, X_test, y_train_val, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state, stratify=y
    )
    
    # Second split: training vs validation
    # Adjust validation size to be a percentage of the training + validation set
    val_ratio = val_size / (1 - test_size)
    X_train, X_val, y_train, y_val = train_test_split(
        X_train_val, y_train_val, test_size=val_ratio, random_state=random_state, stratify=y_train_val
    )
    
    return X_train, X_val, X_test, y_train, y_val, y_test

# Get test sets
_, _, bow_test, _, _, y_test = split_data(X_bow, y)
_, _, tfidf_test, _, _, _ = split_data(X_tfidf, y, random_state=42)
_, _, w2v_test, _, _, _ = split_data(X_word2vec, y, random_state=42)
_, _, glove_test, _, _, _ = split_data(X_glove, y, random_state=42)
_, _, bert_test, _, _, _ = split_data(X_bert, y, random_state=42)

print(f"Test set: {y_test.shape[0]} samples")

In [0]:
# Function to load a saved sklearn model
def load_sklearn_model(model_path):
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    return model

# Load label encoder
with open(os.path.join(models_dir, 'label_encoder.pkl'), 'rb') as f:
    label_encoder = pickle.load(f)

# Display label encoding
print("Label Encoding:")
for i, label in enumerate(label_encoder.classes_):
    print(f"{i} -> {label}")

## 3. Get Model Predictions for Ensemble

In [0]:
# Store base model predictions
base_predictions = {}
base_probabilities = {}

# Load models and get predictions
for _, row in top_models.iterrows():
    model_name = row['Model']
    feature_name = row['Feature']
    model_key = f"{model_name}_{feature_name}"
    
    print(f"Loading {model_name} with {feature_name}...")
    
    # Get appropriate test features
    if feature_name == 'BoW':
        X_test_features = bow_test
    elif feature_name == 'TF-IDF':
        X_test_features = tfidf_test
    elif feature_name == 'Word2Vec':
        X_test_features = w2v_test
    elif feature_name == 'GloVe':
        X_test_features = glove_test
    elif feature_name == 'BERT':
        X_test_features = bert_test
    
    # Load model (skip BiLSTM for simplicity)
    if model_name == 'BiLSTM':
        print("Skipping BiLSTM for this ensemble example...")
        continue
        
    # Load sklearn model
    model_path = os.path.join(models_dir, f'{model_name.lower()}_{feature_name.lower()}.pkl')
    try:
        model = load_sklearn_model(model_path)
        
        # Get predictions
        y_pred = model.predict(X_test_features)
        base_predictions[model_key] = y_pred
        
        # Get probabilities if available
        if hasattr(model, 'predict_proba'):
            y_proba = model.predict_proba(X_test_features)
            base_probabilities[model_key] = y_proba
        
        # Check accuracy
        accuracy = accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred, average='weighted')
        print(f"  Accuracy: {accuracy:.4f}, F1 Score: {f1:.4f}")
    except FileNotFoundError:
        print(f"  Model file not found: {model_path}")
        continue

## 4. Ensemble Methods

### 4.1 Simple Majority Voting

In [0]:
# Implement majority voting
def majority_vote(predictions_dict):
    # Stack all predictions
    all_preds = np.column_stack([predictions_dict[model_key] for model_key in predictions_dict])
    
    # Get majority vote
    majority_preds = np.zeros(all_preds.shape[0], dtype=int)
    for i in range(all_preds.shape[0]):
        unique, counts = np.unique(all_preds[i], return_counts=True)
        majority_preds[i] = unique[np.argmax(counts)]
    
    return majority_preds

# Apply majority voting
majority_preds = majority_vote(base_predictions)

# Evaluate
accuracy = accuracy_score(y_test, majority_preds)
f1 = f1_score(y_test, majority_preds, average='weighted')
print(f"Majority Voting Ensemble - Accuracy: {accuracy:.4f}, F1 Score: {f1:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, majority_preds, target_names=label_encoder.classes_))

# Plot confusion matrix
plt.figure(figsize=(8, 6))
conf_matrix = confusion_matrix(y_test, majority_preds)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
            xticklabels=label_encoder.classes_,
            yticklabels=label_encoder.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title(f'Confusion Matrix - Majority Voting Ensemble')
plt.tight_layout()
plt.savefig(os.path.join(results_dir, 'confusion_matrix_majority_voting.png'))
plt.show()

### 4.2 Weighted Voting (Blending)

In [0]:
# Weighted voting based on validation performance
def weighted_vote(probabilities_dict, weights_dict):
    # Initialize weighted probabilities
    weighted_proba = np.zeros((next(iter(probabilities_dict.values())).shape[0], 
                                next(iter(probabilities_dict.values())).shape[1]))
    
    # Get weighted sum of probabilities
    total_weight = 0
    for model_key, weight in weights_dict.items():
        if model_key in probabilities_dict:
            weighted_proba += probabilities_dict[model_key] * weight
            total_weight += weight
    
    # Normalize by total weight
    if total_weight > 0:
        weighted_proba /= total_weight
    
    # Get class with highest probability
    return np.argmax(weighted_proba, axis=1)

# Define weights based on validation performance
# For simplicity, we'll use the F1 scores from model_results
weights = {}
for _, row in top_models.iterrows():
    model_name = row['Model']
    feature_name = row['Feature']
    model_key = f"{model_name}_{feature_name}"
    
    # Skip BiLSTM for this example
    if model_name == 'BiLSTM':
        continue
        
    # Use F1 score as weight
    weights[model_key] = row['F1 Score']

print("Model weights based on validation F1 scores:")
for model_key, weight in weights.items():
    print(f"{model_key}: {weight:.4f}")

# Apply weighted voting
weighted_preds = weighted_vote(base_probabilities, weights)

# Evaluate
accuracy = accuracy_score(y_test, weighted_preds)
f1 = f1_score(y_test, weighted_preds, average='weighted')
print(f"\nWeighted Voting Ensemble - Accuracy: {accuracy:.4f}, F1 Score: {f1:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, weighted_preds, target_names=label_encoder.classes_))

# Plot confusion matrix
plt.figure(figsize=(8, 6))
conf_matrix = confusion_matrix(y_test, weighted_preds)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
            xticklabels=label_encoder.classes_,
            yticklabels=label_encoder.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title(f'Confusion Matrix - Weighted Voting Ensemble')
plt.tight_layout()
plt.savefig(os.path.join(results_dir, 'confusion_matrix_weighted_voting.png'))
plt.show()

### 4.3 Model Stacking (Simplified)

In [0]:
# For a complete stacking implementation, you would train base models on a training set,
# generate predictions on a validation set, then train a meta-classifier on those predictions.
# Here we'll use a simplified approach for demonstration purposes.

# Create a dataframe with base model predictions
base_preds_df = pd.DataFrame()
for model_key, preds in base_predictions.items():
    base_preds_df[model_key] = preds

# Split into train/test for meta-classifier
X_meta_train, X_meta_test, y_meta_train, y_meta_test = train_test_split(
    base_preds_df, y_test, test_size=0.3, random_state=42, stratify=y_test
)

# Train meta-classifier
meta_clf = LogisticRegression(max_iter=1000)
meta_clf.fit(X_meta_train, y_meta_train)

# Make predictions
stacking_preds = meta_clf.predict(X_meta_test)

# Evaluate
accuracy = accuracy_score(y_meta_test, stacking_preds)
f1 = f1_score(y_meta_test, stacking_preds, average='weighted')
print(f"Stacking Ensemble - Accuracy: {accuracy:.4f}, F1 Score: {f1:.4f}")
print("\nClassification Report:")
print(classification_report(y_meta_test, stacking_preds, target_names=label_encoder.classes_))

# Plot confusion matrix
plt.figure(figsize=(8, 6))
conf_matrix = confusion_matrix(y_meta_test, stacking_preds)
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues',
            xticklabels=label_encoder.classes_,
            yticklabels=label_encoder.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title(f'Confusion Matrix - Stacking Ensemble')
plt.tight_layout()
plt.savefig(os.path.join(results_dir, 'confusion_matrix_stacking.png'))
plt.show()

## 5. Compare Ensemble Results

In [0]:
# Create a comparison table
# First, extract base model performance
base_results = []
for model_key, preds in base_predictions.items():
    accuracy = accuracy_score(y_test, preds)
    f1 = f1_score(y_test, preds, average='weighted')
    base_results.append({
        'Model': model_key,
        'Type': 'Base',
        'Accuracy': accuracy,
        'F1 Score': f1
    })

# Then add ensemble results
ensemble_results = [
    {
        'Model': 'Majority Voting',
        'Type': 'Ensemble',
        'Accuracy': accuracy_score(y_test, majority_preds),
        'F1 Score': f1_score(y_test, majority_preds, average='weighted')
    },
    {
        'Model': 'Weighted Voting',
        'Type': 'Ensemble',
        'Accuracy': accuracy_score(y_test, weighted_preds),
        'F1 Score': f1_score(y_test, weighted_preds, average='weighted')
    },
    {
        'Model': 'Stacking',
        'Type': 'Ensemble',
        'Accuracy': accuracy_score(y_meta_test, stacking_preds),
        'F1 Score': f1_score(y_meta_test, stacking_preds, average='weighted')
    }
]

# Combine results
all_results = pd.DataFrame(base_results + ensemble_results)

# Sort by F1 score
all_results = all_results.sort_values('F1 Score', ascending=False).reset_index(drop=True)

# Display results
print("Model Performance Comparison:")
all_results

In [0]:
# Visualize results
plt.figure(figsize=(12, 6))
sns.barplot(x='Model', y='F1 Score', hue='Type', data=all_results)
plt.title('Model Performance Comparison', fontsize=15)
plt.xlabel('Model', fontsize=12)
plt.ylabel('F1 Score', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig(os.path.join(results_dir, 'ensemble_comparison.png'))
plt.show()

## 6. Save Best Ensemble Model

In [0]:
# Identify best ensemble method
best_ensemble = all_results[all_results['Type'] == 'Ensemble'].iloc[0]['Model']
print(f"Best ensemble method: {best_ensemble}")

# For weighted voting, save the weights
if best_ensemble == 'Weighted Voting':
    with open(os.path.join(models_dir, 'weighted_ensemble_weights.pkl'), 'wb') as f:
        pickle.dump(weights, f)
    print(f"Weighted ensemble weights saved to {os.path.join(models_dir, 'weighted_ensemble_weights.pkl')}")

# For stacking, save the meta-classifier
if best_ensemble == 'Stacking':
    with open(os.path.join(models_dir, 'stacking_meta_classifier.pkl'), 'wb') as f:
        pickle.dump(meta_clf, f)
    print(f"Stacking meta-classifier saved to {os.path.join(models_dir, 'stacking_meta_classifier.pkl')}")

## 7. Summary and Conclusions

In this notebook, we explored three ensemble methods for combining multiple sentiment analysis models:

1. **Majority Voting**: A simple approach that takes the most common prediction across base models.
2. **Weighted Voting (Blending)**: A more sophisticated approach that weights each model's prediction by its validation performance.
3. **Stacking**: A meta-learning approach that trains a higher-level model on the predictions of base models.

Key findings:

- The best single model was [FILL IN BASED ON RESULTS].
- The best ensemble method was [FILL IN BASED ON RESULTS].
- Ensemble methods [DID/DID NOT] improve over the best individual model by [PERCENTAGE].

The benefits of ensemble methods include:
- Reduced variance and increased robustness
- Better generalization across different types of data
- Potential for higher performance by combining complementary models

For production deployment, we recommend using [FILL IN BASED ON RESULTS] due to its balance of performance and complexity.