# LSTM Model for Twitter Sentiment Analysis
## Comparative Study of Word Embedding Techniques

**Student:** Daniel Kudum  
**Team:** Sentiment Analysis Group  
**Model:** Long Short-Term Memory (LSTM) Network  
**Date:** February 8, 2026

## 1. Project Overview

This notebook documents my implementation of LSTM networks for classifying Twitter sentiment using three different word embedding approaches. The goal is to compare how different embedding techniques affect LSTM performance on sentiment analysis tasks.

**Embedding Methods Tested:**
1. **Random Embeddings** (Trained from scratch with the model)
2. **Word2Vec Skip-gram** (Pre-trained using Skip-gram architecture)
3. **Word2Vec CBOW** (Pre-trained using Continuous Bag of Words)

## 2. Dataset Overview

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better visualizations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load and explore the dataset
print(" LOADING TWITTER SENTIMENT DATASET")
print("=" * 50)

df = pd.read_csv('../data/raw/Twitter_Data.csv')

# Data cleaning
initial_count = len(df)
df = df.dropna(subset=['clean_text', 'category'])
cleaned_count = len(df)
df['clean_text'] = df['clean_text'].astype(str)

# Map sentiment labels
df['sentiment'] = df['category'].map({-1: 0, 0: 1, 1: 2})
sentiment_labels = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}

print(f" Dataset Statistics:")
print(f"    Initial samples: {initial_count:,}")
print(f"    After cleaning: {cleaned_count:,}")
print(f"    Removed samples: {initial_count - cleaned_count}")
print(f"    Feature columns: {df.columns.tolist()}")
print()

# Sentiment distribution
sentiment_dist = df['sentiment'].value_counts().sort_index()
print(" Sentiment Distribution:")
for idx, count in sentiment_dist.items():
    label = sentiment_labels[idx]
    percentage = (count / cleaned_count) * 100
    print(f"    {label}: {count:,} tweets ({percentage:.1f}%)")

# Text length analysis
df['text_length'] = df['clean_text'].apply(len)
df['word_count'] = df['clean_text'].apply(lambda x: len(x.split()))

print()
print(" Text Length Statistics:")
print(f"    Avg characters per tweet: {df['text_length'].mean():.0f}")
print(f"    Avg words per tweet: {df['word_count'].mean():.1f}")
print(f"    Max tweet length: {df['text_length'].max()} characters")

# Visualization of sentiment distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Pie chart
axes[0].pie(sentiment_dist.values, labels=[sentiment_labels[i] for i in sentiment_dist.index], 
           autopct='%1.1f%%', startangle=90, colors=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[0].set_title('Sentiment Distribution', fontsize=14, fontweight='bold')

# Bar chart
bars = axes[1].bar([sentiment_labels[i] for i in sentiment_dist.index], sentiment_dist.values, 
                  color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
axes[1].set_title('Tweet Count by Sentiment', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Number of Tweets')
axes[1].set_xlabel('Sentiment')

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    axes[1].text(bar.get_x() + bar.get_width()/2., height + 500,
                f'{int(height):,}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## 3. LSTM Implementation Summary

For each embedding method, I implemented the following LSTM architecture:

**Model Architecture:**
- **Embedding Layer:** Depends on the embedding method
- **Spatial Dropout:** 20% to prevent overfitting
- **LSTM Layer:** 128 units with 20% dropout
- **Dense Layer:** 64 units with ReLU activation
- **Dropout:** 50% regularization
- **Output Layer:** 3 units (Negative, Neutral, Positive) with softmax activation

**Training Configuration:**
- **Optimizer:** Adam
- **Loss Function:** Sparse Categorical Crossentropy
- **Batch Size:** 64
- **Validation Split:** 10%
- **Early Stopping:** Patience of 3 epochs

## 4. Experimental Results

In [None]:
# Comparative Results Analysis
print(" EXPERIMENTAL RESULTS: LSTM WITH DIFFERENT EMBEDDINGS")
print("=" * 70)

results_data = {
    'Embedding Method': ['Random Embeddings', 'Word2Vec Skip-gram', 'Word2Vec CBOW'],
    'Accuracy': [0.9690, 0.9195, 0.9158],
    'Loss': [0.1158, 0.2558, 0.2641],
    'Training Time (min)': [26.9, 25.9, 24.7],
    'Negative F1': [0.94, 0.86, 0.84],
    'Neutral F1': [0.98, 0.94, 0.94],
    'Positive F1': [0.97, 0.93, 0.93]
}

results_df = pd.DataFrame(results_data)

# Display formatted table
print("\n" + results_df.to_string(index=False))
print("\n" + "=" * 70)

# Calculate improvement percentages
random_acc = results_df.loc[0, 'Accuracy']
skipgram_acc = results_df.loc[1, 'Accuracy']
cbow_acc = results_df.loc[2, 'Accuracy']

print("\n Performance Comparison:")
print(f"    Random embeddings outperformed Skip-gram by: {(random_acc-skipgram_acc)*100:.1f}%")
print(f"    Random embeddings outperformed CBOW by: {(random_acc-cbow_acc)*100:.1f}%")
print(f"    Skip-gram vs CBOW difference: {(skipgram_acc-cbow_acc)*100:.2f}% (Skip-gram slightly better)")

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('LSTM Performance Across Different Embedding Methods', fontsize=16, fontweight='bold', y=1.02)

# 1. Accuracy Comparison
bars1 = axes[0, 0].bar(results_df['Embedding Method'], results_df['Accuracy'], 
                      color=['#2E86AB', '#A23B72', '#F18F01'])
axes[0, 0].set_title('Model Accuracy', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].set_ylim(0.85, 1.0)
axes[0, 0].tick_params(axis='x', rotation=15)
axes[0, 0].grid(axis='y', alpha=0.3)

# Add value labels
for bar in bars1:
    height = bar.get_height()
    axes[0, 0].text(bar.get_x() + bar.get_width()/2., height + 0.005,
                   f'{height:.3f}', ha='center', va='bottom', fontweight='bold')

# 2. Training Time Comparison
bars2 = axes[0, 1].bar(results_df['Embedding Method'], results_df['Training Time (min)'], 
                      color=['#2E86AB', '#A23B72', '#F18F01'])
axes[0, 1].set_title('Training Time', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Minutes')
axes[0, 1].tick_params(axis='x', rotation=15)
axes[0, 1].grid(axis='y', alpha=0.3)

# 3. F1 Scores by Sentiment Class
x = np.arange(len(results_df['Embedding Method']))
width = 0.25

axes[1, 0].bar(x - width, results_df['Negative F1'], width, label='Negative', color='#FF6B6B')
axes[1, 0].bar(x, results_df['Neutral F1'], width, label='Neutral', color='#4ECDC4')
axes[1, 0].bar(x + width, results_df['Positive F1'], width, label='Positive', color='#45B7D1')

axes[1, 0].set_title('F1 Scores by Sentiment Class', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('F1 Score')
axes[1, 0].set_xticks(x)
axes[1, 0].set_xticklabels(results_df['Embedding Method'], rotation=15)
axes[1, 0].legend()
axes[1, 0].grid(axis='y', alpha=0.3)

# 4. Loss Comparison
bars4 = axes[1, 1].bar(results_df['Embedding Method'], results_df['Loss'], 
                      color=['#2E86AB', '#A23B72', '#F18F01'])
axes[1, 1].set_title('Model Loss', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Loss')
axes[1, 1].tick_params(axis='x', rotation=15)
axes[1, 1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Key Findings & Analysis

### **Performance Insights:**
1. **Random Embeddings (96.9% accuracy)** performed best because they were trained end-to-end with the LSTM model, allowing them to learn task-specific features optimized for sentiment classification.

2. **Word2Vec Embeddings (91.6-91.9% accuracy)** showed respectable performance but lagged behind random embeddings. This suggests that general-purpose semantic embeddings are less optimal for specialized tasks like sentiment analysis.

3. **Skip-gram vs CBOW:** Skip-gram slightly outperformed CBOW (91.95% vs 91.58%), which aligns with literature suggesting Skip-gram performs better for rare words and complex patterns.

### **Class-wise Performance:**
- **Neutral sentiment** was easiest to classify across all methods (F1: 0.94-0.98)
- **Negative sentiment** was most challenging, especially for Word2Vec embeddings
- This pattern suggests that negative expressions in tweets might be more diverse or subtle

### **Training Efficiency:**
- All models trained in similar time (~25 minutes)
- Random embeddings required slightly more time due to learning embeddings from scratch
- Word2Vec embeddings had faster convergence initially but plateaued at lower accuracy

## 6. Technical Implementation Details

The complete implementation is available in these modular files:

### **Core Implementation Files:**
1. **lstm_step1.py** - Environment setup and library imports
2. **lstm_step2.py** - Scikit-learn model components
3. **lstm_step3.py** - TensorFlow/Keras deep learning setup
4. **lstm_step4.py** - Dataset loading and validation
5. **lstm_step5.py** - Data preparation and label mapping
6. **lstm_step6.py** - Text tokenization and sequence padding
7. **lstm_step7.py** - Train-test split and data partitioning
8. **lstm_step8.py** - LSTM model architecture definition
9. **lstm_step9.py** - Random embeddings implementation and training
10. **lstm_step10.py** - Word2Vec Skip-gram implementation
11. **lstm_step11.py** - Word2Vec CBOW implementation

### **Analysis Files:**
- **lstm_final_report.py** - Complete results analysis and comparison
- **lstm_experiment_summary.txt** - Concise experiment documentation

### **Reproducibility:**
All experiments were conducted with fixed random seeds (42) to ensure reproducibility. The code includes comprehensive logging, progress tracking, and automatic saving of model checkpoints.

## 7. Conclusion & Recommendations

### **Conclusions:**
1. **Task-specific embeddings outperform general-purpose embeddings** for sentiment analysis
2. **LSTM networks are highly effective** for Twitter sentiment classification (>90% accuracy with all methods)
3. **Negative sentiment detection remains challenging** and requires special attention

### **Recommendations for Future Work:**
1. **Fine-tune Word2Vec embeddings** instead of keeping them fixed during training
2. **Experiment with hybrid approaches** combining random and pre-trained embeddings
3. **Investigate attention mechanisms** to handle nuanced negative expressions
4. **Try domain-specific embeddings** trained on Twitter corpora specifically

### **Team Integration:**
These results provide a strong baseline for comparison with other team members' models (SVM, RNN, GRU). The modular implementation allows easy adaptation for additional experiments.

In [None]:
# Final summary cell
print(" LSTM EXPERIMENT SUMMARY")
print("=" * 50)
print(f"Student: Daniel Kudum")
print(f"Models Implemented: 3 LSTM variants")
print(f"Embeddings Tested: Random, Word2Vec Skip-gram, Word2Vec CBOW")
print(f"Best Performance: {results_df.loc[0, 'Accuracy']*100:.1f}% (Random Embeddings)")
print(f"Total Training Time: {results_df['Training Time (min)'].sum():.1f} minutes")
print(f"Implementation Files: 14 Python scripts")
print("=" * 50)
print("\n All implementation files are available in the notebooks/ directory.")