# LSTM Model Experiments
## Twitter Sentiment Analysis with Multiple Embeddings

**Student:** [Your Name]  
**Model:** LSTM  
**Task:** Implement LSTM with 3 different embedding methods

## 1. Introduction
This notebook implements Long Short-Term Memory (LSTM) networks for Twitter sentiment classification using three different word embedding techniques.

## 2. Dataset Loading

In [None]:
import pandas as pd
import numpy as np

# Load dataset
df = pd.read_csv('../data/raw/Twitter_Data.csv')
df = df.dropna(subset=['clean_text', 'category'])
df['clean_text'] = df['clean_text'].astype(str)
df['sentiment'] = df['category'].map({-1: 0, 0: 1, 1: 2})

print(f'Dataset shape: {df.shape}')
print(f'Sentiment distribution:')
print(df['sentiment'].value_counts().sort_index())
print('\n0 = Negative, 1 = Neutral, 2 = Positive')

## 3. Results Summary

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Results from experiments
results = pd.DataFrame({
    'Embedding Method': ['Random Embeddings', 'Word2Vec Skip-gram', 'Word2Vec CBOW'],
    'Accuracy': [0.9690, 0.9195, 0.9158],
    'Loss': [0.1158, 0.2558, 0.2641],
    'Training Time (min)': [26.9, 25.9, 24.7],
    'Best Class': ['Neutral (F1=0.98)', 'Neutral (F1=0.94)', 'Neutral (F1=0.94)']
})

print('=== LSTM WITH DIFFERENT EMBEDDINGS ===')
print('=' * 60)
print(results.to_string(index=False))
print('\n' + '=' * 60)

In [None]:
# Visualization
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.bar(results['Embedding Method'], results['Accuracy'], color=['blue', 'green', 'orange'])
plt.title('Accuracy Comparison')
plt.ylabel('Accuracy')
plt.ylim(0.9, 1.0)
plt.xticks(rotation=15)
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 2, 2)
plt.bar(results['Embedding Method'], results['Training Time (min)'], color=['blue', 'green', 'orange'])
plt.title('Training Time Comparison')
plt.ylabel('Time (minutes)')
plt.xticks(rotation=15)
plt.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## 4. Key Findings

1. **Random embeddings performed best** (96.9% accuracy) because they were trained specifically for the sentiment task.
2. **Word2Vec embeddings** showed good performance (91.6-91.9%), with Skip-gram slightly better than CBOW.
3. **All models struggled most with negative sentiment** classification.
4. **Training times were similar** across all methods (~25 minutes).

## 5. Implementation Files

The detailed step-by-step implementation is available in these files:
- lstm_step1.py to lstm_step11.py - Complete LSTM implementation
- lstm_final_report.py - Results analysis
- lstm_experiment_summary.txt - Experiment summary