# Comparing ELECTRA, ALBERT, DistilBERT, and TinyBERT for YouTube Comment Sentiment Analysis

---

## 1. DistilBERT: Distillation-Based Compression

**How:**  
DistilBERT uses *knowledge distillation* — training a smaller student model to mimic a larger BERT teacher.

**Why Interesting:**  
- Achieves about 40% fewer parameters and 60% faster inference while retaining around 97% of BERT’s performance.  
- Illustrates simple but effective model compression via distillation.

---

## 2. ALBERT: Parameter Sharing & Factorization

**How:**  
ALBERT reduces model size by *sharing parameters across layers* and *factorizing embeddings*.

**Why Interesting:**  
- Dramatically reduces the number of parameters without heavily impacting capacity.  
- Introduces *sentence-order prediction* to improve pretraining efficiency.  
- Shows that *architectural changes* (not just distillation) can yield compact, fast, yet strong models.

---

## 3. ELECTRA: Efficient Pretraining via Discriminators

**How:**  
ELECTRA replaces masked token prediction with a *replaced token detection* task, where a generator replaces some tokens and a discriminator predicts which tokens were replaced.

**Why Interesting:**  
- Makes pretraining more sample-efficient and faster to converge.  
- Smaller ELECTRA models often outperform comparable-sized BERTs despite using fewer compute resources during training.  
- Highlights innovation in *pretraining objectives* rather than model size or architecture alone.

---

## 4. TinyBERT: Distillation with Layer-Wise Compression

**How:**  
TinyBERT applies *knowledge distillation* focusing on both *transformer layer compression* and *embedding compression*, using a two-stage distillation from both the encoder and prediction layers.

**Why Interesting:**  
- Produces a very compact model with fewer layers (typically 4 or 6), substantially reducing size and latency.  
- Maintains strong performance close to larger BERT models on various NLP tasks including sentiment analysis.  
- Designed specifically for deployment on resource-constrained devices, balancing speed and accuracy.

---

## Why Compare These?

- They represent **complementary approaches** to making transformers faster and lighter:  
  - Distillation (DistilBERT, TinyBERT)  
  - Parameter efficiency (ALBERT)  
  - Pretraining objective redesign (ELECTRA)  
- Comparing accuracy, speed, size, and resource consumption on the same tasks reveals trade-offs important for real applications — especially on resource-restricted devices.  
- Helps practitioners **choose the best fit** for their particular constraints (e.g., mobile deployment vs. cloud inference).  
- Informs **future model design** by highlighting which efficiency techniques work best in which contexts.

# Import Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict
from transformers import AutoTokenizer, TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification as SeqModClf
import torch
import numpy 
from sklearn.metrics import accuracy_score, f1_score
from sklearn.utils import resample

In [None]:
import transformers
print(transformers.__version__)

# Load Dataset

In [None]:
dataset = pd.read_csv("YoutubeCommentsDataSet.csv")

dataset.head()

In [None]:
dataset.info()

The dataset has 18,408 rows and 2 columns: "Comment" (with 18,364 non-null text entries) and "Sentiment" (fully populated with sentiment labels). Both columns contain text data.

# Preprocessing

## Handle Null values

In [None]:
dataset.isnull().sum()

In [None]:
dataset.dropna()

# Keep only the comments in english

In [None]:
from langdetect import detect, DetectorFactory
from langdetect.lang_detect_exception import LangDetectException

DetectorFactory.seed = 0

def is_english(text):
    try:
        return detect(str(text)) == "en"
    except LangDetectException:
        return False

dataset['is_english'] = dataset['Comment'].apply(is_english)
dataset = dataset[dataset['is_english']]
dataset = dataset[['Comment', 'Sentiment']]
dataset = dataset.reset_index(drop=True)

# Downscale the positive value and upscale the negative value

In [None]:
dataset['Sentiment'].value_counts()

In [None]:
df_positive = dataset[dataset['Sentiment'] == 'positive']
df_neutral = dataset[dataset['Sentiment'] == 'neutral']
df_negative = dataset[dataset['Sentiment'] == 'negative']


df_positive_downsampled = resample(df_positive,
                                   replace=False,
                                   n_samples=3319,
                                   random_state=42)

df_negative_upsampled = resample(df_negative,
                                 replace=True,     
                                 n_samples=3319,
                                 random_state=42)

dataset = pd.concat([df_positive_downsampled, df_neutral, df_negative_upsampled])
dataset = dataset.sample(frac=1, random_state=42).reset_index(drop=True)

print(dataset['Sentiment'].value_counts())

## Label Encoding
Negative = 0
Neutral = 1 
Positive = 2

To ensure consistent and clear mapping of sentiment categories to numbers across all models, which helps fairly compare their performance since they all work with the same standardized numeric labels.

In [None]:
#encode train & test
def encode_labels(dataset):
     dataset['Sentiment'] = dataset['Sentiment'].replace({'negative':0,'neutral':1,'positive':2})
     return dataset

encoded_dataset = encode_labels(dataset)

encoded_dataset.head()

## Renaming column

In [None]:
encoded_dataset = encoded_dataset.rename(columns={'Comment': 'text', 'Sentiment': 'label'})

In [None]:
encoded_dataset.head()

## Train-test split
Allocate 30% of  data for testing and 70% for training, which provides a balanced split that allows enough data for the model to learn while reserving a sizable portion to reliably evaluate its performance on unseen examples.

In [None]:
train, test = train_test_split(encoded_dataset, test_size = 0.3, 
                               random_state = 42, 
                               stratify = encoded_dataset['label'])

train.to_csv("train.csv", index=True)
test.to_csv("test.csv", index=True)

## Conver pandas dataset to HuggingFace dataset
Hugging Face offers specialized, integrated support for transformer models, including standardized evaluation metrics, easy access to pretrained models, and streamlined training/evaluation pipelines—making model comparison more efficient, consistent, and tailored for NLP tasks than general-purpose pandas operations.

In [None]:
train_hf = Dataset.from_pandas(train.reset_index(drop= True))
test_hf = Dataset.from_pandas(test.reset_index(drop= True))

In [None]:
print(train_hf[:1])

In [None]:
print(test_hf[:1])

In [None]:
datasets = DatasetDict({ 'train': train_hf, 'test' : test_hf})

# Modelling

## DistilBERT model

### Load

In [None]:
from transformers import DistilBertTokenizer

tokenizer_distilbert = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')

model = SeqModClf.from_pretrained('distilbert-base-uncased', num_labels = 3)

device  =  torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(f'model moved to {device}')

### Tokenize

In [None]:
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')

def tokenizer_function(dataset):
    comments = [str(comment) for comment in dataset['text']]
    return tokenizer(comments, padding = 'max_length',truncation = True)

tokenized_datasets = datasets.map(tokenizer_function, batched= True)

print(tokenized_datasets['train'][0])

## Tokenize

In [None]:
training_args = TrainingArguments(
    output_dir = './distilbert',
    num_train_epochs = 3,
    per_device_train_batch_size = 8,
    per_device_eval_batch_size = 16,
    warmup_steps = 500,
    weight_decay = 0.01,
    eval_strategy = 'epoch',
    save_strategy = 'epoch',
    load_best_model_at_end = True,
    metric_for_best_model = 'eval_f1_score',
    greater_is_better = True,
    report_to = 'none',
)

def compute_metrics(pred):
    logits, labels = pred
    predictions = numpy.argmax(logits, axis = -1)
    Accuracy_score = accuracy_score(labels,predictions)
    F1_score = f1_score(labels, predictions, average = 'weighted')
    return {'accuracy_score':Accuracy_score,'f1_score':F1_score}

trainer_distilbert = Trainer(
    model = model,
    args = training_args,
    train_dataset = tokenized_datasets['train'],
    eval_dataset = tokenized_datasets['test'],
    tokenizer = tokenizer,
    compute_metrics = compute_metrics
)

trainer_distilbert.train()

## ELECTRA model

### Load

In [None]:
from transformers import ElectraTokenizer, ElectraForSequenceClassification

tokenizer_electra = ElectraTokenizer.from_pretrained('google/electra-base-discriminator')

model_electra = ElectraForSequenceClassification.from_pretrained('google/electra-base-discriminator', num_labels=3)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_electra.to(device)
print(f'Model moved to {device}')

### Tokenize

In [None]:
def electra_tokenizer_function(dataset):
    comments = [str(comment) for comment in dataset['text']]
    return tokenizer_electra(comments, padding='max_length', truncation=True)

# Tokenize raw datasets for ELECTRA
tokenized_datasets_electra = datasets.map(electra_tokenizer_function, batched=True)

# Remove the original 'text' column
tokenized_datasets_electra = tokenized_datasets_electra.remove_columns(['text'])

# Set the format to PyTorch tensors
tokenized_datasets_electra.set_format('torch')

# Check the first tokenized example in the train split
print(tokenized_datasets_electra['train'][0])

### Train

In [None]:
training_args = TrainingArguments(
    output_dir='./electra',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    eval_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='eval_f1_score',
    greater_is_better=True,
    report_to='none'
)

def compute_metrics(pred):
    logits, labels = pred
    predictions = numpy.argmax(logits, axis=-1)
    Accuracy_score = accuracy_score(labels, predictions)
    F1_score = f1_score(labels, predictions, average='weighted')
    return {'accuracy_score': Accuracy_score, 'f1_score': F1_score}

trainer_electra = Trainer(
    model=model_electra,
    args=training_args,
    train_dataset=tokenized_datasets_electra['train'],
    eval_dataset=tokenized_datasets_electra['test'],
    tokenizer=tokenizer_electra,
    compute_metrics=compute_metrics
)

trainer_electra.train()

## ALBERT model

### Load

In [None]:
from transformers import AlbertTokenizer, AlbertForSequenceClassification

tokenizer_albert = AlbertTokenizer.from_pretrained('albert-base-v2')

model_albert = AlbertForSequenceClassification.from_pretrained('albert-base-v2', num_labels=3)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_albert.to(device)
print(f'Model moved to {device}')

### Tokenize

In [None]:
from transformers import AutoTokenizer

# Load ALBERT tokenizer
tokenizer_albert = AutoTokenizer.from_pretrained('albert-base-v2')

def albert_tokenizer_function(dataset):
    comments = [str(comment) for comment in dataset['text']]
    return tokenizer_albert(comments, padding='max_length', truncation=True)

# Tokenize raw datasets for ALBERT
tokenized_datasets_albert = datasets.map(albert_tokenizer_function, batched=True)

# Remove the original 'text' column
tokenized_datasets_albert = tokenized_datasets_albert.remove_columns(['text'])

# Set the format to PyTorch tensors
tokenized_datasets_albert.set_format('torch')

# Check the first tokenized example in the train split
print(tokenized_datasets_albert['train'][0])

### Train

In [None]:
training_args = TrainingArguments(
    output_dir='./albert',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    eval_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='eval_f1_score',
    greater_is_better=True,
    report_to='none'
)

def compute_metrics(pred):
    logits, labels = pred
    predictions = numpy.argmax(logits, axis=-1)
    Accuracy_score = accuracy_score(labels, predictions)
    F1_score = f1_score(labels, predictions, average='weighted')
    return {'accuracy_score': Accuracy_score, 'f1_score': F1_score}

trainer_albert = Trainer(
    model=model_albert,
    args=training_args,
    train_dataset=tokenized_datasets_albert['train'],
    eval_dataset=tokenized_datasets_albert['test'],
    tokenizer=tokenizer_albert,
    compute_metrics=compute_metrics
)

trainer_albert.train()

## TinyBERT model

### Load

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

okenizer_tinybert = AutoTokenizer.from_pretrained('huawei-noah/TinyBERT_General_4L_312D')

model_tinybert = AutoModelForSequenceClassification.from_pretrained('huawei-noah/TinyBERT_General_4L_312D', num_labels=3)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_tinybert.to(device)
print(f'Model moved to {device}')

### Tokenize

In [None]:
from transformers import AutoTokenizer

# Load TinyBERT tokenizer - use the appropriate TinyBERT checkpoint
tokenizer_tinybert = AutoTokenizer.from_pretrained('huawei-noah/TinyBERT_General_4L_312D')

def tinybert_tokenizer_function(dataset):
    comments = [str(comment) for comment in dataset['text']]
    return tokenizer_tinybert(comments, padding='max_length', truncation=True, max_length=512)

# Tokenize raw datasets for TinyBERT
tokenized_datasets_tinybert = datasets.map(tinybert_tokenizer_function, batched=True)

# Remove the original 'text' column
tokenized_datasets_tinybert = tokenized_datasets_tinybert.remove_columns(['text'])

# Set the format to PyTorch tensors
tokenized_datasets_tinybert.set_format('torch')

# Check the first tokenized example in the train split
print(tokenized_datasets_tinybert['train'][0])

### Train

In [None]:
# Training arguments (adjust output_dir as needed)
training_args_tinybert = TrainingArguments(
    output_dir='./tinybert',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    eval_strategy='epoch',
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='eval_f1_score',
    greater_is_better=True,
    report_to='none'
)

# Metrics function (same as ALBERT)
def compute_metrics(pred):
    logits, labels = pred
    predictions = numpy.argmax(logits, axis=-1)
    Accuracy_score = accuracy_score(labels, predictions)
    F1_score = f1_score(labels, predictions, average='weighted')
    return {'accuracy_score': Accuracy_score, 'f1_score': F1_score}

# Assuming you have TinyBERT-tokenized datasets
trainer_tinybert = Trainer(
    model=model_tinybert,
    args=training_args_tinybert,
    train_dataset=tokenized_datasets_tinybert['train'],
    eval_dataset=tokenized_datasets_tinybert['test'],
    tokenizer=tokenizer_tinybert,
    compute_metrics=compute_metrics
)

trainer_tinybert.train()

# Evaluation

## Metrics

In [None]:
distilbert_metrics = trainer_distilbert.evaluate()
print("DistilBERT eval metrics:", distilbert_metrics)
electra_metrics = trainer_electra.evaluate()
print("ELECTRA eval metrics:", electra_metrics)
albert_metrics = trainer_albert.evaluate()
print("ALBERT eval metrics:", albert_metrics)
tinybert_metrics = trainer_tinybert.evaluate()
print("TinyBERT eval metrics:", tinybert_metrics)

In [None]:
data = {
    "Model": ["DistilBERT", "ELECTRA", "ALBERT", "TinyBERT"],
    "Accuracy": [
        distilbert_metrics.get("eval_accuracy_score"),
        electra_metrics.get("eval_accuracy_score"),
        albert_metrics.get("eval_accuracy_score"),
        tinybert_metrics.get("eval_accuracy_score"),
    ],
    "F1 Score": [
        distilbert_metrics.get("eval_f1_score"),
        electra_metrics.get("eval_f1_score"),
        albert_metrics.get("eval_f1_score"),
        tinybert_metrics.get("eval_f1_score"),
    ],
}

df_metrics = pd.DataFrame(data)
print(df_metrics)

## Plot chart

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Added one more color for TinyBERT
colors = ["skyblue", "orange", "green", "purple"]  # customize as you like

axes[0].bar(df_metrics["Model"], df_metrics["Accuracy"], color=colors)
axes[0].set_title("Model Accuracy")
axes[0].set_ylim(0, 1)
axes[0].set_ylabel("Accuracy")

axes[1].bar(df_metrics["Model"], df_metrics["F1 Score"], color=colors)
axes[1].set_title("Model F1 Score (Weighted)")
axes[1].set_ylim(0, 1)
axes[1].set_ylabel("F1 Score")

plt.show()

## Best model based on F1-score
Comparing models based on F1-score is ideal because it balances precision and recall, providing a more comprehensive measure of performance—especially for imbalanced datasets—than accuracy alone.

In [None]:
# Find the best model name based on F1 Score
best_model_name = df_metrics.loc[df_metrics['F1 Score'].idxmax(), 'Model']

# Map model name to your actual model object
model_map = {
    'DistilBERT': trainer_distilbert.model,
    'ELECTRA': trainer_electra.model,
    'ALBERT': trainer_albert.model,
    'TinyBERT': trainer_tinybert.model  
}

# Save the best model to variable 'model'
model = model_map[best_model_name]

# Get row with best model metrics
best_metrics = df_metrics[df_metrics['Model'] == best_model_name].iloc[0]

print(f"Conclusion: The best model is **{best_model_name}** "
      f"with an F1 Score of {best_metrics['F1 Score']:.4f} "
      f"and an Accuracy of {best_metrics['Accuracy']:.4f}.")

print(f"Saved the best model ({best_model_name}) to variable 'model'.")

# Save model

In [None]:
model.save_pretrained('./bestModel/youtube-sentiment-model')
tokenizer.save_pretrained('./bestModel/youtube-sentiment-model')

# Inference

In [None]:
from transformers import pipeline

sentiment_classifier = pipeline(
    'sentiment-analysis',
    model='./bestModel/youtube-sentiment-model',
    tokenizer='./bestModel/youtube-sentiment-model',
    device=0 if torch.cuda.is_available() else -1
)

label_map = {
    0: 'negative',
    1: 'neutral',
    2: 'positive'
}

In [None]:
new_comments = [
    "This tutorial is fantastic and extremely helpful!",
    "I'm a bit confused by some parts of this explanation.",
    "The content is decent, not particularly good or bad."
]

predictions = sentiment_classifier(new_comments)

for comment, prediction in zip(new_comments, predictions):
    predicted_label_str = prediction['label'] 
    predicted_label_int = int(predicted_label_str.split('_')[-1])
    sentiment = label_map[predicted_label_int]
    confidence = prediction['score']
    
    print(f'Comment: "{comment}"')
    print(f'Predicted Sentiment: {sentiment} (Confidence: {confidence:.4f})\n')

In [None]:
def predict_sentiment():
    print("Type a comment to analyze its sentiment (type 'exit' to quit):")
    while True:
        user_input = input("Your comment: ")
        if user_input.lower() == 'exit':
            print("Exiting sentiment prediction.")
            break

        prediction = sentiment_classifier(user_input)[0]
        label_id = int(prediction['label'].split('_')[-1])
        confidence = prediction['score']

        sentiment = {
            0: "Negative",
            1: "Neutral",
            2: "Positive"
        }.get(label_id, "Unknown")

        print(f"\nInput: \"{user_input}\"")
        print(f"Predicted Sentiment: {sentiment} (Confidence: {confidence:.4f})\n")

predict_sentiment()