<a href="https://colab.research.google.com/github/aainabatool/FineTuning/blob/main/FineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **FineTuning DistilBert Uncased for Emotions DataSet**

In [None]:
!pip install datasets transformers evaluate accelerate


In [None]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
import evaluate
import numpy as np


**Load DataSet**


In [None]:
dataset = load_dataset("emotion")
dataset


**Tokenization**

In [None]:
checkpoint = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=128)

encoded_dataset = dataset.map(tokenize, batched=True)


In [None]:
encoded_dataset = encoded_dataset.rename_column("label", "labels")   #Renames label → labels (Trainer expects labels).
encoded_dataset.set_format("torch", columns=["input_ids", "attention_mask", "labels"])  #Converts dataset to PyTorch tensors so Trainer can use them.

**Load Model**

“We add a classification head because DistilBERT by itself only produces embeddings. The head converts those embeddings into class probabilities (for our 6 emotions). Without it, the model can’t do classification.”

“The input sentence is first tokenized. DistilBERT processes tokens and outputs embeddings. The special [CLS] token represents the whole sentence. We pass that through a classification head (linear + softmax), which gives probabilities for each emotion class. That’s how the model predicts whether the text shows anger, joy, sadness, etc.”

In [None]:
num_labels = dataset["train"].features["label"].num_classes

model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint, num_labels=num_labels
)


Model outputs logits → converted to probabilities → used for classification.

**Metrics**

logits = raw model outputs before softmax

argmax → picks the class with highest score.

Accuracy → measures overall correctness.

F1-macro → averages performance across all classes equally (important for unbalanced datasets).

In [None]:
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    acc = accuracy.compute(predictions=predictions, references=labels)
    f1_macro = f1.compute(predictions=predictions, references=labels, average="macro")
    return {"accuracy": acc["accuracy"], "f1_macro": f1_macro["f1"]}


“No, logits are not the predictions. They are the raw, unnormalized scores output by the model. We apply softmax to convert them into probabilities, and then take the argmax to get the actual predicted class.”

**Training Arguments**

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,        #prevents overfitting.
    logging_dir="./logs",
    logging_steps=50,
    load_best_model_at_end=True,
)

**Trainer**

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)


**Train**

7065d1e896eff136eefcd1180b500eb98346a857


In [None]:
trainer.train()


**Evaluation**

In [None]:
results_finetuned = trainer.evaluate(encoded_dataset["test"])
print(results_finetuned)


# **Compare with Base (Pretrained) Model**

In [None]:
from transformers import pipeline

# Zero-shot (no fine-tuning)
clf = pipeline("text-classification", model=checkpoint, tokenizer=tokenizer)
sample = dataset["test"][0]["text"]
print("Sample text:", sample)
print("Base Model Prediction:", clf(sample))


In [None]:
clf_finetuned = pipeline("text-classification", model=model, tokenizer=tokenizer)
print("Fine-Tuned Prediction:", clf_finetuned(sample))


**Confusion Matrix**

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import ConfusionMatrixDisplay

preds = trainer.predict(encoded_dataset["test"])
y_true = preds.label_ids
y_pred = np.argmax(preds.predictions, axis=1)

ConfusionMatrixDisplay.from_predictions(y_true, y_pred, display_labels=dataset["train"].features["label"].names)
plt.show()


ows = True labels (actual emotion in dataset).

Columns = Predicted labels (what model guessed).

Diagonal values = correct predictions (good).

Off-diagonal values = misclassifications (errors).

In [None]:
from transformers import pipeline

# Load your fine-tuned model + tokenizer
emotion_clf = pipeline(
    "text-classification",
    model="./results/checkpoint-best",   # path where Trainer saved your best model
    tokenizer="distilbert-base-uncased",
    return_all_scores=True   # so we can see probabilities for all classes
)

# --- Test cases ---
texts = [
    "I am so happy to see you again!",        # joy
    "I feel really lonely and sad today.",    # sadness
    "I love spending time with my family.",   # love
    "This situation makes me so angry!",      # anger
    "I was scared walking home at night.",    # fear
    "Wow, I didn’t expect that surprise!",    # surprise
]

# Run predictions
for t in texts:
    result = emotion_clf(t)
    print(f"\nInput: {t}")
    for r in result[0]:
        print(f"{r['label']}: {r['score']:.3f}")