# Fine-Tuning a Transformer Model for Sentence Classification

In this notebook, we’ll fine-tune a pre-trained Transformer model for a sentence classification task. We’ll use the Hugging Face Transformers library along with the Datasets library to handle data loading and preprocessing.

Table of Contents

- [1. Introduction](#1-introduction)
- [2. Setup and Installation](#2-setup-and-installation)
- [3. Set Random Seed for Reproducibility](#3-set-random-seed-for-reproducibility)
- [4. Load and Explore the Dataset](#4-load-and-explore-the-dataset)
- [5. Prepare a Small Subset of the Dataset](#5-prepare-a-small-subset-of-the-dataset)
- [6. Preprocess and Tokenize the Dataset](#6-preprocess-and-tokenize-the-dataset)
- [7. Evaluate the Pre-trained Model (Baseline)](#7-evaluate-the-pre-trained-model-baseline)
- [8. Fine-Tune the Model](#8-fine-tune-the-model)
- [9. Evaluate the Fine-Tuned Model](#9-evaluate-the-fine-tuned-model)
- [10. Visualize Results](#10-visualize-results)
- [11. Conclusion](#11-conclusion)
- [12. Additional Resources](#12-additional-resources)

**What is BERT?**

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model released by Google. It can be fine-tuned for a variety of NLP tasks such as classification, entity recognition, and question answering. By adding an untrained layer on top of BERT and training it on your specific task, you can leverage its deep understanding of language to achieve excellent results.

**Why Fine-Tune BERT Instead of Building Your Own Model?**

* Easy Training: BERT’s pre-trained weights contain extensive language knowledge,
so fine-tuning requires significantly less time—often just 2-4 epochs—compared to training a model from scratch, which can take hundreds of GPU hours.

* Less Data: Fine-tuning BERT can be done with much smaller datasets, making it feasible even when large amounts of labeled data are not available.

* High Performance: Simply adding a fully connected layer on top of BERT and fine-tuning it has been shown to achieve state-of-the-art results across various tasks without the need for complex, task-specific architectures.

**A Shift in NLP**

This approach mirrors a shift previously seen in computer vision, where pre-trained models are commonly fine-tuned for specific tasks, saving time and resources. The emergence of models like BERT represents a similar transformation in NLP, making powerful language models more accessible and efficient to use.

***Let’s get started!***

## 1. Setup and Installation


In [None]:
# Install necessary libraries
!pip install --quiet --upgrade transformers datasets evaluate

In [None]:
# Import libraries
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset, DatasetDict
from evaluate import load as load_metric
import numpy as np
import random
import pandas as pd
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
# Set random seed

seed = 42
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)

## 2. Load Dataset

We’ll use the SST-2 dataset from the GLUE benchmark, which is a binary sentiment classification task.

In [None]:
# Load the SST-2 dataset
dataset = load_dataset('glue', 'sst2')

# Explore the dataset structure
print(dataset)

In [None]:
# View a sample from the training set
print("Training sample:")
print(dataset['train'][0])

# View a sample from the validation set
print("\nValidation sample:")
print(dataset['validation'][0])

## 3. Prepare a Small Subset of the Dataset


In [None]:
# Select a small subset for training and evaluation
small_train_dataset = dataset['train'].shuffle(seed=seed).select(range(500))  # 500 samples for training
small_eval_dataset = dataset['validation'].shuffle(seed=seed).select(range(100))  # 100 samples for evaluation

# Create a DatasetDict
small_dataset = DatasetDict({
    'train': small_train_dataset,
    'validation': small_eval_dataset
})


## 4. Preprocess and Tokenize the Dataset


### Choose a Pre-trained Model and Tokenizer

Let’s load a pre-trained BERT model! There are several options available, and we’ll use distilbert-base-uncased, which is a lighter and faster version of BERT. The term “uncased” means the model was trained on lowercase text only, and “distilbert” refers to a distilled version of BERT that is smaller and more efficient while retaining much of its performance.



<img src='http://jalammar.github.io/images/bert-classifier.png' width=700px>

source: [The Illustrated BERT](http://jalammar.github.io/illustrated-bert/)

We’ll use the distilbert-base-uncased model for its balance between performance and computational efficiency.

In [None]:
model_name = 'distilbert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

In [None]:
def tokenize_function(examples):
    return tokenizer(examples['sentence'], padding='max_length', truncation=True, max_length=128)

tokenized_datasets = small_dataset.map(tokenize_function, batched=True)

In [None]:
tokenized_datasets.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

## 5. Evaluate the Pre-trained Model (Baseline)

Before fine-tuning, let’s evaluate the pre-trained model to establish a baseline performance.


In [None]:
# Load the pre-trained model
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

In [None]:
metric = load_metric('accuracy')

In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = metric.compute(predictions=predictions, references=labels)
    return accuracy

In [None]:
training_args = TrainingArguments(
    output_dir='./results',
    per_device_eval_batch_size=64,
    do_train=False,
    do_eval=True,
    eval_strategy='no',
    logging_steps=10
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    eval_dataset=tokenized_datasets['validation'],
    compute_metrics=compute_metrics
)

In [None]:
eval_dataset = tokenized_datasets['validation']

In [None]:
# Evaluate the pre-trained model
eval_results = trainer.evaluate()
print(f"Baseline accuracy of the pre-trained model: {eval_results['eval_accuracy']:.4f}")

In [None]:
# Import classification_report
from sklearn.metrics import classification_report

# Generate predictions
predictions = trainer.predict(tokenized_datasets['validation'])
preds = np.argmax(predictions.predictions, axis=-1)
labels = predictions.label_ids

# Print the classification report
print("\nClassification Report:")
print(classification_report(labels, preds, zero_division=0))

## 6. Fine-Tune the Model
Now we’ll fine-tune the pre-trained model on our small training dataset.



In [None]:
# Update training arguments for fine-tuning
training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',   # Evaluation strategy set to 'epoch'
    save_strategy='epoch',   # Save strategy matches eval_strategy
    num_train_epochs=1,      # Reduced to 1 epoch for quick training
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    logging_steps=10,
    save_total_limit=1,
    load_best_model_at_end=True,
    metric_for_best_model='eval_accuracy',
    greater_is_better=True
)

In [None]:
# Re-initialize the trainer with training dataset
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation'],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

In [None]:
# Fine-tune the model
trainer.train()

## 7. Evaluate the Fine-Tuned Model


In [None]:
# Evaluate the fine-tuned model
eval_results = trainer.evaluate()
print(f"Accuracy of the fine-tuned model: {eval_results['eval_accuracy']:.4f}")

## 8. Analysis

In [None]:
# Make predictions
predictions = trainer.predict(tokenized_datasets['validation'])
preds = np.argmax(predictions.predictions, axis=-1)
labels = predictions.label_ids

In [None]:
# Print classification report
print(classification_report(labels, preds, zero_division=0))

In [None]:
# Plot confusion matrix
cm = confusion_matrix(labels, preds)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Negative', 'Positive'])
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()

## 9. Challenge: Improving the Model

#### **TODO**: Think of Ways to Improve the Results

**Your Task:**
*	**Objective**: Explore and implement strategies to improve the model’s performance as much as possible.


##### Consider the Following Approaches:
*	**Hyperparameter Tuning**: Experiment with different learning rates, batch sizes, and number of epochs.
*	Data Augmentation: Use techniques to expand or enhance the training dataset.
*	Using a Larger Dataset: Increase the number of samples in the training set.
*	Try Different Pre-trained Models: Use models like bert-base-uncased or roberta-base.
*	Adjust the Tokenization Parameters: Modify max_length, padding, and truncation settings.
*	Layer Freezing/Unfreezing: Experiment with freezing certain layers of the model during training.
*	Regularization Techniques: Apply techniques like dropout or weight decay.
*	Implement and Document:
*	Code Changes: Apply the changes in the code cells.
*	Observations: Note the impact of each change on the model’s performance.
*	Analysis: Discuss why certain changes improved or did not improve the results.

## 10. Conclusion

In this notebook, we:
*	Used a small subset of the SST-2 dataset to reduce training time.
*	Evaluated the pre-trained BERT model to establish a baseline.
*	Fine-tuned the model quickly (1 epoch).
*	Observed the improvement in accuracy after fine-tuning.
*	Visualized the results using a confusion matrix and classification report.

Key Takeaways:
*	Efficient Training: By reducing the dataset size and epochs, we can fine-tune models quickly.
*	Baseline Comparison: Evaluating before and after fine-tuning highlights the impact of training.
*	Simplified Preprocessing: Utilizing the transformers library simplifies data preprocessing.