Fine-tuning pre-trained transformer models, such as BERT (Bidirectional Encoder Representations from Transformers), has become one of the most popular approaches for NLP tasks. BERT and other transformer models are pre-trained on vast amounts of text data and can be fine-tuned for specific tasks like text classification, sentiment analysis, etc.


In this practical, we'll demonstrate how to fine-tune a pre-trained transformer model (specifically BERT) for a classification task using Hugging Face's transformers library and PyTorch. The task we'll work on is Sentiment Analysis using the IMDb movie reviews dataset.

Steps:

Load and preprocess the dataset (IMDb reviews).

Load a pre-trained transformer model (BERT) and tokenizer.

Fine-tune the pre-trained model on the classification task.

Evaluate the fine-tuned model.

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset
from transformers import BertTokenizer, BertForSequenceClassification, AdamW
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.datasets import imdb

# Load the IMDb dataset using Keras
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

# Load pre-trained BERT tokenizer and model (BERT-base)
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Tokenize the data: Convert the integers back to text and then tokenize using BERT's tokenizer
def preprocess_data(data):
    texts = [" ".join([str(word) for word in review]) for review in data]
    encodings = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors='pt')
    return encodings

# Preprocess the train and test data
train_encodings = preprocess_data(train_data)
test_encodings = preprocess_data(test_data)

# Convert data to torch tensors
train_dataset = TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], torch.tensor(train_labels))
test_dataset = TensorDataset(test_encodings['input_ids'], test_encodings['attention_mask'], torch.tensor(test_labels))

# Create DataLoader for training and testing
train_dataloader = DataLoader(train_dataset, batch_size=8, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=8)

# Define the optimizer
optimizer = AdamW(model.parameters(), lr=2e-5)

# Set the model to training mode
model.train()

# Train the model
num_epochs = 3
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

for epoch in range(num_epochs):
    for batch in train_dataloader:
        optimizer.zero_grad()

        # Move batch to the same device as the model (GPU/CPU)
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)

        # Forward pass
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        logits = outputs.logits

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

# Evaluate the fine-tuned model on the test set
model.eval()  # Set the model to evaluation mode
all_preds = []
all_labels = []

for batch in test_dataloader:
    # Move batch to the same device as the model (GPU/CPU)
    input_ids = batch[0].to(device)
    attention_mask = batch[1].to(device)
    labels = batch[2].to(device)

    with torch.no_grad():  # Disable gradient computation for inference
        outputs = model(input_ids, attention_mask=attention_mask)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)

    # Collect predictions and labels for evaluation
    all_preds.extend(predictions.cpu().numpy())
    all_labels.extend(labels.cpu().numpy())

# Compute accuracy
accuracy = accuracy_score(all_labels, all_preds)
print(f"Test Accuracy: {accuracy:.4f}")


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
[1m17464789/17464789[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


xplanation of Code:

1. Dataset Loading:

We use Hugging Face's datasets library to load the IMDb movie reviews dataset. This dataset has two classes: positive and negative reviews.

The dataset is already split into training and testing sets (train and test).

2. Tokenization:

We load the BERT tokenizer (BertTokenizer.from_pretrained("bert-base-uncased")). The tokenizer converts text into tokens that BERT understands.

We define a tokenize_function to handle the tokenization for each example, including:

Padding to ensure uniform input length.

Truncation to handle longer texts (we limit the maximum length to 512 tokens).

3. Fine-tuning the BERT Model:

We load the pre-trained BERT model for sequence classification (BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)). The model is designed to perform classification tasks.
We set up the optimizer (AdamW), with a learning rate of 2e-5, which is a common starting point for fine-tuning transformer models.

We move the model to the available device (GPU or CPU).

4. Training Loop:

We train the model for 3 epochs.

For each batch:

We move the input data and labels to the same device as the model (GPU or CPU).
We perform a forward pass to get the loss and predictions.

We perform backpropagation (loss.backward()) and optimization (optimizer.step()).

We use tqdm for a progress bar to visualize the training process.

5. Evaluation:

After training, we switch the model to evaluation mode (model.eval()).
We disable gradient computation with torch.no_grad() during inference.

For each batch in the test dataset, we compute the predictions and append them to a list (all_preds).

We compute the accuracy of the model by comparing the predictions with the true labels using accuracy_score from sklearn.metrics.

Explanation of Output:

Epoch Progress: The training process runs for 3 epochs. For each epoch, the model's loss is printed to the progress bar, which helps track training progress.

Test Accuracy: After fine-tuning, the model achieves a test accuracy of 89.00% on the IMDb dataset. This means the model correctly classified 89% of the movie reviews as either positive or negative.

Conclusion:

Fine-tuning a pre-trained transformer model like BERT for text classification tasks is an effective approach for achieving high performance on NLP tasks like sentiment analysis.

By leveraging the power of pre-trained models, we can significantly reduce the time required for training and achieve high accuracy, even on smaller datasets.
Further improvements can be made by experimenting with different hyperparameters, training on more data, or exploring other transformer architectures like RoBERTa, DistilBERT, etc.