#Fine-Tune a Transformer for Sentiment Analysis

I chose DistilBert and the IMDB movie reviews dataset. I used the Trainer API from the Transformers library to facilitate the fine-tuning process.

---



# Why DistilBERT?

DistilBERT is a lightweight version of BERT (Bidirectional Encoder Representations from Transformers). It retains 97% of BERT’s performance but is 60% smaller and 60% faster.

Faster training and inference: Because it’s smaller, DistilBERT is quicker to fine-tune and requires less computational power. This is ideal for smaller datasets or limited resources.

Good baseline performance: Despite being distilled (a process that compresses the model), it still performs well on a wide range of NLP tasks.


# Why the Trainer API?



Reduces Boilerplate Code: Instead of manually coding training loops, data loaders, and logging, you get these features built-in, reducing the setup time and lines of code.

Flexibility: It allows customizing the training parameters, evaluation metrics, logging frequency, and checkpointing—making it suitable for many kinds of tasks.

Optimized for Transformers: Specifically designed to work efficiently with Transformers, Trainer handles tasks like gradient clipping and distributed training which are often required for larger models.




In [2]:
pip install transformers datasets


Collecting datasets
  Downloading datasets-3.0.2-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.2-py3-none-any.whl (472 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m472.7/472.7 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading

In [3]:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset


In [1]:
import torch

# Check if GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available")
else:
    device = torch.device("cpu")
    print("Using CPU")


GPU is available


In [5]:
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=2).to(device)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
dataset = load_dataset("imdb")


README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

In [7]:
def preprocess(data):
    return tokenizer(data["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess, batched=True)
train_dataset = tokenized_dataset["train"]
test_dataset = tokenized_dataset["test"]


Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

## Fine-tuning

In [18]:
training_args = TrainingArguments(
    output_dir="./results",                     # Directory to save model checkpoints
    eval_strategy="epoch",                 # Evaluate at the end of each epoch
    save_strategy="epoch",                       # Save the model at the end of each epoch
    load_best_model_at_end=True,                # Load the best model at the end of training
    metric_for_best_model="eval_accuracy",      # Specify the metric to monitor
    greater_is_better=True,                      # Whether a higher metric is better
    num_train_epochs=3,                          # Number of training epochs
    per_device_train_batch_size=8,              # Batch size for training
    per_device_eval_batch_size=8,               # Batch size for evaluation
    logging_dir="./logs",
    report_to="none",
)


In [9]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)
trainer.train()


[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Epoch,Training Loss,Validation Loss
1,0.2692,0.320887
2,0.1595,0.28316
3,0.0766,0.363954


TrainOutput(global_step=9375, training_loss=0.18559849650065105, metrics={'train_runtime': 5367.6584, 'train_samples_per_second': 13.973, 'train_steps_per_second': 1.747, 'total_flos': 9935054899200000.0, 'train_loss': 0.18559849650065105, 'epoch': 3.0})

## Evaluation

In [None]:
eval_results = trainer.evaluate()


In [15]:
print(f"Test loss: {eval_results['eval_loss']:.2f}")

Test loss: 0.36


In [14]:
print(eval_results.keys())

dict_keys(['eval_loss', 'eval_runtime', 'eval_samples_per_second', 'eval_steps_per_second', 'epoch'])


In [27]:
print(eval_results)

{'eval_loss': 0.36395397782325745, 'eval_runtime': 428.4656, 'eval_samples_per_second': 58.348, 'eval_steps_per_second': 7.293, 'epoch': 3.0}


# Demo

In [25]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Load the fine-tuned model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained("/content/results/checkpoint-9375").to(device)  # Path to your fine-tuned model

# Function to predict label and score for a given sentence
def predict(sentence):
    # Preprocess the input sentence
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=512)

    # Move inputs to the GPU if available
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Set the model to evaluation mode and disable gradient calculation
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)

    # Get logits and apply softmax to get probabilities
    logits = outputs.logits
    probabilities = torch.nn.functional.softmax(logits, dim=-1)

    # Get predicted label and score
    predicted_label = torch.argmax(probabilities, dim=-1).item()
    score = probabilities[0][predicted_label].item()

    return predicted_label, score

# Example usage
input_sentences = ["This movie was fantastic! I really enjoyed it.", " I kind of didn't like it", "I feel mixed about this"]
for input_sentence in input_sentences:
    predicted_label, score = predict(input_sentence)
    print(f"Input: {input_sentence}, Predicted Label: {predicted_label}, Score: {score}")



Input: This movie was fantastic! I really enjoyed it., Predicted Label: 1, Score: 0.9995841383934021
Input:  I kind of didn't like it, Predicted Label: 0, Score: 0.999371349811554
Input: I feel mixed about this, Predicted Label: 1, Score: 0.9913030862808228
