# Fine-Tune a Transformer for Sentiment Analysis

I chose DistilBert and the IMDB movie reviews dataset. I used the Trainer API from the Transformers library to facilitate the fine-tuning process.

---



# Why DistilBERT?

DistilBERT is a lightweight version of BERT (Bidirectional Encoder Representations from Transformers). It retains 97% of BERT’s performance but is 60% smaller and 60% faster.

Faster training and inference: Because it’s smaller, DistilBERT is quicker to fine-tune and requires less computational power. This is ideal for smaller datasets or limited resources.

Good baseline performance: Despite being distilled (a process that compresses the model), it still performs well on a wide range of NLP tasks.


# Why the Trainer API?



Reduces Boilerplate Code: Instead of manually coding training loops, data loaders, and logging, you get these features built-in, reducing the setup time and lines of code.

Flexibility: It allows customizing the training parameters, evaluation metrics, logging frequency, and checkpointing—making it suitable for many kinds of tasks.

Optimized for Transformers: Specifically designed to work efficiently with Transformers, Trainer handles tasks like gradient clipping and distributed training which are often required for larger models.




In [5]:
pip uninstall sympy -y


Found existing installation: sympy 1.13.2
Uninstalling sympy-1.13.2:
  Successfully uninstalled sympy-1.13.2
Note: you may need to restart the kernel to use updated packages.


In [10]:
! pip install sympy==1.13.1 



Collecting sympy==1.13.1
  Using cached sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Using cached sympy-1.13.1-py3-none-any.whl (6.2 MB)
Installing collected packages: sympy
Successfully installed sympy-1.13.1


In [7]:
pip install transformers datasets


Note: you may need to restart the kernel to use updated packages.


In [12]:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification, Trainer, TrainingArguments, EarlyStoppingCallback
from datasets import load_dataset


In [2]:
import torch

# Check if GPU is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available")
else:
    device = torch.device("cpu")
    print("Using CPU")


GPU is available


In [3]:
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=2).to(device)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
dataset = load_dataset("imdb")


In [6]:
def preprocess(data):
    return tokenizer(data["text"], padding="max_length", truncation=True)

tokenized_dataset = dataset.map(preprocess, batched=True)
train_dataset = tokenized_dataset["train"]
test_dataset = tokenized_dataset["test"]


Map: 100%|██████████| 25000/25000 [01:04<00:00, 385.08 examples/s]
Map: 100%|██████████| 25000/25000 [01:03<00:00, 395.29 examples/s]
Map: 100%|██████████| 50000/50000 [02:09<00:00, 386.11 examples/s]


## Fine-tuning

In [10]:
training_args = TrainingArguments(
    output_dir="./results",                     # Directory to save model checkpoints
    eval_strategy="epoch",                 # Evaluate at the end of each epoch
    save_strategy="epoch",                       # Save the model at the end of each epoch
    load_best_model_at_end=True,                # Load the best model at the end of training
    metric_for_best_model="eval_loss",      # Specify the metric to monitor
    greater_is_better=True,                      # Whether a higher metric is better
    num_train_epochs=5,                          # Number of training epochs
    per_device_train_batch_size=16,              # Batch size for training
    per_device_eval_batch_size=16,               # Batch size for evaluation
    logging_dir="./logs",         
    report_to="none",
   
)


In [18]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    callbacks = [EarlyStoppingCallback(early_stopping_patience=2)]
    
     
)



Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [19]:
trainer.train()



Epoch,Training Loss,Validation Loss
1,0.035,0.522299
2,0.0314,0.39684
3,0.0168,0.479441




TrainOutput(global_step=2346, training_loss=0.02637208067039496, metrics={'train_runtime': 958.1282, 'train_samples_per_second': 130.463, 'train_steps_per_second': 4.081, 'total_flos': 9935054899200000.0, 'train_loss': 0.02637208067039496, 'epoch': 3.0})

## Evaluation

In [20]:
eval_results = trainer.evaluate()
#print(f"Test Accuracy: {eval_results['eval_accuracy']:.2f}")




In [21]:
print(f"Test loss: {eval_results['eval_loss']:.2f}")
print(eval_results)

Test loss: 0.52
{'eval_loss': 0.5222985744476318, 'eval_runtime': 82.4202, 'eval_samples_per_second': 303.324, 'eval_steps_per_second': 9.488, 'epoch': 3.0}


# Demo

In [25]:
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

# Load the fine-tuned model and tokenizer
model_name = "distilbert-base-uncased"
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained("/work/yqu720/results/checkpoint-3910").to(device)  # Path to your fine-tuned model

# Function to predict label and score for a given sentence
def predict(sentence):
    # Preprocess the input sentence
    inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True, max_length=512)

    # Move inputs to the GPU if available
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Set the model to evaluation mode and disable gradient calculation
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)

    # Get logits and apply softmax to get probabilities
    logits = outputs.logits
    probabilities = torch.nn.functional.softmax(logits, dim=-1)

    # Get predicted label and score
    predicted_label = torch.argmax(probabilities, dim=-1).item()
    score = probabilities[0][predicted_label].item()

    return predicted_label, score




In [27]:
# Example usage
import random

# Randomly sample 10 examples from the test dataset
sample_indices = random.sample(range(len(test_dataset)), 10)
sample_sentences = [test_dataset[i]["text"] for i in sample_indices]


for sentence in sample_sentences:
    predict_label, score = predict(sentence)
    print(f"sentence: {sentence}  \n predicted label: {predict_label}  score: {score}")


# input_sentence = "This movie was fantastic! I really enjoyed it."
# predicted_label, score = predict(input_sentence)

# print(f"Predicted Label: {predicted_label}, Score: {score:.4f}")

sentence: This film is probably the best new French film I've seen in this century so far. There have been some great ones including Noe's Irreversible, Green's Le Pont des Arts and Hadzihalilovic's Innocencebut none of them come close to Les Amants Reguliers' timeless glory.<br /><br />The movie is a description of the events of May 68 and what followed in the wake of it and furthermore it is and update of, and a homage to, the Nouvelle Vague-movies of those days. Concerning the depiction of the riots in Paris the movie is meticulously accurate (I'm only 19 and I wasn't there myself but you know what I mean)and the almost real-time and very long riot scenes set the stage perfectly for the aftermath of the events in the streets of Paris. The riots are not glorified or beautifully photographed like the ones in Bertolucci's The Dreamers (to which the movie is comparable in many ways) instead they are filmed in grimy black and white shots courtesy of the excellent William Lubtchansky. The