# Fine-Tuning a Transformer for Sentiment Analysis
Goal: Train a model to classify IMDb movie reviews as "positive" or "negative".


**Cell 1: Install Necessary Libraries**

In [None]:
# In a Colab cell
!pip install --upgrade transformers datasets accelerate

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

**Cell 2: Imports and GPU Check**

In [None]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer
)

# Check if a GPU is available and set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


**Cell 3: Load and Prepare the Dataset**

In [None]:
# Load the dataset
dataset = load_dataset("imdb")

# The dataset has 'train' and 'test' splits. We'll create a smaller sample for a quicker example.
# Let's take 10k for training and 2k for testing to speed things up.
train_dataset = dataset["train"].shuffle(seed=42).select(range(25000))
test_dataset = dataset["test"].shuffle(seed=42).select(range(2500))

print("Sample training data:")
print(train_dataset[0])

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Sample training data:
{'text': 'There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...', 'label': 1}


**Cell 4: Preprocessing with a Tokenizer**

In [None]:
# Load the tokenizer associated with the pre-trained model
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a function to tokenize the text data
def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

# Apply the tokenizer to our datasets
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True)
tokenized_test_dataset = test_dataset.map(tokenize_function, batched=True)

# The tokenizer adds new columns: 'input_ids', 'attention_mask'
print("\nSample tokenized data:")
print(tokenized_train_dataset[0])

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]


Sample tokenized data:
{'text': 'There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...', 'label': 1, 'input_ids': [101, 2045, 2003, 2053, 7189, 2012, 2035, 2090, 3481, 3771, 1998, 6337, 2099, 2021, 1996, 2755, 2008, 2119, 2024, 2610, 2186, 2055, 6355, 6997, 1012, 6337, 2099, 3504, 15594, 210

**Cell 5: Load the Pre-trained Model**

In [None]:
# Load the pre-trained model with a classification head
# num_labels=2 tells the model we have two output classes (positive/negative)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

# Move the model to the GPU
model.to(device)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


**Cell 6: Define Training Arguments**

In [None]:
!pip install evaluate



In [None]:
from transformers.trainer_utils import IntervalStrategy

training_args = TrainingArguments(
    output_dir="./results",          # Directory to save the model and logs
    num_train_epochs=3,              # A good starting point for fine-tuning
    per_device_train_batch_size=16,  # How many samples to process at once during training
    per_device_eval_batch_size=16,   # How many samples to process at once during evaluation
    warmup_steps=500,                # Number of steps to warm up the learning rate
    weight_decay=0.01,               # Regularization to prevent overfitting
    logging_dir="./logs",            # Directory for storing logs
    logging_steps=100,               # How often to log the training loss
    save_total_limit=1               # Save only the best model
)

In [None]:
# from datasets import load_metric
import evaluate

# Load the accuracy metric
# metric = load_metric("accuracy")
metric = evaluate.load("accuracy")


# Function to compute metrics
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    return metric.compute(predictions=predictions, references=labels)

Downloading builder script: 0.00B [00:00, ?B/s]

**Cell 7: Create the Trainer and Train!**

In [None]:
# Create the Trainer object
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics, # Add this line to include the metric computation
)

# Start the fine-tuning process!
print("Starting training...")
trainer.train()
print("Training finished!")

Starting training...


Step,Training Loss
100,0.049
200,0.0447
300,0.0433
400,0.0591
500,0.0971
600,0.0977
700,0.0941
800,0.0773
900,0.0579
1000,0.0456


Training finished!


**Cell 8: Evaluate the Fine-Tuned Model**

In [None]:
print("Evaluating the model on the test set...")
evaluation_results = trainer.evaluate()

print("\n--- Evaluation Results ---")
print(f"Accuracy: {evaluation_results['eval_accuracy']:.4f}")
print(f"Loss: {evaluation_results['eval_loss']:.4f}")

Evaluating the model on the test set...



--- Evaluation Results ---
Accuracy: 0.9235
Loss: 0.4828


**Cell 9: Use the Model for a New Prediction**

In [None]:
# In a new Colab cell

from torch.nn.functional import softmax

# Let's test with two different reviews
reviews = [
    "This movie was absolutely fantastic! The acting was brilliant and the plot was engaging.",
    "It was a complete waste of time. The plot was predictable and the characters were boring."
]

# The labels are 0 for 'negative' and 1 for 'positive'
labels = ["Negative", "Positive"]

for review in reviews:
    # Tokenize the new text
    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)

    # Move tensors to the same device as the model (the GPU)
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Get predictions from the model
    with torch.no_grad(): # Disable gradient calculation for inference
        outputs = model(**inputs)
        logits = outputs.logits

        # Apply softmax to convert logits to probabilities
        probabilities = softmax(logits, dim=1)

        # Get the most likely class
        prediction_index = torch.argmax(probabilities, dim=1).item()

    print("\n--------------------")
    print(f"Review: '{review}'")
    print(f"Prediction: {labels[prediction_index]}")
    print(f"Confidence (Probabilities): Negative={probabilities[0][0]:.4f}, Positive={probabilities[0][1]:.4f}")


--------------------
Review: 'This movie was absolutely fantastic! The acting was brilliant and the plot was engaging.'
Prediction: Positive
Confidence (Probabilities): Negative=0.0003, Positive=0.9997

--------------------
Review: 'It was a complete waste of time. The plot was predictable and the characters were boring.'
Prediction: Negative
Confidence (Probabilities): Negative=0.9999, Positive=0.0001
