#Fine-tune BERT on SST-2 (GLUE)
This notebook fine-tunes **BERT (bert-base-uncased)** on the **SST-2 dataset** (sentiment classification) using Hugging Face Transformers + Datasets + Trainer API.

## 1. Setup GPU and Install Dependencies

In [1]:
# Check GPU
!nvidia-smi

# Install required libraries
!pip install -q transformers datasets evaluate accelerate scikit-learn

Fri Aug 29 18:57:02 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   68C    P8             12W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## 2. Import Libraries

In [2]:
import numpy as np
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer,
    set_seed,
)
import evaluate

## 3. Load Dataset (GLUE - SST-2)

In [3]:
dataset = load_dataset("glue", "sst2")
print(dataset)
print(dataset["train"][0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})
{'sentence': 'hide new secretions from the parental units ', 'label': 0, 'idx': 0}


## 4. Load Tokenizer & Model

In [4]:
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 5. Preprocess (Tokenization)

In [5]:
max_length = 128

def tokenize_fn(batch):
    return tokenizer(batch["sentence"], truncation=True, max_length=max_length)

tokenized_dataset = dataset.map(tokenize_fn, batched=True, remove_columns=["sentence"])
tokenized_dataset

Map:   0%|          | 0/1821 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 872
    })
    test: Dataset({
        features: ['label', 'idx', 'input_ids', 'token_type_ids', 'attention_mask'],
        num_rows: 1821
    })
})

In [6]:
print(set(tokenized_dataset["train"]["label"]))
print(set(tokenized_dataset["validation"]["label"]))


{0, 1}
{0, 1}


## 6.  Data Collator (Dynamic Padding)

In [7]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

## 7.  Metrics (Accuracy + F1)

In [8]:
accuracy = evaluate.load("accuracy")
f1 = evaluate.load("f1")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {
        "accuracy": accuracy.compute(predictions=preds, references=labels)["accuracy"],
        "f1": f1.compute(predictions=preds, references=labels)["f1"],
    }

## 8.  TrainingArguments

In [9]:
training_args = TrainingArguments(
    output_dir="./bert-sst2",
    eval_strategy="epoch", # Updated argument name
    save_strategy="epoch",
    logging_strategy="steps",
    logging_steps=50,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=3,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_ratio=0.06,
    fp16=False,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    greater_is_better=True,
    save_total_limit=2,
    report_to=["none"],
    seed=42,
)

## 9. Trainer Setup

In [10]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

  trainer = Trainer(


In [11]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.1648,0.221996,0.927752,0.928974
2,0.107,0.263438,0.918578,0.922234
3,0.0682,0.302473,0.924312,0.926667


TrainOutput(global_step=6315, training_loss=0.1405047998481201, metrics={'train_runtime': 1562.7522, 'train_samples_per_second': 129.289, 'train_steps_per_second': 4.041, 'total_flos': 4176656240000220.0, 'train_loss': 0.1405047998481201, 'epoch': 3.0})

## 10. Evaluate

In [12]:
val_results = trainer.evaluate(tokenized_dataset["validation"])
print("Validation Results:", val_results)

try:
    test_results = trainer.evaluate(tokenized_dataset["test"])
    print("Test Results:", test_results)
except:
    print("Test set may not have labels.")

Validation Results: {'eval_loss': 0.22199611365795135, 'eval_accuracy': 0.9277522935779816, 'eval_f1': 0.9289740698985344, 'eval_runtime': 2.692, 'eval_samples_per_second': 323.923, 'eval_steps_per_second': 5.201, 'epoch': 3.0}
Test set may not have labels.


## 11. Quick Inference Demo

In [18]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_path = "/content/bert-sst2/checkpoint-6315"  # your folder with saved checkpoint
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(model_path)


In [29]:
import pandas as pd
test_cases = [
    {"text": "I love you", "label": "Positive"},
    {"text": "I hate you", "label": "Negative"},
    {"text": "I hate the selfishness in you", "label": "Negative"},
    {"text": "I hate anyone hurts you", "label": "Positive"},
    {"text": "I hate anyone hurting you", "label": "Positive"},
    {"text": "I hate anyone hurting you, you are my partner", "label": "Positive"},
    {"text": "I hate anyone hurting you, you are my love", "label": "Positive"},
    {"text": "I like rude people", "label": "Negative"},
    {"text": "I don't like rude people", "label": "Negative"},
    {"text": "I hate polite people", "label": "Negative"},
    {"text": "I don't hate polite people", "label": "Positive"},
]

test_cases_df = pd.DataFrame(test_cases)
test_cases_df

Unnamed: 0,text,label
0,I love you,Positive
1,I hate you,Negative
2,I hate the selfishness in you,Negative
3,I hate anyone hurts you,Positive
4,I hate anyone hurting you,Positive
5,"I hate anyone hurting you, you are my partner",Positive
6,"I hate anyone hurting you, you are my love",Positive
7,I like rude people,Negative
8,I don't like rude people,Negative
9,I hate polite people,Negative


In [31]:
import torch

# Load model & tokenizer
# Force device to CPU to avoid CUDA error
device = torch.device("cpu")

# Load the model directly to CPU
model = AutoModelForSequenceClassification.from_pretrained(model_path, num_labels=2).to(device)

# Verify model device
print(f"Model is on device: {next(model.parameters()).device}")


# Tokenize
inputs = tokenizer(
    test_cases_df['text'].tolist(), # Convert pandas Series to a list
    return_tensors="pt",
    padding=True,
    truncation=True,
    max_length=128
)

# Move tensors to the same device as the model (CPU)
inputs = {k: v.to(device) for k, v in inputs.items()}

# Inference
with torch.no_grad():
    outputs = model(**inputs).logits
predictions = torch.argmax(outputs, dim=-1).tolist()

# Map to labels
label_map = {0: "Negative", 1: "Positive"}
pred_labels = [label_map[p] for p in predictions]

# Add new column
test_cases_df["pred"] = pred_labels


Model is on device: cpu


In [32]:
test_cases_df.head(10)

Unnamed: 0,text,label,pred
0,I love you,Positive,Positive
1,I hate you,Negative,Negative
2,I hate the selfishness in you,Negative,Negative
3,I hate anyone hurts you,Positive,Positive
4,I hate anyone hurting you,Positive,Negative
5,"I hate anyone hurting you, you are my partner",Positive,Positive
6,"I hate anyone hurting you, you are my love",Positive,Positive
7,I like rude people,Negative,Negative
8,I don't like rude people,Negative,Negative
9,I hate polite people,Negative,Negative
