# 🔬 Fine-Tuning BERT on SST-2 for Sentiment Analysis

## 🧪 Objective

To fine-tune the pre-trained BERT model on the SST-2 dataset (Stanford Sentiment Treebank) for binary sentiment classification using Hugging Face Transformers and PyTorch.

In [None]:
# ⚙️ Set up a new virtual environment and activate it in Colab

# Step 1: Install virtualenv if not already available
!pip3 install virtualenv

# Step 2: Create a new virtual environment named 'venv'
!virtualenv venv

# Step 3: Activate the virtual environment and install required packages
# Note: Colab doesn't support persistent venv activation across cells,
# so we'll use the venv's Python binary directly in future cells.

# Example: Install packages using the venv's pip
!./venv/bin/pip3 install torch transformers datasets evaluate scikit-learn tensorflow faker tqdm babel matplotlib

# ✅ From now on, run Python scripts using the venv's Python:
# !./venv/bin/python your_script.py


Traceback (most recent call last):
  File "/usr/bin/pip", line 11, in <module>
    load_entry_point('pip==20.0.2', 'console_scripts', 'pip')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 490, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2854, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2445, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2451, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/main.py", line 10, in <module>
    from pip._internal.cli.autocompletion import autocomplete
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/autocompletion.py", line 9, in <module>
    from pip._internal.cli.main_parser import create_ma

In [None]:
#uncomment and run these lines if you get some errors, or if you don't want to use the virtual environment, or if you are running on colab

#!pip3 install --upgrade pip3

#!pip3 install transformers datasets evaluate scikit-learn --quiet
#
#!pip install transformers datasets evaluate scikit-learn --quiet

Traceback (most recent call last):
  File "/usr/bin/pip3", line 11, in <module>
    load_entry_point('pip==20.0.2', 'console_scripts', 'pip3')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 490, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2854, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2445, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2451, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/main.py", line 10, in <module>
    from pip._internal.cli.autocompletion import autocomplete
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/autocompletion.py", line 9, in <module>
    from pip._internal.cli.main_parser import create_

In [None]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import numpy as np
import evaluate
from sklearn.metrics import classification_report

ModuleNotFoundError: No module named 'evaluate'

In [None]:
# Load the SST-2 dataset from GLUE
dataset = load_dataset("glue", "sst2", download_mode="force_redownload")
dataset["train"][0]

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

{'sentence': 'hide new secretions from the parental units ',
 'label': 0,
 'idx': 0}

In [None]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(example):
    return tokenizer(example["sentence"], truncation=True, padding="max_length", max_length=128)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Map:   0%|          | 0/1821 [00:00<?, ? examples/s]

In [None]:
tokenized_datasets.set_format("torch", columns=["input_ids", "attention_mask", "label"])

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
training_args = TrainingArguments(
    output_dir="./results_sst2",
    save_strategy="epoch",
    eval_strategy="epoch",
    logging_dir="./logs_sst2",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)

In [None]:
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy.compute(predictions=predictions, references=labels)

Downloading builder script: 0.00B [00:00, ?B/s]

In [16]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",                 # where to save model checkpoints
    eval_strategy="epoch",
    logging_dir="./logs",                  # where to save logs
    report_to="none",                      # disables W&B and other loggers
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    logging_steps=10,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(50)),
    eval_dataset=tokenized_datasets["validation"].select(range(100)),
    compute_metrics=compute_metrics,
)

trainer.train()


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.821345,0.52
2,0.670000,0.726181,0.52
3,0.462600,0.720317,0.51


TrainOutput(global_step=21, training_loss=0.5543216835884821, metrics={'train_runtime': 406.6265, 'train_samples_per_second': 0.369, 'train_steps_per_second': 0.052, 'total_flos': 9866664576000.0, 'train_loss': 0.5543216835884821, 'epoch': 3.0})

In [17]:
# Evaluate and generate classification report
preds_output = trainer.predict(tokenized_datasets["validation"].select(range(100)))
y_pred = np.argmax(preds_output.predictions, axis=1)
y_true = preds_output.label_ids

print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

           0       0.33      0.02      0.04        48
           1       0.52      0.96      0.67        52

    accuracy                           0.51       100
   macro avg       0.42      0.49      0.36       100
weighted avg       0.43      0.51      0.37       100



In [18]:
model.save_pretrained("./finetuned-bert-sst2")
tokenizer.save_pretrained("./finetuned-bert-sst2")

('./finetuned-bert-sst2/tokenizer_config.json',
 './finetuned-bert-sst2/special_tokens_map.json',
 './finetuned-bert-sst2/vocab.txt',
 './finetuned-bert-sst2/added_tokens.json')

In [None]:
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return "positive" if torch.argmax(probs) == 1 else "negative"

# Example inference
predict_sentiment("An absolutely wonderful film.")

'positive'

## ✅ Conclusion

- Fine-tuned `bert-base-uncased` on the SST-2 sentiment classification dataset.
- Demonstrated training, evaluation, and inference.
- Saved the model and tokenizer for future use.