<a href="https://colab.research.google.com/github/KainatAzhar/Text-Classification/blob/main/models/bert_sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning BERT for Sentiment Analysis

This notebook demonstrates how to fine-tune a pre-trained BERT model for a text classification task using the IMDb movie review dataset.

## 1. Setup and Library Imports

In [1]:
import torch
from datasets import load_dataset
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

## 2. Load and Prepare the Dataset

We use the `datasets` library from Hugging Face to easily load the IMDb dataset. We'll then create a smaller subset for faster training as a demonstration.

In [2]:
# Load the IMDb dataset
dataset = load_dataset('imdb')

# For demonstration purposes, let's use a smaller subset of the data
train_dataset = dataset['train'].shuffle(seed=42).select(range(2000)) # 2000 examples for training
test_dataset = dataset['test'].shuffle(seed=42).select(range(500))   # 500 examples for testing

print("Training data sample:", train_dataset[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

plain_text/train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

plain_text/test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

plain_text/unsupervised-00000-of-00001.p(…):   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Training data sample: {'text': 'There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier\'s plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it\'s the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all...', 'label': 1}


## 3. Tokenization

We need to tokenize the text data into a format that BERT can understand. We'll use the tokenizer corresponding to the 'bert-base-uncased' model.

In [3]:
# Load the tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Create a tokenization function
def tokenize_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True)

# Apply the tokenization to our datasets
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True)
tokenized_test_dataset = test_dataset.map(tokenize_function, batched=True)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

## 4. Model Fine-Tuning

Now we load the pre-trained BERT model and set up the `Trainer` API to fine-tune it on our specific task.

In [4]:
# Disable Weights & Biases integration
%env WANDB_DISABLED=true

env: WANDB_DISABLED=true


In [5]:
# Load the pre-trained model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Define a function to compute metrics for evaluation
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

# Define the training arguments with the corrected parameter name
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    eval_strategy="epoch"  # Corrected argument name
)

# Create the Trainer instance
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    compute_metrics=compute_metrics
)

# Fine-tune the model
trainer.train()

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3061,0.319361,0.884,0.888462,0.843066,0.939024


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3061,0.319361,0.884,0.888462,0.843066,0.939024
2,0.3702,0.317945,0.894,0.891616,0.897119,0.886179
3,0.0072,0.465702,0.886,0.888454,0.856604,0.922764


TrainOutput(global_step=750, training_loss=0.33718284143010774, metrics={'train_runtime': 648.8316, 'train_samples_per_second': 9.247, 'train_steps_per_second': 1.156, 'total_flos': 1578666332160000.0, 'train_loss': 0.33718284143010774, 'epoch': 3.0})

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 5. Evaluation

Let's evaluate the final performance of our fine-tuned model on the test set.

In [7]:
print("Final evaluation on the test set:")
trainer.evaluate()

Final evaluation on the test set:


{'eval_loss': 0.4657016098499298,
 'eval_accuracy': 0.886,
 'eval_f1': 0.8884540117416829,
 'eval_precision': 0.8566037735849057,
 'eval_recall': 0.9227642276422764,
 'eval_runtime': 15.7914,
 'eval_samples_per_second': 31.663,
 'eval_steps_per_second': 3.99,
 'epoch': 3.0}

## 6. Save the Model and Tokenizer

We'll save our fine-tuned model so we can use it for inference later.

In [8]:
model_save_path = "../models/fine-tuned-bert"
trainer.save_model(model_save_path)
tokenizer.save_pretrained(model_save_path)
print(f"Model saved to {model_save_path}")

Model saved to ../models/fine-tuned-bert


## 7. Inference on New Text

Finally, let's use our saved model to predict the sentiment of a new movie review.

In [9]:
from transformers import pipeline

# Load the fine-tuned model using a pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model=model_save_path, tokenizer=model_save_path)

# Test with a positive review
positive_review = "This movie was absolutely fantastic! The acting was superb and the plot was gripping."
result_pos = sentiment_analyzer(positive_review)
print(f"Review: '{positive_review}'")
print(f"Prediction: {result_pos}")

# Test with a negative review
negative_review = "I was really disappointed with this film. It was boring and the ending was predictable."
result_neg = sentiment_analyzer(negative_review)
print(f"\nReview: '{negative_review}'")
print(f"Prediction: {result_neg}")

Device set to use cuda:0


Review: 'This movie was absolutely fantastic! The acting was superb and the plot was gripping.'
Prediction: [{'label': 'LABEL_1', 'score': 0.9983087778091431}]

Review: 'I was really disappointed with this film. It was boring and the ending was predictable.'
Prediction: [{'label': 'LABEL_0', 'score': 0.9972695708274841}]
