<a href="https://colab.research.google.com/github/brucefjn/ML-Sentiment-Analysis-/blob/main/sentiment_prediction_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis with Transformer Fine-Tuning

This project builds a sentiment classification system for movie reviews.

Originally, the project began with a classical machine learning approach
(TF-IDF + Logistic Regression).

However, to build a more rigorous and modern system capable of understanding
grammar and contextual meaning, the model was upgraded to a Transformer-based
architecture (DistilBERT).

Final model: Fine-tuned DistilBERT for binary sentiment classification.


## 1. Dataset

The dataset consists of labeled movie reviews:

- `Review`: raw text input
- `Emotion`: sentiment label (positive / negative)

We perform supervised learning using these labeled examples.


In [4]:
!wget -O movie_reviews 'https://drive.google.com/uc?export=view&id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS'

--2026-02-14 00:23:01--  https://drive.google.com/uc?export=view&id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS
Resolving drive.google.com (drive.google.com)... 74.125.130.102, 74.125.130.100, 74.125.130.113, ...
Connecting to drive.google.com (drive.google.com)|74.125.130.102|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS&export=view [following]
--2026-02-14 00:23:01--  https://drive.usercontent.google.com/download?id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS&export=view
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 172.217.194.132, 2404:6800:4003:c04::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|172.217.194.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31994890 (31M) [application/octet-stream]
Saving to: ‘movie_reviews’


2026-02-14 00:23:05 (199 MB/s) - ‘movie_reviews’ saved [31994890/31994890]

In [5]:
## Label Data

import numpy as np
import pandas as pd
data = pd.read_csv('movie_reviews', delimiter=",")
data.head(20)


Unnamed: 0,Review,Emotion
0,this could have been a good episode but i simp...,negative
1,the film is severely awful and is demeaning to...,negative
2,the first 30min of the flick was choppy and ha...,negative
3,went to watch this movie expecting a nothing r...,negative
4,im not sure what dragged me into the cinema to...,negative
5,i had to write a review of this film after rea...,negative
6,having not read the novel i cant tell how fait...,negative
7,hiya folksbr br well this movie sucks really t...,negative
8,the screenwriter poorly attempted to recreate ...,negative
9,on the way back from imc6 san jose california ...,negative


# 2. Dataset Inspection

We inspect:

- Total dataset size
- Class distribution
- Label balance

This ensures the dataset is suitable for supervised learning.


In [6]:
x = data['Review']
y = data['Emotion']

In [7]:
!pip -q install transformers datasets accelerate torch scikit-learn


In [8]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

# 3. Label Encoding

Sentiment labels are mapped to numeric IDs to allow
training using cross-entropy loss.


In [9]:
data = data[["Review", "Emotion"]].dropna()

# Map labels to ids
labels = sorted(data["Emotion"].unique())
label2id = {l: i for i, l in enumerate(labels)}
id2label = {i: l for l, i in label2id.items()}

data["label"] = data["Emotion"].map(label2id)

print("Labels:", labels)
print(data["Emotion"].value_counts())
print("Total rows:", len(data))



Labels: ['negative', 'positive']
Emotion
negative    12496
positive    12492
Name: count, dtype: int64
Total rows: 24988


# 4. Train-Test Split

The dataset is split into:

- 80% training data
- 20% testing data

Stratified sampling preserves class proportions.


In [10]:
train_df, test_df = train_test_split(
    data,
    test_size=0.2,
    random_state=20,
    stratify=data["label"]
)

print("Train:", len(train_df), "Test:", len(test_df))



Train: 19990 Test: 4998


In [11]:
train_ds = Dataset.from_pandas(train_df[["Review", "label"]])
test_ds  = Dataset.from_pandas(test_df[["Review", "label"]])

# 5. Tokenization

Raw text is converted into token IDs using a pretrained
DistilBERT tokenizer.

This enables contextual embedding and grammar-aware modeling.


In [12]:
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize(batch):
    return tokenizer(
        batch["Review"],
        truncation=True,
        padding="max_length",
        max_length=256
    )

train_ds = train_ds.map(tokenize, batched=True)
test_ds  = test_ds.map(tokenize, batched=True)

cols = ["input_ids", "attention_mask", "label"]
train_ds.set_format(type="torch", columns=cols)
test_ds.set_format(type="torch", columns=cols)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/19990 [00:00<?, ? examples/s]

Map:   0%|          | 0/4998 [00:00<?, ? examples/s]

# 6. Model Architecture

We fine-tune DistilBERT, a Transformer-based neural network.

Architecture:

- Pretrained DistilBERT encoder
- Linear classification head
- Softmax output layer

Training objective: minimize cross-entropy loss.


In [13]:
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id
)

def compute_metrics(eval_pred):
    logits, y_true = eval_pred
    y_pred = np.argmax(logits, axis=-1)
    return {
        "accuracy": accuracy_score(y_true, y_pred),
        "f1_macro": f1_score(y_true, y_pred, average="macro")
    }


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/100 [00:00<?, ?it/s]

DistilBertForSequenceClassification LOAD REPORT from: distilbert-base-uncased
Key                     | Status     | 
------------------------+------------+-
vocab_layer_norm.bias   | UNEXPECTED | 
vocab_transform.weight  | UNEXPECTED | 
vocab_projector.bias    | UNEXPECTED | 
vocab_layer_norm.weight | UNEXPECTED | 
vocab_transform.bias    | UNEXPECTED | 
pre_classifier.bias     | MISSING    | 
pre_classifier.weight   | MISSING    | 
classifier.bias         | MISSING    | 
classifier.weight       | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


In [15]:
import transformers, inspect
from transformers import TrainingArguments

print("transformers version:", transformers.__version__)
print("transformers file:", transformers.__file__)
print("TrainingArguments init params contains evaluation_strategy?",
      "evaluation_strategy" in str(inspect.signature(TrainingArguments.__init__)))
print("TrainingArguments init params contains eval_strategy?",
      "eval_strategy" in str(inspect.signature(TrainingArguments.__init__)))


transformers version: 5.0.0
transformers file: /usr/local/lib/python3.12/dist-packages/transformers/__init__.py
TrainingArguments init params contains evaluation_strategy? False
TrainingArguments init params contains eval_strategy? True


# 7. Training Configuration

Hyperparameters:

- Learning rate: 2e-5
- Batch size: 16
- Epochs: 2
- Optimizer: AdamW
- Metric: F1 Macro

GPU acceleration is used for efficient fine-tuning.


In [16]:
args = TrainingArguments(
    output_dir="sentiment_bert",
    eval_strategy="epoch",     # <-- FIXED
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    report_to="none"
)


In [17]:
from transformers import DataCollatorWithPadding, Trainer

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# 8. Model Training & Evaluation

The pretrained transformer is fine-tuned
on the labeled movie review dataset.

Final performance metrics:

- Accuracy ≈ 91%
- F1 Macro ≈ 91%

The model demonstrates strong generalization performance.


In [18]:
trainer.train()
trainer.evaluate()


Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro
1,0.284008,0.247274,0.908363,0.908354
2,0.17289,0.268384,0.913966,0.913965


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

There were missing keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.weight', 'distilbert.embeddings.LayerNorm.bias'].
There were unexpected keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.beta', 'distilbert.embeddings.LayerNorm.gamma'].


{'eval_loss': 0.26838356256484985,
 'eval_accuracy': 0.9139655862344938,
 'eval_f1_macro': 0.9139652418199846,
 'eval_runtime': 40.8912,
 'eval_samples_per_second': 122.227,
 'eval_steps_per_second': 7.654,
 'epoch': 2.0}

 # 9. Example of Prediction

In [33]:
import torch
import torch.nn.functional as F

def predict_with_proba(texts):
    if isinstance(texts, str):
        texts = [texts]

    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=256)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    model.eval()
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1).cpu().numpy()

    preds = probs.argmax(axis=1)

    for i, text in enumerate(texts):
        pred_label = id2label[int(preds[i])]
        prob_dict = {id2label[j]: float(probs[i][j]) for j in range(len(labels))}
        print("\nTEXT:", text)
        print("PRED:", pred_label)
        print("PROBS:", prob_dict)

predict_with_proba([
    "This movie is not as good.",
    "Absolutely fantastic — I loved it."
])



TEXT: This movie is not as good.
PRED: negative
PROBS: {'negative': 0.9914933443069458, 'positive': 0.008506668731570244}

TEXT: Absolutely fantastic — I loved it.
PRED: positive
PROBS: {'negative': 0.00499365059658885, 'positive': 0.9950063228607178}


# 10. Interactive Prediction Demo

Users can input custom movie reviews
to observe real-time sentiment predictions
and confidence scores.


In [34]:
import torch
import torch.nn.functional as F

def predict_texts(texts):
    if isinstance(texts, str):
        texts = [texts]

    inputs = tokenizer(
        texts,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=256
    )

    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    model.eval()
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1).cpu().numpy()

    for text, p in zip(texts, probs):
        pred_id = int(p.argmax())
        pred_label = id2label[pred_id]

        print("\n" + "="*60)
        print("TEXT:", text)
        print("PREDICTION:", pred_label)
        print("CONFIDENCE:", round(float(p[pred_id]), 4))
        print("ALL PROBS:")
        for i in range(len(p)):
            print(f"  {id2label[i]}: {round(float(p[i]), 4)}")
        print("="*60)


# -----------------------------
# Feel free edit this line ↓↓↓
# -----------------------------

user_input = input("Enter a movie review sentence: ")

predict_texts(user_input)


Enter a movie review sentence: i love it? no i don't.

TEXT: i love it? no i don't.
PREDICTION: positive
CONFIDENCE: 0.9094
ALL PROBS:
  negative: 0.0906
  positive: 0.9094


# 10. Error Analysis

We examine misclassified examples to understand
model limitations.

Observed challenges:

- Comparative sentiment
- Sarcasm
- Pragmatic reversal
- Contrastive structures


In [40]:
predict_texts([
    "I love it.",
    "I love it. Not.",
    "Great movie... said no one ever.",
    "This was amazing... until it wasn't.",
    "This movie is not as good as Titanic; it's already good enough.",
    "I thought it was good; it was actually terrible."
])


TEXT: I love it.
PREDICTION: positive
CONFIDENCE: 0.9927
ALL PROBS:
  negative: 0.0073
  positive: 0.9927

TEXT: I love it. Not.
PREDICTION: positive
CONFIDENCE: 0.8564
ALL PROBS:
  negative: 0.1436
  positive: 0.8564

TEXT: Great movie... said no one ever.
PREDICTION: positive
CONFIDENCE: 0.9921
ALL PROBS:
  negative: 0.0079
  positive: 0.9921

TEXT: This was amazing... until it wasn't.
PREDICTION: positive
CONFIDENCE: 0.9229
ALL PROBS:
  negative: 0.0771
  positive: 0.9229

TEXT: This movie is not as good as Titanic; it's already good enough
PREDICTION: negative
CONFIDENCE: 0.9789
ALL PROBS:
  negative: 0.9789
  positive: 0.0211

TEXT: I thought it was good; it was actually terrible.
PREDICTION: negative
CONFIDENCE: 0.8819
ALL PROBS:
  negative: 0.8819
  positive: 0.1181
