<a href="https://colab.research.google.com/github/brucefjn/ML-Sentiment-Analysis-/blob/main/sentiment_prediction_system_(4).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis with Transformer Fine-Tuning

This project builds a sentiment classification system for movie reviews.

Originally, the project began with a classical machine learning approach
(TF-IDF + Logistic Regression).

However, to build a more rigorous and modern system capable of understanding
grammar and contextual meaning, the model was upgraded to a Transformer-based
architecture (DistilBERT).

Final model: Fine-tuned DistilBERT for binary sentiment classification.


## 1. Dataset

The dataset consists of labeled movie reviews:

- `Review`: raw text input
- `Emotion`: sentiment label (positive / negative)

We perform supervised learning using these labeled examples.


In [None]:
!wget -O movie_reviews 'https://drive.google.com/uc?export=view&id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS'

--2026-02-14 01:43:49--  https://drive.google.com/uc?export=view&id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS
Resolving drive.google.com (drive.google.com)... 74.125.130.101, 74.125.130.138, 74.125.130.113, ...
Connecting to drive.google.com (drive.google.com)|74.125.130.101|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS&export=view [following]
--2026-02-14 01:43:49--  https://drive.usercontent.google.com/download?id=1kWs6yOYpdjVr-liLPs4PKIs1qSpIohzS&export=view
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 172.217.194.132, 2404:6800:4003:c04::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|172.217.194.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31994890 (31M) [application/octet-stream]
Saving to: ‘movie_reviews’


2026-02-14 01:43:54 (119 MB/s) - ‘movie_reviews’ saved [31994890/31994890]

In [None]:
## Label Data

import numpy as np
import pandas as pd
data = pd.read_csv('movie_reviews', delimiter=",")
data.head(20)


Unnamed: 0,Review,Emotion
0,this could have been a good episode but i simp...,negative
1,the film is severely awful and is demeaning to...,negative
2,the first 30min of the flick was choppy and ha...,negative
3,went to watch this movie expecting a nothing r...,negative
4,im not sure what dragged me into the cinema to...,negative
5,i had to write a review of this film after rea...,negative
6,having not read the novel i cant tell how fait...,negative
7,hiya folksbr br well this movie sucks really t...,negative
8,the screenwriter poorly attempted to recreate ...,negative
9,on the way back from imc6 san jose california ...,negative


# 2. Dataset Inspection

We inspect:

- Total dataset size
- Class distribution
- Label balance

This ensures the dataset is suitable for supervised learning.


In [None]:
x = data['Review']
y = data['Emotion']

In [None]:
!pip -q install transformers datasets accelerate torch scikit-learn


In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer

# 3. Label Encoding

Sentiment labels are mapped to numeric IDs to allow
training using cross-entropy loss.


In [None]:
data = data[["Review", "Emotion"]].dropna()

# Map labels to ids
labels = sorted(data["Emotion"].unique())
label2id = {l: i for i, l in enumerate(labels)}
id2label = {i: l for l, i in label2id.items()}

data["label"] = data["Emotion"].map(label2id)

print("Labels:", labels)
print(data["Emotion"].value_counts())
print("Total rows:", len(data))



Labels: ['negative', 'positive']
Emotion
negative    12496
positive    12492
Name: count, dtype: int64
Total rows: 24988


# 4. Train-Test Split

The dataset is split into:

- 80% training data
- 20% testing data

Stratified sampling preserves class proportions.


In [None]:
train_df, test_df = train_test_split(
    data,
    test_size=0.2,
    random_state=20,
    stratify=data["label"]
)

print("Train:", len(train_df), "Test:", len(test_df))



Train: 19990 Test: 4998


In [None]:
train_ds = Dataset.from_pandas(train_df[["Review", "label"]])
test_ds  = Dataset.from_pandas(test_df[["Review", "label"]])

# 5. Tokenization

Raw text is converted into token IDs using a pretrained
DistilBERT tokenizer.

This enables contextual embedding and grammar-aware modeling.


In [None]:
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize(batch):
    return tokenizer(
        batch["Review"],
        truncation=True,
        padding="max_length",
        max_length=256
    )

train_ds = train_ds.map(tokenize, batched=True)
test_ds  = test_ds.map(tokenize, batched=True)

cols = ["input_ids", "attention_mask", "label"]
train_ds.set_format(type="torch", columns=cols)
test_ds.set_format(type="torch", columns=cols)


Map:   0%|          | 0/19990 [00:00<?, ? examples/s]

Map:   0%|          | 0/4998 [00:00<?, ? examples/s]

# 6. Model Architecture

We fine-tune DistilBERT, a Transformer-based neural network.

Architecture:

- Pretrained DistilBERT encoder
- Linear classification head
- Softmax output layer

Training objective: minimize cross-entropy loss.


In [None]:
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id
)

def compute_metrics(eval_pred):
    logits, y_true = eval_pred
    y_pred = np.argmax(logits, axis=-1)
    return {
        "accuracy": accuracy_score(y_true, y_pred),
        "f1_macro": f1_score(y_true, y_pred, average="macro")
    }




Loading weights:   0%|          | 0/100 [00:00<?, ?it/s]

DistilBertForSequenceClassification LOAD REPORT from: distilbert-base-uncased
Key                     | Status     | 
------------------------+------------+-
vocab_layer_norm.bias   | UNEXPECTED | 
vocab_transform.weight  | UNEXPECTED | 
vocab_projector.bias    | UNEXPECTED | 
vocab_layer_norm.weight | UNEXPECTED | 
vocab_transform.bias    | UNEXPECTED | 
pre_classifier.bias     | MISSING    | 
pre_classifier.weight   | MISSING    | 
classifier.bias         | MISSING    | 
classifier.weight       | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


In [None]:
import transformers, inspect
from transformers import TrainingArguments

print("transformers version:", transformers.__version__)
print("transformers file:", transformers.__file__)
print("TrainingArguments init params contains evaluation_strategy?",
      "evaluation_strategy" in str(inspect.signature(TrainingArguments.__init__)))
print("TrainingArguments init params contains eval_strategy?",
      "eval_strategy" in str(inspect.signature(TrainingArguments.__init__)))


transformers version: 5.0.0
transformers file: /usr/local/lib/python3.12/dist-packages/transformers/__init__.py
TrainingArguments init params contains evaluation_strategy? False
TrainingArguments init params contains eval_strategy? True


# 7. Training Configuration

Hyperparameters:

- Learning rate: 2e-5
- Batch size: 16
- Epochs: 2
- Optimizer: AdamW
- Metric: F1 Macro

GPU acceleration is used for efficient fine-tuning.


In [None]:
args = TrainingArguments(
    output_dir="sentiment_bert",
    eval_strategy="epoch",     # <-- FIXED
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    load_best_model_at_end=True,
    metric_for_best_model="f1_macro",
    report_to="none"
)


In [None]:
from transformers import DataCollatorWithPadding, Trainer

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# 8. Model Training & Evaluation

The pretrained transformer is fine-tuned
on the labeled movie review dataset.

Final performance metrics:

- Accuracy ≈ 91%
- F1 Macro ≈ 91%

The model demonstrates strong generalization performance.


In [None]:
trainer.train()
trainer.evaluate()


Epoch,Training Loss,Validation Loss,Accuracy,F1 Macro
1,0.28609,0.259557,0.906763,0.906739
2,0.174237,0.264505,0.913766,0.913764


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

There were missing keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.weight', 'distilbert.embeddings.LayerNorm.bias'].
There were unexpected keys in the checkpoint model loaded: ['distilbert.embeddings.LayerNorm.beta', 'distilbert.embeddings.LayerNorm.gamma'].


{'eval_loss': 0.2645047903060913,
 'eval_accuracy': 0.913765506202481,
 'eval_f1_macro': 0.9137639837814617,
 'eval_runtime': 41.0904,
 'eval_samples_per_second': 121.634,
 'eval_steps_per_second': 7.617,
 'epoch': 2.0}

 # 9. Example of Prediction

In [None]:
import torch
import torch.nn.functional as F

def predict_with_proba(texts):
    if isinstance(texts, str):
        texts = [texts]

    inputs = tokenizer(texts, return_tensors="pt", truncation=True, padding=True, max_length=256)
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    model.eval()
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1).cpu().numpy()

    preds = probs.argmax(axis=1)

    for i, text in enumerate(texts):
        pred_label = id2label[int(preds[i])]
        prob_dict = {id2label[j]: float(probs[i][j]) for j in range(len(labels))}
        print("\nTEXT:", text)
        print("PRED:", pred_label)
        print("PROBS:", prob_dict)

predict_with_proba([
    "This movie is not as good.",
    "Absolutely fantastic — I loved it."
])



TEXT: This movie is not as good.
PRED: negative
PROBS: {'negative': 0.9902251362800598, 'positive': 0.009774923324584961}

TEXT: Absolutely fantastic — I loved it.
PRED: positive
PROBS: {'negative': 0.008070720359683037, 'positive': 0.991929292678833}


# 10. Interactive Prediction Demo

Users can input custom movie reviews
to observe real-time sentiment predictions
and confidence scores.


In [None]:
import torch
import torch.nn.functional as F

def predict_texts(texts):
    if isinstance(texts, str):
        texts = [texts]

    inputs = tokenizer(
        texts,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=256
    )

    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    model.eval()
    with torch.no_grad():
        logits = model(**inputs).logits
        probs = F.softmax(logits, dim=-1).cpu().numpy()

    for text, p in zip(texts, probs):
        pred_id = int(p.argmax())
        pred_label = id2label[pred_id]

        print("\n" + "="*60)
        print("TEXT:", text)
        print("PREDICTION:", pred_label)
        print("CONFIDENCE:", round(float(p[pred_id]), 4))
        print("ALL PROBS:")
        for i in range(len(p)):
            print(f"  {id2label[i]}: {round(float(p[i]), 4)}")
        print("="*60)


# -----------------------------
# Feel free edit this line ↓↓↓
# -----------------------------

user_input = input("Enter a movie review sentence: ")

predict_texts(user_input)


Enter a movie review sentence: This is a relatively decent one.

TEXT: This is a relatively decent one.
PREDICTION: positive
CONFIDENCE: 0.7926
ALL PROBS:
  negative: 0.2074
  positive: 0.7926


# 10. Error Analysis

We examine misclassified examples to understand
model limitations.

Observed challenges:

- Comparative sentiment
- Sarcasm
- Pragmatic reversal
- Contrastive structures


In [None]:
predict_texts([
    "I love it.",
    "I love it. Not.",
    "Great movie... said no one ever.",
    "This was amazing... until it wasn't.",
    "This movie is not as good as Titanic; it's already good enough.",
    "I thought it was good; it was actually terrible."
])


TEXT: I love it.
PREDICTION: positive
CONFIDENCE: 0.9877
ALL PROBS:
  negative: 0.0123
  positive: 0.9877

TEXT: I love it. Not.
PREDICTION: positive
CONFIDENCE: 0.823
ALL PROBS:
  negative: 0.177
  positive: 0.823

TEXT: Great movie... said no one ever.
PREDICTION: positive
CONFIDENCE: 0.9909
ALL PROBS:
  negative: 0.0091
  positive: 0.9909

TEXT: This was amazing... until it wasn't.
PREDICTION: positive
CONFIDENCE: 0.8426
ALL PROBS:
  negative: 0.1574
  positive: 0.8426

TEXT: This movie is not as good as Titanic; it's already good enough.
PREDICTION: negative
CONFIDENCE: 0.9868
ALL PROBS:
  negative: 0.9868
  positive: 0.0132

TEXT: I thought it was good; it was actually terrible.
PREDICTION: negative
CONFIDENCE: 0.9113
ALL PROBS:
  negative: 0.9113
  positive: 0.0887
