# Sentiment Analysis — DistilBERT on IMDB
**TL;DR:** Classify IMDB movie reviews with a CPU-first DistilBERT pipeline and prep for LoRA fine-tuning.

**Models & Datasets:** [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (Apache-2.0), [IMDB](https://huggingface.co/datasets/imdb) (CC BY-NC 4.0)
**Run Profiles:** 🖥️ CPU | 🍎 Metal (Apple Silicon) | 🧪 Colab/T4 | ⚡ CUDA GPU
**Env (minimal):** python>=3.10, transformers, datasets, evaluate, accelerate (optional: peft, bitsandbytes, timm, diffusers)
**Colab:** [Open in Colab](https://colab.research.google.com/github/SSusantAchary/Hands-On-Huggingface-AI-Models/blob/main/notebooks/nlp/sentiment-distilbert-imdb_cpu-first.ipynb)

**Switches (edit in one place):**
- `device` = {"cpu","mps","cuda"}
- `precision` = {"fp32","fp16","bf16","int8","4bit"}  (apply only if supported)
- `context_len` / `image_res` / `batch_size`

**Footprint & Speed (fill after run):**
- Peak RAM: TODO
- Peak VRAM: TODO (if GPU)
- TTFB: TODO, Throughput: TODO, Load time: TODO

**Gotchas:** Metal backend falls back to CPU if MPS unavailable ([Fixes & Tips](../fixes-and-tips/metal-backend-fallback.md))



## Setup
Configure device toggles, load a small IMDB slice, and prepare utility helpers.


In [None]:

import json
import os
import subprocess
import time
from pathlib import Path

import pandas as pd
import torch
from datasets import load_dataset
from evaluate import load as load_metric
from transformers import pipeline

from notebooks._templates.measure import append_benchmark_row, measure_memory_speed

DEVICE_PREFERENCE = os.environ.get("HF_DEVICE", "cpu")
PRECISION = os.environ.get("HF_PRECISION", "fp32")
BATCH_SIZE = int(os.environ.get("HF_BATCH", "4"))

def resolve_device(preference: str = "cpu") -> str:
    if preference == "cuda" and torch.cuda.is_available():
        return "cuda:0"
    if preference == "mps" and torch.backends.mps.is_available():
        return "mps"
    return "cpu"

DEVICE = resolve_device(DEVICE_PREFERENCE)
print(f"Using device={DEVICE} (precision={PRECISION})")

DATASET_ID = "imdb"
MODEL_ID = "distilbert-base-uncased-finetuned-sst-2-english"
OUTPUT_DIR = Path("outputs") / "sentiment-distilbert-imdb"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

dataset = load_dataset(DATASET_ID, split="test[:16]")
sample = dataset.shuffle(seed=42).select(range(BATCH_SIZE))
texts = sample["text"]
labels = sample["label"]

label_map = {0: "NEGATIVE", 1: "POSITIVE"}
print(f"Loaded {len(texts)} samples for smoke run.")


## Inference & Evaluation


In [None]:

torch.manual_seed(42)

load_start = time.perf_counter()
classifier = pipeline(
    "text-classification",
    model=MODEL_ID,
    device=DEVICE,
    top_k=None,
    padding=True,
    truncation=True,
    batch_size=BATCH_SIZE,
    return_all_scores=True,
)
load_time = time.perf_counter() - load_start

all_scores = classifier(texts)
predictions = []
for idx, scores in enumerate(all_scores):
    sorted_scores = sorted(scores, key=lambda item: item["score"], reverse=True)
    top = sorted_scores[0]
    predictions.append(
        {
            "text": texts[idx][:120].replace("\n", " "),
            "true_label": label_map[labels[idx]],
            "pred_label": top["label"],
            "pred_score": round(top["score"], 4),
            "neg_prob": round(sorted_scores[0]["score"] if sorted_scores[0]["label"] == "NEGATIVE" else sorted_scores[1]["score"], 4),
            "pos_prob": round(sorted_scores[0]["score"] if sorted_scores[0]["label"] == "POSITIVE" else sorted_scores[1]["score"], 4),
        }
    )

df = pd.DataFrame(predictions)
display(df)

roc_auc = load_metric("roc_auc")
f1_metric = load_metric("f1")
preds = [0 if row["pred_label"] == "NEGATIVE" else 1 for row in predictions]
roc_score = roc_auc.compute(
    references=labels,
    prediction_scores=[row["pos_prob"] for row in predictions],
)["roc_auc"]
f1_score = f1_metric.compute(predictions=preds, references=labels)["f1"]

print(f"ROC-AUC: {roc_score:.3f} | F1: {f1_score:.3f}")

predictions_path = OUTPUT_DIR / "predictions.csv"
df.to_csv(predictions_path, index=False)
print(f"Saved predictions to {predictions_path}")


## Measurement


In [None]:

def run_inference(recorder):
    outputs = classifier(texts, batch_size=BATCH_SIZE, truncation=True, padding=True)
    if outputs:
        recorder.mark_first_token()
    recorder.add_items(len(outputs))

metrics = measure_memory_speed(run_inference)

def fmt(value, digits=4):
    if value in (None, "", float("inf")):
        return ""
    return f"{value:.{digits}f}"

try:
    repo_commit = subprocess.check_output(["git", "rev-parse", "HEAD"], text=True).strip()
except Exception:  # noqa: BLE001
    repo_commit = ""

append_benchmark_row(
    task="sentiment-imdb",
    model_id=MODEL_ID,
    dataset=DATASET_ID,
    sequence_or_image_res="256-tokens",
    batch=str(BATCH_SIZE),
    peak_ram_mb=fmt(metrics.get("peak_ram_mb"), 2),
    peak_vram_mb=fmt(metrics.get("peak_vram_mb"), 2),
    load_time_s=fmt(load_time, 2),
    ttfb_s=fmt(metrics.get("ttfb_s"), 3),
    tokens_per_s_or_images_per_s=fmt(metrics.get("throughput_per_s"), 3),
    precision=PRECISION,
    notebook_path="notebooks/nlp/sentiment-distilbert-imdb_cpu-first.ipynb",
    repo_commit=repo_commit,
)

with open(OUTPUT_DIR / "metrics.json", "w", encoding="utf-8") as fp:
    json.dump(metrics, fp, indent=2)
metrics


## Results Summary
        - Observations: TODO
        - Metrics captured: see `benchmarks/matrix.csv`

        ## Next Steps
        - TODOs: fill in after benchmarking

        ## Repro
        - Seed: 42 (set in measurement cell)
        - Libraries: captured via `detect_env()`
        - Notebook path: `notebooks/nlp/sentiment-distilbert-imdb_cpu-first.ipynb`
        - Latest commit: populated automatically when appending benchmarks (if git available)
