### Description:
    This script demonstrates how to load and evaluate a pre-trained sentiment analysis model
    (DistilBERT fine-tuned on SST-2) using PyTorch, Hugging Face’s Transformers, and Datasets libraries.
    It evaluates the model on the SST-2 validation set and prints the overall accuracy along with
    example predictions, without applying any active learning techniques.

### How It Works:
    1. Imports necessary libraries for model loading, dataset handling, and evaluation.
    2. Loads the pre-trained DistilBERT model and its tokenizer using Hugging Face's Auto classes.
    3. Sets up the computation device (GPU if available, otherwise CPU) and puts the model in evaluation mode.
    4. Loads the SST-2 validation dataset from the GLUE benchmark using the Hugging Face Datasets library.
    5. Defines a collate function to tokenize the text and batch the data appropriately.
    6. Creates a DataLoader to handle batch processing of the dataset.
    7. Iterates over the DataLoader, performs a forward pass through the model for each batch,
       collects predictions, and computes the accuracy.
    8. Prints the overall accuracy and a few example predictions with their ground truth labels.

In [14]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm.auto import tqdm
import numpy as np

In [5]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
# pre-trained model (DistilBERT fine-tuned on SST-2)

In [6]:
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [7]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [8]:
model.eval()

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [9]:
dataset = load_dataset("glue", "sst2", split="validation")
# Load the SST-2 dataset (using the validation split from GLUE)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Generating train split: 100%|██████████| 67349/67349 [00:00<00:00, 311430.24 examples/s]
Generating validation split: 100%|██████████| 872/872 [00:00<00:00, 62203.36 examples/s]
Generating test split: 100%|██████████| 1821/1821 [00:00<00:00, 169019.62 examples/s]


In [10]:
def collate_fn(batch):
    texts = [item["sentence"] for item in batch]
    labels = [item["label"] for item in batch]
    tokenized_inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    tokenized_inputs["labels"] = torch.tensor(labels)
    return tokenized_inputs


In [11]:
dataloader = DataLoader(dataset, batch_size=32, collate_fn=collate_fn)

In [12]:
all_preds = []
all_labels = []

with torch.no_grad():
    for batch in tqdm(dataloader, desc="Evaluating"):
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)
        outputs = model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        preds = torch.argmax(logits, dim=-1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

accuracy = np.mean(np.array(all_preds) == np.array(all_labels))
print("Accuracy of the pre-trained model on SST-2 validation set: {:.2f}%".format(accuracy * 100))

Evaluating: 100%|██████████| 28/28 [00:21<00:00,  1.29it/s]

Accuracy of the pre-trained model on SST-2 validation set: 91.06%





In [13]:
for i in range(5):
    sentence = dataset[i]["sentence"]
    true_label = "positive" if dataset[i]["label"] == 1 else "negative"
    pred_label = "positive" if all_preds[i] == 1 else "negative"
    print(f"Sentence: {sentence}")
    print(f"Predicted: {pred_label} | Ground Truth: {true_label}\n")

Sentence: it 's a charming and often affecting journey . 
Predicted: positive | Ground Truth: positive

Sentence: unflinchingly bleak and desperate 
Predicted: negative | Ground Truth: negative

Sentence: allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . 
Predicted: positive | Ground Truth: positive

Sentence: the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . 
Predicted: positive | Ground Truth: positive

Sentence: it 's slow -- very , very slow . 
Predicted: negative | Ground Truth: negative

