# Question 5: Sentiment Analysis with T5

This notebook implements sentiment analysis using a T5-Small model fine-tuned on SST-2.

## Part (b): Model Selection

We are using the following model from Hugging Face:

**Model:** `lightsout19/t5-sst2`
- Model Page: https://huggingface.co/lightsout19/t5-sst2

### Install and Import Dependencies

In [32]:
!pip install -U transformers torch datasets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip[0m


In [34]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from datasets import load_dataset
from tqdm.auto import tqdm

### Load Model and Tokenizer

In [35]:
# Load the T5-SST2 model and tokenizer
model_name = "lightsout19/t5-sst2"
print(f"Loading model: {model_name}")

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

print("Model loaded successfully!")
print(f"Number of labels: {model.config.num_labels}")

Loading model: lightsout19/t5-sst2
Model loaded successfully!
Number of labels: 2


### Define Prediction Function

In [36]:
def predict_sentiment(text):
    """
    Predict sentiment for a given text.
    
    Args:
        text (str): Input text to analyze
        
    Returns:
        dict: Dictionary containing predicted label and confidence scores
    """
    # Tokenize input
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    
    # Get model prediction
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        
    # Get probabilities
    probabilities = torch.softmax(logits, dim=-1)[0]
    predicted_class = torch.argmax(probabilities).item()
    
    # Map labels (LABEL_0 = negative, LABEL_1 = positive)
    label_map = {0: "negative", 1: "positive"}
    
    return {
        "text": text,
        "predicted_label": label_map[predicted_class],
        "confidence": probabilities[predicted_class].item(),
        "negative_score": probabilities[0].item(),
        "positive_score": probabilities[1].item()
    }

## Part (c): Predict Sentiment for 4 Sentences

Now we'll predict sentiment for the four required sentences.

### Sentence 1: "This movie is awesome"

In [37]:
sentence1 = "This movie is awesome"
result1 = predict_sentiment(sentence1)

print("=" * 60)
print(f"Input: {result1['text']}")
print(f"Predicted Sentiment: {result1['predicted_label'].upper()}")
print(f"Confidence: {result1['confidence']:.4f}")
print(f"\nDetailed Scores:")
print(f"  Negative: {result1['negative_score']:.4f}")
print(f"  Positive: {result1['positive_score']:.4f}")
print("=" * 60)

Input: This movie is awesome
Predicted Sentiment: POSITIVE
Confidence: 0.9992

Detailed Scores:
  Negative: 0.0008
  Positive: 0.9992


### Sentence 2: "I didn't like the movie so much"

In [38]:
sentence2 = "I didn't like the movie so much"
result2 = predict_sentiment(sentence2)

print("=" * 60)
print(f"Input: {result2['text']}")
print(f"Predicted Sentiment: {result2['predicted_label'].upper()}")
print(f"Confidence: {result2['confidence']:.4f}")
print(f"\nDetailed Scores:")
print(f"  Negative: {result2['negative_score']:.4f}")
print(f"  Positive: {result2['positive_score']:.4f}")
print("=" * 60)

Input: I didn't like the movie so much
Predicted Sentiment: NEGATIVE
Confidence: 0.9897

Detailed Scores:
  Negative: 0.9897
  Positive: 0.0103


### Sentence 3: "I'm not sure what I think about this movie."

In [39]:
sentence3 = "I'm not sure what I think about this movie."
result3 = predict_sentiment(sentence3)

print("=" * 60)
print(f"Input: {result3['text']}")
print(f"Predicted Sentiment: {result3['predicted_label'].upper()}")
print(f"Confidence: {result3['confidence']:.4f}")
print(f"\nDetailed Scores:")
print(f"  Negative: {result3['negative_score']:.4f}")
print(f"  Positive: {result3['positive_score']:.4f}")
print("=" * 60)

Input: I'm not sure what I think about this movie.
Predicted Sentiment: NEGATIVE
Confidence: 0.9766

Detailed Scores:
  Negative: 0.9766
  Positive: 0.0234


### Sentence 4: "Did you like the movie?"

In [40]:
sentence4 = "Did you like the movie?"
result4 = predict_sentiment(sentence4)

print("=" * 60)
print(f"Input: {result4['text']}")
print(f"Predicted Sentiment: {result4['predicted_label'].upper()}")
print(f"Confidence: {result4['confidence']:.4f}")
print(f"\nDetailed Scores:")
print(f"  Negative: {result4['negative_score']:.4f}")
print(f"  Positive: {result4['positive_score']:.4f}")
print("=" * 60)

Input: Did you like the movie?
Predicted Sentiment: POSITIVE
Confidence: 0.9577

Detailed Scores:
  Negative: 0.0423
  Positive: 0.9577


### Summary of All Predictions

In [41]:
# Create summary table
print("\n" + "=" * 80)
print("SUMMARY OF SENTIMENT PREDICTIONS")
print("=" * 80)
print(f"{'Sentence':<50} {'Prediction':<12} {'Confidence':>10}")
print("-" * 80)

for i, result in enumerate([result1, result2, result3, result4], 1):
    text = result['text'][:47] + "..." if len(result['text']) > 50 else result['text']
    print(f"{text:<50} {result['predicted_label']:<12} {result['confidence']:>10.4f}")

print("=" * 80)


SUMMARY OF SENTIMENT PREDICTIONS
Sentence                                           Prediction   Confidence
--------------------------------------------------------------------------------
This movie is awesome                              positive         0.9992
I didn't like the movie so much                    negative         0.9897
I'm not sure what I think about this movie.        negative         0.9766
Did you like the movie?                            positive         0.9577


## Part (d): Evaluate Model on SST-2 Dataset

Now we'll load the full SST-2 dataset and evaluate the model's accuracy on the validation set.

### Load SST-2 Dataset

In [None]:
# Load SST-2 dataset from Stanford NLP
print("Loading SST-2 dataset...")
dataset = load_dataset("stanfordnlp/sst2")

print("\nDataset structure:")
print(dataset)

# Get validation set
validation_set = dataset['validation']
print(f"\nValidation set size: {len(validation_set)} examples")

Loading SST-2 dataset...

Dataset structure:
DatasetDict({
    train: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 872
    })
    test: Dataset({
        features: ['idx', 'sentence', 'label'],
        num_rows: 1821
    })
})

Validation set size: 872 examples


### Define Evaluation Function

In [45]:
def evaluate_on_sst2(model, tokenizer, dataset, batch_size=32):
    """
    Evaluate the model on the SST-2 dataset.
    
    Args:
        model: The classification model
        tokenizer: The tokenizer
        dataset: The dataset split to evaluate on
        batch_size: Batch size for processing
        
    Returns:
        dict: Dictionary containing accuracy and detailed results
    """
    model.eval()
    correct = 0
    total = 0
    
    print(f"Evaluating on {len(dataset)} examples...")
    
    # Process in batches for efficiency
    for i in tqdm(range(0, len(dataset), batch_size)):
        batch = dataset[i:min(i + batch_size, len(dataset))]
        texts = batch['sentence']
        labels = batch['label']
        
        # Tokenize batch
        inputs = tokenizer(texts, return_tensors="pt", truncation=True, 
                          max_length=512, padding=True)
        
        # Get predictions
        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits
            predictions = torch.argmax(logits, dim=-1)
        
        # Count correct predictions
        for pred, true_label in zip(predictions.tolist(), labels):
            if pred == true_label:
                correct += 1
            total += 1
    
    accuracy = correct / total
    
    return {
        "accuracy": accuracy,
        "correct": correct,
        "total": total
    }

### Run Evaluation

In [None]:
# Evaluate on validation set
results = evaluate_on_sst2(model, tokenizer, validation_set, batch_size=32)

print("\n" + "=" * 70)
print("EVALUATION RESULTS ON SST-2 VALIDATION SET")
print("=" * 70)
print(f"Total Examples:    {results['total']}")
print(f"Correct:           {results['correct']}")
print(f"Incorrect:         {results['total'] - results['correct']}")
print(f"\n✓ Accuracy:        {results['accuracy']:.4f} ({results['accuracy']*100:.2f}%)")
print("=" * 70)

# Report for written solution
print(f"Accuracy: {results['accuracy']*100:.2f}%")

Evaluating on 872 examples...


  0%|          | 0/28 [00:00<?, ?it/s]


EVALUATION RESULTS ON SST-2 VALIDATION SET
Total Examples:    872
Correct:           786
Incorrect:         86

✓ Accuracy:        0.9014 (90.14%)
Accuracy: 90.14%


### Analysis: Check Dataset Balance

In [49]:
# Check class distribution in validation set
labels = validation_set['label']
negative_count = sum(1 for label in labels if label == 0)
positive_count = sum(1 for label in labels if label == 1)

print("\n" + "=" * 70)
print("CLASS DISTRIBUTION IN SST-2 VALIDATION SET")
print("=" * 70)
print(f"Negative examples: {negative_count} ({negative_count/len(labels)*100:.1f}%)")
print(f"Positive examples: {positive_count} ({positive_count/len(labels)*100:.1f}%)")
print(f"\nTotal examples:    {len(labels)}")
print(f"Balance ratio:     {min(negative_count, positive_count)/max(negative_count, positive_count):.3f}")
print("=" * 70)


CLASS DISTRIBUTION IN SST-2 VALIDATION SET
Negative examples: 428 (49.1%)
Positive examples: 444 (50.9%)

Total examples:    872
Balance ratio:     0.964
