# Session 4: Sentiment Showdown

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/level-2-course-material/blob/main/session-04/notebook.ipynb)

In [None]:
!pip install -q transformers torch
from transformers import pipeline
print("Setup complete!")

## What We Built Tonight

We put three sentiment models head-to-head on the same text and asked: which one is "right"?

The answer: it depends on **what you're using it for**. That's model evaluation.

## Load Three Sentiment Models

This takes a minute -- you're downloading three models. Run each cell and wait for the checkmark.

In [None]:
# Model 1: Trained on movie reviews (POSITIVE / NEGATIVE only)
model_movie = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")
print("Movie review model loaded!")

In [None]:
# Model 2: Trained on 124 million tweets (positive / neutral / negative)
model_twitter = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest")
print("Twitter model loaded!")

In [None]:
# Model 3: Trained on product reviews (1 star to 5 stars)
model_product = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment")
print("Product review model loaded!")

## Run the Showdown

All three models read the same text. Do they agree?

In [None]:
text = "The service was slow but the food was amazing."

r1 = model_movie(text)[0]
r2 = model_twitter(text)[0]
r3 = model_product(text)[0]

print(f"Movie Review Model:   {r1['label']} ({r1['score']:.0%})")
print(f"Twitter Model:        {r2['label']} ({r2['score']:.0%})")
print(f"Product Review Model: {r3['label']} ({r3['score']:.0%})")

## Experiments

### Experiment 1: Try 5 inputs and keep score

**Try this:** Run each input and fill in the table below.

| Input | Movie Model | Twitter Model | Product Model | Do they agree? |
|-------|------------|---------------|---------------|----------------|
| (your text) | | | | |
| (your text) | | | | |
| (your text) | | | | |
| (your text) | | | | |
| (your text) | | | | |

In [None]:
# Try this: change the text and run again
text = "lol this is SO bad it's actually good"

r1 = model_movie(text)[0]
r2 = model_twitter(text)[0]
r3 = model_product(text)[0]

print(f"Movie Review Model:   {r1['label']} ({r1['score']:.0%})")
print(f"Twitter Model:        {r2['label']} ({r2['score']:.0%})")
print(f"Product Review Model: {r3['label']} ({r3['score']:.0%})")

### Experiment 2: Find maximum disagreement

**Try this:** Can you find a sentence where all three models give completely different answers?

In [None]:
# Try this: find the most disagreement you can
text = "I can't believe how terrible this is. Just kidding, it's great!"

r1 = model_movie(text)[0]
r2 = model_twitter(text)[0]
r3 = model_product(text)[0]

print(f"Movie Review Model:   {r1['label']} ({r1['score']:.0%})")
print(f"Twitter Model:        {r2['label']} ({r2['score']:.0%})")
print(f"Product Review Model: {r3['label']} ({r3['score']:.0%})")

### Experiment 3: Try sarcasm

**Try this:** Sarcasm is hard for AI. Which model handles it best?

In [None]:
# Try this: does any model "get" sarcasm?
text = "10/10 would not recommend"

r1 = model_movie(text)[0]
r2 = model_twitter(text)[0]
r3 = model_product(text)[0]

print(f"Movie Review Model:   {r1['label']} ({r1['score']:.0%})")
print(f"Twitter Model:        {r2['label']} ({r2['score']:.0%})")
print(f"Product Review Model: {r3['label']} ({r3['score']:.0%})")

### Experiment 4: Neutral text

**Try this:** The movie review model has no "neutral" label. What does it do with neutral text?

In [None]:
# Try this: what happens with genuinely neutral text?
text = "The movie was fine. Nothing special but not bad either."

r1 = model_movie(text)[0]
r2 = model_twitter(text)[0]
r3 = model_product(text)[0]

print(f"Movie Review Model:   {r1['label']} ({r1['score']:.0%})")
print(f"Twitter Model:        {r2['label']} ({r2['score']:.0%})")
print(f"Product Review Model: {r3['label']} ({r3['score']:.0%})")
print()
print("Notice: the movie model HAS to pick POSITIVE or NEGATIVE -- it has no neutral!")

## Challenge

If you had to pick **one** of these three models for a real project, which would you pick and why? It depends on what kind of text your users would input.

---

**GitHub skill:** Upload this notebook to your `my-ai-portfolio` repo (same as last time -- reinforce the habit!):
1. Go to your repo on github.com
2. Click **Add file** -> **Upload files**
3. Drag your `.ipynb` file and click **Commit changes**

## Vocabulary

| Term | Meaning |
|------|--------|
| **Model evaluation** | The process of measuring how well a model performs on a specific task |
| **Sentiment analysis** | Detecting positive, negative, or neutral feeling in text |
| **Confidence score** | How sure the model is about its answer (shown as a percentage) |
| **Domain** | The type of text a model was trained on (tweets, reviews, news, etc.) |
| **False positive** | Model says YES when the answer is actually NO |
| **False negative** | Model says NO when the answer is actually YES |