# Session 4: Introduction to Supervised Learning
## How Models Learn From Labeled Examples

**Session Length:** 2 hours

**Today's Mission:** Understand what "training" really means by exploring supervised learning -- the process of teaching a model by showing it labeled examples.

### Session Outline
| Time | Activity |
|------|----------|
| 0:00-0:05 | Review: What did you break? What did you fix? |
| 0:05-0:30 | Part 1: What Does "Trained on Labeled Data" Mean? |
| 0:30-1:05 | Part 2: Three Types of Supervised Tasks |
| 1:05-1:40 | Part 3: How Would You Label This Data? |
| 1:40-2:00 | On Your Own: Explore track notebooks + model cards |

### Key Vocabulary

| Term | Definition |
|------|-----------|
| **Supervised Learning** | Teaching a model by showing it labeled examples |
| **Training Data** | The examples a model learned from |
| **Labels** | The correct answers attached to training examples |
| **Pre-trained Model** | A model someone else already trained, ready to use |
| **Model Card** | Documentation describing how a model was built and what it's good at |

---

## Review (0:00-0:05)

Last session, we broke models on purpose and learned how to fix inputs through data cleaning. We saw that models can be confused by noise, sarcasm, domain mismatch, and adversarial inputs.

But here is the deeper question: **why do models get confused at all?**

The answer comes down to **training**. Every model you have used was trained on specific data. When the input matches what it trained on, it works well. When the input is different from the training data, it struggles.

Today we unpack what "training" actually means.

---

## Part 1: What Does "Trained on Labeled Data" Mean? (0:05-0:30)

Every model you have used in this course was **trained** by someone before you used it. Training means: someone collected thousands (or millions) of examples, attached correct answers to each one, and fed them into a learning algorithm.

The model studied those examples and learned to find patterns. Then, when you give it a new input it has never seen before, it uses those patterns to make a prediction.

This process -- learning from examples with correct answers -- is called **supervised learning**. The "supervision" is the labels: the correct answers that guide the model's learning.

### Real Models, Real Data

Here is what some of the models you have used actually trained on:

| Model | Training Data | How Many Examples |
|-------|--------------|-------------------|
| Sentiment analysis | Movie reviews labeled POSITIVE or NEGATIVE | ~67,000 reviews |
| Emotion detection | Tweets labeled with 7 emotions | ~58,000 tweets |
| Image classification | Photos labeled with 1,000 categories | ~14 million images |
| Text generation (GPT-2) | Web pages, books, articles | ~8 million web pages |

Every one of these models exists because someone collected data, labeled it, and trained a model on it. No data, no model.

In [None]:
# Install and import
!pip install transformers==4.47.1 -q

from transformers import pipeline
print("Ready!")

**After running the install cell above**, go to **Runtime > Restart runtime**, then continue from the cell below. This ensures all packages load correctly. This is standard practice -- every data scientist does this.

### Seeing the Connection: Model to Data

Let's load an emotion classifier and trace its predictions back to its training data.

> **INSTRUCTOR NOTE:** "Open huggingface.co/j-hartmann/emotion-english-distilroberta-base in a browser tab. Show the model card. Click through to the training data link. Show students: this model exists because someone labeled 58,000 tweets with emotions. The model card tells you everything about how a model was built."

In [None]:
from transformers import pipeline

# Load the emotion classifier
emotion_classifier = pipeline(
    "text-classification",
    model="j-hartmann/emotion-english-distilroberta-base",
    top_k=None  # Get all emotion scores
)

# Run it on a clear example
text = "I can't believe I won the contest! This is the best day ever!"
emotions = emotion_classifier(text)[0]

print(f"Text: {text}")
print()
print("Emotions detected:")
for e in sorted(emotions, key=lambda x: x['score'], reverse=True):
    bar = "*" * int(e['score'] * 20)
    print(f"  {e['label']:12} {bar} ({e['score']:.1%})")

The model predicted "joy" with high confidence. But **where did it learn that "I can't believe I won!" is joy?**

From 58,000 labeled tweets. Someone read each tweet and tagged it: "this one is joy," "this one is anger," "this one is sadness." The model studied those examples and learned:
- Exclamation marks + winning/achieving words --> probably joy
- Insults + frustration words --> probably anger
- Loss + missing words --> probably sadness

It did not memorize rules. It discovered **patterns** in the labeled data.

### The Supervised Learning Recipe

Every supervised learning project follows the same recipe:

```
1. COLLECT data          (gather examples)
2. LABEL the data        (attach correct answers)
3. TRAIN the model       (let it study the examples)
4. TEST the model        (check its predictions on new data)
5. DEPLOY the model      (let people use it)
```

You have been using models at Step 5. Today we look at Steps 1-4.

### What Are Labels?

Labels are the "correct answers" attached to training examples. The type of label determines the type of task.

| Task | Input | Label | Example |
|------|-------|-------|---------|
| Sentiment analysis | "Great movie!" | POSITIVE | Positive review |
| Emotion detection | "I'm furious" | anger | Angry tweet |
| Image classification | Photo of a dog | "golden retriever" | ImageNet label |
| Spam detection | "You won $1M!" | SPAM | Email label |

Without labels, a model cannot learn. The labels are the "supervision" in supervised learning.

---

## Part 2: Three Types of Supervised Tasks (0:30-1:05)

Now let's see supervised learning in action across three different types of tasks. Each uses the same principle -- learning from labeled examples -- but with different kinds of labels.

### Task 1: Emotion Classification (Multi-Class Text Classification)

This model learned from ~58,000 tweets, each labeled with one of 7 emotions.

> **INSTRUCTOR NOTE:** "For each model, ask students to suggest inputs. Type their suggestions into the student_text variable."

> **READ THE MODEL CARD**
>
> The emotion model we just loaded is `j-hartmann/emotion-english-distilroberta-base`. Before testing it further, check its documentation:
>
> Go to [huggingface.co/j-hartmann/emotion-english-distilroberta-base](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base)
>
> - What 7 emotions can it detect?
> - What dataset was it trained on? How many examples?
> - Is it designed for tweets, formal writing, or both?
> - What limitations does the card mention?
>
> Understanding a model's training data helps you predict where it will succeed and where it will struggle.

In [None]:
# Emotion classification -- trained on 58,000 labeled tweets
student_text = "REPLACE WITH STUDENT SUGGESTION"

emotions = emotion_classifier(student_text)[0]

print(f"Text: {student_text}")
print()
print("Emotion scores:")
for e in sorted(emotions, key=lambda x: x['score'], reverse=True):
    bar = "*" * int(e['score'] * 30)
    print(f"  {e['label']:12} {bar} ({e['score']:.1%})")

> **ASK AI ABOUT THIS**
>
> Copy the emotion classifier code into Claude or ChatGPT and ask:
> *"This model was trained on 58,000 tweets. What does that mean for how it handles formal writing vs. social media text?"*
>
> This is how real researchers think about model limitations -- by connecting performance to training data.

### Quick Runtime Restart

Before loading the image model, restart your runtime to free up memory.

**Go to: Runtime > Restart runtime**

Then re-run the install cell and continue from the cell below.

This clears the text models from memory so you have room for image models. Professional data scientists do this all the time.

---

### Task 2: Image Classification

This model learned from ~14 million images, each labeled with one of 1,000 categories (from the ImageNet dataset).

> **INSTRUCTOR NOTE:** "Show the model card for google/vit-base-patch16-224 on Hugging Face. Point out: 14 million images, 1,000 categories. Someone had to label all of those."

> **FIND A MODEL: Image Classification**
>
> The default image classifier (`google/vit-base-patch16-224`) was trained on ImageNet -- 14 million photos in 1,000 categories. But there are many other image classifiers on the Hub, trained on different datasets for different purposes.
>
> 1. Go to [huggingface.co/models?pipeline_tag=image-classification&sort=downloads](https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads)
> 2. Browse the top results. You'll see models for general objects, food, animals, medical images, and more.
> 3. Pick one model and **read its model card**: What images was it trained on? How many categories? Any known limitations?
> 4. Copy the model ID -- you'll use it in the swap slot below.

In [None]:
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Helper function to load images from URLs
def load_image_from_url(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content))

# ── SWAP SLOT: Image Classification Model ──
# Default: google/vit-base-patch16-224 (trained on ImageNet, 1000 categories)
# To use a model you found on the Hub, paste its ID below:

my_image_model = "PASTE YOUR MODEL ID HERE"
# Example: "google/vit-base-patch16-224"  (ImageNet, general objects)
# Example: "nateraw/food"  (food classification)
# Example: "dima806/bird_species_image_classification"  (bird species)

# Uncomment the line below to use your own model:
# image_classifier = pipeline("image-classification", model=my_image_model)

# Default: use the standard ImageNet model
image_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")

print("Image classifier loaded!")

In [None]:
# Test with a sample image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/YellowLabradorLooking_new.jpg/1200px-YellowLabradorLooking_new.jpg"

image = load_image_from_url(image_url)
display(image.resize((300, 300)))

results = image_classifier(image)
print("\nTop predictions:")
for r in results[:5]:
    bar = "*" * int(r['score'] * 30)
    print(f"  {r['label']:30} {bar} ({r['score']:.1%})")

The model predicted "Labrador retriever" with high confidence. Where did it learn this?

From the ImageNet dataset. Researchers at Stanford collected 14 million images and organized them into 1,000 categories. Each image was labeled by a human. The model studied these labeled examples and learned to recognize visual patterns: floppy ears + golden fur + specific body shape = Labrador retriever.

### Try More Images

In [None]:
# Test with different images
test_images = {
    "cat": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg",
    "pizza": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a3/Eq_it-na_pizza-margherita_sep2005_sml.jpg/1200px-Eq_it-na_pizza-margherita_sep2005_sml.jpg",
}

for name, url in test_images.items():
    print(f"\n=== {name.upper()} ===")
    try:
        image = load_image_from_url(url)
        results = image_classifier(image)
        for r in results[:3]:
            print(f"  {r['label']:30} ({r['score']:.1%})")
    except Exception as e:
        print(f"  Error loading image: {e}")

### Task 2B: Object Detection (Book Enhancement)

Image classification answers: **"What is in this image?"**

Object detection answers: **"What is in this image, and where is each object?"**

This is a major upgrade because we get bounding boxes, not just labels.

> **FIND A MODEL: Object Detection**
>
> Object detection models identify AND locate objects in images. The default (`facebook/detr-resnet-50`) was trained on COCO -- 80 everyday object categories.
>
> 1. Go to [huggingface.co/models?pipeline_tag=object-detection&sort=downloads](https://huggingface.co/models?pipeline_tag=object-detection&sort=downloads)
> 2. Browse the models. Some specialize in faces, vehicles, or specific domains.
> 3. **Read the model card** for one: What objects can it detect? How many categories? What dataset was it trained on?

In [None]:
from PIL import Image, ImageDraw
import requests
from io import BytesIO

# Object detection model from the book
object_detector = pipeline("object-detection", model="facebook/detr-resnet-50")

def load_image_for_detection(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content))

print("Object detector loaded!")

In [None]:
# Sample image with multiple objects
object_image_url = "https://images.cocodataset.org/val2017/000000039769.jpg"
object_image = load_image_for_detection(object_image_url)

display(object_image.resize((420, 320)))

detections = object_detector(object_image)
print("Detected objects:")
for obj in detections[:10]:
    print(f"- {obj['label']} ({obj['score']:.1%})")

In [None]:
# Draw bounding boxes on the image
annotated = object_image.copy()
draw = ImageDraw.Draw(annotated)

for obj in detections[:10]:
    box = obj['box']
    coords = [box['xmin'], box['ymin'], box['xmax'], box['ymax']]
    label_text = f"{obj['label']} {obj['score']:.2f}"
    draw.rectangle(coords, outline="red", width=3)
    draw.text((coords[0], max(0, coords[1] - 12)), label_text, fill="red")

display(annotated.resize((420, 320)))

> **INSTRUCTOR NOTE:** Ask students to suggest a new image URL with multiple objects (street, classroom, sports scene). Run detection and inspect mistakes.

In [None]:
# Student object-detection test
student_object_image_url = "REPLACE WITH STUDENT SUGGESTION"

if "REPLACE" not in student_object_image_url:
    student_image = load_image_for_detection(student_object_image_url)
    student_detections = object_detector(student_image)
    display(student_image.resize((420, 320)))
    print("Top detections:")
    for obj in student_detections[:8]:
        print(f"- {obj['label']} ({obj['score']:.1%})")

### Quick Runtime Restart

Before loading the text generation model, restart your runtime to free up memory.

**Go to: Runtime > Restart runtime**

Then re-run the install cell and continue from the cell below.

---

### Task 3: Text Generation

Text generation models learned to **predict the next word** from billions of examples. The "label" is the word that actually came next in the training text.

> **INSTRUCTOR NOTE:** "Show the model card for distilgpt2 on Hugging Face. Point out: this model was trained on a large corpus of text to predict the next word. Every word it generates is based on patterns it learned from real text."

In [None]:
from transformers import pipeline

# Load text generation -- trained by predicting the next word
generator = pipeline("text-generation", model="distilgpt2")

# Generate a continuation
prompt = "The most important thing about learning AI is"

results = generator(
    prompt,
    max_length=60,
    num_return_sequences=3,
    do_sample=True,
    temperature=0.8
)

print(f"Prompt: {prompt}")
print()
for i, r in enumerate(results, 1):
    print(f"Continuation {i}:")
    print(f"  {r['generated_text']}")
    print()

This model's "labels" were different from the others. Instead of categories, the label was always: **what word actually came next?**

The model read billions of sentences and for each position, it tried to guess the next word. When it guessed wrong, it adjusted its internal numbers. After seeing enough examples, it got good at predicting what comes next.

That is why it can generate text that sounds plausible -- it learned the statistical patterns of how words follow each other.

### Comparing All Three Tasks

All three models use supervised learning. The difference is the type of label:

| Task | Input | Label Type | Training Size |
|------|-------|-----------|---------------|
| Emotion detection | Text | One of 7 emotion categories | ~58,000 tweets |
| Image classification | Image | One of 1,000 object categories | ~14 million images |
| Text generation | Text | The next word in the sequence | Billions of words |

The principle is the same: **show the model labeled examples, and it learns patterns.**

> **INSTRUCTOR NOTE:** "After all three demos, ask students: Which model was most accurate? Most surprising? Which would be hardest to train -- meaning, which would need the most labeled data?"

---

## Part 3: How Would You Label This Data? (1:05-1:25)

Now let's flip the perspective. Instead of using models, let's think about **creating training data.** This exercise will show you why labeling is harder than it sounds.

### The Labeling Challenge

Below are real examples that need labels. For each one, decide what the correct label should be. Write your answer, then discuss with the class.

> **INSTRUCTOR NOTE:** "Read each example aloud. Ask students to call out their labels. Write them on screen. The point is that students will DISAGREE -- and that disagreement is a real problem in ML."

### Round 1: Emotion Labeling

For each tweet, choose one emotion: **joy, anger, sadness, fear, surprise, disgust, neutral**

In [None]:
# The labeling challenge -- what emotion is each tweet?
tweets_to_label = [
    "I can't even right now",
    "Just found out school is cancelled tomorrow!!!",
    "My dog ate my homework. No really, he actually did.",
    "Why do people even bother trying anymore",
    "lol this is fine everything is fine",
]

print("EMOTION LABELING CHALLENGE")
print("=" * 60)
print("Choose one: joy, anger, sadness, fear, surprise, disgust, neutral")
print()
for i, tweet in enumerate(tweets_to_label, 1):
    print(f'  {i}. "{tweet}"')
    print(f"     Your label: _______________")
    print()

### Round 2: Sentiment Labeling

For each review, choose: **positive or negative**

In [None]:
reviews_to_label = [
    "It was fine, I guess.",
    "Not the worst thing I've ever seen.",
    "I expected more, but it was okay.",
    "My mom loved it, but I thought it was boring.",
    "3 out of 5 stars.",
]

print("SENTIMENT LABELING CHALLENGE")
print("=" * 60)
print("Choose one: positive or negative")
print()
for i, review in enumerate(reviews_to_label, 1):
    print(f'  {i}. "{review}"')
    print(f"     Your label: _______________")
    print()

### Round 3: Image Category Labeling

Imagine you are labeling photos. What category would you assign?

In [None]:
image_descriptions = [
    "A photo of latte art in a coffee cup",
    "A person jogging through a forest",
    "A cat sitting on a laptop keyboard",
    "A sunset over a city skyline",
    "A child painting at an easel",
]

possible_categories = ["food", "sports", "animals", "nature", "art", "technology", "people"]

print("IMAGE CATEGORY LABELING CHALLENGE")
print("=" * 60)
print(f"Choose from: {', '.join(possible_categories)}")
print()
for i, desc in enumerate(image_descriptions, 1):
    print(f"  {i}. {desc}")
    print(f"     Your label: _______________")
    print()

### Discussion: Why Labeling Is Hard

**Think about what just happened:**

1. **Did everyone in the class agree on every label?** Probably not. "I can't even right now" -- is that anger? Sadness? Frustration is not even one of the options.

2. **Some examples fit multiple categories.** A photo of latte art is food AND art. A person jogging in a forest is sports AND nature. Forcing one label onto multi-category data loses information.

3. **Context matters.** "It was fine, I guess" could be positive (it was acceptable) or negative (disappointed). Without knowing who said it and why, you cannot be sure.

4. **Scale matters.** You just labeled about 15 examples. Imagine doing this 58,000 times for the emotion model. Now imagine doing it 14 million times for the image model. Errors are inevitable at that scale.

**The key insight:** When humans disagree about labels, the model learns from that disagreement. If 60% of labelers call a tweet "anger" and 40% call it "sadness," the model will learn that this kind of tweet is ambiguous -- which is actually the correct answer.

> **ASK AI ABOUT THIS**
>
> Ask Claude or ChatGPT:
> *"If humans disagree about whether a tweet is sad or angry, what happens when an AI model trains on those labels?"*
>
> The answer connects directly to what we discovered in Session 3 about model confidence scores.

---

## On Your Own (1:40-2:00)

### Exercise 1: Explore Model Cards

Go to [huggingface.co/models](https://huggingface.co/models) and find a model that interests you. Read its model card and answer:

1. **What was it trained on?** (What dataset? How many examples?)
2. **What task does it solve?** (Classification? Generation? Something else?)
3. **What are its limitations?** (Every good model card has a limitations section)
4. **Who built it?** (A company? A university? An individual?)

**Write your findings here:**

Model name:

Trained on:

Task:

Limitations:

Built by:

### Exercise 2: Design Your Own Training Dataset

Think of a task you would like a model to solve. Then design the training data:

1. **What is the task?** (Example: "Classify student questions as homework-help, conceptual, or off-topic")
2. **What would the input look like?** (Example: A student's question text)
3. **What would the labels be?** (Example: homework-help, conceptual, off-topic)
4. **How many examples would you need?** (Hundreds? Thousands? Millions?)
5. **What would be hard to label?** (What edge cases would cause disagreement?)

**Your design:**

Task:

Input:

Labels:

How many examples:

Hard to label:

### Exercise 3: Track Preview

Want to go deeper? Here is what each specialization track covers in upcoming sessions:

| Track | Focus | You Will Build |
|-------|-------|----------------|
| **Text & Language** | Emotion detection, classification, summarization | Content analyzer, writing feedback tool |
| **Images & Vision** | Image classification, captioning, detection | Photo organizer, visual search tool |
| **Creative AI** | Text generation, style control, multi-modal | Story generator, creative writing assistant |

All three tracks build on supervised learning. The models in each track were trained on different data with different labels, but the principle is identical: **learn from labeled examples.**

---

## Checklist: Before You Leave

- [ ] Understand that every model was trained on labeled data
- [ ] Can explain the supervised learning recipe (collect, label, train, test, deploy)
- [ ] Ran the emotion classifier and traced its predictions to training data
- [ ] Read the emotion model's model card on Hugging Face
- [ ] Ran the image classifier (default or one you found on the Hub)
- [ ] Browsed image-classification models on the Hub
- [ ] Ran the text generator and understand it learned to predict the next word
- [ ] Tried the labeling challenge and discovered that humans disagree
- [ ] Can explain why labeling disagreement affects model performance
- [ ] Read at least one model card on Hugging Face

**Save your work:** File > Save a copy in Drive

---

## Looking Ahead

Today you learned that supervised learning is about learning from labeled examples. Every model you use was trained this way: someone collected data, labeled it, and trained a model to find patterns.

But we skipped a critical question: **how does the training actually work?** When we say a model "studies" examples and "adjusts" itself, what is really happening mathematically?

Next session, we open the black box further. You will learn about **training parameters** -- the knobs you can turn that control how a model learns. You will train your own model and see how changing these settings affects the result.

### Key Takeaways

1. **Supervised learning** means teaching a model by showing it labeled examples
2. Every model you have used was trained on thousands to millions of labeled examples
3. The **type of label** determines the type of task (emotions, categories, next word)
4. **Labeling is harder than it sounds** -- humans disagree, and that affects models
5. **Model cards** document how a model was built, trained, and where it struggles

---

*Youth Horizons AI Researcher Program - Level 2*