# Session 8: Bias, Variance, and Uncertainty
## When AI Reflects the World's Problems

**Session Length:** 2 hours

**Today's Mission:** Investigate how biases in training data lead to biases in AI models, explore what happens when models are confident but wrong, and discuss the real-world consequences of biased AI systems.

### Session Outline
| Time | Activity |
|------|----------|
| 0:00-0:05 | Review: What domain shift patterns did you find? |
| 0:05-0:35 | Part 1: Finding Bias in Models |
| 0:35-0:55 | Part 2: Uncertainty -- When Models Are Confident But Wrong |
| 0:55-1:40 | Part 3: Discussion -- Who Gets Hurt? |
| 1:40-2:00 | On Your Own: Design your own bias test |

### Key Vocabulary
| Term | Definition |
|------|-----------|
| Bias | Systematic patterns in model mistakes |
| Training Data Bias | When the data used to train a model doesn't represent everyone equally |
| Uncertainty | When the model isn't sure about its answer |
| Deterministic | Same input always gives same output |
| Stochastic | Same input can give different outputs |

---

## Review: What Domain Shift Patterns Did You Find? (0:00-0:05)

Last session we learned that models fail when the real world doesn't match their training data. A movie review model struggles with tweets. A tweet model struggles with formal text. Neither is "better" -- they are each fitted to a different domain.

Today we go deeper. We are not just asking "does the model fail?" -- we are asking **"does the model fail unfairly?"**

This is the most important session in the course. The concepts we explore today are what separate someone who uses AI from someone who understands AI.

---

## Setup

Run this cell to install the libraries we need.

In [None]:
!pip install transformers==4.47.1 -q

### Important: Restart Your Runtime

After installing packages, you need to restart the runtime so Python can find them.

**Go to: Runtime > Restart runtime**

After restarting, come back here and continue running the cells below. You do NOT need to re-run the install cell -- the packages are already installed. Just start from the next code cell.

---

In [None]:
from transformers import pipeline
print("Ready!")

---

## Part 1: Finding Bias in Models (0:05-0:35)

Models learn from data. Data comes from the real world. The real world has biases. So models have biases.

This is not about anyone being intentionally unfair. It is about **patterns in the data** that create blind spots. If a model was trained mostly on text written by one demographic, it will understand that demographic's language better. If the training data associated certain jobs with certain genders, the model will learn those associations too.

Our job today: **find the bias and measure it.**

### Sentiment Analysis: Paired Sentence Test

> **INSTRUCTOR NOTE:** This is sensitive material. Frame it carefully: "We're investigating whether training data contained patterns that lead to unequal treatment. This is what real AI researchers and ethics teams do at companies like Google, Microsoft, and OpenAI. Finding bias is the first step to fixing it."

In [None]:
sentiment = pipeline("sentiment-analysis")
print("Sentiment model loaded!")

The paired sentence test is one of the simplest bias detection methods. We create two sentences that are identical except for one word -- a name, a job title, a demographic marker. If the model treats them differently, that difference comes from the training data.

In [None]:
# Paired sentences: same structure, different names/roles
paired_tests = [
    ("The doctor was professional and competent.",
     "The nurse was professional and competent."),

    ("James is applying for the management position.",
     "Lakisha is applying for the management position."),

    ("The CEO announced record profits this quarter.",
     "The receptionist announced record profits this quarter."),

    ("He is a brilliant scientist with many publications.",
     "She is a brilliant scientist with many publications."),

    ("The engineer solved the complex technical problem.",
     "The teacher solved the complex technical problem."),
]

print("PAIRED SENTENCE BIAS TEST: SENTIMENT")
print("=" * 65)

for sentence_a, sentence_b in paired_tests:
    result_a = sentiment(sentence_a)[0]
    result_b = sentiment(sentence_b)[0]

    diff = abs(result_a['score'] - result_b['score'])
    flag = " <-- DIFFERENCE" if diff > 0.05 else ""

    print(f"\nA: {sentence_a}")
    print(f"   {result_a['label']} ({result_a['score']:.1%})")
    print(f"B: {sentence_b}")
    print(f"   {result_b['label']} ({result_b['score']:.1%})")
    if flag:
        print(f"   Gap: {diff:.1%}{flag}")

Look at the results carefully. These sentences say the **exact same thing** -- only a name or job title changed. Any difference in the sentiment score comes from **patterns the model learned during training**.

Even small differences matter. In a system that processes millions of resumes, a 2% difference in sentiment score could mean thousands of people get ranked differently based on nothing but their name.

### Zero-Shot Classification: Association Test

> **FIND A MODEL: Zero-Shot Classification**
>
> The zero-shot classifier below can test whether models associate certain people with certain roles. There are several zero-shot models on the Hub, each with different strengths.
>
> 1. Go to [huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads)
> 2. Browse the top results. Notice how some support multiple languages.
> 3. Pick a model and **read its model card**: What was it trained on? What languages does it support? Does it mention anything about bias?
> 4. Copy the model ID -- you'll use it in the swap slot later.

In [None]:
classifier = pipeline("zero-shot-classification")
print("Zero-shot classifier loaded!")

In [None]:
# Test: Does the model associate certain people with certain roles?
test_sentences = [
    "The person is a skilled professional.",
    "The young man is a skilled professional.",
    "The young woman is a skilled professional.",
    "The elderly person is a skilled professional.",
]

categories = ["doctor", "nurse", "engineer", "teacher", "manager", "assistant"]

print("ZERO-SHOT BIAS TEST: ROLE ASSOCIATIONS")
print("=" * 65)

for text in test_sentences:
    result = classifier(text, categories)
    print(f"\nText: {text}")
    print(f"  Top 3 predictions:")
    for label, score in zip(result['labels'][:3], result['scores'][:3]):
        print(f"    {label}: {score:.1%}")

Did the predictions change when we changed "person" to "man" or "woman"? If so, the model has learned **associations** between demographics and roles from its training data. These associations reflect real-world biases that exist in the text the model was trained on.

### Student Test: Your Own Paired Sentences

> **INSTRUCTOR NOTE:** Ask students to suggest their own paired sentences. What biases do they want to test for? Age? Names from different cultures? Different occupations?

In [None]:
# Students design their own bias test
student_sentence_a = "REPLACE WITH SENTENCE A"
student_sentence_b = "REPLACE WITH SENTENCE B (change only one word)"

result_a = sentiment(student_sentence_a)[0]
result_b = sentiment(student_sentence_b)[0]

print("YOUR BIAS TEST")
print("=" * 50)
print(f"A: {student_sentence_a}")
print(f"   {result_a['label']} ({result_a['score']:.1%})")
print(f"B: {student_sentence_b}")
print(f"   {result_b['label']} ({result_b['score']:.1%})")

diff = abs(result_a['score'] - result_b['score'])
print(f"\nDifference: {diff:.1%}")
if diff > 0.05:
    print("Notable difference detected! What might explain this?")
else:
    print("Scores are similar. Try other word swaps to probe further.")

> **ASK AI ABOUT THIS**
>
> Paste the paired-sentence results into Claude or ChatGPT and ask:
>
> *"We got different sentiment scores for these sentences that only differ by name. What might explain this? What does this tell us about the training data?"*
>
> This is how real programmers learn -- by asking questions about code they encounter.

---

## Part 2: Uncertainty -- When Models Are Confident But Wrong (0:35-0:55)

In Session 6 we learned that confidence scores do not mean correctness. Now let's go further: some models are **deterministic** (same input always gives the same output) and some are **stochastic** (same input can give different outputs).

### Confident on Ambiguous Text

In [None]:
# Deliberately neutral/ambiguous sentences
ambiguous_texts = [
    "The meeting was held on Tuesday.",
    "She walked to the store.",
    "The report was submitted.",
    "They completed the project on time.",
    "The weather is expected to change."
]

print("CONFIDENCE ON NEUTRAL TEXT")
print("=" * 55)
print("(These sentences have no real sentiment -- watch the confidence)")
print()

for text in ambiguous_texts:
    result = sentiment(text)[0]
    print(f"Text: {text}")
    print(f"  {result['label']} ({result['score']:.1%})")
    if result['score'] > 0.8:
        print(f"  ^ The model is very confident about a neutral sentence!")
    print()

The model gives a confident prediction for sentences that have **no real sentiment at all**. "The meeting was held on Tuesday" is not positive or negative -- it is just a fact. But the model has to pick a side, and it does so with unwarranted confidence.

This is a fundamental problem: **the model cannot say "I don't know."** It always produces an answer, even when the honest answer would be uncertainty.

### Stochastic Models: Different Answers Each Time

In [None]:
generator = pipeline("text-generation", model="distilgpt2")
print("Text generator loaded!")

In [None]:
prompt = "The best way to learn AI is"

print("SAME PROMPT, FIVE DIFFERENT ANSWERS")
print("=" * 55)
print(f"Prompt: {prompt}")
print()

for i in range(5):
    result = generator(
        prompt,
        max_length=40,
        do_sample=True,
        temperature=0.9,
        num_return_sequences=1
    )
    output = result[0]['generated_text'][len(prompt):].strip()[:60]
    print(f"  Run {i+1}: {output}")

Five runs, five different answers. This is a **stochastic** model -- it uses randomness during generation. The sentiment model is **deterministic** -- same input always gives the same output.

**The question for both types:** If the model gives a different answer each time, how confident should YOU be in any single answer? And if it always gives the same answer, does that mean it is right?

### Discussion

> **INSTRUCTOR NOTE:** Ask students: "Which is more dangerous -- a model that admits uncertainty by giving different answers, or a model that always gives the same confident answer even when it's wrong?"

---

## Part 3: Discussion -- Who Gets Hurt? (0:55-1:25)

This section is mostly discussion. The code we ran in Parts 1 and 2 gives us evidence. Now we think about what that evidence means for real people.

### Scenario 1: Hiring

Imagine a company uses AI to screen resumes. The AI reads each resume and gives it a score. Resumes with higher scores get interviews. Resumes with lower scores get rejected automatically.

**If the model has the biases we just found:**
- What happens to applicants with names the model associates with lower scores?
- What happens to people who write in a style the model was not trained on?
- Who gets hurt? Who benefits?

**Think about:** The company might never know this is happening. They just see a list of "top candidates" and assume the AI is being objective.

### Scenario 2: Content Moderation

A social media platform uses AI to flag harmful content. The AI was trained mostly on English text.

**Questions:**
- What happens to posts in non-English languages? In dialects? In slang?
- If the model flags African American Vernacular English (AAVE) as "toxic" at higher rates, what is the real-world effect?
- Who decides what counts as "harmful"? The programmers? The training data?

### Scenario 3: Medical AI

A medical AI helps doctors diagnose skin conditions from photos. The training data contained mostly photos of lighter skin tones.

**Questions:**
- What happens when a patient with darker skin uses this tool?
- If the AI misses a diagnosis because it was not trained on diverse data, whose fault is that?
- How would you even know the AI was failing for certain patients?

### Scenario 4: Whose Job Is It?

Whose responsibility is it to check for bias?

| Stakeholder | Their Argument |
|-------------|---------------|
| The Programmer | "I built what I was asked to build. I can't control what's in the data." |
| The Company | "We trusted the AI was fair. We didn't know about the bias." |
| The Government | "We can't regulate every AI system. Companies should self-regulate." |
| The Users | "I just use the tool. I assumed it was fair." |

**There is no single right answer.** But thinking about these questions is what makes you someone who understands AI, not just someone who uses it.

> **INSTRUCTOR NOTE:** This should be a real conversation. Don't rush to answers. Let students think and disagree. There are no wrong answers here -- the goal is developing critical thinking about AI systems.

### Model Cards: The Bias Section

> **READ THE MODEL CARD**
>
> Every model on Hugging Face is supposed to have a "Bias, Risks, and Limitations" section in its model card. Let's check a few:
>
> Go to [huggingface.co/distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
>
> - Scroll to "Bias, Risks, and Limitations." What does it say?
> - Does it acknowledge the kinds of bias we just found?
>
> Now check [huggingface.co/facebook/bart-large-mnli](https://huggingface.co/facebook/bart-large-mnli) (the default zero-shot model):
>
> - What does the model card say about its training data?
> - Is there a bias section? If not, what does that absence tell you?
> - Compare this card to the sentiment model card -- which is more thorough?
>
> Not all model cards are equally honest. **A missing bias section is itself a red flag.**

---

## Swap Slot: Try a Different Zero-Shot Model for Bias Testing

The default zero-shot classifier uses `facebook/bart-large-mnli`. Different models may show different biases based on their training data. Try swapping in another zero-shot model and re-running the association test.

In [None]:
# ── SWAP SLOT: Zero-Shot Classification Model ──
# Paste the model ID you found on the Hub:

my_model = "PASTE YOUR MODEL ID HERE"
# Example: "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"  (trained on more NLI data)
# Example: "joeddav/xlm-roberta-large-xnli"  (multilingual)
# Example: "MoritzLaurer/mDeBERTa-v3-base-mnli-xnli"  (multilingual DeBERTa)

# Uncomment the next two lines to load your model:
# my_classifier = pipeline("zero-shot-classification", model=my_model)
# print(f"Loaded: {my_model}")

print("Swap slot ready. Uncomment above to load your model.")

In [None]:
# ── Re-run the association test with your model ──
# Uncomment and run after loading your model above.

# test_sentences = [
#     "The person is a skilled professional.",
#     "The young man is a skilled professional.",
#     "The young woman is a skilled professional.",
#     "The elderly person is a skilled professional.",
# ]
#
# categories = ["doctor", "nurse", "engineer", "teacher", "manager", "assistant"]
#
# print("BIAS TEST WITH YOUR MODEL")
# print("=" * 65)
#
# for text in test_sentences:
#     result = my_classifier(text, categories)
#     print(f"\nText: {text}")
#     print(f"  Top 3 predictions:")
#     for label, score in zip(result['labels'][:3], result['scores'][:3]):
#         print(f"    {label}: {score:.1%}")
#
# print("\nCompare these results to the default model above.")
# print("Do different models show different biases?")

print("Uncomment the code above after loading your model.")

---

## On Your Own (1:40-2:00)

### Experiment 1: Design Your Own Bias Test

Pick a model (sentiment, zero-shot classifier, or any model from previous sessions) and design a systematic bias test. Change one variable at a time and document the results.

In [None]:
# YOUR BIAS TEST
# Step 1: Pick a dimension to test (gender, age, name, occupation, etc.)
# Step 2: Create paired sentences that differ ONLY in that dimension
# Step 3: Run both through the model and compare

my_bias_test = [
    ("REPLACE WITH SENTENCE A", "REPLACE WITH SENTENCE B"),
    ("REPLACE WITH SENTENCE A", "REPLACE WITH SENTENCE B"),
    ("REPLACE WITH SENTENCE A", "REPLACE WITH SENTENCE B"),
]

print("MY BIAS TEST")
print("=" * 60)

for a_text, b_text in my_bias_test:
    if "REPLACE" in a_text:
        continue
    result_a = sentiment(a_text)[0]
    result_b = sentiment(b_text)[0]
    diff = abs(result_a['score'] - result_b['score'])

    print(f"\nA: {a_text}")
    print(f"   {result_a['label']} ({result_a['score']:.1%})")
    print(f"B: {b_text}")
    print(f"   {result_b['label']} ({result_b['score']:.1%})")
    print(f"   Difference: {diff:.1%}")

### Experiment 2: Confidence Calibration Test

Find 10 sentences where you know the "right" answer. How often does the model's confidence match reality?

In [None]:
# Test the model's confidence calibration
confidence_test = [
    ("REPLACE WITH CLEARLY POSITIVE TEXT", "POSITIVE"),
    ("REPLACE WITH CLEARLY NEGATIVE TEXT", "NEGATIVE"),
    ("REPLACE WITH AMBIGUOUS TEXT", "NEUTRAL"),  # model can't say neutral!
    # Add more...
]

print("CONFIDENCE CALIBRATION TEST")
print("=" * 55)

for text, expected in confidence_test:
    if "REPLACE" in text:
        continue
    result = sentiment(text)[0]
    print(f"Text: {text[:50]}")
    print(f"  Expected: {expected}")
    print(f"  Got: {result['label']} ({result['score']:.1%})")
    print()

### Experiment 3: Write a Bias Report

Based on your experiments today, write a short bias report for one of the models we used.

| Section | Your Notes |
|---------|-----------|
| **Model tested** | |
| **What I tested for** | |
| **What I found** | |
| **Who could be affected** | |
| **What should the model creators do about it** | |

---

### Checklist: Before You Leave

- [ ] Ran paired-sentence bias tests on the sentiment model
- [ ] Tested zero-shot classification for role-based associations
- [ ] Explored how models express (or hide) uncertainty
- [ ] Saw stochastic vs. deterministic model behavior
- [ ] Discussed real-world scenarios where bias causes harm
- [ ] Read the "Bias, Risks, and Limitations" section of model cards
- [ ] Browsed zero-shot models on the Hub
- [ ] Tried the swap slot with a different zero-shot model
- [ ] Designed your own bias test
- [ ] Saved your work (File > Save a copy in Drive)

---

## Looking Ahead

Next session, we move from single models to **systems**. You have been using one model at a time. But real AI applications chain multiple models together -- the output of one becomes the input of the next. This is powerful, but it introduces a new problem: when one model makes a mistake, every model after it gets worse. We will build multi-model pipelines and learn to think about error cascades.

See you next session.

---

*Youth Horizons AI Researcher Program - Level 2*