# Lecture 2 (Slides 2-5): Supervised Learning Basics
## CSC5991 - Introduction to LLMs

---

**Welcome!** This notebook is designed for students who have **never studied machine learning before**. We will walk through every concept step by step, with plain-English explanations, real code examples, and visualizations.

### What You Will Learn in This Notebook

| Slide | Topic | What You Will Understand |
|-------|-------|-------------------------|
| Slide 2 | What is Supervised Learning? | The core definition, labeled data, inputs/outputs, and deep learning basics |
| Slide 3 | Examples of Supervised Learning | 10+ real-world applications across many industries |
| Slide 4 | Supervised Learning Workflow | The step-by-step pipeline from data to predictions |
| Slide 5 | Types of Supervised Learning | Regression vs. Classification with hands-on code |

### Prerequisites

- Basic Python knowledge (variables, loops, functions)
- No machine learning experience required!

### Libraries We Will Use

| Library | Purpose |
|---------|--------|
| `numpy` | Numerical computations (arrays, math) |
| `matplotlib` | Creating charts and visualizations |
| `scikit-learn` | Machine learning algorithms and tools |
| `pandas` | Data manipulation (tables) |

---

In [None]:
# =============================================================================
# STEP 0: Import all the libraries we need for this notebook
# =============================================================================

# numpy: the fundamental library for numerical computing in Python.
# We use it to create arrays (lists of numbers) and do math on them.
import numpy as np

# matplotlib.pyplot: the most popular plotting library in Python.
# We use it to create charts, graphs, and visualizations.
import matplotlib.pyplot as plt

# pandas: a library for working with tabular data (like spreadsheets).
# We use it to organize data into neat tables.
import pandas as pd

# scikit-learn (sklearn): the most widely used machine learning library.
# We will import specific parts of it as we need them throughout the notebook.

# This line makes our plots look nicer and appear inside the notebook.
%matplotlib inline

# Set a consistent style for all our plots so they look clean.
plt.style.use('seaborn-v0_8-whitegrid')

# Set a random seed so that every time you run this notebook,
# you get the exact same results. This makes learning easier
# because the numbers won't change between runs.
np.random.seed(42)

print("All libraries imported successfully!")
print(f"NumPy version:  {np.__version__}")
print(f"Pandas version: {pd.__version__}")
print("\nYou are ready to begin learning about Supervised Learning!")

---

# SLIDE 2: What is Supervised Learning?

---

## 2.1 The Big Picture: What is Machine Learning?

Before we define supervised learning, let us first understand what **machine learning (ML)** is.

### Traditional Programming vs. Machine Learning

In **traditional programming**, a human programmer writes explicit rules:

```
Traditional Programming:
    INPUT (Data) + RULES (written by human) --> OUTPUT (Answers)
    
    Example: "If temperature > 100°F, print 'hot'"
```

In **machine learning**, the computer figures out the rules by looking at examples:

```
Machine Learning:
    INPUT (Data) + OUTPUT (Answers) --> RULES (learned by computer)
    
    Example: Given thousands of temperatures labeled 'hot' or 'cold',
             the computer learns the boundary on its own.
```

**Key insight**: In ML, we do NOT tell the computer the rules. Instead, we give it many examples and let it figure out the patterns by itself.

---

## 2.2 Definition of Supervised Learning

> **Supervised Learning** is a machine learning paradigm where the model learns from **labeled data**.

Let us break this definition down word by word:

| Term | Meaning |
|------|--------|
| **Supervised** | "Supervised" means there is a teacher (the labels). Just like a student learning with an answer key, the model has the correct answers during training. |
| **Learning** | The model improves its predictions by looking at more and more examples. |
| **Labeled data** | Each piece of data comes with a correct answer (a "label"). For example, a photo of a cat comes with the label "cat". |

### Real-World Analogy: Learning with Flashcards

Imagine you are studying for a vocabulary test using flashcards:

- **Front of the card** (the input): A word in Spanish, e.g., "gato"
- **Back of the card** (the label): The English translation, e.g., "cat"

You study many flashcards (training data). Each card has both the question AND the answer. After studying enough cards, someone shows you a NEW Spanish word you have never seen, and you try to guess the English translation.

That is EXACTLY what supervised learning does:
1. **Training phase**: Study many examples where both the input and correct answer are known.
2. **Prediction phase**: Given a new input (never seen before), predict the correct answer.

---

## 2.3 The Mathematical Notation: Data Samples

In supervised learning, our data is organized as **pairs**:

$$\text{Data samples: } (x_i, y_i) \text{ for } i = 1, 2, \ldots, n$$

Let us decode this notation:

| Symbol | Name | Meaning | Example |
|--------|------|---------|--------|
| $x_i$ | **Input** (also called "feature") | The information we give to the model | A photo, a number, a sentence |
| $y_i$ | **Output** (also called "label" or "target") | The correct answer for that input | "cat", 72.5, "positive" |
| $i$ | **Index** | Which example we are looking at | 1st example, 2nd example, etc. |
| $n$ | **Total count** | How many examples we have | 1000 training examples |
| $(x_i, y_i)$ | **Data pair** | One input matched with its correct output | (photo_of_cat, "cat") |

### Why Pairs?

Each input $x_i$ is **paired** with its correct label $y_i$. This pairing is what makes it "supervised" -- the model can check its guesses against the true answers.

---

In [None]:
# =============================================================================
# EXAMPLE 2.3: Let's create labeled data in Python!
# =============================================================================
# Scenario: We have data about houses.
# Input (x): the size of the house in square feet.
# Output (y): the price of the house in thousands of dollars.
#
# Each (x_i, y_i) pair is one house with its known price.
# =============================================================================

# x_i values: house sizes in square feet
# We use a numpy array because it is efficient for numerical data.
x_inputs = np.array([600, 800, 1000, 1200, 1500, 1800, 2000, 2500, 3000, 3500])

# y_i values: house prices in thousands of dollars
# Each price corresponds to the house size at the same position.
# For example, x_inputs[0]=600 sqft has y_labels[0]=$120k
y_labels = np.array([120, 160, 200, 240, 310, 370, 420, 530, 620, 740])

# Let's see how many data samples we have.
# In our mathematical notation, this is 'n'.
n = len(x_inputs)
print(f"Number of data samples (n): {n}")
print()

# Let's display each (x_i, y_i) pair so you can see the structure.
print("Our labeled data samples (x_i, y_i):")
print("=" * 50)
# Loop through each example using enumerate which gives us index i and the value.
for i, (x_i, y_i) in enumerate(zip(x_inputs, y_labels)):
    # i starts at 0 in Python, but we add 1 to match the math notation (1-indexed).
    print(f"  Sample {i+1:2d}:  x_{i+1} = {x_i:5d} sqft  -->  y_{i+1} = ${y_i:4d}k")

print()
print("Notice: Every input x_i is PAIRED with a label y_i.")
print("This is what makes it LABELED data!")

In [None]:
# =============================================================================
# EXAMPLE 2.3b: Let's also display this data as a nice table using pandas.
# =============================================================================

# Create a pandas DataFrame, which is like a spreadsheet / table.
# Each column has a name, and each row is one data sample.
data_table = pd.DataFrame({
    'Sample (i)': range(1, n + 1),           # Sample index: 1, 2, ..., n
    'Input x_i (sqft)': x_inputs,             # The input feature
    'Label y_i (price $k)': y_labels           # The correct output label
})

# Display the table. In Jupyter notebooks, DataFrames are rendered as nice HTML tables.
print("Labeled Dataset as a Table:")
print()
# .to_string(index=False) prints the table without the default pandas row index
print(data_table.to_string(index=False))

---

## 2.4 The Goal of Machine Learning

> **Goal**: Given an **unseen** $x$ (an input the model has never seen before), predict the label $y$.

This is the fundamental purpose of supervised learning. Let us break it down:

1. **Training Phase**: The model studies the labeled data: $(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)$
2. **Learning**: The model finds patterns in the data (e.g., "bigger houses tend to cost more")
3. **Prediction Phase**: Someone gives the model a brand new $x_{\text{new}}$ it has never seen
4. **Output**: The model predicts $\hat{y}$ (read as "y-hat"), its best guess for the label

### Important Notation

| Symbol | Meaning |
|--------|---------|
| $y$ | The **true** (correct) label |
| $\hat{y}$ | The **predicted** label (the model's guess). The hat ^ means "estimate". |

A good model makes predictions $\hat{y}$ that are very close to the true labels $y$.

### Real-World Analogy: Studying for an Exam

- **Training data** = Practice problems with answer keys
- **Learning** = Studying those practice problems to understand the patterns
- **Unseen x** = The actual exam questions (you have never seen these exact questions)
- **Prediction** = Your answers on the exam
- **Good model** = A well-prepared student who gets most exam answers right

---

In [None]:
# =============================================================================
# EXAMPLE 2.4: Demonstrating the Goal of ML -- Predict unseen data
# =============================================================================
# We will use our house price data to show the concept.
# The model learns from the 10 houses we showed it,
# then predicts the price of a house it has NEVER seen.
# =============================================================================

# Import LinearRegression from scikit-learn.
# This is one of the simplest ML models. Don't worry about the details yet;
# we will explain models in more depth later. For now, just think of it as
# a "learning machine" that finds patterns in data.
from sklearn.linear_model import LinearRegression

# STEP 1: Prepare the training data.
# sklearn expects the input X to be a 2D array (a table with rows and columns),
# even if we only have one feature. reshape(-1, 1) converts our 1D array
# into a 2D array with one column.
# Before reshape: [600, 800, 1000, ...] -- shape is (10,)
# After  reshape: [[600], [800], [1000], ...] -- shape is (10, 1)
X_train = x_inputs.reshape(-1, 1)
y_train = y_labels

print("STEP 1: Training data prepared")
print(f"  X_train shape: {X_train.shape}  (10 samples, 1 feature each)")
print(f"  y_train shape: {y_train.shape}  (10 labels)")
print()

# STEP 2: Create the model and train it.
# .fit() is the function that tells the model to LEARN from the data.
# This is the "training phase" or "learning phase".
model = LinearRegression()       # Create a new (untrained) model
model.fit(X_train, y_train)      # Train the model on our labeled data

print("STEP 2: Model has been trained (it studied our 10 labeled examples)")
print()

# STEP 3: Predict on UNSEEN data.
# Let's ask: what would a 1,750 sqft house cost?
# The model has never seen a 1,750 sqft house in the training data!
x_new = np.array([[1750]])  # New, unseen input (must be 2D for sklearn)

# .predict() asks the model to make a prediction.
y_predicted = model.predict(x_new)

print("STEP 3: Prediction on UNSEEN data")
print(f"  New input (x_new):    {x_new[0][0]} sqft")
print(f"  Predicted price (y^): ${y_predicted[0]:.1f}k")
print()
print("The model predicted a price for a house size it was NEVER trained on!")
print("This is the GOAL of machine learning: generalize to new, unseen data.")

In [None]:
# =============================================================================
# VISUALIZATION 2.4: Plot the training data and the prediction
# =============================================================================

# Create a figure (the blank canvas) and axes (the plotting area).
# figsize=(10, 6) means the plot will be 10 inches wide and 6 inches tall.
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the training data as blue circles.
# 'o' means circle markers, 's' is markersize, 'zorder' controls layering.
ax.scatter(x_inputs, y_labels, color='blue', s=100, zorder=5,
           label='Training data (labeled)', edgecolors='black')

# Plot the prediction as a red star.
ax.scatter(x_new[0][0], y_predicted[0], color='red', s=300, zorder=6,
           marker='*', label=f'Prediction for {x_new[0][0]} sqft = ${y_predicted[0]:.0f}k',
           edgecolors='black')

# Draw the line that the model learned (the pattern it found).
# We create a range of x values and predict y for each one.
x_line = np.linspace(400, 3800, 100).reshape(-1, 1)  # 100 points from 400 to 3800
y_line = model.predict(x_line)                          # Model's prediction for each
ax.plot(x_line, y_line, color='green', linewidth=2, linestyle='--',
        label='Learned pattern (model)', alpha=0.7)

# Add an arrow pointing to the prediction to highlight it.
ax.annotate('NEW unseen input!',
            xy=(x_new[0][0], y_predicted[0]),         # Point the arrow AT
            xytext=(x_new[0][0] + 400, y_predicted[0] + 80),  # Text position
            fontsize=12, color='red', fontweight='bold',
            arrowprops=dict(arrowstyle='->', color='red', lw=2))

# Label the axes so the reader knows what each axis represents.
ax.set_xlabel('House Size (sqft)', fontsize=14)
ax.set_ylabel('House Price ($k)', fontsize=14)
ax.set_title('The Goal of ML: Predict Labels for Unseen Inputs', fontsize=16)

# Add a legend to explain the colors/symbols.
ax.legend(fontsize=11, loc='upper left')

# Make the plot look clean.
plt.tight_layout()

# Display the plot.
plt.show()

print("\nBlue dots = training data the model LEARNED from.")
print("Green dashed line = the pattern the model discovered.")
print("Red star = prediction for new, UNSEEN data. That's the goal!")

---

## 2.5 Deep Learning: A Special Kind of Supervised Learning

The lecture slide mentions:

> **Deep Learning**: Input --> Neural Networks (input layer, hidden layers, output layer) --> Output

### What is Deep Learning?

Deep learning is a **subset** (a special case) of machine learning that uses **neural networks** -- computational systems loosely inspired by the human brain.

```
Artificial Intelligence (AI)
  └── Machine Learning (ML)
        └── Deep Learning (DL)  <-- uses neural networks
```

### What is a Neural Network?

A neural network is made up of **layers** of interconnected "neurons" (mathematical units):

```
Input Layer          Hidden Layer(s)         Output Layer
  (x)                (processing)              (y-hat)
                                              
  [x1] ----\        /--[ h1 ]--\             
             \------/            \--------->  [y-hat]
  [x2] ----/--------\--[ h2 ]--/
                       
```

| Layer | Role | Analogy |
|-------|------|---------|
| **Input Layer** | Receives the raw data (features) | Your eyes seeing a photo |
| **Hidden Layer(s)** | Processes and transforms the data, extracting patterns | Your brain analyzing what it sees |
| **Output Layer** | Produces the final prediction | Your mouth saying "that's a cat" |

### Why is it called "Deep"?

- A network with **many hidden layers** is called a "deep" neural network.
- More layers = the network can learn more complex patterns.
- Example: A simple network might have 2-3 layers. A deep network (like those used in ChatGPT) can have hundreds of layers!

### Real-World Analogy: A Factory Assembly Line

Think of a neural network as a factory:

- **Input layer** = Raw materials arriving at the factory (raw data)
- **Hidden layers** = Workers at different stations who each do a specific job (processing, transforming, refining the data)
- **Output layer** = The finished product that comes off the assembly line (the prediction)

Each worker (neuron) takes the output from the previous worker, does some processing, and passes it to the next. The more workers (layers) you have, the more complex the product (prediction) you can make.

---

In [None]:
# =============================================================================
# VISUALIZATION 2.5: Draw a simple neural network diagram
# =============================================================================
# We will draw a simple neural network with:
#   - 3 input neurons (input layer)
#   - 4 hidden neurons (hidden layer)
#   - 1 output neuron (output layer)
# =============================================================================

fig, ax = plt.subplots(figsize=(12, 7))

# Define the x-position of each layer (left to right).
layer_x = [0.15, 0.5, 0.85]  # Input layer at 0.15, hidden at 0.5, output at 0.85

# Define the y-positions of neurons in each layer.
# More neurons = more y-positions.
input_neurons_y  = [0.25, 0.5, 0.75]           # 3 input neurons
hidden_neurons_y = [0.15, 0.38, 0.62, 0.85]    # 4 hidden neurons
output_neurons_y = [0.5]                         # 1 output neuron

# Choose colors for each layer.
input_color  = '#4CAF50'  # Green
hidden_color = '#2196F3'  # Blue
output_color = '#FF5722'  # Red-orange

# --- Draw connections (lines) between neurons ---
# Draw lines from every input neuron to every hidden neuron.
for iy in input_neurons_y:
    for hy in hidden_neurons_y:
        # Each line represents a "weight" -- a number the model learns.
        ax.plot([layer_x[0], layer_x[1]], [iy, hy],
                color='gray', alpha=0.3, linewidth=1)

# Draw lines from every hidden neuron to every output neuron.
for hy in hidden_neurons_y:
    for oy in output_neurons_y:
        ax.plot([layer_x[1], layer_x[2]], [hy, oy],
                color='gray', alpha=0.3, linewidth=1)

# --- Draw neurons (circles) ---
neuron_size = 800  # Size of each circle

# Input layer neurons
for i, iy in enumerate(input_neurons_y):
    ax.scatter(layer_x[0], iy, s=neuron_size, color=input_color,
               edgecolors='black', linewidth=2, zorder=5)
    ax.text(layer_x[0], iy, f'x{i+1}', ha='center', va='center',
            fontsize=14, fontweight='bold', color='white')

# Hidden layer neurons
for i, hy in enumerate(hidden_neurons_y):
    ax.scatter(layer_x[1], hy, s=neuron_size, color=hidden_color,
               edgecolors='black', linewidth=2, zorder=5)
    ax.text(layer_x[1], hy, f'h{i+1}', ha='center', va='center',
            fontsize=14, fontweight='bold', color='white')

# Output layer neuron
for i, oy in enumerate(output_neurons_y):
    ax.scatter(layer_x[2], oy, s=neuron_size, color=output_color,
               edgecolors='black', linewidth=2, zorder=5)
    ax.text(layer_x[2], oy, r'$\hat{y}$', ha='center', va='center',
            fontsize=16, fontweight='bold', color='white')

# --- Add layer labels ---
ax.text(layer_x[0], 0.0, 'Input\nLayer', ha='center', va='center',
        fontsize=14, fontweight='bold', color=input_color)
ax.text(layer_x[1], 0.0, 'Hidden\nLayer', ha='center', va='center',
        fontsize=14, fontweight='bold', color=hidden_color)
ax.text(layer_x[2], 0.0, 'Output\nLayer', ha='center', va='center',
        fontsize=14, fontweight='bold', color=output_color)

# --- Add arrows showing data flow ---
ax.annotate('', xy=(0.08, 0.5), xytext=(-0.02, 0.5),
            arrowprops=dict(arrowstyle='->', lw=3, color='black'))
ax.text(-0.04, 0.5, 'Input\n(x)', ha='center', va='center',
        fontsize=13, fontweight='bold')

ax.annotate('', xy=(1.0, 0.5), xytext=(0.92, 0.5),
            arrowprops=dict(arrowstyle='->', lw=3, color='black'))
ax.text(1.04, 0.5, 'Output\n(prediction)', ha='center', va='center',
        fontsize=13, fontweight='bold')

# Remove axes since this is a diagram, not a data plot.
ax.set_xlim(-0.12, 1.12)
ax.set_ylim(-0.1, 1.0)
ax.axis('off')
ax.set_title('Deep Learning: Neural Network Architecture\n(Input Layer --> Hidden Layer --> Output Layer)',
             fontsize=16, fontweight='bold')

plt.tight_layout()
plt.show()

print("The diagram above shows a simple neural network.")
print("- Green nodes (x1, x2, x3): INPUT layer -- receives raw data")
print("- Blue nodes (h1-h4): HIDDEN layer -- processes and transforms data")
print("- Red node (y-hat): OUTPUT layer -- produces the final prediction")
print("- Gray lines: CONNECTIONS (weights) that the model learns during training")

---

### Summary of Slide 2: What is Supervised Learning?

| Concept | Key Takeaway |
|---------|--------------|
| Supervised Learning | A type of ML where the model learns from data that has known correct answers (labels). |
| Data format | Pairs of (input, label): $(x_i, y_i)$ for $i = 1, \ldots, n$ |
| Input ($x_i$) | The information fed into the model (features) |
| Output ($y_i$) | The correct answer (label/target) the model tries to learn |
| Goal of ML | Given a NEW, unseen input $x$, predict its label $y$ accurately |
| Deep Learning | Uses neural networks with input, hidden, and output layers |

---
---

# SLIDE 3: Examples of Supervised Learning

---

Supervised learning is used in a huge variety of real-world applications. The lecture slide lists the following examples. Let us go through each one in detail to understand what the **input** is, what the **label** is, and why it matters.

---

## 3.1 Comprehensive Table of Examples

| # | Application | Input ($x$) | Label ($y$) | Industry | Why It Matters |
|---|-------------|-------------|-------------|----------|----------------|
| 1 | **Image Classification** | A photograph (pixels) | Category ("cat", "dog", "car") | Tech, Healthcare | Self-driving cars recognizing stop signs; doctors detecting tumors in X-rays |
| 2 | **Document Categorization** | A text document | Category ("sports", "politics", "science") | Media, Legal | Automatically sorting news articles; organizing legal documents |
| 3 | **Speech Recognition** | Audio waveform | Text transcript | Tech, Accessibility | Siri, Alexa, Google Assistant converting your voice to text |
| 4 | **Protein Classification** | Protein structure/sequence | Protein family or function | Biotech, Pharma | Drug discovery; understanding diseases at the molecular level |
| 5 | **Spam Detection** | An email (text, metadata) | "spam" or "not spam" | Email, Security | Filtering junk emails so your inbox stays clean |
| 6 | **Branch Prediction** | CPU instruction history | Next branch direction (taken/not taken) | Computer Architecture | Making your computer's CPU run faster |
| 7 | **Fraud Detection** | Credit card transaction | "fraudulent" or "legitimate" | Finance, Banking | Protecting your bank account from unauthorized charges |
| 8 | **Natural Language Processing (NLP)** | Text (sentence, document) | Sentiment, translation, entity | Tech, Business | Chatbots, language translation, sentiment analysis of reviews |
| 9 | **Playing Games** | Game state (board, score) | Best next move or outcome | AI Research, Entertainment | AlphaGo beating world champions at the game of Go |
| 10 | **Computational Advertising** | User profile + webpage context | "click" or "no click" | Marketing, Tech | Showing you ads that are relevant to your interests |

---

## 3.2 Detailed Walkthrough of Selected Examples

Let us dive deeper into a few of these examples to really solidify the concept.

---

### Example A: Image Classification

**Problem**: Given a photograph, identify what object is in it.

- **Input ($x$)**: An image, which to a computer is a grid of numbers (pixel values). A 28x28 grayscale image is 784 numbers.
- **Label ($y$)**: The name of the object, e.g., "cat", "dog", "airplane".
- **Training**: Show the model thousands of labeled images ("this image is a cat", "this one is a dog").
- **Prediction**: Give the model a new photo it has never seen, and it tells you what it thinks it is.

**Why it's supervised**: Each training image comes WITH its correct label.

### Example B: Spam Detection

**Problem**: Is this email spam or not?

- **Input ($x$)**: Features of the email -- word frequencies, sender address, presence of links, etc.
- **Label ($y$)**: "spam" or "not spam" (also called "ham").
- **Training**: Show the model thousands of emails that humans have already labeled as spam or not spam.
- **Prediction**: When a new email arrives, the model classifies it instantly.

**Why it's supervised**: Humans provided the correct labels for the training emails.

### Example C: Fraud Detection

**Problem**: Is this credit card transaction fraudulent?

- **Input ($x$)**: Transaction details -- amount, time, location, merchant type, whether card was present.
- **Label ($y$)**: "fraudulent" or "legitimate".
- **Training**: The model studies millions of past transactions where investigators already determined if fraud occurred.
- **Prediction**: When a new transaction happens, the model flags it in real-time if it looks suspicious.

**Why it's supervised**: Historical transactions were labeled by fraud investigators.

---

### Real-World Analogy: A Doctor Learning to Diagnose

Think of all these examples like a medical student learning to diagnose diseases:

1. **Training** (medical school): The student studies thousands of patient cases where the diagnosis is already known. X-ray showing a dark spot? The professor says "that's a tumor." (labeled data)
2. **Practice** (residency): The student gets better and better at recognizing patterns.
3. **Real world** (practicing doctor): A new patient walks in with a new X-ray. The doctor uses their training to predict the diagnosis.

Every single supervised learning application follows this same pattern: learn from labeled examples, then predict on new data.

---

In [None]:
# =============================================================================
# EXAMPLE 3.2: Let's build a real Spam Detection classifier!
# =============================================================================
# We will create a tiny (but real) spam detector using made-up email features.
# This demonstrates the concepts from Slide 3 with actual running code.
# =============================================================================

# --- Step 1: Create our labeled training data ---
# We simulate email features:
#   Feature 1: Number of exclamation marks in the email
#   Feature 2: Number of links in the email
#   Feature 3: Number of times "free" appears
#   Feature 4: Length of the email (number of words)
#
# Label: 1 = spam, 0 = not spam (ham)

# Each row is one email: [exclamation_marks, links, "free"_count, word_count]
email_features = np.array([
    # --- Spam emails (label = 1) ---
    [15, 8, 5, 50],    # Many exclamation marks, lots of links, says "free" a lot
    [12, 6, 4, 40],
    [20, 10, 7, 30],
    [18, 9, 6, 45],
    [10, 7, 3, 35],
    [25, 12, 8, 20],
    [14, 5, 4, 55],
    [22, 11, 9, 25],
    # --- Legitimate (ham) emails (label = 0) ---
    [1, 1, 0, 200],    # Few exclamation marks, few links, no "free", longer email
    [0, 0, 0, 150],
    [2, 2, 0, 300],
    [1, 0, 0, 180],
    [3, 1, 1, 250],
    [0, 1, 0, 120],
    [2, 0, 0, 400],
    [1, 2, 0, 350],
])

# Labels: 1 = spam, 0 = not spam
# The first 8 emails are spam, the last 8 are legitimate.
email_labels = np.array([1, 1, 1, 1, 1, 1, 1, 1,
                         0, 0, 0, 0, 0, 0, 0, 0])

# Let's see the data in a nice table.
email_df = pd.DataFrame(email_features,
                        columns=['Exclamation Marks', 'Links', '"Free" Count', 'Word Count'])
email_df['Label'] = ['SPAM' if l == 1 else 'HAM' for l in email_labels]

print("Our Labeled Email Dataset:")
print("=" * 65)
print(email_df.to_string(index=True))
print()
print(f"Total emails: {len(email_labels)}")
print(f"Spam emails:  {sum(email_labels == 1)}")
print(f"Ham emails:   {sum(email_labels == 0)}")

In [None]:
# =============================================================================
# EXAMPLE 3.2b: Train a spam classifier and make predictions
# =============================================================================

# Import a simple classifier: Decision Tree.
# A decision tree makes predictions by asking yes/no questions about the data.
# Example: "Does the email have > 5 exclamation marks? If yes -> likely spam"
from sklearn.tree import DecisionTreeClassifier

# Create and train the model.
spam_model = DecisionTreeClassifier(random_state=42)  # Create the model
spam_model.fit(email_features, email_labels)           # Train on labeled data

print("Spam detection model has been trained!")
print()

# --- Now let's predict on NEW, UNSEEN emails ---
# Email A: Looks like spam (many exclamation marks, links, says "free")
new_email_A = np.array([[16, 7, 5, 35]])
# Email B: Looks legitimate (few exclamation marks, no "free", long email)
new_email_B = np.array([[1, 1, 0, 280]])
# Email C: Ambiguous case
new_email_C = np.array([[5, 3, 1, 100]])

# Make predictions. predict() returns 0 or 1.
pred_A = spam_model.predict(new_email_A)[0]
pred_B = spam_model.predict(new_email_B)[0]
pred_C = spam_model.predict(new_email_C)[0]

# Display predictions.
print("Predictions on UNSEEN emails:")
print("=" * 65)
print(f"  Email A (16 !, 7 links, 5 'free', 35 words): {'SPAM' if pred_A else 'HAM'}")
print(f"  Email B ( 1 !, 1 link,  0 'free', 280 words): {'SPAM' if pred_B else 'HAM'}")
print(f"  Email C ( 5 !, 3 links, 1 'free', 100 words): {'SPAM' if pred_C else 'HAM'}")
print()
print("The model used patterns from the LABELED training emails to classify")
print("these NEW emails it had never seen before. This is supervised learning!")

In [None]:
# =============================================================================
# VISUALIZATION 3.2: Visualize spam vs ham emails
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# --- Plot 1: Exclamation marks vs Links ---
ax1 = axes[0]
# Separate spam and ham for coloring.
spam_mask = email_labels == 1  # Boolean mask: True for spam rows
ham_mask  = email_labels == 0  # Boolean mask: True for ham rows

# Plot spam emails as red X markers.
ax1.scatter(email_features[spam_mask, 0], email_features[spam_mask, 1],
            color='red', marker='x', s=150, linewidth=3, label='Spam', zorder=5)
# Plot ham emails as blue circle markers.
ax1.scatter(email_features[ham_mask, 0], email_features[ham_mask, 1],
            color='blue', marker='o', s=100, label='Ham (not spam)', zorder=5,
            edgecolors='black')

ax1.set_xlabel('Number of Exclamation Marks', fontsize=12)
ax1.set_ylabel('Number of Links', fontsize=12)
ax1.set_title('Spam vs Ham: Exclamation Marks vs Links', fontsize=13)
ax1.legend(fontsize=11)

# --- Plot 2: "Free" count vs Word count ---
ax2 = axes[1]
ax2.scatter(email_features[spam_mask, 2], email_features[spam_mask, 3],
            color='red', marker='x', s=150, linewidth=3, label='Spam', zorder=5)
ax2.scatter(email_features[ham_mask, 2], email_features[ham_mask, 3],
            color='blue', marker='o', s=100, label='Ham (not spam)', zorder=5,
            edgecolors='black')

ax2.set_xlabel('"Free" Word Count', fontsize=12)
ax2.set_ylabel('Email Length (words)', fontsize=12)
ax2.set_title('Spam vs Ham: "Free" Count vs Email Length', fontsize=13)
ax2.legend(fontsize=11)

plt.tight_layout()
plt.show()

print("\nNotice the clear separation between spam (red X) and ham (blue dots):")
print("  - Spam emails tend to have MORE exclamation marks, links, and 'free' mentions")
print("  - Legitimate emails tend to be LONGER and have fewer spammy features")
print("  - The ML model learns these patterns automatically from labeled data!")

---

### Summary of Slide 3: Examples of Supervised Learning

| Key Insight | Explanation |
|-------------|-------------|
| Supervised learning is everywhere | From email (spam detection) to healthcare (protein classification) to finance (fraud detection) |
| The pattern is always the same | Collect labeled data, train a model, predict on new data |
| Different inputs, same idea | Whether the input is an image, text, audio, or numbers, the supervised learning framework is identical |
| Labels are the key | What makes it "supervised" is that we have correct answers to learn from |

---
---

# SLIDE 4: Supervised Learning Workflow

---

Now that we understand WHAT supervised learning is and WHERE it is used, let us understand HOW it works step by step.

## 4.1 The Workflow Diagram

The lecture slide shows this workflow:

```
                 TRAINING PHASE
    ┌──────────────────────────────────────┐
    │                                      │
    │  Labeled Data ──┐                    │
    │                 ├──> Machine ──> ML Model
    │  Labels ────────┘   (Learning         │
    │                      Algorithm)       │
    └──────────────────────────────────────┘
                                            
                 PREDICTION PHASE            
    ┌──────────────────────────────────────┐
    │                                      │
    │  Test Data ──> ML Model ──> Predictions
    │                (trained)              │
    └──────────────────────────────────────┘
```

## 4.2 Each Step Explained in Detail

### Step 1: Labeled Data + Labels

- **What**: We gather a dataset where every example has a known correct answer.
- **Example**: 10,000 emails, each labeled as "spam" or "not spam" by a human.
- **Why this step matters**: Without labeled data, the model has nothing to learn from. This is often the most time-consuming and expensive step!

### Step 2: Machine (Learning Algorithm)

- **What**: A mathematical algorithm that examines the labeled data and discovers patterns.
- **Example**: The algorithm notices that emails with many exclamation marks and the word "free" tend to be spam.
- **Why this step matters**: This is where the actual "learning" happens. Different algorithms find patterns in different ways.

### Step 3: ML Model (The Trained Model)

- **What**: The result of training. It is a mathematical function that can make predictions.
- **Example**: A function that takes email features as input and outputs "spam" or "not spam".
- **Why this step matters**: The model encodes everything the algorithm learned. It can be saved and reused.

### Step 4: Test Data (New, Unseen Data)

- **What**: New examples that the model has NEVER seen during training.
- **Example**: New emails arriving in your inbox right now.
- **Why this step matters**: This is the real test -- can the model generalize to data it was not trained on?

### Step 5: Predictions

- **What**: The model's output for the test data.
- **Example**: "This new email is spam" or "This new email is not spam".
- **Why this step matters**: The whole point of the process! We evaluate how accurate these predictions are.

---

### Real-World Analogy: A Cooking School

| ML Workflow Step | Cooking Analogy |
|-----------------|----------------|
| Labeled Data + Labels | Recipe books with photos (input = ingredients, label = dish name) |
| Machine (Learning Algorithm) | The cooking school curriculum (the process of learning) |
| ML Model | The trained chef (has internalized the patterns) |
| Test Data | New ingredients the chef has never used before |
| Predictions | The dish the chef creates from new ingredients |

---

In [None]:
# =============================================================================
# VISUALIZATION 4.1: Draw the Supervised Learning Workflow
# =============================================================================
# We will create a diagram that matches the slide's workflow.
# =============================================================================

fig, ax = plt.subplots(figsize=(14, 8))

# Helper function to draw a box with text.
def draw_box(ax, x, y, width, height, text, color, fontsize=12, text_color='white'):
    """Draws a rounded rectangle with centered text."""
    import matplotlib.patches as mpatches
    # FancyBboxPatch creates a box with rounded corners.
    box = mpatches.FancyBboxPatch(
        (x - width/2, y - height/2), width, height,
        boxstyle="round,pad=0.1", facecolor=color,
        edgecolor='black', linewidth=2
    )
    ax.add_patch(box)
    ax.text(x, y, text, ha='center', va='center',
            fontsize=fontsize, fontweight='bold', color=text_color)

# --- TRAINING PHASE (top half) ---
ax.text(0.5, 0.95, 'TRAINING PHASE', ha='center', va='center',
        fontsize=18, fontweight='bold', color='darkblue',
        bbox=dict(boxstyle='round', facecolor='lightyellow', edgecolor='darkblue', linewidth=2))

draw_box(ax, 0.12, 0.78, 0.18, 0.1, 'Labeled\nData', '#4CAF50')
draw_box(ax, 0.12, 0.62, 0.18, 0.1, 'Labels', '#4CAF50')
draw_box(ax, 0.42, 0.70, 0.2, 0.12, 'Learning\nAlgorithm', '#2196F3')
draw_box(ax, 0.75, 0.70, 0.18, 0.12, 'ML\nModel', '#FF9800')

# Arrows for training phase.
ax.annotate('', xy=(0.31, 0.74), xytext=(0.22, 0.78),
            arrowprops=dict(arrowstyle='->', lw=2.5, color='black'))
ax.annotate('', xy=(0.31, 0.66), xytext=(0.22, 0.62),
            arrowprops=dict(arrowstyle='->', lw=2.5, color='black'))
ax.annotate('', xy=(0.65, 0.70), xytext=(0.53, 0.70),
            arrowprops=dict(arrowstyle='->', lw=2.5, color='black'))

# --- PREDICTION PHASE (bottom half) ---
ax.text(0.5, 0.42, 'PREDICTION PHASE', ha='center', va='center',
        fontsize=18, fontweight='bold', color='darkred',
        bbox=dict(boxstyle='round', facecolor='lightyellow', edgecolor='darkred', linewidth=2))

draw_box(ax, 0.15, 0.25, 0.18, 0.1, 'Test\nData', '#9C27B0')
draw_box(ax, 0.48, 0.25, 0.22, 0.12, 'Trained\nML Model', '#FF9800')
draw_box(ax, 0.82, 0.25, 0.2, 0.1, 'Predictions', '#F44336')

# Arrows for prediction phase.
ax.annotate('', xy=(0.36, 0.25), xytext=(0.25, 0.25),
            arrowprops=dict(arrowstyle='->', lw=2.5, color='black'))
ax.annotate('', xy=(0.71, 0.25), xytext=(0.60, 0.25),
            arrowprops=dict(arrowstyle='->', lw=2.5, color='black'))

# Arrow connecting training to prediction (the model is reused).
ax.annotate('', xy=(0.53, 0.32), xytext=(0.75, 0.63),
            arrowprops=dict(arrowstyle='->', lw=2, color='gray',
                           linestyle='dashed'))
ax.text(0.72, 0.48, 'Model is\nreused', ha='center', va='center',
        fontsize=10, color='gray', fontstyle='italic')

# Clean up.
ax.set_xlim(0, 1)
ax.set_ylim(0.1, 1.05)
ax.axis('off')
ax.set_title('Supervised Learning Workflow (Slide 4)', fontsize=20, fontweight='bold', pad=10)

plt.tight_layout()
plt.show()

In [None]:
# =============================================================================
# EXAMPLE 4.2: Full supervised learning workflow in code
# =============================================================================
# We will demonstrate the COMPLETE workflow end-to-end using
# the famous Iris dataset (classifying flowers based on measurements).
# =============================================================================

# Import the dataset and tools we need.
from sklearn.datasets import load_iris                  # A famous labeled dataset
from sklearn.model_selection import train_test_split     # Splits data into train/test
from sklearn.neighbors import KNeighborsClassifier       # A simple ML algorithm
from sklearn.metrics import accuracy_score               # Measures prediction quality

# ---- STEP 1: Get labeled data ----
# The Iris dataset has measurements of 150 iris flowers.
# Each flower is labeled as one of 3 species: setosa, versicolor, virginica.
iris = load_iris()

# X = inputs (features): sepal length, sepal width, petal length, petal width
X = iris.data
# y = labels (targets): 0 = setosa, 1 = versicolor, 2 = virginica
y = iris.target

print("==== STEP 1: Labeled Data ====")
print(f"Total number of samples: {len(X)}")
print(f"Number of features per sample: {X.shape[1]}")
print(f"Feature names: {iris.feature_names}")
print(f"Class names: {list(iris.target_names)}")
print(f"\nFirst 5 samples (inputs):")
for i in range(5):
    print(f"  x_{i+1} = {X[i]}  -->  y_{i+1} = {iris.target_names[y[i]]}")
print()

In [None]:
# ---- STEP 2: Split data into training set and test set ----
# WHY split? We need to keep some data the model has NEVER seen,
# so we can honestly evaluate how well it generalizes.
#
# train_test_split randomly divides the data:
#   - 80% for training (the model learns from these)
#   - 20% for testing (the model has never seen these)
#
# test_size=0.2 means 20% goes to the test set.
# random_state=42 ensures reproducible results.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("==== STEP 2: Split into Training and Test Sets ====")
print(f"Training samples: {len(X_train)} (used to LEARN)")
print(f"Test samples:     {len(X_test)} (used to EVALUATE, model never sees these)")
print()

In [None]:
# ---- STEP 3: Choose a learning algorithm and train the model ----
# We use K-Nearest Neighbors (KNN), one of the simplest ML algorithms.
# KNN works by finding the K closest training examples to a new input,
# and predicting the most common label among those neighbors.
#
# Think of it like this: if you don't know what a flower is,
# look at the 5 most similar flowers you've seen before, and go with
# whatever species they mostly are.

# Create the model with k=5 (look at 5 nearest neighbors).
knn_model = KNeighborsClassifier(n_neighbors=5)

# Train the model. .fit() is the standard method name for training in sklearn.
# This step is where the "Machine" (learning algorithm) processes the
# "Labeled Data + Labels" from the workflow diagram.
knn_model.fit(X_train, y_train)

print("==== STEP 3: Learning Algorithm --> ML Model ====")
print("Algorithm: K-Nearest Neighbors (k=5)")
print("The model has been trained. It has studied the 120 training examples.")
print("It is now ready to make predictions on new data.")
print()

In [None]:
# ---- STEP 4: Feed test data to the trained model ----
# The test data (X_test) represents new, unseen flowers.
# The model predicts the species for each one.

# .predict() makes predictions for all test samples at once.
y_predicted = knn_model.predict(X_test)

print("==== STEP 4: Test Data --> Trained ML Model --> Predictions ====")
print()
print(f"{'Sample':<8} {'True Label':<15} {'Predicted Label':<18} {'Correct?':<10}")
print("=" * 55)
# Show predictions for the first 15 test samples.
for i in range(min(15, len(y_test))):
    true_name = iris.target_names[y_test[i]]       # The actual species
    pred_name = iris.target_names[y_predicted[i]]   # The model's guess
    correct = "Yes" if y_test[i] == y_predicted[i] else "No"
    print(f"  {i+1:<6} {true_name:<15} {pred_name:<18} {correct:<10}")
print("  ...")
print()

In [None]:
# ---- STEP 5: Evaluate the predictions ----
# We compare the model's predictions (y_predicted) to the actual labels (y_test).
# accuracy_score calculates what percentage of predictions are correct.

accuracy = accuracy_score(y_test, y_predicted)

print("==== STEP 5: Evaluate Predictions ====")
print(f"Number of correct predictions: {sum(y_test == y_predicted)} out of {len(y_test)}")
print(f"Accuracy: {accuracy * 100:.1f}%")
print()
if accuracy >= 0.9:
    print("Excellent! The model correctly classified most unseen flowers.")
    print("This shows the model GENERALIZED well from the training data.")
else:
    print("The model could use improvement, but the workflow concept is the same.")

print()
print("=" * 60)
print("COMPLETE WORKFLOW RECAP:")
print("  1. Labeled Data: 150 iris flowers with known species")
print("  2. Split: 120 for training, 30 for testing")
print("  3. Machine: K-Nearest Neighbors algorithm learns patterns")
print("  4. ML Model: The trained KNN model")
print("  5. Test Data --> Model --> Predictions (evaluated for accuracy)")
print("=" * 60)

In [None]:
# =============================================================================
# VISUALIZATION 4.2: Visualize the Iris dataset and predictions
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# --- Plot 1: Training data (what the model learned from) ---
ax1 = axes[0]
# We plot using the first two features: sepal length (column 0) and sepal width (column 1)
colors = ['green', 'blue', 'orange']
for class_idx in range(3):
    # Select rows where the label equals class_idx
    mask = y_train == class_idx
    ax1.scatter(X_train[mask, 0], X_train[mask, 1],
               color=colors[class_idx], label=iris.target_names[class_idx],
               s=80, edgecolors='black', alpha=0.7)

ax1.set_xlabel(iris.feature_names[0], fontsize=12)
ax1.set_ylabel(iris.feature_names[1], fontsize=12)
ax1.set_title('Training Data (Model Learned From These)', fontsize=14)
ax1.legend(fontsize=11)

# --- Plot 2: Test data predictions ---
ax2 = axes[1]
for class_idx in range(3):
    # Color by PREDICTED labels to show model's predictions.
    mask = y_predicted == class_idx
    ax2.scatter(X_test[mask, 0], X_test[mask, 1],
               color=colors[class_idx], label=iris.target_names[class_idx],
               s=80, edgecolors='black', alpha=0.7)

# Mark incorrect predictions with red X markers.
incorrect_mask = y_test != y_predicted
if any(incorrect_mask):
    ax2.scatter(X_test[incorrect_mask, 0], X_test[incorrect_mask, 1],
               color='red', marker='x', s=200, linewidth=3,
               label='WRONG prediction', zorder=10)

ax2.set_xlabel(iris.feature_names[0], fontsize=12)
ax2.set_ylabel(iris.feature_names[1], fontsize=12)
ax2.set_title('Test Data Predictions (Model Never Saw These)', fontsize=14)
ax2.legend(fontsize=11)

plt.tight_layout()
plt.show()

print("Left: The training data the model learned from (labeled examples).")
print("Right: Predictions on new, unseen test data.")
print("  Colors show predicted species. Red X marks any wrong predictions.")

---

### Summary of Slide 4: Supervised Learning Workflow

| Step | What Happens | Code Equivalent |
|------|-------------|----------------|
| 1. Collect Labeled Data | Gather examples with known correct answers | `X, y = load_iris().data, load_iris().target` |
| 2. Split Data | Separate into training and test sets | `train_test_split(X, y)` |
| 3. Train (Learn) | Algorithm studies training data, finds patterns | `model.fit(X_train, y_train)` |
| 4. Predict | Feed new data to the trained model | `y_pred = model.predict(X_test)` |
| 5. Evaluate | Compare predictions to true answers | `accuracy_score(y_test, y_pred)` |

**Key takeaway**: Every supervised learning project follows this same 5-step workflow, regardless of the application.

---
---

# SLIDE 5: Types of Supervised Learning

---

There are two main types of supervised learning, based on what kind of thing we are trying to predict:

## 5.1 The Two Types at a Glance

| | **Regression** | **Classification** |
|--|----------------|--------------------|
| **What we predict** | A **continuous** number | A **discrete** category/label |
| **Output type** | Any real number (e.g., 72.5, 84.0, -3.14) | A category from a fixed set (e.g., "hot", "cold") |
| **Example question** | "What will the temperature be tomorrow?" | "Will it be hot or cold tomorrow?" |
| **Example answer** | **84 degrees Fahrenheit** | **Hot** |
| **Think of it as** | Predicting "how much" or "how many" | Predicting "which one" or "what type" |
| **Output is on a** | Number line (infinite possibilities) | List of options (finite possibilities) |

---

## 5.2 Regression: Predicting Continuous Values

> **Regression** predicts a **continuous numerical value**. The output can be any number on a continuous scale.

### What does "continuous" mean?

A continuous value can take **any** value within a range, including decimals. There are infinitely many possible outputs.

- Temperature: 72.0, 72.1, 72.15, 72.153, ... (infinite possibilities)
- House price: $150,000, $150,001, $150,000.50, ...
- A person's height: 5.5 feet, 5.51 feet, 5.512 feet, ...

### Examples of Regression Problems

| Problem | Input | Output (continuous) |
|---------|-------|--------------------|
| House price prediction | Size, location, bedrooms | Price in dollars |
| Temperature forecasting | Past temperatures, humidity | Temperature in °F |
| Stock price prediction | Historical prices, news | Price per share |
| Age estimation | Photo of a face | Age in years |
| Sales forecasting | Past sales, season, ads | Revenue in dollars |

### Lecture Slide Example

From the slide: "What will the temperature be tomorrow?" Answer: **84°F**

This is regression because 84 is a specific number on a continuous scale. It could have been 83.5 or 84.2 -- any value is possible.

### Real-World Analogy: A Thermometer

Regression is like reading a thermometer -- the mercury can stop at ANY point along the scale. It is not limited to specific fixed marks.

---

In [None]:
# =============================================================================
# EXAMPLE 5.2: Regression -- Predicting Temperature
# =============================================================================
# Let's build a regression model that predicts tomorrow's temperature
# based on today's temperature and humidity.
# =============================================================================

# --- Step 1: Create labeled data ---
# Each sample: [today's temperature (°F), humidity (%)]
# Label: tomorrow's temperature (°F) -- a CONTINUOUS value

# We generate synthetic (made-up but realistic) data.
np.random.seed(42)  # For reproducibility

n_samples = 100  # 100 days of historical weather data

# Today's temperature: random values between 30°F and 100°F
todays_temp = np.random.uniform(30, 100, n_samples)

# Humidity: random values between 20% and 90%
humidity = np.random.uniform(20, 90, n_samples)

# Tomorrow's temperature (the label):
# In reality this is complex, but we simulate it as:
# tomorrow = 0.85 * today + 0.05 * humidity + some randomness
tomorrows_temp = 0.85 * todays_temp + 0.05 * humidity + np.random.normal(0, 5, n_samples)

# Combine features into a 2D array (each row = [today_temp, humidity]).
X_weather = np.column_stack([todays_temp, humidity])
y_weather = tomorrows_temp

print("Weather Regression Dataset:")
print(f"  Number of samples: {n_samples}")
print(f"  Features: today's temperature, humidity")
print(f"  Label: tomorrow's temperature (continuous!)")
print()
print("First 10 samples:")
print(f"{'Today Temp (°F)':<18} {'Humidity (%)':<15} {'Tomorrow Temp (°F)':<20}")
print("=" * 55)
for i in range(10):
    print(f"  {todays_temp[i]:>8.1f}          {humidity[i]:>8.1f}        {tomorrows_temp[i]:>8.1f}")

print()
print("Notice: Tomorrow's temperature is a CONTINUOUS number (not a category).")
print("It can be 72.3, 84.7, 56.1 -- any value. This is a REGRESSION problem.")

In [None]:
# =============================================================================
# EXAMPLE 5.2b: Train a regression model and visualize
# =============================================================================

from sklearn.linear_model import LinearRegression

# Split data: 80% training, 20% testing.
X_w_train, X_w_test, y_w_train, y_w_test = train_test_split(
    X_weather, y_weather, test_size=0.2, random_state=42
)

# Create and train a linear regression model.
reg_model = LinearRegression()
reg_model.fit(X_w_train, y_w_train)

# Make predictions on the test set.
y_w_pred = reg_model.predict(X_w_test)

# Show some predictions vs actual values.
print("Regression Predictions vs Actual Values:")
print(f"{'Actual Temp (°F)':<20} {'Predicted Temp (°F)':<22} {'Difference':<12}")
print("=" * 55)
for i in range(10):
    diff = y_w_pred[i] - y_w_test[i]
    print(f"  {y_w_test[i]:>10.1f}          {y_w_pred[i]:>10.1f}           {diff:>+6.1f}")

# Calculate the mean squared error (how far off predictions are on average).
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_w_test, y_w_pred)
r2 = r2_score(y_w_test, y_w_pred)

print()
print(f"Mean Squared Error: {mse:.2f} (lower is better)")
print(f"R-squared Score:    {r2:.3f} (closer to 1.0 is better)")
print()
print("The model predicts a CONTINUOUS number (e.g., 84.3°F), not a category.")
print("This is what makes it REGRESSION.")

In [None]:
# =============================================================================
# VISUALIZATION 5.2: Regression results
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# --- Plot 1: Today's temp vs Tomorrow's temp (with regression line) ---
ax1 = axes[0]
ax1.scatter(todays_temp, tomorrows_temp, color='steelblue', alpha=0.5, s=40,
            edgecolors='navy', label='Data points')

# Draw the trend line.
x_line_temp = np.linspace(25, 105, 100).reshape(-1, 1)
# For the line, we use average humidity.
x_line_with_humidity = np.column_stack([x_line_temp, np.full(100, humidity.mean())])
y_line_temp = reg_model.predict(x_line_with_humidity)
ax1.plot(x_line_temp, y_line_temp, color='red', linewidth=2.5, label='Regression line')

ax1.set_xlabel("Today's Temperature (°F)", fontsize=13)
ax1.set_ylabel("Tomorrow's Temperature (°F)", fontsize=13)
ax1.set_title('REGRESSION: Predicting Continuous Values', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)

# --- Plot 2: Predicted vs Actual (perfect predictions would lie on the diagonal) ---
ax2 = axes[1]
ax2.scatter(y_w_test, y_w_pred, color='steelblue', s=60, edgecolors='navy', alpha=0.7)

# Draw the "perfect prediction" line (where predicted = actual).
min_val = min(y_w_test.min(), y_w_pred.min()) - 5
max_val = max(y_w_test.max(), y_w_pred.max()) + 5
ax2.plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect prediction')

ax2.set_xlabel('Actual Temperature (°F)', fontsize=13)
ax2.set_ylabel('Predicted Temperature (°F)', fontsize=13)
ax2.set_title('Predicted vs Actual (closer to red line = better)', fontsize=14)
ax2.legend(fontsize=11)

plt.tight_layout()
plt.show()

print("Left: How today's temperature relates to tomorrow's (the regression pattern).")
print("Right: Model's predictions vs reality. Points on the red line = perfect.")
print("\nKey point: The output is a CONTINUOUS NUMBER, not a category.")

---

## 5.3 Classification: Predicting Discrete Labels

> **Classification** predicts a **discrete category** (also called a "class" or "label"). The output is one choice from a fixed set of options.

### What does "discrete" mean?

A discrete value comes from a **fixed, countable set** of options. There is no "in between".

- Weather category: "Hot", "Cold", "Mild" (exactly 3 options, nothing in between)
- Email type: "Spam" or "Not Spam" (exactly 2 options)
- Animal type: "Cat", "Dog", "Bird" (exactly 3 options)

### Subtypes of Classification

| Subtype | # of Categories | Example |
|---------|-----------------|---------|
| **Binary Classification** | Exactly 2 | Spam vs. Not Spam; Fraud vs. Legitimate |
| **Multi-class Classification** | 3 or more | Cat vs. Dog vs. Bird; Setosa vs. Versicolor vs. Virginica |

### Examples of Classification Problems

| Problem | Input | Output (discrete) |
|---------|-------|--------------------|
| Spam detection | Email text | "spam" or "not spam" |
| Disease diagnosis | Patient symptoms | "positive" or "negative" |
| Handwriting recognition | Image of a digit | 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9 |
| Sentiment analysis | Product review text | "positive", "neutral", or "negative" |
| Weather type | Temperature, humidity | "hot" or "cold" |

### Lecture Slide Example

From the slide: "Will it be hot or cold tomorrow?" Answer: **Hot**

This is classification because the answer is one of two categories ("hot" or "cold"), not a specific number. You cannot say "it will be 73% hot" -- it is either hot or cold.

### Real-World Analogy: Sorting Mail into Boxes

Classification is like a mail room worker sorting letters into labeled boxes: "Bills", "Personal", "Junk". Each letter goes into exactly ONE box. There is a fixed number of boxes (categories), and the worker must pick one.

---

In [None]:
# =============================================================================
# EXAMPLE 5.3: Classification -- Predicting Hot or Cold
# =============================================================================
# Using the SAME weather data, but now we predict a CATEGORY
# ("hot" or "cold") instead of a specific temperature number.
# This perfectly matches the lecture slide example!
# =============================================================================

# --- Step 1: Convert continuous labels to discrete categories ---
# We define: if tomorrow's temp >= 70°F, it's "hot" (1); otherwise "cold" (0).
# This is how we turn a regression problem into a classification problem.

THRESHOLD = 70  # degrees Fahrenheit

# Create classification labels from the continuous temperature values.
# np.where(condition, value_if_true, value_if_false)
y_class = np.where(tomorrows_temp >= THRESHOLD, 1, 0)
# 1 = "hot", 0 = "cold"

class_names = ['Cold (< 70°F)', 'Hot (>= 70°F)']

print("Classification Dataset (same data, different labels):")
print(f"  Threshold: {THRESHOLD}°F")
print(f"  'Hot' samples:  {sum(y_class == 1)}")
print(f"  'Cold' samples: {sum(y_class == 0)}")
print()
print("Compare REGRESSION vs CLASSIFICATION labels:")
print(f"{'Tomorrow Temp (°F)':<22} {'Regression Label':<20} {'Classification Label'}")
print("=" * 65)
for i in range(8):
    temp = tomorrows_temp[i]
    reg_label = f"{temp:.1f}°F"
    cls_label = "HOT" if y_class[i] == 1 else "COLD"
    print(f"  {temp:>10.1f}              {reg_label:<20} {cls_label}")

print()
print("See the difference?")
print("  Regression:     the label is a specific number like 84.3°F")
print("  Classification: the label is a category like 'HOT' or 'COLD'")

In [None]:
# =============================================================================
# EXAMPLE 5.3b: Train a classification model
# =============================================================================

from sklearn.linear_model import LogisticRegression

# Split data (same features, but using classification labels now).
X_c_train, X_c_test, y_c_train, y_c_test = train_test_split(
    X_weather, y_class, test_size=0.2, random_state=42
)

# Create and train a logistic regression model.
# Despite the name, Logistic Regression is a CLASSIFICATION algorithm!
# It predicts probabilities and then assigns a category.
clf_model = LogisticRegression(random_state=42)
clf_model.fit(X_c_train, y_c_train)

# Make predictions.
y_c_pred = clf_model.predict(X_c_test)

# Show predictions.
print("Classification Predictions:")
print(f"{'Actual':<12} {'Predicted':<12} {'Correct?'}")
print("=" * 40)
for i in range(min(15, len(y_c_test))):
    actual = class_names[y_c_test[i]]
    predicted = class_names[y_c_pred[i]]
    correct = "Yes" if y_c_test[i] == y_c_pred[i] else "No"
    print(f"  {actual:<20} {predicted:<20} {correct}")
print("  ...")

# Accuracy.
clf_accuracy = accuracy_score(y_c_test, y_c_pred)
print(f"\nClassification Accuracy: {clf_accuracy * 100:.1f}%")
print()
print("The model outputs a CATEGORY ('Hot' or 'Cold'), NOT a number.")
print("This is what makes it CLASSIFICATION.")

In [None]:
# =============================================================================
# VISUALIZATION 5.3: Classification decision boundary
# =============================================================================
# This visualization shows how the classification model divides the space
# into "Hot" and "Cold" regions.
# =============================================================================

fig, ax = plt.subplots(figsize=(10, 7))

# Create a mesh grid to visualize the decision boundary.
# We create a grid of points covering the entire feature space,
# then predict the class for each point to see where the boundary is.
x_min, x_max = X_weather[:, 0].min() - 5, X_weather[:, 0].max() + 5
y_min, y_max = X_weather[:, 1].min() - 5, X_weather[:, 1].max() + 5

# np.meshgrid creates a grid of all (x, y) combinations.
xx, yy = np.meshgrid(
    np.linspace(x_min, x_max, 200),  # 200 points along x-axis
    np.linspace(y_min, y_max, 200)   # 200 points along y-axis
)

# Predict the class for every point in the grid.
# np.c_ concatenates columns side by side.
# .ravel() flattens a 2D array to 1D.
grid_points = np.c_[xx.ravel(), yy.ravel()]
Z = clf_model.predict(grid_points).reshape(xx.shape)

# Draw the decision regions as colored backgrounds.
# contourf fills the areas with colors.
ax.contourf(xx, yy, Z, alpha=0.3, levels=[-0.5, 0.5, 1.5],
            colors=['#3498db', '#e74c3c'])  # Blue for cold, red for hot

# Plot the actual data points on top.
cold_mask = y_class == 0
hot_mask  = y_class == 1

ax.scatter(X_weather[cold_mask, 0], X_weather[cold_mask, 1],
           color='blue', s=60, edgecolors='black', alpha=0.7, label='Cold (actual)')
ax.scatter(X_weather[hot_mask, 0], X_weather[hot_mask, 1],
           color='red', s=60, edgecolors='black', alpha=0.7, label='Hot (actual)')

ax.set_xlabel("Today's Temperature (°F)", fontsize=13)
ax.set_ylabel('Humidity (%)', fontsize=13)
ax.set_title('CLASSIFICATION: Hot vs Cold Decision Boundary', fontsize=15, fontweight='bold')
ax.legend(fontsize=12, loc='upper left')

plt.tight_layout()
plt.show()

print("The colored background shows the model's decision regions:")
print("  Blue region: model predicts 'COLD'")
print("  Red region:  model predicts 'HOT'")
print("  The boundary between them is the 'decision boundary'.")
print("  New data points are classified based on which region they fall in.")

---

## 5.4 Regression vs Classification: Side-by-Side Comparison

This is one of the most important distinctions in supervised learning. Let us make it crystal clear.

| Aspect | Regression | Classification |
|--------|-----------|----------------|
| **Output type** | Continuous number | Discrete category |
| **Example output** | 84.3°F | "Hot" |
| **Possible outputs** | Infinite (any real number) | Finite (from a fixed set) |
| **Question format** | "How much?" / "How many?" | "Which one?" / "What type?" |
| **Slide example** | Temperature = 84°F | Hot or Cold? = Hot |
| **Error measurement** | Distance (how far off?) | Correct or incorrect |
| **Common algorithms** | Linear Regression, Ridge, Lasso | Logistic Regression, SVM, Decision Trees |
| **Visualization** | Best-fit line through points | Decision boundary separating regions |

### The Same Data, Two Different Questions

Notice that we used the **exact same weather data** for both regression and classification! The difference is in the **question we ask**:

- **Regression question**: "What will the exact temperature be tomorrow?" --> 84°F
- **Classification question**: "Will it be hot or cold tomorrow?" --> Hot

The type of question determines whether you use regression or classification.

---

In [None]:
# =============================================================================
# VISUALIZATION 5.4: Side-by-side comparison of Regression vs Classification
# =============================================================================
# This is the KEY visual that ties Slide 5 together.
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ---- LEFT: REGRESSION ----
ax1 = axes[0]

# Plot the data.
ax1.scatter(todays_temp, tomorrows_temp, color='steelblue', alpha=0.5,
            s=40, edgecolors='navy')

# Plot the regression line.
x_plot = np.linspace(25, 105, 100).reshape(-1, 1)
x_plot_full = np.column_stack([x_plot, np.full(100, humidity.mean())])
y_plot = reg_model.predict(x_plot_full)
ax1.plot(x_plot, y_plot, color='red', linewidth=2.5, label='Regression line')

# Highlight a prediction.
ax1.scatter([80], [reg_model.predict([[80, 55]])], color='gold', s=300,
            marker='*', zorder=10, edgecolors='black', linewidth=2)
ax1.annotate(f'Prediction: {reg_model.predict([[80, 55]])[0]:.1f}°F',
             xy=(80, reg_model.predict([[80, 55]])[0]),
             xytext=(45, reg_model.predict([[80, 55]])[0] + 12),
             fontsize=12, fontweight='bold', color='darkred',
             arrowprops=dict(arrowstyle='->', color='darkred', lw=2))

ax1.set_xlabel("Today's Temperature (°F)", fontsize=13)
ax1.set_ylabel("Tomorrow's Temperature (°F)", fontsize=13)
ax1.set_title('REGRESSION\n"What will the temperature be?"\nAnswer: A specific NUMBER (e.g., 84°F)',
              fontsize=13, fontweight='bold', color='darkblue')
ax1.legend(fontsize=11)

# ---- RIGHT: CLASSIFICATION ----
ax2 = axes[1]

# Plot hot and cold points.
cold_m = y_class == 0
hot_m = y_class == 1
ax2.scatter(todays_temp[cold_m], humidity[cold_m],
            color='blue', s=60, edgecolors='black', alpha=0.7, label='Cold')
ax2.scatter(todays_temp[hot_m], humidity[hot_m],
            color='red', s=60, edgecolors='black', alpha=0.7, label='Hot')

# Highlight a prediction.
new_point_class = clf_model.predict([[80, 55]])[0]
label_text = "HOT" if new_point_class == 1 else "COLD"
ax2.scatter([80], [55], color='gold', s=300, marker='*', zorder=10,
            edgecolors='black', linewidth=2)
ax2.annotate(f'Prediction: {label_text}',
             xy=(80, 55), xytext=(45, 75),
             fontsize=12, fontweight='bold', color='darkred',
             arrowprops=dict(arrowstyle='->', color='darkred', lw=2))

ax2.set_xlabel("Today's Temperature (°F)", fontsize=13)
ax2.set_ylabel('Humidity (%)', fontsize=13)
ax2.set_title('CLASSIFICATION\n"Will it be hot or cold?"\nAnswer: A CATEGORY (Hot or Cold)',
              fontsize=13, fontweight='bold', color='darkred')
ax2.legend(fontsize=12)

plt.tight_layout()
plt.show()

print("=" * 70)
print("LEFT (Regression):     Output is a specific NUMBER  (e.g., 84.0°F)")
print("RIGHT (Classification): Output is a CATEGORY/LABEL  (e.g., 'Hot')")
print("=" * 70)
print("\nSame data. Same features. DIFFERENT type of prediction.")
print("The type of question you ask determines which approach to use!")

In [None]:
# =============================================================================
# INTERACTIVE EXERCISE: Test Your Understanding
# =============================================================================
# For each problem below, decide if it is REGRESSION or CLASSIFICATION.
# Then run the cell to see the answers!
# =============================================================================

problems = [
    ("Predict the price of a used car", "Regression",
     "Price is a continuous number ($15,230.50, $22,000, etc.)"),
    ("Predict if a tumor is benign or malignant", "Classification",
     "Only 2 categories: benign or malignant"),
    ("Predict how many inches of rain will fall", "Regression",
     "Rainfall is a continuous number (2.5 inches, 0.3 inches, etc.)"),
    ("Predict which genre a movie belongs to", "Classification",
     "Fixed categories: action, comedy, drama, horror, etc."),
    ("Predict a student's final exam score", "Regression",
     "Score is a continuous number (92.5, 78.3, etc.)"),
    ("Predict if a customer will buy a product", "Classification",
     "Only 2 categories: buy or not buy"),
    ("Predict the number of visitors to a website tomorrow", "Regression",
     "Visitor count is a number (though whole numbers, treated as continuous)"),
    ("Predict which digit (0-9) is in a handwritten image", "Classification",
     "Exactly 10 fixed categories: 0, 1, 2, ..., 9"),
]

print("QUIZ: Regression or Classification?")
print("=" * 75)
print()

for i, (problem, answer, explanation) in enumerate(problems, 1):
    print(f"  {i}. {problem}")
    print(f"     --> ANSWER: {answer}")
    print(f"     --> WHY:    {explanation}")
    print()

---

### Summary of Slide 5: Types of Supervised Learning

| Type | Predicts | Example from Lecture | Key Question |
|------|----------|---------------------|-------------|
| **Regression** | Continuous values (numbers) | Temperature = 84°F | "How much?" |
| **Classification** | Discrete categories (labels) | Hot or Cold = Hot | "Which one?" |

**Remember**: The type of output (number vs. category) determines whether it is regression or classification. The same input data can be used for either -- it depends on the question you ask!

---
---

# Final Summary: Lecture 2, Slides 2-5

---

## Everything We Learned, All in One Place

### Slide 2: What is Supervised Learning?

- **Supervised learning** = learning from **labeled data** (data with known correct answers)
- Data is organized as pairs: $(x_i, y_i)$ where $x_i$ is the input and $y_i$ is the label
- The **goal** is to predict the correct label $y$ for a brand-new, unseen input $x$
- **Deep learning** uses neural networks (input layer --> hidden layers --> output layer) to make predictions

### Slide 3: Examples of Supervised Learning

Supervised learning powers many real-world applications:
- Image Classification, Document Categorization, Speech Recognition
- Protein Classification, Spam Detection, Branch Prediction
- Fraud Detection, NLP, Playing Games, Computational Advertising
- In every case: learn from labeled examples, then predict on new data

### Slide 4: Supervised Learning Workflow

Every supervised learning project follows the same pipeline:
1. **Labeled Data + Labels** --> 2. **Learning Algorithm** --> 3. **Trained ML Model** --> 4. **Test Data** --> 5. **Predictions**
- In code: `model.fit(X_train, y_train)` then `model.predict(X_test)`

### Slide 5: Types of Supervised Learning

| Type | Output | Slide Example |
|------|--------|---------------|
| **Regression** | Continuous number | Temperature = 84°F |
| **Classification** | Discrete category | Hot or Cold = Hot |

---

## Key Vocabulary

| Term | Definition |
|------|------------|
| Supervised Learning | ML where the model learns from labeled data |
| Labeled Data | Data samples where the correct answer (label) is known |
| Input ($x$) / Features | The information fed into the model |
| Output ($y$) / Label / Target | The correct answer the model tries to predict |
| Training | The process of the model learning from labeled data |
| Prediction / Inference | Using the trained model on new, unseen data |
| Regression | Predicting a continuous numerical value |
| Classification | Predicting a discrete category from a fixed set |
| Neural Network | A model with input, hidden, and output layers |
| Deep Learning | ML using neural networks with many layers |
| $\hat{y}$ (y-hat) | The model's predicted value (estimate of the true $y$) |

---

## What is Next?

Now that you understand the fundamentals of supervised learning, the next lectures will dive deeper into:
- Specific algorithms (how does a neural network actually learn?)
- Loss functions (how do we measure prediction errors?)
- Training techniques (gradient descent, backpropagation)
- And eventually: how Large Language Models (LLMs) use these foundations!

**Congratulations on completing your first steps into machine learning!**