# Lab 05: Feature Spaces & Separability

**ING3513 - Introduction to Artificial Intelligence and Machine Learning**

In Lab 04, we learned that bad data beats good models. Now we'll explore another fundamental concept: **how do machine learning algorithms "see" your data?**

**What you'll learn:**

- Feature spaces ‚Äî the geometric world where ML algorithms operate
- Linear separability ‚Äî when classes can be divided by a line or plane
- Why some features help classification and others don't
- How a simple perceptron learns by adjusting weights
- The limits of linear models (and why we need more powerful ones)

**The scenario:** A lumber mill wants to automate sorting wooden blocks into Pine (P) or Birch (B) using sensors that measure physical properties.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Plotly for interactive visualizations
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# ipywidgets for interactive sliders
import ipywidgets as widgets
from IPython.display import display

# sklearn for the non-linear classifier demo
from sklearn.svm import SVC

# Configure plotting style
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (10, 6)

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries loaded successfully!")

## 1. The Wood Classification Problem

A lumber mill receives mixed batches of wooden blocks ‚Äî some Pine (P), some Birch (B). Currently, workers sort them by hand, but management wants to automate this using machine learning.

**What is wood grain?** When you look at a piece of wood, you see patterns of lines ‚Äî these are the _grain_, formed by the tree's annual growth rings. Different wood types have distinctive grain patterns.

**The sensors measure three properties:**

| Feature              | Symbol | What it measures                                   | Think of it as...                                                  |
| -------------------- | ------ | -------------------------------------------------- | ------------------------------------------------------------------ |
| **Brightness**       | `b`    | Average lightness of the wood surface              | How light or dark the wood looks overall (0=black, 10=white)       |
| **Grain Prominence** | `gp`   | Contrast between light and dark bands in the grain | How much the grain pattern "pops" ‚Äî subtle (0) vs bold stripes (1) |
| **Grain Frequency**  | `f`    | How closely spaced the grain lines are             | Tight/fine grain (high) vs wide/coarse grain (low)                 |

**The question:** Which features should we use to build our classifier?


### 1.1 Generating the Wood Block Dataset

We'll create synthetic data that mimics real measurements. The key insight is:

- **Grain Prominence (gp)** ‚Üí Different for Pine vs Birch (useful!)
- **Brightness (b)** ‚Üí Similar for both wood types (useless alone)
- **Grain Frequency (f)** ‚Üí Similar for both wood types (useless alone)


In [None]:
# Generate synthetic wood block data
np.random.seed(42)
n_samples = 8  # Per class

# PINE: Lower grain prominence
pine_b = np.random.normal(5.5, 1.2, n_samples)  # Brightness: centered around 5.5
pine_gp = np.random.normal(
    0.2, 0.08, n_samples
)  # Grain prominence: centered around 0.2 (low)
pine_f = np.random.normal(0.5, 0.15, n_samples)  # Grain frequency: centered around 0.5

# BIRCH: Higher grain prominence (same brightness and frequency as Pine!)
birch_b = np.random.normal(5.5, 1.2, n_samples)  # Brightness: SAME as pine!
birch_gp = np.random.normal(
    0.6, 0.08, n_samples
)  # Grain prominence: centered around 0.6 (high)
birch_f = np.random.normal(0.5, 0.15, n_samples)  # Grain frequency: same as pine!

# Clip values to valid ranges
pine_b = np.clip(pine_b, 0, 10)
pine_gp = np.clip(pine_gp, 0, 1)
pine_f = np.clip(pine_f, 0, 1)
birch_b = np.clip(birch_b, 0, 10)
birch_gp = np.clip(birch_gp, 0, 1)
birch_f = np.clip(birch_f, 0, 1)

# Create DataFrame
pine_df = pd.DataFrame(
    {
        "brightness": pine_b,
        "grain_prominence": pine_gp,
        "grain_frequency": pine_f,
        "wood_type": "Pine",
    }
)

birch_df = pd.DataFrame(
    {
        "brightness": birch_b,
        "grain_prominence": birch_gp,
        "grain_frequency": birch_f,
        "wood_type": "Birch",
    }
)

wood_data = pd.concat([pine_df, birch_df], ignore_index=True)

print("Wood Block Dataset")
print("=" * 50)
print(f"Total samples: {len(wood_data)} ({n_samples} Pine, {n_samples} Birch)")
print("\nFirst few samples:")
wood_data.head(10)

In [None]:
# Summary statistics by wood type
print("Summary Statistics by Wood Type")
print("=" * 50)
print(
    wood_data.groupby("wood_type")[
        ["brightness", "grain_prominence", "grain_frequency"]
    ]
    .agg(["mean", "std"])
    .round(3)
)

## 2. Watching Clusters Form in Feature Space

**What is a feature space?**

When we measure properties of objects, each object becomes a _point_ in a multi-dimensional space. For 2 features, this is a 2D plane. For 3 features, it's a 3D volume.

**The key insight:** Classification is about finding boundaries in this space that separate different classes.

Let's watch our feature space "fill up" as we collect training data, one sample at a time.


In [None]:
# Create animated scatter plot showing data points appearing one by one
# Using Brightness (b) vs Grain Prominence (gp) - the GOOD features

# Interleave Pine and Birch samples so both appear from the start
pine_samples = wood_data[wood_data["wood_type"] == "Pine"].reset_index(drop=True)
birch_samples = wood_data[wood_data["wood_type"] == "Birch"].reset_index(drop=True)
wood_data_interleaved = pd.concat(
    [
        pine_samples.iloc[[i // 2]] if i % 2 == 0 else birch_samples.iloc[[i // 2]]
        for i in range(len(wood_data))
    ]
).reset_index(drop=True)

# Prepare data for animation - start from frame 2 so both categories exist
frames_data = []
for i in range(2, len(wood_data_interleaved) + 1):  # Start at 2, not 1
    subset = wood_data_interleaved.iloc[:i].copy()
    subset["frame"] = i
    frames_data.append(subset)

animation_df = pd.concat(frames_data, ignore_index=True)

# Create animated scatter plot
fig = px.scatter(
    animation_df,
    x="grain_prominence",
    y="brightness",
    color="wood_type",
    animation_frame="frame",
    range_x=[-0.1, 1.1],
    range_y=[0, 10],
    title="Training the Wood Classifier: Watch Clusters Form",
    labels={
        "grain_prominence": "Grain Prominence (gp)",
        "brightness": "Brightness (b)",
        "wood_type": "Wood Type",
    },
    color_discrete_map={"Pine": "#2E86AB", "Birch": "#A23B72"},
)

fig.update_traces(marker=dict(size=15, line=dict(width=2, color="black")))
fig.update_layout(
    width=700,
    height=550,
    font=dict(size=14),
    legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99),
)

# Slow down the animation
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 600
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 200

fig.show()

### 2.1 Can You Predict the Unknown Sample?

After training, an unknown wood block arrives. The sensors measure:

- **Brightness (b) = 7.5**
- **Grain Prominence (gp) = 0.35**

Based on where this point falls in the feature space, is it **Pine** or **Birch**?


In [None]:
# Show the final training data with the unknown sample X
fig = go.Figure()

# Add Pine samples
pine_data = wood_data[wood_data["wood_type"] == "Pine"]
fig.add_trace(
    go.Scatter(
        x=pine_data["grain_prominence"],
        y=pine_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
        name="Pine",
        showlegend=True,
    )
)

# Add Birch samples
birch_data = wood_data[wood_data["wood_type"] == "Birch"]
fig.add_trace(
    go.Scatter(
        x=birch_data["grain_prominence"],
        y=birch_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
        name="Birch",
        showlegend=True,
    )
)

# Add the unknown sample X
fig.add_trace(
    go.Scatter(
        x=[0.35],
        y=[7.5],
        mode="markers",
        marker=dict(size=20, color="gold", line=dict(width=3, color="red"), symbol="x"),
        name="Unknown (X)",
        showlegend=True,
    )
)

fig.update_layout(
    title="Using the Wood Classifier: What is X?",
    xaxis_title="Grain Prominence (gp)",
    yaxis_title="Brightness (b)",
    xaxis=dict(range=[-0.1, 1.1]),
    yaxis=dict(range=[0, 10]),
    width=700,
    height=550,
    font=dict(size=14),
)

fig.show()

print(
    "\nü§î QUESTION: Based on its position in the feature space, is X more likely Pine or Birch?"
)
print("   (Think about which cluster X is closer to...)")

### 2.2 The Decision Boundary

If the two classes can be separated by a straight line, we say they are **linearly separable**.

In this case, we can draw a line such that:

- All **Pine** samples are on one side
- All **Birch** samples are on the other side

This line is called the **decision boundary**.


In [None]:
# Show the decision boundary (a line that separates the classes)
fig = go.Figure()

# Decision boundary line: vertical line at gp = 0.4
# Since only grain prominence distinguishes the classes, we use a vertical boundary
gp_boundary = 0.4
b_line = np.linspace(0, 10, 100)
gp_line = np.full_like(b_line, gp_boundary)

# Add decision boundary
fig.add_trace(
    go.Scatter(
        x=gp_line,
        y=b_line,
        mode="lines",
        line=dict(color="red", width=3),
        name="Decision Boundary: f(a) = 0",
    )
)

# Add Pine samples
pine_data = wood_data[wood_data["wood_type"] == "Pine"]
fig.add_trace(
    go.Scatter(
        x=pine_data["grain_prominence"],
        y=pine_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
        name="Pine",
    )
)

# Add Birch samples
birch_data = wood_data[wood_data["wood_type"] == "Birch"]
fig.add_trace(
    go.Scatter(
        x=birch_data["grain_prominence"],
        y=birch_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
        name="Birch",
    )
)

# Add annotation for the equation
fig.add_annotation(
    x=0.75,
    y=8,
    text="f(a) = w<sup>T</sup>a + Œ≤",
    showarrow=False,
    font=dict(size=16, color="red"),
    bgcolor="white",
    bordercolor="red",
    borderwidth=2,
)

fig.update_layout(
    title="A Two-Class Wood Classifier (Pine and Birch)",
    xaxis_title="Grain Prominence (gp)",
    yaxis_title="Brightness (b)",
    xaxis=dict(range=[-0.1, 1.1]),
    yaxis=dict(range=[0, 10]),
    width=750,
    height=550,
    font=dict(size=14),
)

fig.show()

print(
    "‚úÖ This data is LINEARLY SEPARABLE ‚Äî a straight line can perfectly divide Pine from Birch!"
)

## 3. The Perceptron: A Simple Learning Machine

How does a machine learning algorithm find this decision boundary? Let's look at one of the simplest models: the **perceptron**.

### The Perceptron Model

The perceptron computes a weighted sum of the input features plus a bias:

$$f(\mathbf{w}, \beta) = \mathbf{w}^T \mathbf{a} + \beta = w_1 \cdot a_1 + w_2 \cdot a_2 + \beta$$

Where:

- $\mathbf{a} = [a_1, a_2]^T$ = input features (grain prominence, brightness)
- $\mathbf{w} = [w_1, w_2]^T$ = weights (learned parameters)
- $\beta$ = bias/intercept (learned parameter)

**Decision rule:**

- If $f(\mathbf{w}, \beta) > 0$ ‚Üí predict **Pine**
- If $f(\mathbf{w}, \beta) < 0$ ‚Üí predict **Birch**

The boundary where $f = 0$ is a straight line!


### 3.1 Be the Learning Algorithm!

Now it's your turn. Adjust the weights ($w_1$, $w_2$) and bias ($\beta$) to find a line that separates Pine from Birch.

**Your goal:** Find values of $w_1$, $w_2$, and $\beta$ such that:

- All Pine samples are on one side of the line (positive side)
- All Birch samples are on the other side (negative side)

**Hints:**

- $w_1$ and $w_2$ control the **slope** (direction) of the line
- $\beta$ controls the **offset** (shifts the line up/down)


In [None]:
# Interactive perceptron tuning with ipywidgets


def plot_perceptron(w1, w2, beta):
    """Plot the decision boundary for given perceptron parameters."""
    fig = go.Figure()

    # Calculate decision boundary line
    # w1 * gp + w2 * b + beta = 0
    # b = (-w1 * gp - beta) / w2  (if w2 != 0)
    gp_range = np.linspace(-0.1, 1.1, 100)

    if abs(w2) > 0.01:
        b_line = (-w1 * gp_range - beta) / w2
        fig.add_trace(
            go.Scatter(
                x=gp_range,
                y=b_line,
                mode="lines",
                line=dict(color="red", width=3),
                name=f"Boundary: {w1:.1f}¬∑gp + {w2:.1f}¬∑b + {beta:.1f} = 0",
            )
        )

    # Color points by prediction
    pine_data = wood_data[wood_data["wood_type"] == "Pine"]
    birch_data = wood_data[wood_data["wood_type"] == "Birch"]

    # Calculate predictions
    pine_pred = w1 * pine_data["grain_prominence"] + w2 * pine_data["brightness"] + beta
    birch_pred = (
        w1 * birch_data["grain_prominence"] + w2 * birch_data["brightness"] + beta
    )

    # Check if correctly classified (Pine should be positive, Birch should be negative)
    pine_correct = (pine_pred > 0).sum()
    birch_correct = (birch_pred < 0).sum()
    total_correct = pine_correct + birch_correct

    # Add Pine samples
    fig.add_trace(
        go.Scatter(
            x=pine_data["grain_prominence"],
            y=pine_data["brightness"],
            mode="markers",
            marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
            name="Pine",
        )
    )

    # Add Birch samples
    fig.add_trace(
        go.Scatter(
            x=birch_data["grain_prominence"],
            y=birch_data["brightness"],
            mode="markers",
            marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
            name="Birch",
        )
    )

    # Title with accuracy
    accuracy = total_correct / len(wood_data) * 100
    title_text = f"Testing the Wood Classifier ‚Äî Accuracy: {total_correct}/{len(wood_data)} ({accuracy:.0f}%)"
    if total_correct == len(wood_data):
        title_text += " ‚úÖ Perfect!"

    fig.update_layout(
        title=title_text,
        xaxis_title="Grain Prominence (gp) = a‚ÇÅ",
        yaxis_title="Brightness (b) = a‚ÇÇ",
        xaxis=dict(range=[-0.1, 1.1]),
        yaxis=dict(range=[0, 10]),
        width=750,
        height=500,
        font=dict(size=14),
    )

    fig.show()

    # Print equation
    print(f"\nPerceptron equation: f(w, Œ≤) = {w1:.1f}¬∑a‚ÇÅ + {w2:.1f}¬∑a‚ÇÇ + {beta:.1f}")
    print(
        f"                   = {w1:.1f}¬∑(grain_prominence) + {w2:.1f}¬∑(brightness) + {beta:.1f}"
    )
    print("\nDecision: If f > 0 ‚Üí Pine, If f < 0 ‚Üí Birch")
    print(
        f"\nCorrect: Pine {pine_correct}/{len(pine_data)}, Birch {birch_correct}/{len(birch_data)}"
    )


# Create interactive widgets
w1_slider = widgets.FloatSlider(
    value=-8.0, min=-20, max=10, step=0.5, description="w‚ÇÅ (gp):"
)
w2_slider = widgets.FloatSlider(
    value=1.0, min=-10, max=10, step=0.5, description="w‚ÇÇ (b):"
)
beta_slider = widgets.FloatSlider(
    value=0.0, min=-20, max=20, step=0.5, description="Œ≤ (bias):"
)

# Create interactive output
out = widgets.interactive_output(
    plot_perceptron, {"w1": w1_slider, "w2": w2_slider, "beta": beta_slider}
)

# Display
print(
    "üéÆ INTERACTIVE: Adjust the sliders to find a decision boundary that separates Pine from Birch!"
)
print("=" * 80)
display(widgets.VBox([widgets.HBox([w1_slider, w2_slider, beta_slider]), out]))

### 3.2 What Did You Just Do?

By adjusting $w_1$, $w_2$, and $\beta$ until the line separated the classes, you did **exactly what a machine learning algorithm does** ‚Äî but manually!

**Gradient descent** (the algorithm that trains neural networks) does this automatically:

1. Start with random weights
2. Check how many samples are misclassified
3. Adjust weights slightly in the direction that reduces errors
4. Repeat until the boundary separates the classes

**Key insight:** "Learning" in ML is just finding the right parameters for a mathematical function.


## 4. The 3D Feature Space

So far we've used 2 features: brightness and grain prominence. What happens when we add a third feature ‚Äî grain frequency?

In 3D, the decision boundary becomes a **plane** instead of a line.

$$f(\mathbf{w}, \beta) = w_1 \cdot gp + w_2 \cdot b + w_3 \cdot f + \beta = 0$$


In [None]:
# 3D scatter plot with all three features
fig = go.Figure()

# Add Pine samples
pine_data = wood_data[wood_data["wood_type"] == "Pine"]
fig.add_trace(
    go.Scatter3d(
        x=pine_data["grain_prominence"],
        y=pine_data["brightness"],
        z=pine_data["grain_frequency"],
        mode="markers",
        marker=dict(size=8, color="#2E86AB", line=dict(width=1, color="black")),
        name="Pine",
    )
)

# Add Birch samples
birch_data = wood_data[wood_data["wood_type"] == "Birch"]
fig.add_trace(
    go.Scatter3d(
        x=birch_data["grain_prominence"],
        y=birch_data["brightness"],
        z=birch_data["grain_frequency"],
        mode="markers",
        marker=dict(size=8, color="#A23B72", line=dict(width=1, color="black")),
        name="Birch",
    )
)

# Add a separating plane
# Vertical plane at gp = 0.4 (perpendicular to gp axis)
# This plane extends across all brightness and frequency values
b_plane = np.linspace(0, 10, 10)
f_plane = np.linspace(0, 1, 10)
b_mesh, f_mesh = np.meshgrid(b_plane, f_plane)
# gp is constant at 0.4
gp_mesh = np.full_like(b_mesh, 0.4)

fig.add_trace(
    go.Surface(
        x=gp_mesh,
        y=b_mesh,
        z=f_mesh,
        colorscale=[[0, "rgba(255,0,0,0.3)"], [1, "rgba(255,0,0,0.3)"]],
        showscale=False,
        name="Decision Plane",
    )
)

fig.update_layout(
    title="3D Feature Space: Brightness, Grain Prominence, Grain Frequency",
    scene=dict(
        xaxis_title="Grain Prominence (gp)",
        yaxis_title="Brightness (b)",
        zaxis_title="Grain Frequency (f)",
        xaxis=dict(range=[0, 1]),
        yaxis=dict(range=[0, 10]),
        zaxis=dict(range=[0, 1]),
    ),
    width=800,
    height=600,
    font=dict(size=12),
)

fig.show()

print("üîÑ INTERACTIVE: Rotate the 3D plot to explore the feature space!")
print("\nNotice the separating PLANE at gp ‚âà 0.4:")
print("   ‚Ä¢ The plane is perpendicular to the gp axis (only gp matters)")
print("   ‚Ä¢ It extends across ALL brightness and frequency values")
print("   ‚Ä¢ This proves: we need a PLANE to separate classes in 3D!")

### 4.1 Why We Need a Plane in 3D

Look at the 3D plot above. The key observations:

1. **A plane separates the classes** ‚Äî in 3D, we need a 2D surface (plane) as our decision boundary
2. **The plane is perpendicular to the gp axis** ‚Äî because only grain prominence distinguishes the classes
3. **Brightness and frequency don't help** ‚Äî they just add extra dimensions without improving separation

**This is why feature selection matters:** Adding irrelevant features increases complexity without improving performance!


## 5. When Linear Separation Fails

What if we tried to classify using **brightness and grain frequency** (ignoring grain prominence)?

Since NEITHER brightness nor frequency distinguishes the classes, this feature space is **NOT linearly separable**!


In [None]:
# Show the b vs f feature space - NOT linearly separable!
fig = go.Figure()

# Add Pine samples
pine_data = wood_data[wood_data["wood_type"] == "Pine"]
fig.add_trace(
    go.Scatter(
        x=pine_data["grain_frequency"],
        y=pine_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
        name="Pine",
    )
)

# Add Birch samples
birch_data = wood_data[wood_data["wood_type"] == "Birch"]
fig.add_trace(
    go.Scatter(
        x=birch_data["grain_frequency"],
        y=birch_data["brightness"],
        mode="markers",
        marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
        name="Birch",
    )
)

fig.update_layout(
    title="Feature Space: Brightness vs Grain Frequency ‚Äî NOT Linearly Separable!",
    xaxis_title="Grain Frequency (f)",
    yaxis_title="Brightness (b)",
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 10]),
    width=700,
    height=550,
    font=dict(size=14),
)

fig.show()

print("‚ùå This data is NOT linearly separable!")
print("   Pine and Birch have the SAME brightness and frequency distributions.")
print("   No matter how you draw a straight line, you can't separate them!")

### 5.1 Try It Yourself: Can You Find a Line?

Use the sliders below to try to find a line that separates Pine from Birch using brightness and grain frequency.

**Spoiler:** You won't be able to achieve 100% accuracy with a straight line!


In [None]:
# Interactive perceptron for the non-separable case (b vs f)


def plot_perceptron_b_f(w1, w2, beta):
    """Plot the decision boundary for b vs f feature space."""
    fig = go.Figure()

    # Calculate decision boundary line
    # w1 * f + w2 * b + beta = 0
    # b = (-w1 * f - beta) / w2
    f_range = np.linspace(0, 1, 100)

    if abs(w2) > 0.01:
        b_line = (-w1 * f_range - beta) / w2
        fig.add_trace(
            go.Scatter(
                x=f_range,
                y=b_line,
                mode="lines",
                line=dict(color="red", width=3),
                name="Boundary",
            )
        )

    # Get data
    pine_data = wood_data[wood_data["wood_type"] == "Pine"]
    birch_data = wood_data[wood_data["wood_type"] == "Birch"]

    # Calculate predictions
    pine_pred = w1 * pine_data["grain_frequency"] + w2 * pine_data["brightness"] + beta
    birch_pred = (
        w1 * birch_data["grain_frequency"] + w2 * birch_data["brightness"] + beta
    )

    pine_correct = (pine_pred > 0).sum()
    birch_correct = (birch_pred < 0).sum()
    total_correct = pine_correct + birch_correct

    # Add samples
    fig.add_trace(
        go.Scatter(
            x=pine_data["grain_frequency"],
            y=pine_data["brightness"],
            mode="markers",
            marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
            name="Pine",
        )
    )

    fig.add_trace(
        go.Scatter(
            x=birch_data["grain_frequency"],
            y=birch_data["brightness"],
            mode="markers",
            marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
            name="Birch",
        )
    )

    accuracy = total_correct / len(wood_data) * 100
    title_text = f"Brightness vs Grain Frequency ‚Äî Accuracy: {total_correct}/{len(wood_data)} ({accuracy:.0f}%)"

    fig.update_layout(
        title=title_text,
        xaxis_title="Grain Frequency (f)",
        yaxis_title="Brightness (b)",
        xaxis=dict(range=[0, 1]),
        yaxis=dict(range=[0, 10]),
        width=700,
        height=500,
        font=dict(size=14),
    )

    display(fig)

    if total_correct == len(wood_data):
        print("‚úÖ Perfect separation! (Lucky arrangement of points!)")
    else:
        print(f"‚ùå Best you can do: {total_correct}/{len(wood_data)} correct")
        print("   A straight line CANNOT perfectly separate this data!")


# Create widgets
w1_slider2 = widgets.FloatSlider(
    value=1.0, min=-10, max=10, step=0.5, description="w‚ÇÅ (f):"
)
w2_slider2 = widgets.FloatSlider(
    value=1.0, min=-10, max=10, step=0.5, description="w‚ÇÇ (b):"
)
beta_slider2 = widgets.FloatSlider(
    value=0.0, min=-10, max=10, step=0.5, description="Œ≤ (bias):"
)

out2 = widgets.interactive_output(
    plot_perceptron_b_f, {"w1": w1_slider2, "w2": w2_slider2, "beta": beta_slider2}
)

print(
    "üéÆ TRY IT: Can you find a line that separates Pine from Birch? (Hint: You can't!)"
)
print("=" * 80)
display(widgets.VBox([widgets.HBox([w1_slider2, w2_slider2, beta_slider2]), out2]))

### 5.2 The Limits of Linear Models

You've just experienced what researchers discovered in the 1960s: **simple perceptrons can't solve all problems.**

In 1969, Marvin Minsky and Seymour Papert published "Perceptrons," proving mathematically that single-layer perceptrons cannot solve problems where the classes aren't linearly separable.

This caused the first "AI Winter" ‚Äî a period where funding and interest in neural networks collapsed.

**The solution?** More powerful models that can learn non-linear decision boundaries.


## 6. Non-Linear Decision Boundaries

What if we could use a **curved** boundary instead of a straight line?

Let's create a dataset that's separable with a curve but not with a line, then show how a more powerful model (Support Vector Machine with a non-linear kernel) can handle it.


In [None]:
# Create a dataset that requires a non-linear boundary
# Pine forms a cluster in the center, Birch forms a ring around it
np.random.seed(42)
n_samples_nl = 12

# Pine: cluster in center
pine_gp_nl = np.random.normal(0.5, 0.1, n_samples_nl)
pine_f_nl = np.random.normal(0.5, 0.1, n_samples_nl)

# Birch: ring around the outside
angles = np.random.uniform(0, 2 * np.pi, n_samples_nl)
radii = np.random.uniform(0.3, 0.4, n_samples_nl)
birch_gp_nl = 0.5 + radii * np.cos(angles)
birch_f_nl = 0.5 + radii * np.sin(angles)

# Clip to valid range
pine_gp_nl = np.clip(pine_gp_nl, 0.05, 0.95)
pine_f_nl = np.clip(pine_f_nl, 0.05, 0.95)
birch_gp_nl = np.clip(birch_gp_nl, 0.05, 0.95)
birch_f_nl = np.clip(birch_f_nl, 0.05, 0.95)

# Create dataset
nonlinear_data = pd.DataFrame(
    {
        "gp": np.concatenate([pine_gp_nl, birch_gp_nl]),
        "f": np.concatenate([pine_f_nl, birch_f_nl]),
        "wood_type": ["Pine"] * n_samples_nl + ["Birch"] * n_samples_nl,
    }
)

# Visualize
fig = go.Figure()

pine_nl = nonlinear_data[nonlinear_data["wood_type"] == "Pine"]
birch_nl = nonlinear_data[nonlinear_data["wood_type"] == "Birch"]

fig.add_trace(
    go.Scatter(
        x=pine_nl["gp"],
        y=pine_nl["f"],
        mode="markers",
        marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
        name="Pine",
    )
)

fig.add_trace(
    go.Scatter(
        x=birch_nl["gp"],
        y=birch_nl["f"],
        mode="markers",
        marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
        name="Birch",
    )
)

fig.update_layout(
    title="Non-Linear Problem: Pine in Center, Birch Around the Edge",
    xaxis_title="Feature 1",
    yaxis_title="Feature 2",
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 1]),
    width=600,
    height=600,
    font=dict(size=14),
)

fig.show()

print("ü§î Can a straight line separate these classes?")
print("   No! But a CIRCLE could...")

In [None]:
# Show that a non-linear (RBF kernel) SVM can separate this data

# Prepare data
X_nl = nonlinear_data[["gp", "f"]].values
y_nl = (nonlinear_data["wood_type"] == "Pine").astype(int)

# Fit a non-linear SVM
svm_rbf = SVC(kernel="rbf", gamma=50, C=1.0)
svm_rbf.fit(X_nl, y_nl)

# Create a mesh to plot decision boundary
xx, yy = np.meshgrid(np.linspace(0, 1, 200), np.linspace(0, 1, 200))
Z = svm_rbf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot with decision boundary
fig = go.Figure()

# Add decision boundary as contour
fig.add_trace(
    go.Contour(
        x=np.linspace(0, 1, 200),
        y=np.linspace(0, 1, 200),
        z=Z,
        contours=dict(start=0, end=0, size=1, coloring="none", showlines=True),
        line=dict(color="red", width=3),
        showscale=False,
        name="Non-linear Boundary",
    )
)

# Add shaded regions
fig.add_trace(
    go.Contour(
        x=np.linspace(0, 1, 200),
        y=np.linspace(0, 1, 200),
        z=Z,
        contours=dict(start=-10, end=0, coloring="fill"),
        colorscale=[[0, "rgba(162, 59, 114, 0.2)"], [1, "rgba(162, 59, 114, 0.2)"]],
        showscale=False,
        name="Birch Region",
    )
)

fig.add_trace(
    go.Contour(
        x=np.linspace(0, 1, 200),
        y=np.linspace(0, 1, 200),
        z=Z,
        contours=dict(start=0, end=10, coloring="fill"),
        colorscale=[[0, "rgba(46, 134, 171, 0.2)"], [1, "rgba(46, 134, 171, 0.2)"]],
        showscale=False,
        name="Pine Region",
    )
)

# Add data points
fig.add_trace(
    go.Scatter(
        x=pine_nl["gp"],
        y=pine_nl["f"],
        mode="markers",
        marker=dict(size=15, color="#2E86AB", line=dict(width=2, color="black")),
        name="Pine",
    )
)

fig.add_trace(
    go.Scatter(
        x=birch_nl["gp"],
        y=birch_nl["f"],
        mode="markers",
        marker=dict(size=15, color="#A23B72", line=dict(width=2, color="black")),
        name="Birch",
    )
)

fig.update_layout(
    title="Non-Linear Boundary: SVM with RBF Kernel Can Separate This!",
    xaxis_title="Feature 1",
    yaxis_title="Feature 2",
    xaxis=dict(range=[0, 1]),
    yaxis=dict(range=[0, 1]),
    width=650,
    height=600,
    font=dict(size=14),
)

fig.show()

# Calculate accuracy
y_pred = svm_rbf.predict(X_nl)
accuracy = (y_pred == y_nl).mean() * 100
print(f"‚úÖ SVM with RBF kernel accuracy: {accuracy:.0f}%")
print("\nThe curved (non-linear) boundary can separate the classes!")
print("This is the power of kernel methods ‚Äî they can learn complex patterns.")

### 6.1 The Kernel Trick (Preview)

How does the SVM find this curved boundary? The key idea is the **kernel trick**:

1. **Transform** the data into a higher-dimensional space where it IS linearly separable
2. **Find** a linear boundary in that high-dimensional space
3. **Project** back to the original space ‚Äî the boundary appears curved!

This is a powerful technique you'll learn more about in future courses. The key insight:

> **Non-linear problems can often be solved by transforming them into linear problems in a higher-dimensional space.**


## 7. Feature Space Comparison: Which Features Work?

Let's compare all possible 2D feature combinations to see which ones are useful for classification.


In [None]:
# Pairwise scatter plot matrix
fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=[
        "Grain Prominence vs Brightness<br>(Linearly Separable ‚úÖ)",
        "Brightness vs Grain Frequency<br>(NOT Separable ‚ùå)",
        "Grain Prominence vs Grain Frequency<br>(Linearly Separable ‚úÖ)",
    ],
)

# Plot 1: gp vs b (good!)
pine_data = wood_data[wood_data["wood_type"] == "Pine"]
birch_data = wood_data[wood_data["wood_type"] == "Birch"]

fig.add_trace(
    go.Scatter(
        x=pine_data["grain_prominence"],
        y=pine_data["brightness"],
        mode="markers",
        marker=dict(size=12, color="#2E86AB", line=dict(width=1, color="black")),
        name="Pine",
        legendgroup="pine",
        showlegend=True,
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=birch_data["grain_prominence"],
        y=birch_data["brightness"],
        mode="markers",
        marker=dict(size=12, color="#A23B72", line=dict(width=1, color="black")),
        name="Birch",
        legendgroup="birch",
        showlegend=True,
    ),
    row=1,
    col=1,
)

# Plot 2: b vs f (bad - NOT separable)
fig.add_trace(
    go.Scatter(
        x=pine_data["grain_frequency"],
        y=pine_data["brightness"],
        mode="markers",
        marker=dict(size=12, color="#2E86AB", line=dict(width=1, color="black")),
        name="Pine",
        legendgroup="pine",
        showlegend=False,
    ),
    row=1,
    col=2,
)

fig.add_trace(
    go.Scatter(
        x=birch_data["grain_frequency"],
        y=birch_data["brightness"],
        mode="markers",
        marker=dict(size=12, color="#A23B72", line=dict(width=1, color="black")),
        name="Birch",
        legendgroup="birch",
        showlegend=False,
    ),
    row=1,
    col=2,
)

# Plot 3: gp vs f (bad)
fig.add_trace(
    go.Scatter(
        x=pine_data["grain_prominence"],
        y=pine_data["grain_frequency"],
        mode="markers",
        marker=dict(size=12, color="#2E86AB", line=dict(width=1, color="black")),
        name="Pine",
        legendgroup="pine",
        showlegend=False,
    ),
    row=1,
    col=3,
)

fig.add_trace(
    go.Scatter(
        x=birch_data["grain_prominence"],
        y=birch_data["grain_frequency"],
        mode="markers",
        marker=dict(size=12, color="#A23B72", line=dict(width=1, color="black")),
        name="Birch",
        legendgroup="birch",
        showlegend=False,
    ),
    row=1,
    col=3,
)

# Update axes
fig.update_xaxes(title_text="Grain Prominence", row=1, col=1)
fig.update_yaxes(title_text="Brightness", row=1, col=1)
fig.update_xaxes(title_text="Grain Frequency", row=1, col=2)
fig.update_yaxes(title_text="Brightness", row=1, col=2)
fig.update_xaxes(title_text="Grain Prominence", row=1, col=3)
fig.update_yaxes(title_text="Grain Frequency", row=1, col=3)

fig.update_layout(
    title="Feature Space Comparison: Which Features Separate the Classes?",
    width=1100,
    height=400,
    font=dict(size=12),
)

fig.show()

In [None]:
# Summary table
print("Feature Combination Summary")
print("=" * 70)
print(f"{'Feature Pair':<35} {'Linearly Separable?':<20} {'Why?':<25}")
print("-" * 70)
print(
    f"{'Grain Prominence + Brightness':<35} {'YES ‚úÖ':<20} {'gp separates classes':<25}"
)
print(
    f"{'Grain Prominence + Frequency':<35} {'YES ‚úÖ':<20} {'gp separates classes':<25}"
)
print(f"{'Brightness + Frequency':<35} {'NO ‚ùå':<20} {'Neither feature helps!':<25}")
print("-" * 70)
print("\nüí° Key Insight: Only Grain Prominence distinguishes Pine from Birch!")
print("   Brightness and Frequency are useless ‚Äî they're the same for both classes.")
print("   But in 3D, we still need a PLANE to separate the classes.")

## 8. Key Takeaways

### What We Learned

1. **Feature spaces are the geometric world where ML algorithms operate**
   - Each data point becomes a location in multi-dimensional space
   - Classification = finding boundaries (lines, planes, curves) that separate classes
   - You literally watched clusters form as training data accumulated!

2. **Feature selection is EVERYTHING**
   - Grain prominence separates Pine from Birch ‚Üí linearly separable ‚úÖ
   - Brightness and frequency don't help ‚Üí NOT linearly separable ‚ùå
   - The right feature makes a simple model work; wrong features doom even sophisticated models

3. **Decision boundaries scale with dimensions**
   - In 2D: we need a **LINE** to separate classes
   - In 3D: we need a **PLANE** to separate classes
   - In n-dimensions: we need a **HYPERPLANE**
   - But if the key feature is only grain prominence, even the 3D plane is perpendicular to that axis!

4. **You just became a perceptron (manually!)**
   - You adjusted sliders for $w_1$, $w_2$, and $\beta$ to find a decision boundary
   - That's exactly what gradient descent does automatically
   - $f(\mathbf{w}, \beta) = \mathbf{w}^T \mathbf{a} + \beta$ ‚Äî weights control direction, bias controls position
   - "Learning" = tuning these parameters to minimize misclassification

5. **Linear models hit a wall with non-separable data**
   - You tried (and failed) to separate brightness vs frequency with a straight line
   - This is what Minsky & Papert proved in 1969, triggering the first AI Winter
   - Solution: non-linear models like SVMs with RBF kernels can learn curved boundaries

6. **Irrelevant features are worse than useless**
   - They increase dimensionality (computational cost)

   - They add noise without signal

   - They make data sparse in high dimensions (curse of dimensionality) - Always prefer fewer, meaningful features over many noisy ones


## 9. Discussion Questions

1. **Feature Engineering in Practice:** You saw that grain prominence was the "golden feature" that separated Pine from Birch. If you were a data scientist at the lumber mill, what other wood properties might you measure? (Density? Weight? Color channels? Knot patterns? Surface roughness?)

2. **The Perceptron's Historical Tragedy:** In 1969, Minsky & Papert's book "Perceptrons" proved that single-layer perceptrons couldn't solve non-linearly separable problems (like XOR). This killed neural network research for nearly 20 years. But today, deep neural networks dominate AI. What breakthrough made the difference? (Hint: think about stacking multiple layers...)

3. **When Linear Isn't Enough:** In Section 6, you saw how an SVM with an RBF kernel could create a circular decision boundary. Can you think of real-world classification problems where you'd NEVER expect a straight-line boundary to work? (Medical diagnosis? Image recognition? Fraud detection?)

4. **The Curse of Dimensionality:** Brightness and frequency were useless features. But what if you measured 100 useless features? Would "more data" help? (Consider: in 100-D space, all points become equally far apart ‚Äî nearest neighbors stop working!)

5. **Hands-On Insight:** When you adjusted the sliders to tune the perceptron, did you develop an intuition for what $w_1$, $w_2$, and $\beta$ actually DO? Could you explain to a non-technical person how a perceptron "learns"?


## The Bottom Line

> **"The question isn't 'which algorithm should I use?' ‚Äî it's 'which features should I measure?'"**

You just proved this yourself:

- With grain prominence ‚Üí a simple perceptron works perfectly
- Without it ‚Üí even manual tuning can't find a solution

> **A perfect algorithm with the wrong features will fail. A simple algorithm with the right features will succeed.**

When you rotated that 3D plot, you were seeing the world through an algorithm's eyes. Every ML model ‚Äî from perceptrons to GPT ‚Äî fundamentally operates by finding boundaries in high-dimensional feature spaces. Master this intuition, and you've mastered the core of machine learning.

This is Lab 04's lesson ("bad data beats good models") from a geometric perspective. Feature spaces make it visual: if your classes overlap completely, no amount of algorithmic sophistication can separate them.

> **Machine learning algorithms don't "see" raw data ‚Äî they see points in feature space. Understanding this geometry is understanding how ML works.**
