# Neural Network Fundamentals

## Part 4: The Perceptron - First Prediction

### The Brain's Decision Committee - Chapter 4

---

**Previously:** In Parts 1-3, our committee member learned:
- How to read images as numbers (matrices)
- How to weigh evidence and apply personal thresholds (weights & bias)
- How to cast meaningful votes (activation functions)

**Today's Mission:** Our committee member is now **fully equipped**. It's time for their first real attempt at classifying lines! We'll build a complete Perceptron—the original neural network from 1958—and watch it make predictions. 

*Spoiler: It won't go well at first. And that's exactly the point.*

---

### What You'll Learn in Part 4

By the end of this notebook, you will:

1. **Understand the Perceptron** - The first working neural network (Rosenblatt, 1958)
2. **Generate a dataset** - Create V/H line examples on-the-fly
3. **Implement the forward pass** - Input → Weighted Sum → Activation → Output
4. **Build a Perceptron class** - Clean, reusable code
5. **Make predictions** - Watch the untrained network guess
6. **Understand why it fails** - Random weights = random guesses

---

### Prerequisites

Make sure you've completed:
- **Part 0 & 1:** Welcome & Matrices (`neural_network_fundamentals.ipynb`)
- **Part 2:** The First Committee Member (`part_2_single_neuron.ipynb`)
- **Part 3:** Activation Functions (`part_3_activation_functions.ipynb`)


---

## Setup: Import Dependencies

Let's import our tools and recreate the building blocks from previous notebooks.


In [None]:
# =============================================================================
# PART 4: THE PERCEPTRON - SETUP
# =============================================================================

import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, clear_output

# Try to import ipywidgets for interactive features
try:
    import ipywidgets as widgets
    WIDGETS_AVAILABLE = True
except ImportError:
    WIDGETS_AVAILABLE = False
    print("Note: ipywidgets not installed. Interactive features will be limited.")

# Set up matplotlib style
style_options = ['seaborn-v0_8-whitegrid', 'seaborn-whitegrid', 'ggplot', 'default']
for style in style_options:
    try:
        plt.style.use(style)
        break
    except OSError:
        continue

plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['font.size'] = 12
np.random.seed(42)  # For reproducible random numbers

# =============================================================================
# RECREATE OUR CANONICAL LINE IMAGES (from Parts 1-3)
# =============================================================================

# Vertical line: bright pixels in the middle column
vertical_line = np.array([
    [0, 1, 0],
    [0, 1, 0],
    [0, 1, 0]
])

# Horizontal line: bright pixels in the middle row
horizontal_line = np.array([
    [0, 0, 0],
    [1, 1, 1],
    [0, 0, 0]
])

# Flattened versions (9 pixels as a 1D array)
vertical_flat = vertical_line.flatten()
horizontal_flat = horizontal_line.flatten()

print("Setup complete!")
print("="*60)
print("\nOur canonical images (as 3x3 matrices):")
print(f"\nVertical Line:            Horizontal Line:")
print(f"  {vertical_line[0]}                 {horizontal_line[0]}")
print(f"  {vertical_line[1]}                 {horizontal_line[1]}")
print(f"  {vertical_line[2]}                 {horizontal_line[2]}")
print(f"\nAs flattened vectors (9 pixels):")
print(f"  Vertical:   {vertical_flat}")
print(f"  Horizontal: {horizontal_flat}")


---

## 4.1 What is a Perceptron?

The **Perceptron** is the original neural network, invented by Frank Rosenblatt in 1958. It's the simplest possible neural network—just a single neuron!

### Why Start with the Perceptron?

Before diving in, let's understand why the Perceptron matters:

| Question | Answer |
|----------|--------|
| **What problem does it solve?** | Binary classification (yes/no, cat/dog, vertical/horizontal) |
| **Why is it fundamental?** | ALL neural networks are built from Perceptron-like units |
| **Why learn it first?** | Simple enough to understand completely, complex enough to be useful |

**The Key Insight:** Once you understand ONE neuron, you understand the building block of ALL deep learning. Everything else is just more neurons connected together!

### Historical Significance

The Perceptron was revolutionary. For the first time, a machine could **learn** to classify patterns without being explicitly programmed. Rosenblatt famously predicted it would eventually "be able to walk, talk, see, write, reproduce itself and be conscious of its existence."

(Spoiler: We're still working on most of those.)

### Why This Architecture?

The Perceptron's design is inspired by biological neurons:

| Biological Neuron | Perceptron Equivalent | Purpose |
|-------------------|----------------------|---------|
| Dendrites (receive signals) | Inputs (x) | Receive information |
| Synapses (connection strength) | Weights (w) | Determine importance |
| Cell body (integrates) | Weighted sum (Σ) | Combine all inputs |
| Axon hillock (threshold) | Bias (b) | Decision threshold |
| Axon (fires/doesn't fire) | Activation (f) | Output a decision |

This isn't just an analogy - it's the actual inspiration! Rosenblatt was trying to model how real neurons make decisions.

### The Architecture

A Perceptron is exactly what we built in Parts 2-3:

```
    INPUTS (x)           WEIGHTS (w)              SUM               ACTIVATION          OUTPUT
    ┌─────┐              ┌─────┐                                    
    │ x₁  │──────────────│ w₁  │─────┐                              
    └─────┘              └─────┘     │                              
    ┌─────┐              ┌─────┐     │         ┌─────┐              ┌─────┐
    │ x₂  │──────────────│ w₂  │─────┼────────▶│  Σ  │──────────────│ f() │────────▶  ŷ
    └─────┘              └─────┘     │         │+bias│              └─────┘
    ┌─────┐              ┌─────┐     │         └─────┘              
    │ x₃  │──────────────│ w₃  │─────┘                              
    └─────┘              └─────┘                                    
```

### The Complete Formula (Everything Together!)

$$\hat{y} = f\left(\sum_{i=1}^{n} w_i \cdot x_i + b\right) = f(\mathbf{w} \cdot \mathbf{x} + b)$$

Where:
- **x** = input vector (our flattened 9-pixel image)
- **w** = weight vector (9 weights, one per pixel)
- **b** = bias (the personal threshold)
- **Σ** = weighted sum (dot product + bias)
- **f()** = activation function (sigmoid for us)
- **ŷ** = predicted output (probability it's a vertical line)

### Committee Analogy: The First Working Committee Member

*"Our committee member is now fully trained in procedure. They can:*
1. *Read the evidence (input)*
2. *Weigh each piece by importance (weights)*
3. *Apply their personal threshold (bias)*
4. *Cast a meaningful vote (activation)*

*Now it's time for their first REAL case!"*


---

## 4.2 Generating Our Dataset

To test our Perceptron, we need examples to classify. Instead of loading a dataset from a file, we'll **generate one on-the-fly**. This is a powerful technique!

### First, What IS a Dataset?

A **dataset** is a collection of examples used to train or test a machine learning model. Each example has:
- **Features (X)**: The input data (for us, 9 pixel values)
- **Label (y)**: The correct answer (for us, 0 or 1)

This is called **supervised learning** because we "supervise" the model by giving it the right answers to learn from.

| Term | Meaning | Our Example |
|------|---------|-------------|
| **Sample** | One example (input + label) | One 3x3 image + whether it's vertical |
| **Feature** | One piece of input data | One pixel value |
| **Label** | The correct answer | 0 (horizontal) or 1 (vertical) |
| **Dataset** | Collection of samples | 100 images with their labels |

### Why Do We Need Datasets?

Machine learning models learn by example, not by rules:

| Traditional Programming | Machine Learning |
|------------------------|------------------|
| Human writes rules | Human provides examples |
| "If middle column is bright, it's vertical" | Model sees 50 vertical + 50 horizontal lines |
| Rules are explicit | Model discovers patterns itself |
| Hard to handle edge cases | Learns from variety in data |

**The magic:** Instead of us figuring out the rules, the model discovers them from data!

### Our Classification Task

| Image Type | Label (y) | Meaning |
|------------|-----------|---------|
| Vertical Line | 1 | "This is a vertical line" |
| Horizontal Line | 0 | "This is a horizontal line" |

### Dataset Requirements

For a proper machine learning experiment, we need:
1. **Multiple examples** - Not just 2 images, but many variations
2. **Balanced classes** - Equal numbers of vertical and horizontal
3. **Some variety** - Lines in different positions
4. **Optional noise** - To make the problem harder (later)

### The Dataset Generator Function

We'll create a function that generates any number of V/H line examples:


In [None]:
# =============================================================================
# DATASET GENERATOR: Create V/H Line Examples On-The-Fly
# =============================================================================

def generate_line_dataset(n_samples=100, noise_level=0.0, seed=None):
    """
    Generate a dataset of vertical and horizontal line images.
    
    Parameters:
    -----------
    n_samples : int
        Total number of samples (will be split evenly between V and H)
    noise_level : float (0.0 to 0.5)
        Amount of random noise to add (0.0 = clean, 0.3 = noisy)
    seed : int or None
        Random seed for reproducibility
    
    Returns:
    --------
    X : numpy array of shape (n_samples, 9)
        Flattened 3x3 images
    y : numpy array of shape (n_samples,)
        Labels: 1 for vertical, 0 for horizontal
    """
    
    if seed is not None:
        np.random.seed(seed)
    
    X = []  # Will hold all images (as flattened arrays)
    y = []  # Will hold all labels
    
    # Generate n_samples/2 vertical lines and n_samples/2 horizontal lines
    for i in range(n_samples):
        
        if i < n_samples // 2:
            # ----- VERTICAL LINE (label = 1) -----
            # Pick a random column (0, 1, or 2) for variety
            col = np.random.randint(0, 3)
            
            # Create blank 3x3 image
            image = np.zeros((3, 3))
            
            # Fill the chosen column with 1s
            image[:, col] = 1
            
            # Add noise if requested
            if noise_level > 0:
                image = image + np.random.randn(3, 3) * noise_level
                image = np.clip(image, 0, 1)  # Keep values in [0, 1]
            
            X.append(image.flatten())
            y.append(1)  # Label: Vertical
            
        else:
            # ----- HORIZONTAL LINE (label = 0) -----
            # Pick a random row (0, 1, or 2) for variety
            row = np.random.randint(0, 3)
            
            # Create blank 3x3 image
            image = np.zeros((3, 3))
            
            # Fill the chosen row with 1s
            image[row, :] = 1
            
            # Add noise if requested
            if noise_level > 0:
                image = image + np.random.randn(3, 3) * noise_level
                image = np.clip(image, 0, 1)
            
            X.append(image.flatten())
            y.append(0)  # Label: Horizontal
    
    # Convert to numpy arrays
    X = np.array(X)
    y = np.array(y)
    
    # Shuffle the dataset (so V and H are mixed, not grouped)
    shuffle_idx = np.random.permutation(n_samples)
    X = X[shuffle_idx]
    y = y[shuffle_idx]
    
    return X, y

print("Dataset generator function created!")
print("="*60)


In [None]:
# =============================================================================
# GENERATE AND VISUALIZE OUR DATASET
# =============================================================================

# Generate 20 clean examples for visualization
X_small, y_small = generate_line_dataset(n_samples=20, noise_level=0.0, seed=42)

print("DATASET GENERATED!")
print("="*60)
print(f"\nDataset shape: X = {X_small.shape}, y = {y_small.shape}")
print(f"  - {X_small.shape[0]} total samples")
print(f"  - Each sample has {X_small.shape[1]} features (3x3 = 9 pixels)")
print(f"\nLabel distribution:")
print(f"  - Vertical lines (y=1): {np.sum(y_small == 1)} samples")
print(f"  - Horizontal lines (y=0): {np.sum(y_small == 0)} samples")

# Show first few samples
print("\n" + "="*60)
print("FIRST 6 SAMPLES:")
print("="*60)

for i in range(6):
    image = X_small[i].reshape(3, 3)
    label = y_small[i]
    label_name = "VERTICAL" if label == 1 else "HORIZONTAL"
    print(f"\nSample {i}: Label = {label} ({label_name})")
    print(f"  {image[0]}")
    print(f"  {image[1]}")
    print(f"  {image[2]}")


In [None]:
# =============================================================================
# VISUALIZE SAMPLE IMAGES FROM OUR DATASET
# =============================================================================

# Show a grid of 10 sample images
fig, axes = plt.subplots(2, 5, figsize=(12, 5))

for i, ax in enumerate(axes.flat):
    image = X_small[i].reshape(3, 3)
    label = y_small[i]
    label_name = "VERTICAL" if label == 1 else "HORIZONTAL"
    
    ax.imshow(image, cmap='Blues', vmin=0, vmax=1)
    ax.set_title(f"{label_name}\n(y={label})", fontsize=10)
    ax.axis('off')
    
    # Add grid lines
    for j in range(4):
        ax.axhline(j - 0.5, color='gray', linewidth=0.5)
        ax.axvline(j - 0.5, color='gray', linewidth=0.5)

plt.suptitle('Sample Images from Our Generated Dataset', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nNotice: The lines can appear in different positions (left/center/right columns,")
print("top/center/bottom rows). This variety makes our dataset more realistic!")


---

## 4.3 The Forward Pass: Step-by-Step

The **forward pass** is how a neural network makes a prediction. Information flows **forward** from input to output.

### What is the Forward Pass?

The term "forward pass" comes from the direction information flows:

```
INPUT → WEIGHTS × INPUT → ADD BIAS → ACTIVATION → OUTPUT
  x    →    w · x       →   + b    →    f(z)    →   ŷ
```

| Term | Meaning |
|------|---------|
| **Forward** | Information flows left-to-right, input-to-output |
| **Pass** | One complete journey through the network |
| **Inference** | Another name for making predictions (vs. training) |

**Why "Forward"?** Later in Part 5, we'll see the **backward pass** where error flows in the opposite direction. Together, they form the complete learning process!

### Forward Pass vs Training

It's important to understand when each happens:

| Forward Pass (Inference) | Training |
|--------------------------|----------|
| Make a prediction | Learn from mistakes |
| Uses current weights | Updates the weights |
| Fast (one direction) | Slower (forward + backward) |
| Used after training | Used to create the model |
| "What do I think this is?" | "How can I do better?" |

Right now, we're just doing the forward pass - making predictions. Training comes in Part 5!

### The Four Steps of a Forward Pass

| Step | Operation | Formula | Purpose |
|------|-----------|---------|---------|
| 1 | Receive Input | x | Get the flattened image (9 values) |
| 2 | Weighted Sum | z = w · x | Compute dot product with weights |
| 3 | Add Bias | z = z + b | Add the personal threshold |
| 4 | Apply Activation | ŷ = f(z) | Convert score to meaningful output |

Let's trace through this step-by-step with actual numbers.

### Committee Analogy

*"The forward pass is the committee member reading a case file:*
1. *They receive the evidence (input)*
2. *They multiply each piece by their priority (weights)*
3. *They add their personal standard (bias)*
4. *They cast their vote (activation)"*

Let's see this in code with EVERY step shown:


In [None]:
# =============================================================================
# THE FORWARD PASS: Step-by-Step Walkthrough
# =============================================================================

# Define the sigmoid activation function (from Part 3)
def sigmoid(z):
    """Sigmoid activation: squashes any value to range (0, 1)."""
    return 1 / (1 + np.exp(-z))

# Let's use our canonical vertical line as the input
x = vertical_flat.copy()

# Create some random weights (as if the Perceptron is untrained)
np.random.seed(123)  # For reproducibility
w = np.random.randn(9) * 0.5  # 9 random weights
b = np.random.randn() * 0.1    # 1 random bias

print("="*70)
print("FORWARD PASS: Step-by-Step with Real Numbers")
print("="*70)

# ----- STEP 1: Receive Input -----
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ STEP 1: Receive Input                                               │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nInput image (as 3x3 grid):")
print(f"  {x.reshape(3,3)[0]}")
print(f"  {x.reshape(3,3)[1]}")
print(f"  {x.reshape(3,3)[2]}")
print(f"\nFlattened input vector x:")
print(f"  x = {x}")

# ----- STEP 2: Weighted Sum (Dot Product) -----
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ STEP 2: Weighted Sum (Dot Product)                                  │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nWeights vector w:")
print(f"  w = [{', '.join([f'{wi:.3f}' for wi in w])}]")

# Show element-wise multiplication
print(f"\nElement-wise products (x[i] × w[i]):")
products = x * w
print(f"  = [{', '.join([f'{p:.3f}' for p in products])}]")

# Sum the products
dot_product = np.sum(products)
print(f"\nSum of products (the dot product):")
print(f"  w · x = {dot_product:.4f}")

# ----- STEP 3: Add Bias -----
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ STEP 3: Add Bias                                                    │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nBias value:")
print(f"  b = {b:.4f}")
print(f"\nPre-activation value z:")
z = dot_product + b
print(f"  z = (w · x) + b")
print(f"  z = {dot_product:.4f} + {b:.4f}")
print(f"  z = {z:.4f}")

# ----- STEP 4: Apply Activation -----
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ STEP 4: Apply Activation (Sigmoid)                                  │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nApplying sigmoid to z = {z:.4f}:")
y_hat = sigmoid(z)
print(f"  ŷ = sigmoid(z) = 1 / (1 + e^(-z))")
print(f"  ŷ = 1 / (1 + e^(-{z:.4f}))")
print(f"  ŷ = {y_hat:.4f}")

# ----- FINAL RESULT -----
print("\n" + "="*70)
print("FORWARD PASS COMPLETE!")
print("="*70)
print(f"\nFinal output: ŷ = {y_hat:.4f}")
print(f"\nInterpretation: The Perceptron is {y_hat*100:.1f}% confident this is a VERTICAL line.")
print(f"\nPrediction: {'VERTICAL (y=1)' if y_hat >= 0.5 else 'HORIZONTAL (y=0)'}")
print(f"Actual label: VERTICAL (y=1)")
print(f"{'✓ Correct!' if y_hat >= 0.5 else '✗ Wrong!'}")


---

## 4.4 Building the Perceptron Class

Now let's package everything into a clean, reusable **Perceptron class**. This is how real neural networks are implemented - as modular, reusable code.

### Why Use a Class?

In programming, a **class** is a blueprint for creating objects. For neural networks, classes help us:

| Benefit | Explanation |
|---------|-------------|
| **Organization** | Keep weights, bias, and methods together |
| **Reusability** | Create multiple Perceptrons easily |
| **State** | Remember weights between method calls |
| **Readability** | `perceptron.predict(x)` is clearer than raw math |

### What Our Perceptron Needs

| Component | What It Does |
|-----------|--------------|
| `__init__()` | Initialize weights and bias (randomly) |
| `forward()` | Compute the forward pass (returns probability) |
| `predict()` | Make a binary decision (0 or 1) |

### Why Random Initialization?

Before training, we need some starting values for weights. Why random?

| Alternative | Problem |
|-------------|---------|
| All zeros | All neurons would output the same thing! |
| All ones | Would overwhelm the activation function |
| Same value everywhere | All weights would update identically |
| **Random small values** | ✓ Breaks symmetry, allows diverse learning |

**Key Insight:** The SPECIFIC random values don't matter much - training will adjust them. But they must be:
- **Small** (typically between -0.1 and 0.1) to avoid saturating the sigmoid
- **Different** from each other to allow diverse learning

The scale `* 0.1` keeps initial outputs near 0.5 (middle of sigmoid), where learning is fastest.

### The Core Math (Keep It Simple!)

All the math fits in just two lines:

**Forward pass:** `z = np.dot(weights, x) + bias`

**Activation:** `output = 1 / (1 + np.exp(-z))`

**Prediction:** `1 if output >= 0.5 else 0`

### Understanding the Threshold (0.5)

The sigmoid outputs a **probability** between 0 and 1. To make a **decision**, we need a threshold:

| Output | Decision Rule | Prediction |
|--------|---------------|------------|
| 0.0 - 0.49 | "Probably NOT vertical" | 0 (Horizontal) |
| 0.50 - 1.0 | "Probably IS vertical" | 1 (Vertical) |

**Why 0.5?** It's the natural midpoint - if the model is >50% confident it's vertical, we call it vertical.

Note: In some applications, you might use a different threshold (e.g., 0.7 for "high confidence only"). But 0.5 is the standard starting point.


In [None]:
# =============================================================================
# THE PERCEPTRON CLASS: Clean, Reusable Implementation
# =============================================================================

class Perceptron:
    """
    A single-layer Perceptron for binary classification.
    
    This is the simplest possible neural network - just one neuron!
    
    Attributes:
        n_inputs (int): Number of input features (9 for our 3x3 images)
        weights (array): One weight per input feature
        bias (float): The threshold/offset term
    """
    
    def __init__(self, n_inputs):
        """
        Initialize the Perceptron with random weights and bias.
        
        Parameters:
            n_inputs: Number of input features (pixels in our image)
        """
        # Random weights, small values centered around 0
        self.weights = np.random.randn(n_inputs) * 0.1
        
        # Bias starts at 0
        self.bias = 0.0
        
        # Store for reference
        self.n_inputs = n_inputs
        
        # Storage for debugging/visualization
        self.last_z = None    # Pre-activation value
        self.last_output = None  # Final output
    
    def forward(self, x):
        """
        Compute the forward pass - make a prediction.
        
        Parameters:
            x: Input array (can be 2D image or 1D flattened)
        
        Returns:
            float: Probability between 0 and 1
        """
        # Ensure x is a 1D array
        x = np.array(x).flatten()
        
        # STEP 1 & 2: Weighted sum + bias
        # Formula: z = w · x + b
        self.last_z = np.dot(self.weights, x) + self.bias
        
        # STEP 3: Apply sigmoid activation
        # Formula: output = 1 / (1 + e^(-z))
        self.last_output = 1 / (1 + np.exp(-self.last_z))
        
        return self.last_output
    
    def predict(self, x):
        """
        Make a binary prediction (0 or 1).
        
        Parameters:
            x: Input array
        
        Returns:
            int: 0 (horizontal) or 1 (vertical)
        """
        probability = self.forward(x)
        return 1 if probability >= 0.5 else 0
    
    def __repr__(self):
        return f"Perceptron(inputs={self.n_inputs})"


# Create our Perceptron!
print("="*60)
print("PERCEPTRON CLASS CREATED!")
print("="*60)

# Instantiate a Perceptron for 9 inputs (3x3 = 9 pixels)
perceptron = Perceptron(n_inputs=9)

print(f"\nOur Perceptron: {perceptron}")
print(f"\nInitial weights (random, untrained):")
print(f"  Shape: {perceptron.weights.shape}")
print(f"  Values: [{', '.join([f'{w:.3f}' for w in perceptron.weights])}]")
print(f"\nInitial bias: {perceptron.bias}")
print("\nThe Perceptron is ready, but completely UNTRAINED!")
print("Its weights are random - it doesn't know what a vertical line looks like.")


---

## 4.5 Initial Predictions: The Confused Perceptron

Now the moment of truth! Let's see how our untrained Perceptron performs.

### What is Accuracy?

**Accuracy** is the simplest way to measure how well a model performs:

$$\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}} \times 100\%$$

For example:
- 80 correct out of 100 = 80% accuracy
- 50 correct out of 100 = 50% accuracy

### The Baseline: What's "Random Guessing"?

For any classification task, there's a **baseline accuracy** - what you'd get by guessing randomly:

| Task Type | Classes | Random Baseline |
|-----------|---------|-----------------|
| Binary (yes/no) | 2 | 50% |
| 3-way choice | 3 | 33% |
| 10-way choice | 10 | 10% |

**Our task is binary** (vertical vs horizontal), so random guessing gives 50%.

**Why this matters:** If your model gets 50% on binary classification, it's learned NOTHING. It's no better than flipping a coin!

### What We Expect

Since the weights are random, the Perceptron has no idea what it's doing. It's like asking someone who's never seen a line before to classify them.

**Expected accuracy:** Around 50% (random guessing for binary classification)

### Committee Analogy

*"Our committee member has been trained in procedure, but has never seen an actual case. They're about to make judgments based on completely arbitrary priorities. The results won't be pretty..."*


In [None]:
# =============================================================================
# TESTING THE UNTRAINED PERCEPTRON ON OUR CANONICAL EXAMPLES
# =============================================================================

print("="*70)
print("TESTING UNTRAINED PERCEPTRON")
print("="*70)

# Test on our canonical vertical line
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ Test 1: VERTICAL LINE                                               │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nImage (3x3):")
print(f"  {vertical_line[0]}")
print(f"  {vertical_line[1]}")
print(f"  {vertical_line[2]}")

prob_vertical = perceptron.forward(vertical_flat)
pred_vertical = perceptron.predict(vertical_flat)
actual_vertical = 1

print(f"\nForward pass calculation:")
print(f"  z = w · x + b = {perceptron.last_z:.4f}")
print(f"  output = sigmoid(z) = {prob_vertical:.4f}")
print(f"\nPrediction: {pred_vertical} ({'VERTICAL' if pred_vertical == 1 else 'HORIZONTAL'})")
print(f"Actual:     {actual_vertical} (VERTICAL)")
print(f"Result:     {'CORRECT!' if pred_vertical == actual_vertical else 'WRONG!'}")

# Test on our canonical horizontal line
print("\n┌─────────────────────────────────────────────────────────────────────┐")
print("│ Test 2: HORIZONTAL LINE                                             │")
print("└─────────────────────────────────────────────────────────────────────┘")
print(f"\nImage (3x3):")
print(f"  {horizontal_line[0]}")
print(f"  {horizontal_line[1]}")
print(f"  {horizontal_line[2]}")

prob_horizontal = perceptron.forward(horizontal_flat)
pred_horizontal = perceptron.predict(horizontal_flat)
actual_horizontal = 0

print(f"\nForward pass calculation:")
print(f"  z = w · x + b = {perceptron.last_z:.4f}")
print(f"  output = sigmoid(z) = {prob_horizontal:.4f}")
print(f"\nPrediction: {pred_horizontal} ({'VERTICAL' if pred_horizontal == 1 else 'HORIZONTAL'})")
print(f"Actual:     {actual_horizontal} (HORIZONTAL)")
print(f"Result:     {'CORRECT!' if pred_horizontal == actual_horizontal else 'WRONG!'}")


In [None]:
# =============================================================================
# TESTING ON THE FULL DATASET: Calculate Accuracy
# =============================================================================

# Generate a larger dataset for proper testing
X_test, y_test = generate_line_dataset(n_samples=100, noise_level=0.0, seed=99)

print("="*70)
print("FULL DATASET EVALUATION")
print("="*70)
print(f"\nDataset: {len(y_test)} samples ({sum(y_test)} vertical, {len(y_test) - sum(y_test)} horizontal)")

# Make predictions on all samples
predictions = []
correct = 0

for i in range(len(X_test)):
    pred = perceptron.predict(X_test[i])
    predictions.append(pred)
    if pred == y_test[i]:
        correct += 1

accuracy = correct / len(y_test) * 100

# Display results table (first 10 samples)
print("\n" + "-"*70)
print("FIRST 10 PREDICTIONS:")
print("-"*70)
print(f"{'Sample':<8} {'Actual':<12} {'Predicted':<12} {'Result':<10}")
print("-"*70)

for i in range(10):
    actual_name = "VERTICAL" if y_test[i] == 1 else "HORIZONTAL"
    pred_name = "VERTICAL" if predictions[i] == 1 else "HORIZONTAL"
    result = "Correct" if predictions[i] == y_test[i] else "WRONG"
    symbol = "+" if predictions[i] == y_test[i] else "X"
    print(f"  {i:<6} {actual_name:<12} {pred_name:<12} {symbol} {result}")

# Summary
print("\n" + "="*70)
print("ACCURACY SUMMARY")
print("="*70)
print(f"\n  Total samples:  {len(y_test)}")
print(f"  Correct:        {correct}")
print(f"  Wrong:          {len(y_test) - correct}")
print(f"\n  ACCURACY: {accuracy:.1f}%")
print(f"\n  Expected (random guessing): ~50%")
print(f"  Difference from random: {abs(accuracy - 50):.1f}%")

if accuracy > 55:
    print("\n  Hmm, slightly better than random - got lucky with the random weights!")
elif accuracy < 45:
    print("\n  Worse than random! The weights are actually hurting performance.")
else:
    print("\n  As expected: basically random guessing. The Perceptron is CONFUSED!")


---

## 4.6 Why It's Wrong: Understanding the Problem

Our Perceptron performed around 50% accuracy - basically coin-flipping. Why?

### Understanding What Weights Actually DO

The weights are the Perceptron's **knowledge**. Each weight answers the question:

> "How important is this input for making the decision?"

| Weight Value | Meaning |
|--------------|---------|
| **Large positive** (+1.0) | "This input STRONGLY suggests class 1" |
| **Small positive** (+0.1) | "This input slightly suggests class 1" |
| **Near zero** (0.0) | "This input doesn't matter" |
| **Small negative** (-0.1) | "This input slightly suggests class 0" |
| **Large negative** (-1.0) | "This input STRONGLY suggests class 0" |

### What We WANT the Perceptron to Learn

For detecting vertical lines, the ideal weights would encode this knowledge:

```
    "Pixels in columns = IMPORTANT for vertical detection"
    "Pixels in rows = NOT important (or negative) for vertical detection"
```

In weight terms:
- Middle column pixels → HIGH positive weights (vertical lines have these lit up)
- Other pixels → LOW or NEGATIVE weights (don't indicate verticality)

### The Problem: Random Weights = No Knowledge

Our current weights are random - they encode NO knowledge about vertical lines:
- Some weights are positive when they should be negative
- Some weights are large when they should be small
- There's no pattern that matches "vertical line detection"

### Feature Detection: What the Perceptron is Trying to Become

A **feature detector** is a model that responds strongly to specific patterns. Our goal:

| Input Pattern | Ideal Perceptron Response |
|---------------|---------------------------|
| Vertical line (any column) | High output (close to 1.0) |
| Horizontal line (any row) | Low output (close to 0.0) |

**Right now:** The Perceptron is NOT a feature detector - it's just random noise.

**After training:** It WILL become a vertical line feature detector!

### The Problem: Random Weights = Random Decisions

Let's visualize what our random weights actually look like:


In [None]:
# =============================================================================
# VISUALIZING THE PROBLEM: Random Weights vs Ideal Weights
# =============================================================================

# What ideal weights for a vertical detector should look like
ideal_weights = np.array([
    [-1,  2, -1],   # Top row: look for middle
    [-1,  2, -1],   # Middle row: look for middle
    [-1,  2, -1]    # Bottom row: look for middle
]).flatten() * 0.5

# Our actual (random) weights
actual_weights = perceptron.weights

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Plot 1: Random weights (what we have)
ax1 = axes[0]
weights_grid = actual_weights.reshape(3, 3)
im1 = ax1.imshow(weights_grid, cmap='RdBu', vmin=-0.5, vmax=0.5)
ax1.set_title('Our Random Weights\n(Untrained)', fontsize=12, fontweight='bold')
for i in range(3):
    for j in range(3):
        ax1.text(j, i, f'{weights_grid[i,j]:.2f}', ha='center', va='center', fontsize=10)
plt.colorbar(im1, ax=ax1, label='Weight value')

# Plot 2: Ideal weights (what we need)
ax2 = axes[1]
ideal_grid = ideal_weights.reshape(3, 3)
im2 = ax2.imshow(ideal_grid, cmap='RdBu', vmin=-0.5, vmax=0.5)
ax2.set_title('Ideal Weights\n(What we need)', fontsize=12, fontweight='bold')
for i in range(3):
    for j in range(3):
        ax2.text(j, i, f'{ideal_grid[i,j]:.2f}', ha='center', va='center', fontsize=10)
plt.colorbar(im2, ax=ax2, label='Weight value')

# Plot 3: A vertical line (what we're trying to detect)
ax3 = axes[2]
im3 = ax3.imshow(vertical_line, cmap='Blues', vmin=0, vmax=1)
ax3.set_title('Vertical Line\n(What we detect)', fontsize=12, fontweight='bold')
for i in range(3):
    for j in range(3):
        ax3.text(j, i, f'{vertical_line[i,j]}', ha='center', va='center', fontsize=10)
plt.colorbar(im3, ax=ax3, label='Pixel value')

plt.tight_layout()
plt.show()

# Show the key insight
print("\nKEY INSIGHT: Why Random Weights Fail")
print("="*60)
print("""
IDEAL weights for vertical detection should have:
  - HIGH values in the middle column (where vertical lines are)
  - LOW or NEGATIVE values elsewhere

Our RANDOM weights have no pattern - they're just noise!

The Perceptron doesn't KNOW what vertical lines look like yet.
It needs to LEARN the right weights through TRAINING.
""")


In [None]:
# =============================================================================
# WHAT IF WE HAD IDEAL WEIGHTS? (Sneak Preview)
# =============================================================================

print("="*70)
print("WHAT IF WE HAD THE RIGHT WEIGHTS? (A Preview)")
print("="*70)

# Create a new Perceptron and give it ideal weights
ideal_perceptron = Perceptron(n_inputs=9)
ideal_perceptron.weights = ideal_weights.copy()
ideal_perceptron.bias = -1.5  # A good threshold

print("\nIdeal weights (as 3x3 grid):")
print(f"  {ideal_perceptron.weights.reshape(3,3)[0]}")
print(f"  {ideal_perceptron.weights.reshape(3,3)[1]}")
print(f"  {ideal_perceptron.weights.reshape(3,3)[2]}")
print(f"\nBias: {ideal_perceptron.bias}")

# Test on the same dataset
correct_ideal = 0
for i in range(len(X_test)):
    if ideal_perceptron.predict(X_test[i]) == y_test[i]:
        correct_ideal += 1

accuracy_ideal = correct_ideal / len(y_test) * 100

print("\n" + "-"*70)
print("COMPARISON:")
print("-"*70)
print(f"\n  Random weights accuracy:  {accuracy:.1f}%")
print(f"  Ideal weights accuracy:   {accuracy_ideal:.1f}%")
print(f"\n  Improvement: +{accuracy_ideal - accuracy:.1f}%")

print("\n" + "="*70)
print("THE BIG QUESTION:")
print("="*70)
print("""
How do we get from RANDOM weights to IDEAL weights?

We don't want to hand-design them (that defeats the purpose!).
We want the Perceptron to LEARN them automatically.

This is what TRAINING does - and it's the topic of Part 5!
""")


---

## Part 4 Summary: What We've Learned

### Key Concepts Mastered

| Concept | What It Is | Why It Matters |
|---------|------------|----------------|
| **Perceptron** | Single-neuron neural network | Simplest possible NN, building block for larger networks |
| **Dataset Generation** | Creating training examples | We can test our models without external data |
| **Forward Pass** | Input → Output computation | This is how predictions are made |
| **Random Initialization** | Starting with random weights | The beginning state before learning |

### The Complete Perceptron Formula

$$\hat{y} = \sigma(w \cdot x + b) = \frac{1}{1 + e^{-(w \cdot x + b)}}$$

Or in code:
```python
z = np.dot(weights, x) + bias    # Weighted sum
output = 1 / (1 + np.exp(-z))    # Sigmoid activation
prediction = 1 if output >= 0.5 else 0
```

### Committee Analogy Progress

| Part | What Happened |
|------|---------------|
| Part 1 | Committee learned to read evidence (matrices) |
| Part 2 | First member learned to weigh evidence (weights/bias) |
| Part 3 | Member learned to cast meaningful votes (activation) |
| **Part 4** | **Member attempted their first case - and FAILED!** |
| Part 5 | (Next) Member learns from their mistakes |

### Key Insight

**Random weights = Random guessing**

An untrained Perceptron has no knowledge. Its weights are just noise. To become useful, it must **learn** the right weights by seeing examples and adjusting based on its mistakes.

---

## Knowledge Check


In [None]:
# =============================================================================
# KNOWLEDGE CHECK - Part 4
# =============================================================================

print("KNOWLEDGE CHECK - Part 4: The Perceptron")
print("="*60)
print("\nAnswer these questions to test your understanding:\n")

questions = [
    {
        "q": "1. What are the steps of a forward pass (in order)?",
        "options": [
            "A) Activation -> Weighted Sum -> Output",
            "B) Weighted Sum -> Add Bias -> Activation -> Output",
            "C) Input -> Output -> Activation",
            "D) Bias -> Weights -> Sigmoid"
        ],
        "answer": "B",
        "explanation": "The forward pass is: (1) compute weighted sum of inputs, (2) add bias, (3) apply activation function, (4) get output."
    },
    {
        "q": "2. Why does an untrained Perceptron get ~50% accuracy?",
        "options": [
            "A) Because sigmoid always outputs 0.5",
            "B) Because the dataset is unbalanced",
            "C) Because random weights give random predictions",
            "D) Because the bias is always 0"
        ],
        "answer": "C",
        "explanation": "Random weights have no meaningful pattern, so the Perceptron essentially guesses randomly. For binary classification, random guessing gives ~50% accuracy."
    },
    {
        "q": "3. What does the forward pass output for binary classification?",
        "options": [
            "A) Always 0 or 1 exactly",
            "B) A probability between 0 and 1",
            "C) Any real number",
            "D) The raw weighted sum"
        ],
        "answer": "B",
        "explanation": "The sigmoid activation squashes the output to a probability between 0 and 1. We then threshold at 0.5 to get a binary prediction."
    },
    {
        "q": "4. For a vertical line detector, where should the weights be highest?",
        "options": [
            "A) In the corners",
            "B) In the middle column",
            "C) In the middle row",
            "D) Equally everywhere"
        ],
        "answer": "B",
        "explanation": "Vertical lines appear in columns. High weights in the middle column will give high scores when vertical pixels align with them."
    },
    {
        "q": "5. Who invented the Perceptron?",
        "options": [
            "A) Geoffrey Hinton",
            "B) Frank Rosenblatt",
            "C) Yann LeCun",
            "D) Alan Turing"
        ],
        "answer": "B",
        "explanation": "Frank Rosenblatt invented the Perceptron in 1958 at Cornell. It was the first neural network that could learn!"
    }
]

for q in questions:
    print(q["q"])
    for opt in q["options"]:
        print(f"   {opt}")
    print()

print("\n" + "="*60)
print("Scroll down for answers...")
print("="*60)


In [None]:
# =============================================================================
# ANSWERS - Knowledge Check Part 4
# =============================================================================

print("ANSWERS - Part 4 Knowledge Check")
print("="*60)

for i, q in enumerate(questions, 1):
    print(f"\n{i}. Answer: {q['answer']}")
    print(f"   Explanation: {q['explanation']}")

print("\n" + "="*60)
print("How did you do?")
print("  5/5: Perceptron Expert!")
print("  4/5: Great understanding!")
print("  3/5: Review the sections you missed")
print("  <3:  Re-read Part 4 before continuing")
print("="*60)


---

## What's Next?

You've completed Part 4! Our Perceptron is built but confused - it makes random guesses because its weights are random.

### Coming Up in Part 5: Training - Learning from Mistakes

In Part 5, we'll cover:
- **Loss Functions** - Measuring "how wrong" a prediction is
- **Gradient Descent** - Finding better weights
- **Backpropagation** - How errors flow backward
- **The Training Loop** - Iteratively improving weights
- **Watch It Learn** - See accuracy improve from 50% to 90%+!

---

**Continue to Part 5:** `part_5_training.ipynb`

---

*"The Perceptron is ready. The data is ready. Now it's time to LEARN."*

**The Brain's Decision Committee** - Learning to See, One Step at a Time
