# üî≤ The Convolution Operation - Heart of CNNs

Welcome to the most important concept in CNNs! üéâ

In the previous notebook, we learned **WHY** CNNs work (local connectivity, parameter sharing, translation invariance). Now we'll learn **HOW** they work by understanding the convolution operation!

## üéØ What You'll Learn

By the end of this notebook, you'll understand:
- **What is a filter/kernel** and how it detects patterns
- **How convolution works** with step-by-step animations
- **Implementing conv2d from scratch** in NumPy
- **Different filter types**: edge detection, blur, sharpen
- **Stride and padding** and how they affect output size
- **Feature maps** and why we use multiple filters
- **Visualizing** what different filters detect

**Prerequisites:** Notebook 01 (What are CNNs)

---

## üé¨ The Movie Analogy

Think of convolution like watching a movie through a small window:
- **Filter**: The window (defines what you can see)
- **Sliding**: Moving the window across the screen
- **Output**: Your description of what you saw at each position

Different window sizes and patterns reveal different aspects of the movie! üé•

Let's dive in! üöÄ

In [None]:
# Import our tools
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from matplotlib.patches import Rectangle
from IPython.display import HTML
import matplotlib.patches as mpatches

# Set random seed for reproducibility
np.random.seed(42)

# Configure matplotlib for better plots
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 10

print("‚úÖ Libraries imported successfully!")
print(f"üì¶ NumPy version: {np.__version__}")

---
## üîç What is a Filter (Kernel)?

### üéØ The Core Idea

A **filter** (also called **kernel**) is a small matrix of numbers that slides across an image to detect specific patterns!

**Simple Example:**
```python
filter = [
    [1, 0, -1],
    [1, 0, -1],
    [1, 0, -1]
]
```

This 3√ó3 filter detects **vertical edges**!

### ü§î Why These Numbers?

- **Left column (1, 1, 1)**: Look for bright pixels on the left
- **Middle column (0, 0, 0)**: Don't care about middle
- **Right column (-1, -1, -1)**: Look for dark pixels on the right

**Result**: Responds strongly where there's a bright‚Üídark transition (a vertical edge!) üéØ

### üìê Filter Properties

- **Size**: Usually 3√ó3, 5√ó5, or 7√ó7 (always odd numbers for symmetry)
- **Values**: Can be any numbers (learned during training!)
- **Purpose**: Each filter learns to detect a specific pattern

Let's visualize some common filters!

In [None]:
# Define common filters
filters = {
    'Vertical Edge': np.array([
        [1, 0, -1],
        [1, 0, -1],
        [1, 0, -1]
    ]),
    'Horizontal Edge': np.array([
        [1, 1, 1],
        [0, 0, 0],
        [-1, -1, -1]
    ]),
    'Diagonal Edge': np.array([
        [2, 1, 0],
        [1, 0, -1],
        [0, -1, -2]
    ]),
    'Sharpen': np.array([
        [0, -1, 0],
        [-1, 5, -1],
        [0, -1, 0]
    ]),
    'Blur (Box)': np.ones((3, 3)) / 9,
    'Identity': np.array([
        [0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]
    ])
}

# Visualize all filters
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, (name, filt) in enumerate(filters.items()):
    ax = axes[idx]
    
    # Display filter as heatmap
    im = ax.imshow(filt, cmap='RdBu', vmin=-2, vmax=2, interpolation='nearest')
    ax.set_title(f'{name}\n({filt.shape[0]}√ó{filt.shape[1]})', 
                fontsize=13, fontweight='bold')
    
    # Add grid
    for i in range(4):
        ax.axhline(i - 0.5, color='black', linewidth=2)
        ax.axvline(i - 0.5, color='black', linewidth=2)
    
    # Add values as text
    for i in range(filt.shape[0]):
        for j in range(filt.shape[1]):
            text_color = 'white' if abs(filt[i, j]) > 0.5 else 'black'
            ax.text(j, i, f'{filt[i, j]:.2f}', 
                   ha='center', va='center',
                   color=text_color, fontweight='bold', fontsize=11)
    
    ax.set_xticks([])
    ax.set_yticks([])
    
    # Add colorbar
    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

plt.suptitle('Common Filters and What They Detect', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüéØ What Each Filter Detects:")
print("   ‚Ä¢ Vertical Edge: Bright‚ÜíDark transitions (left to right)")
print("   ‚Ä¢ Horizontal Edge: Bright‚ÜíDark transitions (top to bottom)")
print("   ‚Ä¢ Diagonal Edge: Diagonal boundaries")
print("   ‚Ä¢ Sharpen: Enhances edges and details")
print("   ‚Ä¢ Blur: Smooths by averaging neighbors")
print("   ‚Ä¢ Identity: Returns the original (no change)")
print("\nüí° In CNNs, filters are LEARNED, not hand-designed!")

---
## üîÑ How Does Convolution Work?

### üìù The Algorithm (Step-by-Step)

**Convolution** is simply:
1. **Place** the filter on the top-left of the image
2. **Multiply** each filter value with the corresponding image pixel
3. **Sum** all those products to get ONE output number
4. **Slide** the filter one position to the right
5. **Repeat** steps 2-4 until you've covered the entire image

### üßÆ The Math

For a 3√ó3 filter at position (i, j):

```
Output[i,j] = Œ£ Œ£ Image[i+m, j+n] √ó Filter[m, n]
              m n
```

**In plain English**: Multiply corresponding values and add them up!

### üé¨ Let's See It in Action!

I'll show you a simple 5√ó5 image convolved with a 3√ó3 filter:

In [None]:
# Create a simple example
simple_image = np.array([
    [1, 1, 1, 0, 0],
    [1, 1, 1, 0, 0],
    [1, 1, 1, 0, 0],
    [1, 1, 1, 0, 0],
    [1, 1, 1, 0, 0]
])

vertical_edge_filter = np.array([
    [1, 0, -1],
    [1, 0, -1],
    [1, 0, -1]
])

# Manually compute one position (top-left 3x3)
print("üîç Computing Convolution at Position (0, 0)")
print("="*70)
print("\nImage patch (3√ó3):")
patch = simple_image[0:3, 0:3]
print(patch)

print("\nFilter (3√ó3):")
print(vertical_edge_filter)

print("\nüìä Element-wise Multiplication:")
print("="*70)
elementwise_product = patch * vertical_edge_filter
print(elementwise_product)

print("\n‚ûï Sum of all elements:")
output_value = np.sum(elementwise_product)
print(f"   {output_value}")

print("\nüéØ This is the output value at position (0, 0)!")
print("\nNow we slide the filter and repeat...")

# Show the calculation in detail
print("\nüìù Detailed Calculation:")
print("="*70)
total = 0
for i in range(3):
    for j in range(3):
        product = patch[i, j] * vertical_edge_filter[i, j]
        total += product
        print(f"   [{i},{j}]: {patch[i,j]} √ó {vertical_edge_filter[i,j]:2} = {product:3}")

print(f"\n   Sum = {total}")
print("\nüí° Notice: This filter is detecting the vertical edge at x=2 (bright‚Üídark)")

### üé® Visualizing the Sliding Window

Let's create a visualization showing how the filter slides across the image!

In [None]:
# Create visualization of convolution process
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# We'll show 6 different filter positions
positions = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]

for idx, (pos_i, pos_j) in enumerate(positions):
    ax = axes[idx // 3, idx % 3]
    
    # Display the image
    ax.imshow(simple_image, cmap='gray', vmin=0, vmax=1, interpolation='nearest')
    
    # Highlight the current receptive field
    rect = Rectangle((pos_j - 0.5, pos_i - 0.5), 3, 3,
                     linewidth=4, edgecolor='red', facecolor='none')
    ax.add_patch(rect)
    
    # Extract the patch and compute output
    patch = simple_image[pos_i:pos_i+3, pos_j:pos_j+3]
    output_val = np.sum(patch * vertical_edge_filter)
    
    ax.set_title(f'Position ({pos_i}, {pos_j})\nOutput = {output_val:.0f}',
                fontsize=12, fontweight='bold')
    
    # Add grid
    for i in range(6):
        ax.axhline(i - 0.5, color='cyan', linewidth=1)
        ax.axvline(i - 0.5, color='cyan', linewidth=1)
    
    # Show pixel values
    for i in range(5):
        for j in range(5):
            ax.text(j, i, f'{simple_image[i, j]:.0f}',
                   ha='center', va='center',
                   color='yellow', fontweight='bold', fontsize=10)
    
    ax.set_xticks([])
    ax.set_yticks([])

plt.suptitle('Convolution: Sliding Window Process\n(Filter slides ‚Üí computes output at each position)',
            fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüéØ Key Observations:")
print("   ‚Ä¢ Filter slides left-to-right, top-to-bottom")
print("   ‚Ä¢ Each position produces ONE output value")
print("   ‚Ä¢ Output is strongest at position (0,1) and (0,2) - where the edge is!")
print("   ‚Ä¢ This 5√ó5 image with 3√ó3 filter ‚Üí 3√ó3 output (5-3+1=3)")

---
## üíª Implementing Conv2D from Scratch

Let's implement convolution in pure NumPy to truly understand it!

### üéØ Function Signature

```python
def conv2d(image, kernel, stride=1, padding=0):
    """
    Perform 2D convolution.
    
    Parameters:
    -----------
    image : np.ndarray, shape (H, W)
        Input image
    kernel : np.ndarray, shape (K, K)
        Convolution filter
    stride : int
        Step size for sliding window
    padding : int
        Border padding size
    
    Returns:
    --------
    output : np.ndarray
        Convolution output (feature map)
    """
```

Let's implement this step by step!

In [None]:
def conv2d(image, kernel, stride=1, padding=0, verbose=False):
    """
    Perform 2D convolution - the heart of CNNs!
    
    This is a simple but correct implementation using nested loops.
    Real frameworks (PyTorch, TensorFlow) use highly optimized algorithms.
    """
    # Get dimensions
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape
    
    if verbose:
        print(f"üìä Input: {image.shape}")
        print(f"üìä Kernel: {kernel.shape}")
        print(f"üìä Stride: {stride}, Padding: {padding}")
    
    # Step 1: Add padding if needed
    if padding > 0:
        image = np.pad(image, 
                      pad_width=padding,
                      mode='constant',
                      constant_values=0)
        if verbose:
            print(f"üìä After padding: {image.shape}")
    
    # Update dimensions after padding
    padded_height, padded_width = image.shape
    
    # Step 2: Calculate output dimensions
    # Formula: (W - K + 2P) / S + 1
    output_height = (padded_height - kernel_height) // stride + 1
    output_width = (padded_width - kernel_width) // stride + 1
    
    if verbose:
        print(f"üìä Output will be: ({output_height}, {output_width})")
    
    # Step 3: Initialize output
    output = np.zeros((output_height, output_width))
    
    # Step 4: Perform convolution
    for i in range(output_height):
        for j in range(output_width):
            # Calculate the position in the padded image
            h_start = i * stride
            h_end = h_start + kernel_height
            w_start = j * stride
            w_end = w_start + kernel_width
            
            # Extract the receptive field (the patch we're looking at)
            receptive_field = image[h_start:h_end, w_start:w_end]
            
            # Perform element-wise multiplication and sum
            # This is the core of convolution!
            output[i, j] = np.sum(receptive_field * kernel)
    
    if verbose:
        print(f"‚úÖ Convolution complete!")
    
    return output

# Test our implementation!
print("üß™ Testing our conv2d implementation...")
print("="*70)

test_output = conv2d(simple_image, vertical_edge_filter, stride=1, padding=0, verbose=True)

print("\nüìä Output Feature Map:")
print(test_output)

print("\n‚úÖ Success! Our implementation works!")
print("\nüí° Notice: Highest values (3.0) are at positions where the vertical edge is!")

### üß™ Testing Different Filters

Let's test our convolution implementation with different filters!

In [None]:
# Create a more interesting test image
test_image = np.array([
    [0, 0, 1, 1, 1, 0, 0],
    [0, 1, 0, 0, 0, 1, 0],
    [1, 0, 0, 0, 0, 0, 1],
    [1, 0, 0, 0, 0, 0, 1],
    [1, 0, 0, 0, 0, 0, 1],
    [0, 1, 0, 0, 0, 1, 0],
    [0, 0, 1, 1, 1, 0, 0]
])

# Test with multiple filters
test_filters = {
    'Vertical Edge': np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]]),
    'Horizontal Edge': np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]]),
    'Blur': np.ones((3, 3)) / 9,
    'Sharpen': np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
}

# Apply each filter
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

# Show original image
axes[0].imshow(test_image, cmap='gray', interpolation='nearest')
axes[0].set_title('Original Image\n(7√ó7)', fontsize=13, fontweight='bold')
axes[0].grid(True, color='cyan', linewidth=1)
axes[0].set_xticks(range(7))
axes[0].set_yticks(range(7))

# Apply each filter
for idx, (name, filt) in enumerate(test_filters.items()):
    ax = axes[idx + 1]
    
    # Apply convolution
    output = conv2d(test_image, filt, stride=1, padding=0)
    
    # Display output
    im = ax.imshow(output, cmap='RdBu', interpolation='nearest')
    ax.set_title(f'{name}\nOutput: {output.shape}', fontsize=13, fontweight='bold')
    ax.grid(True, color='gray', linewidth=0.5)
    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    
    # Show values
    for i in range(output.shape[0]):
        for j in range(output.shape[1]):
            ax.text(j, i, f'{output[i,j]:.1f}',
                   ha='center', va='center',
                   color='black', fontsize=8)

# Hide last subplot
axes[5].axis('off')

plt.suptitle('Different Filters Detect Different Features!', fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüéØ What We Can See:")
print("   ‚Ä¢ Vertical Edge: Responds to left/right boundaries of the circle")
print("   ‚Ä¢ Horizontal Edge: Responds to top/bottom boundaries")
print("   ‚Ä¢ Blur: Smooths the image")
print("   ‚Ä¢ Sharpen: Enhances edges and details")
print("\nüí° Each filter extracts DIFFERENT information from the same image!")

---
## üèÉ Stride: Controlling the Step Size

### üéØ What is Stride?

**Stride** is how many pixels we move the filter at each step!

- **Stride = 1**: Move one pixel at a time (default)
- **Stride = 2**: Skip every other pixel
- **Stride = 3**: Skip two pixels, etc.

### ü§î Why Use Stride > 1?

‚úÖ **Reduces output size** (downsampling)
‚úÖ **Faster computation** (fewer positions to compute)
‚úÖ **Alternative to pooling** (can replace pooling layers)

### üìê How Stride Affects Output Size

```
Output Size = (Input - Kernel) / Stride + 1
```

**Example**: 7√ó7 input, 3√ó3 kernel
- Stride 1: (7-3)/1 + 1 = 5√ó5 output
- Stride 2: (7-3)/2 + 1 = 3√ó3 output
- Stride 3: (7-3)/3 + 1 = 2√ó2 output

Let's visualize this!

In [None]:
# Create a simple 7x7 image
stride_test_image = np.arange(49).reshape(7, 7)

# Simple averaging filter
avg_filter = np.ones((3, 3)) / 9

# Test different strides
strides = [1, 2, 3]

fig, axes = plt.subplots(1, 4, figsize=(18, 5))

# Show original
axes[0].imshow(stride_test_image, cmap='viridis', interpolation='nearest')
axes[0].set_title('Original Image\n(7√ó7)', fontsize=13, fontweight='bold')
axes[0].grid(True, color='white', linewidth=1.5)

# Add grid
for i in range(8):
    axes[0].axhline(i - 0.5, color='white', linewidth=1.5)
    axes[0].axvline(i - 0.5, color='white', linewidth=1.5)

# Test each stride
for idx, stride in enumerate(strides):
    ax = axes[idx + 1]
    
    # Apply convolution with this stride
    output = conv2d(stride_test_image, avg_filter, stride=stride, padding=0)
    
    # Display
    im = ax.imshow(output, cmap='viridis', interpolation='nearest')
    ax.set_title(f'Stride = {stride}\nOutput: {output.shape}',
                fontsize=13, fontweight='bold')
    
    # Add grid
    for i in range(output.shape[0] + 1):
        ax.axhline(i - 0.5, color='white', linewidth=1.5)
        ax.axvline(i - 0.5, color='white', linewidth=1.5)
    
    # Show values
    for i in range(output.shape[0]):
        for j in range(output.shape[1]):
            ax.text(j, i, f'{output[i,j]:.1f}',
                   ha='center', va='center',
                   color='white', fontweight='bold', fontsize=10)
    
    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

plt.suptitle('Effect of Stride on Output Size', fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüìä Output Size Summary:")
print("="*50)
print(f"{'Stride':<10} {'Formula':<25} {'Output Size':<15}")
print("="*50)
for stride in strides:
    size = (7 - 3) // stride + 1
    formula = f"(7-3)/{stride}+1 = {size}"
    print(f"{stride:<10} {formula:<25} {size}√ó{size}")
print("="*50)

print("\nüéØ Key Insight:")
print("   ‚Ä¢ Larger stride ‚Üí Smaller output")
print("   ‚Ä¢ Stride=2 is common in modern CNNs (alternative to pooling)")
print("   ‚Ä¢ Trade-off: Less computation but also less spatial information")

---
## üõ°Ô∏è Padding: Preserving Spatial Dimensions

### üéØ The Problem

Without padding:
- 7√ó7 image + 3√ó3 filter ‚Üí 5√ó5 output (shrinks!)
- Each convolution layer makes the image smaller
- After many layers: 224√ó224 ‚Üí 222√ó220 ‚Üí 220√ó218 ‚Üí ... ‚Üí Too small!

### üõ°Ô∏è The Solution: Padding

**Padding** adds border pixels around the image!

```
Original:        With Padding=1:
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê          ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  7√ó7‚îÇ          ‚îÇ0 0 0 0‚îÇ
‚îÇ     ‚îÇ    ‚Üí     ‚îÇ0  7√ó7 ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò          ‚îÇ0 0 0 0‚îÇ
                 ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                    9√ó9
```

### üìê Two Types of Padding

1. **Valid Padding (no padding)**
   - Output size = Input - Kernel + 1
   - Image shrinks with each layer

2. **Same Padding**
   - Output size = Input size (when stride=1)
   - Padding = (Kernel - 1) / 2
   - Most common in modern CNNs!

### üßÆ Calculating Required Padding

For "same" padding with stride=1:
```python
padding = (kernel_size - 1) // 2
```

Examples:
- 3√ó3 kernel ‚Üí padding = 1
- 5√ó5 kernel ‚Üí padding = 2
- 7√ó7 kernel ‚Üí padding = 3

Let's see padding in action!

In [None]:
# Create a simple image
padding_test_image = np.ones((5, 5))
padding_test_image[1:4, 1:4] = 0  # Create a dark square in the middle

# Test different padding values
padding_values = [0, 1, 2]

fig, axes = plt.subplots(2, 3, figsize=(16, 10))

for idx, pad in enumerate(padding_values):
    # Show padded image
    ax_img = axes[0, idx]
    
    if pad > 0:
        padded_img = np.pad(padding_test_image, pad_width=pad, 
                           mode='constant', constant_values=0.5)
    else:
        padded_img = padding_test_image
    
    ax_img.imshow(padded_img, cmap='gray', interpolation='nearest', vmin=0, vmax=1)
    ax_img.set_title(f'Padding = {pad}\nSize: {padded_img.shape}',
                    fontsize=12, fontweight='bold')
    
    # Add grid
    for i in range(padded_img.shape[0] + 1):
        ax_img.axhline(i - 0.5, color='red', linewidth=1.5)
        ax_img.axvline(i - 0.5, color='red', linewidth=1.5)
    
    # Highlight the padding region
    if pad > 0:
        rect = Rectangle((pad - 0.5, pad - 0.5), 5, 5,
                        linewidth=3, edgecolor='yellow', facecolor='none')
        ax_img.add_patch(rect)
        ax_img.text(padded_img.shape[1]/2, -1, 
                   'Yellow box = original image',
                   ha='center', fontsize=10, color='yellow', fontweight='bold')
    
    # Apply convolution with this padding
    ax_output = axes[1, idx]
    output = conv2d(padding_test_image, avg_filter, stride=1, padding=pad)
    
    im = ax_output.imshow(output, cmap='viridis', interpolation='nearest')
    ax_output.set_title(f'After Convolution\nOutput: {output.shape}',
                       fontsize=12, fontweight='bold')
    
    # Add grid
    for i in range(output.shape[0] + 1):
        ax_output.axhline(i - 0.5, color='white', linewidth=1)
        ax_output.axvline(i - 0.5, color='white', linewidth=1)
    
    plt.colorbar(im, ax=ax_output, fraction=0.046, pad=0.04)

plt.suptitle('Effect of Padding on Output Size\n(3√ó3 filter, stride=1)',
            fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüìä Output Size Summary:")
print("="*60)
print(f"{'Padding':<15} {'Input Size':<15} {'Output Size':<15} {'Change':<15}")
print("="*60)
for pad in padding_values:
    output = conv2d(padding_test_image, avg_filter, stride=1, padding=pad)
    change = "Shrinks" if pad == 0 else ("Same" if pad == 1 else "Grows")
    print(f"{pad:<15} {'5√ó5':<15} {f'{output.shape[0]}√ó{output.shape[1]}':<15} {change:<15}")
print("="*60)

print("\nüéØ Key Points:")
print("   ‚Ä¢ Padding=0: Output shrinks (valid padding)")
print("   ‚Ä¢ Padding=1: Output stays same size with 3√ó3 kernel (same padding)")
print("   ‚Ä¢ Padding=2: Output grows (usually not desired)")
print("\nüí° Most CNNs use 'same' padding to maintain spatial dimensions!")

### üßÆ The Complete Output Size Formula

Combining everything we've learned:

```
Output Size = ‚åä(Input + 2√óPadding - Kernel) / Stride‚åã + 1
```

Where ‚åä ‚åã means floor division (round down).

Let's create a calculator!

In [None]:
def calculate_output_size(input_size, kernel_size, stride, padding):
    """
    Calculate output size after convolution.
    
    Formula: ‚åä(Input + 2√óPadding - Kernel) / Stride‚åã + 1
    """
    output_size = (input_size + 2 * padding - kernel_size) // stride + 1
    return output_size

# Test various configurations
print("üßÆ Convolution Output Size Calculator")
print("="*80)
print(f"{'Input':<10} {'Kernel':<10} {'Stride':<10} {'Padding':<10} {'Output':<10} {'Description':<25}")
print("="*80)

test_configs = [
    (32, 3, 1, 0, "No padding (shrinks)"),
    (32, 3, 1, 1, "Same padding (preserves size)"),
    (32, 3, 2, 1, "Stride=2 (downsampling)"),
    (224, 7, 2, 3, "ImageNet first layer"),
    (28, 5, 1, 0, "MNIST with 5√ó5 filter"),
    (64, 3, 1, 1, "Typical CNN layer"),
    (56, 3, 2, 1, "Downsampling layer"),
]

for input_size, kernel, stride, padding, description in test_configs:
    output = calculate_output_size(input_size, kernel, stride, padding)
    print(f"{input_size:<10} {kernel:<10} {stride:<10} {padding:<10} {output:<10} {description:<25}")

print("="*80)

print("\nüí° Design Tips:")
print("   ‚Ä¢ Use padding=1 with 3√ó3 kernels to maintain size")
print("   ‚Ä¢ Use stride=2 for downsampling (alternative to pooling)")
print("   ‚Ä¢ First layer often uses larger kernel (7√ó7) and stride=2")
print("   ‚Ä¢ Later layers typically use 3√ó3 kernels")

---
## üó∫Ô∏è Feature Maps: Multiple Filters

### üéØ The Big Idea

In real CNNs, we don't use just ONE filter - we use MANY!

**Why?**
- One filter = detects one pattern (e.g., vertical edges)
- Multiple filters = detect multiple patterns!
  - Filter 1: Vertical edges
  - Filter 2: Horizontal edges
  - Filter 3: Diagonal edges
  - Filter 4: Corners
  - Filter 5: Textures
  - ... and so on!

### üìä Feature Map Dimensions

```
Input: (H, W, C_in)
  ‚Ä¢ H = height
  ‚Ä¢ W = width
  ‚Ä¢ C_in = input channels

Filters: N filters of size (K, K, C_in)
  ‚Ä¢ N = number of filters
  ‚Ä¢ K = kernel size
  ‚Ä¢ C_in = must match input channels

Output: (H_out, W_out, N)
  ‚Ä¢ H_out, W_out = calculated using formula
  ‚Ä¢ N = number of feature maps (one per filter)
```

### üé® Visualizing Multiple Feature Maps

Let's create a real example with multiple filters!

In [None]:
# Create a richer test image (10√ó10 with various features)
rich_image = np.zeros((10, 10))

# Add vertical edge
rich_image[:, 3] = 1

# Add horizontal edge
rich_image[6, :] = 1

# Add a bright square
rich_image[1:3, 7:9] = 1

# Define multiple filters
filters_dict = {
    'Vertical\nEdge': np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]]),
    'Horizontal\nEdge': np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]]),
    'Diagonal\nEdge': np.array([[2, 1, 0], [1, 0, -1], [0, -1, -2]]),
    'Corner\nDetector': np.array([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]),
}

# Apply each filter
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.flatten()

# Show original
axes[0].imshow(rich_image, cmap='gray', interpolation='nearest')
axes[0].set_title('Original Image\n(10√ó10)', fontsize=13, fontweight='bold')
axes[0].grid(True, color='cyan', linewidth=0.5)

# Apply each filter and show feature map
for idx, (name, filt) in enumerate(filters_dict.items()):
    ax = axes[idx + 1]
    
    # Apply convolution
    feature_map = conv2d(rich_image, filt, stride=1, padding=0)
    
    # Display feature map
    im = ax.imshow(feature_map, cmap='RdBu', interpolation='nearest')
    ax.set_title(f'{name}\nFeature Map: {feature_map.shape}',
                fontsize=12, fontweight='bold')
    ax.grid(True, color='gray', linewidth=0.5)
    plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

# Show the filters
axes[5].axis('off')
axes[5].text(0.5, 0.5,
            'üéØ Each filter detects\n'
            'different patterns!\n\n'
            '‚Ä¢ Vertical filter ‚Üí strong\n'
            '  response at vertical edges\n\n'
            '‚Ä¢ Horizontal filter ‚Üí strong\n'
            '  response at horizontal edges\n\n'
            '‚Ä¢ Each feature map shows\n'
            '  WHERE that pattern exists\n\n'
            'üß† CNN learns these filters\n'
            'automatically during training!',
            ha='center', va='center', fontsize=11,
            bbox=dict(boxstyle='round,pad=1', facecolor='lightyellow',
                     edgecolor='black', linewidth=2))

plt.suptitle('Multiple Filters ‚Üí Multiple Feature Maps\n'
            '(Each filter specializes in detecting different patterns)',
            fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüéØ Understanding Feature Maps:")
print("   ‚Ä¢ Original image: 10√ó10√ó1 (one channel)")
print("   ‚Ä¢ We applied 4 different 3√ó3 filters")
print("   ‚Ä¢ Result: 4 feature maps, each 8√ó8")
print("   ‚Ä¢ Output shape: 8√ó8√ó4 (height √ó width √ó channels)")
print("\nüí° Key Insight:")
print("   Each filter acts as a 'feature detector' for a specific pattern!")
print("   More filters ‚Üí more features can be detected!")

### üß† What Do Real CNN Filters Learn?

In a trained CNN:

**First Layer Filters**:
- Simple patterns: edges, colors, simple textures
- Look like edge detectors, color blobs

**Middle Layer Filters**:
- More complex: corners, curves, patterns
- Combinations of first layer features

**Deep Layer Filters**:
- Very complex: object parts (wheels, faces, eyes)
- Semantic features

**Final Layers**:
- Whole objects and scenes

This creates a **hierarchical feature representation**! üéØ

In [None]:
# Visualize hierarchical feature learning
fig, ax = plt.subplots(figsize=(16, 10))
ax.set_xlim(0, 12)
ax.set_ylim(0, 10)
ax.axis('off')

# Define layers
layers = [
    {'name': 'Input\nImage', 'x': 1, 'y': 3, 'height': 4, 'color': '#FFE4E1'},
    {'name': 'Layer 1\nEdges\nColors', 'x': 3, 'y': 2.5, 'height': 5, 'color': '#87CEEB'},
    {'name': 'Layer 2\nTextures\nPatterns', 'x': 5.5, 'y': 2, 'height': 6, 'color': '#90EE90'},
    {'name': 'Layer 3\nShapes\nParts', 'x': 8, 'y': 2.5, 'height': 5, 'color': '#FFB6C1'},
    {'name': 'Layer 4\nObjects', 'x': 10.5, 'y': 3, 'height': 4, 'color': '#DDA0DD'}
]

# Draw layers
for i, layer in enumerate(layers):
    # Draw box
    from matplotlib.patches import FancyBboxPatch
    box = FancyBboxPatch((layer['x'], layer['y']), 1.2, layer['height'],
                         boxstyle="round,pad=0.1",
                         facecolor=layer['color'],
                         edgecolor='black', linewidth=3)
    ax.add_patch(box)
    
    # Add text
    ax.text(layer['x'] + 0.6, layer['y'] + layer['height']/2,
           layer['name'], ha='center', va='center',
           fontsize=11, fontweight='bold')
    
    # Add arrows
    if i < len(layers) - 1:
        next_layer = layers[i + 1]
        arrow = mpatches.FancyArrowPatch(
            (layer['x'] + 1.2, layer['y'] + layer['height']/2),
            (next_layer['x'], next_layer['y'] + next_layer['height']/2),
            arrowstyle='->', mutation_scale=30,
            linewidth=3, color='black', alpha=0.6
        )
        ax.add_patch(arrow)

# Add title and description
ax.text(6, 9, 'Hierarchical Feature Learning in CNNs',
       ha='center', fontsize=16, fontweight='bold')

ax.text(6, 0.5,
       'Each layer builds on the previous layer, detecting increasingly complex patterns!',
       ha='center', fontsize=12, style='italic',
       bbox=dict(boxstyle='round,pad=0.8', facecolor='lightyellow',
                edgecolor='black', linewidth=2))

plt.tight_layout()
plt.show()

print("\nüß† How CNNs Build Understanding:")
print("\n   Layer 1 (Early):")
print("     ‚Ä¢ Learns: Simple edges, colors, basic textures")
print("     ‚Ä¢ Example: Horizontal line, blue blob, rough texture")
print("\n   Layer 2 (Middle):")
print("     ‚Ä¢ Learns: Combinations of edges ‚Üí shapes")
print("     ‚Ä¢ Example: Corner, circle, grid pattern")
print("\n   Layer 3 (Middle-Deep):")
print("     ‚Ä¢ Learns: Object parts")
print("     ‚Ä¢ Example: Eye, wheel, door, window")
print("\n   Layer 4 (Deep):")
print("     ‚Ä¢ Learns: Complete objects")
print("     ‚Ä¢ Example: Cat face, car, house")
print("\nüí° This is why deep networks work so well!")
print("   They automatically learn the right features at each level!")

---
## üåà Convolution with RGB Images

### üéØ The Challenge

So far we've worked with grayscale images (2D). But most real images are **RGB** (3D)!

```
Grayscale: (H, W)     - 2D
RGB:       (H, W, 3)  - 3D (Red, Green, Blue channels)
```

### üîç How Does Convolution Work with RGB?

**Key Insight**: The filter must have the SAME depth as the input!

```
Input: (H, W, 3)     - RGB image
Filter: (K, K, 3)    - 3 channels to match!
Output: (H', W', 1)  - Single feature map per filter
```

**The Process**:
1. Filter has 3 layers (one for R, one for G, one for B)
2. Convolve each channel separately
3. **Sum** the results from all 3 channels
4. Result: One output value

### üé® Mathematical Formula

```python
output[i,j] = sum over all channels (
    image[i:i+k, j:j+k, channel] * filter[:,:,channel]
)
```

Let's implement this!

In [None]:
def conv2d_rgb(image, kernel, stride=1, padding=0):
    """
    Perform 2D convolution on RGB images.
    
    Parameters:
    -----------
    image : np.ndarray, shape (H, W, 3)
        RGB input image
    kernel : np.ndarray, shape (K, K, 3)
        3D convolution filter (must have 3 channels)
    stride : int
        Step size
    padding : int
        Border padding
    
    Returns:
    --------
    output : np.ndarray, shape (H', W')
        Feature map (2D output)
    """
    # Get dimensions
    height, width, channels = image.shape
    kernel_height, kernel_width, kernel_channels = kernel.shape
    
    assert channels == kernel_channels, "Image and kernel must have same number of channels!"
    
    # Add padding if needed (pad all channels)
    if padding > 0:
        image = np.pad(image,
                      pad_width=((padding, padding), (padding, padding), (0, 0)),
                      mode='constant',
                      constant_values=0)
    
    # Update dimensions
    padded_height, padded_width, _ = image.shape
    
    # Calculate output size
    output_height = (padded_height - kernel_height) // stride + 1
    output_width = (padded_width - kernel_width) // stride + 1
    
    # Initialize output
    output = np.zeros((output_height, output_width))
    
    # Perform convolution
    for i in range(output_height):
        for j in range(output_width):
            h_start = i * stride
            h_end = h_start + kernel_height
            w_start = j * stride
            w_end = w_start + kernel_width
            
            # Extract 3D receptive field
            receptive_field = image[h_start:h_end, w_start:w_end, :]
            
            # Convolve: multiply element-wise and sum EVERYTHING
            # This sums across height, width, AND channels!
            output[i, j] = np.sum(receptive_field * kernel)
    
    return output

# Test with a simple RGB image
rgb_image = np.random.rand(8, 8, 3)  # Random RGB image

# Create an RGB filter (3 channels)
rgb_filter = np.random.randn(3, 3, 3) * 0.1

print("üß™ Testing RGB Convolution...")
print("="*60)
print(f"Input image shape: {rgb_image.shape}")
print(f"Filter shape: {rgb_filter.shape}")

output = conv2d_rgb(rgb_image, rgb_filter, stride=1, padding=0)

print(f"Output shape: {output.shape}")
print("\n‚úÖ Success! RGB convolution works!")

print("\nüîç What Happened:")
print("   ‚Ä¢ 8√ó8√ó3 image (RGB)")
print("   ‚Ä¢ 3√ó3√ó3 filter (matches 3 channels)")
print("   ‚Ä¢ Output: 6√ó6 feature map (single channel)")
print("\nüí° One filter ‚Üí One feature map (regardless of input channels!)")

### üé® Visualizing RGB Convolution

In [None]:
# Create a clearer example
# Make an RGB image with distinct channel patterns
demo_rgb = np.zeros((6, 6, 3))
demo_rgb[:, :3, 0] = 1.0  # Left half is RED
demo_rgb[:, 3:, 1] = 1.0  # Right half is GREEN

# Create a filter that responds to red (looks at R channel)
red_detector = np.zeros((3, 3, 3))
red_detector[:, :, 0] = np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]])  # Vertical edge on R channel

# Apply convolution
red_response = conv2d_rgb(demo_rgb, red_detector, stride=1, padding=0)

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Show RGB image
axes[0, 0].imshow(demo_rgb)
axes[0, 0].set_title('RGB Image\n(Red left, Green right)', fontsize=12, fontweight='bold')
axes[0, 0].grid(True, color='white', linewidth=1)

# Show individual channels
for idx, (channel_name, channel_idx, color) in enumerate([('R', 0, 'Reds'),
                                                           ('G', 1, 'Greens'),
                                                           ('B', 2, 'Blues')]):
    ax = axes[0, idx] if idx == 0 else axes[1, idx - 1]
    if idx > 0:
        ax.imshow(demo_rgb[:, :, channel_idx], cmap=color, vmin=0, vmax=1)
        ax.set_title(f'{channel_name} Channel', fontsize=12, fontweight='bold')
        ax.grid(True, color='gray', linewidth=0.5)

# Show filter
axes[1, 1].imshow(red_detector[:, :, 0], cmap='RdBu', vmin=-1, vmax=1)
axes[1, 1].set_title('Filter\n(R channel only)', fontsize=12, fontweight='bold')
axes[1, 1].grid(True, color='black', linewidth=1)

# Show output
im = axes[1, 2].imshow(red_response, cmap='hot', interpolation='nearest')
axes[1, 2].set_title(f'Output Feature Map\n{red_response.shape}', 
                    fontsize=12, fontweight='bold')
axes[1, 2].grid(True, color='gray', linewidth=0.5)
plt.colorbar(im, ax=axes[1, 2])

plt.suptitle('RGB Convolution: How Filters See Color Channels',
            fontsize=15, fontweight='bold')
plt.tight_layout()
plt.show()

print("\nüéØ Understanding RGB Convolution:")
print("   ‚Ä¢ Filter has 3 layers (one per color channel)")
print("   ‚Ä¢ Each layer convolves with its corresponding channel")
print("   ‚Ä¢ Results are SUMMED to produce one output value")
print("   ‚Ä¢ Output is strongest where the filter pattern matches the image")
print("\nüí° This is how CNNs process color images!")

---
## üéØ Summary: The Convolution Operation

Congratulations! You now understand the heart of CNNs! üéâ

### ‚úÖ What We Learned

1. **Filters/Kernels**:
   - Small matrices that detect patterns
   - Different filters detect different features
   - Usually 3√ó3, 5√ó5, or 7√ó7

2. **The Convolution Operation**:
   - Slide filter across image
   - Multiply element-wise and sum
   - Produces feature map showing where pattern exists

3. **Stride**:
   - Controls step size (usually 1 or 2)
   - Larger stride ‚Üí smaller output
   - Used for downsampling

4. **Padding**:
   - Adds border pixels
   - Preserves spatial dimensions
   - "Same" padding keeps size constant

5. **Feature Maps**:
   - Output of convolution
   - Multiple filters ‚Üí multiple feature maps
   - Each detects different pattern

6. **RGB Convolution**:
   - Filter depth must match input depth
   - Results summed across all channels
   - One filter ‚Üí one feature map

### üßÆ Key Formulas

**Output Size**:
```
Output = ‚åä(Input + 2√óPadding - Kernel) / Stride‚åã + 1
```

**Same Padding** (stride=1):
```
Padding = (Kernel - 1) / 2
```

**Convolution Operation**:
```
Output[i,j] = Œ£ Œ£ Image[i+m, j+n] √ó Filter[m,n]
```

### üí° Key Insights

- **Convolution = Pattern Matching**: The filter looks for its pattern everywhere in the image
- **Local Operation**: Each output depends only on a local patch
- **Parameter Sharing**: Same filter weights used everywhere
- **Hierarchical**: Early layers detect simple features, deep layers detect complex objects

### üéì What's Next?

Now that you understand convolution, you're ready for:

**Next Notebook: Pooling Layers**
- Downsampling feature maps
- Max pooling vs average pooling
- Why pooling helps CNNs

**Then**: Building a complete CNN!

Let's continue the journey! üöÄ

---
## üéÆ Practice Exercises

Test your understanding with these exercises:

### Exercise 1: Design Your Own Filter
Create a 3√ó3 filter that detects diagonal edges (top-left to bottom-right).
Test it on an image with diagonal patterns.

### Exercise 2: Calculate Output Sizes
Given:
- Input: 64√ó64√ó3
- 32 filters, each 5√ó5
- Stride = 2
- Padding = 2

What is the output shape?

### Exercise 3: Implement Multi-Channel Convolution
Modify our `conv2d_rgb` function to:
- Accept multiple filters
- Return a 3D output (H √ó W √ó num_filters)

### Exercise 4: Visualize Filter Learning
Create a simple image and several random filters.
Visualize which filters respond most strongly to different parts of the image.

### Exercise 5: Compare Stride vs Pooling
- Apply conv with stride=2
- Apply conv with stride=1 + max pooling
- Compare the outputs - what's different?

**Try these exercises in the exercises.ipynb notebook!**

---

*Great work! You now understand the core operation that powers all CNNs!* üí™

*Ready to learn about pooling? Let's go!* ‚Üí **[Next: Notebook 03 - Pooling Layers](03_pooling_layers.ipynb)**