# Convolutional Neural Networks (CNNs): A Complete Tutorial

## From Theory to Practice - Understanding Image Classification

**Author:** Based on StatQuest with Josh Starmer  
**Date:** November 2025

---

## Table of Contents
1. [Introduction](#introduction)
2. [Why CNNs? Problems with Regular Neural Networks](#why-cnns)
3. [Core Concepts of CNNs](#core-concepts)
4. [Step-by-Step: How CNNs Work](#how-cnns-work)
5. [Building CNNs from Scratch](#from-scratch)
6. [Real-World Example: MNIST Digit Classification](#mnist-example)
7. [Building CNNs with Keras/TensorFlow](#with-keras)
8. [Visualizing CNN Features](#visualization)
9. [Summary and Best Practices](#summary)

## 1. Introduction <a id='introduction'></a>

Imagine you're playing tic-tac-toe and your computer needs to recognize whether you drew an **X** or an **O**. How does it do this?

The answer: **Convolutional Neural Networks (CNNs)**!

CNNs are specialized neural networks designed for processing grid-like data, especially images. They've revolutionized:
- Image classification
- Object detection
- Face recognition
- Medical image analysis
- Self-driving cars
- And much more!

Let's dive in and understand how they work! üöÄ

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.patches import Rectangle
from matplotlib.colors import ListedColormap
import warnings
warnings.filterwarnings('ignore')

# For deep learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("Libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")

## 2. Why CNNs? Problems with Regular Neural Networks <a id='why-cnns'></a>

Let's start by understanding why we need CNNs. Regular (fully connected) neural networks have three major problems when dealing with images:

### Problem 1: Too Many Parameters
- A small 6√ó6 image = 36 input nodes
- Each hidden layer node needs 36 weights
- A 100√ó100 image = 10,000 weights per node!
- **Doesn't scale well** ‚ùå

### Problem 2: Not Shift-Invariant
- If you shift an image by 1 pixel, the network might fail
- **Poor generalization** ‚ùå

### Problem 3: Ignores Spatial Relationships
- Pixels near each other are usually correlated
- Regular networks treat each pixel independently
- **Misses important patterns** ‚ùå

### CNNs Solve These Problems! ‚úÖ
1. **Reduce parameters** through weight sharing
2. **Tolerate small shifts** through convolution
3. **Exploit correlations** by looking at pixel neighborhoods

In [None]:
# Let's create a simple example: Letter O and Letter X
def create_letter_o():
    """Create a 6x6 image of letter O"""
    letter_o = np.array([
        [0, 1, 1, 1, 1, 0],
        [1, 0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 1],
        [0, 1, 1, 1, 1, 0]
    ])
    return letter_o

def create_letter_x():
    """Create a 6x6 image of letter X"""
    letter_x = np.array([
        [1, 0, 0, 0, 0, 1],
        [0, 1, 0, 0, 1, 0],
        [0, 0, 1, 1, 0, 0],
        [0, 0, 1, 1, 0, 0],
        [0, 1, 0, 0, 1, 0],
        [1, 0, 0, 0, 0, 1]
    ])
    return letter_x

# Create and visualize
letter_o = create_letter_o()
letter_x = create_letter_x()

fig, axes = plt.subplots(1, 2, figsize=(10, 5))

axes[0].imshow(letter_o, cmap='gray_r', interpolation='nearest')
axes[0].set_title('Letter O (6√ó6 pixels)', fontsize=14, fontweight='bold')
axes[0].grid(True, which='both', color='blue', linewidth=0.5)
axes[0].set_xticks(np.arange(-0.5, 6, 1))
axes[0].set_yticks(np.arange(-0.5, 6, 1))
axes[0].tick_params(labelbottom=False, labelleft=False)

axes[1].imshow(letter_x, cmap='gray_r', interpolation='nearest')
axes[1].set_title('Letter X (6√ó6 pixels)', fontsize=14, fontweight='bold')
axes[1].grid(True, which='both', color='blue', linewidth=0.5)
axes[1].set_xticks(np.arange(-0.5, 6, 1))
axes[1].set_yticks(np.arange(-0.5, 6, 1))
axes[1].tick_params(labelbottom=False, labelleft=False)

plt.tight_layout()
plt.show()

print("\nLetter O array:")
print(letter_o)
print("\nLetter X array:")
print(letter_x)

## 3. Core Concepts of CNNs <a id='core-concepts'></a>

CNNs have three main building blocks:

### 3.1 Convolution (Filtering)
- Apply a small **filter** (usually 3√ó3) to the image
- The filter slides across the image
- At each position, compute the **dot product**
- Creates a **feature map**

### 3.2 Activation Function (ReLU)
- Apply ReLU: `max(0, x)`
- Converts negative values to 0
- Keeps positive values unchanged
- Introduces non-linearity

### 3.3 Pooling
- **Max Pooling**: Select maximum value in each region
- **Average Pooling**: Calculate average value
- Reduces dimensionality
- Makes network more robust to small shifts

Let's visualize each step!

## 4. Step-by-Step: How CNNs Work <a id='how-cnns-work'></a>

Let's walk through the complete process of classifying the letter O!

### Step 1: Convolution - Applying a Filter

In [None]:
def convolve2d(image, kernel, bias=0, stride=1):
    """
    Perform 2D convolution on an image with a kernel.
    
    Parameters:
    -----------
    image : numpy array
        Input image
    kernel : numpy array
        Convolution filter/kernel
    bias : float
        Bias term to add
    stride : int
        Step size for sliding the kernel
    
    Returns:
    --------
    feature_map : numpy array
        Output feature map
    """
    # Get dimensions
    image_height, image_width = image.shape
    kernel_height, kernel_width = kernel.shape
    
    # Calculate output dimensions
    output_height = (image_height - kernel_height) // stride + 1
    output_width = (image_width - kernel_width) // stride + 1
    
    # Initialize feature map
    feature_map = np.zeros((output_height, output_width))
    
    # Perform convolution
    for i in range(output_height):
        for j in range(output_width):
            # Extract the region
            region = image[i*stride:i*stride+kernel_height, 
                          j*stride:j*stride+kernel_width]
            
            # Compute dot product (element-wise multiplication and sum)
            feature_map[i, j] = np.sum(region * kernel) + bias
    
    return feature_map

# Create a simple edge detection filter
filter_kernel = np.array([
    [1, 1, 1],
    [1, 0, 1],
    [1, 1, 1]
])

bias = -2

# Apply convolution to letter O
feature_map_o = convolve2d(letter_o, filter_kernel, bias=bias)

print("Filter (3√ó3):")
print(filter_kernel)
print(f"\nBias: {bias}")
print(f"\nFeature Map shape: {feature_map_o.shape}")
print("\nFeature Map:")
print(feature_map_o)

In [None]:
# Visualize the convolution process
def visualize_convolution_step(image, kernel, position=(0, 0), bias=0):
    """
    Visualize a single step of convolution
    """
    fig, axes = plt.subplots(1, 4, figsize=(16, 4))
    
    # Original image
    axes[0].imshow(image, cmap='gray_r', interpolation='nearest')
    axes[0].set_title('Input Image', fontsize=12, fontweight='bold')
    axes[0].grid(True, which='both', color='blue', linewidth=0.5, alpha=0.3)
    
    # Highlight the region
    i, j = position
    rect = Rectangle((j-0.5, i-0.5), 3, 3, linewidth=3, 
                     edgecolor='red', facecolor='none')
    axes[0].add_patch(rect)
    
    # Show filter
    axes[1].imshow(kernel, cmap='RdBu_r', interpolation='nearest', vmin=-1, vmax=1)
    axes[1].set_title('Filter (Kernel)', fontsize=12, fontweight='bold')
    for (x, y), value in np.ndenumerate(kernel):
        axes[1].text(y, x, f'{value}', ha='center', va='center', 
                    fontsize=14, fontweight='bold')
    axes[1].grid(True, which='both', color='gray', linewidth=0.5)
    
    # Show region being processed
    region = image[i:i+3, j:j+3]
    axes[2].imshow(region, cmap='gray_r', interpolation='nearest')
    axes[2].set_title('Image Region', fontsize=12, fontweight='bold')
    for (x, y), value in np.ndenumerate(region):
        axes[2].text(y, x, f'{int(value)}', ha='center', va='center', 
                    fontsize=14, fontweight='bold', 
                    color='white' if value > 0.5 else 'black')
    axes[2].grid(True, which='both', color='blue', linewidth=0.5)
    
    # Calculate and show result
    dot_product = np.sum(region * kernel)
    result = dot_product + bias
    
    axes[3].text(0.5, 0.6, 'Calculation:', ha='center', va='center', 
                fontsize=14, fontweight='bold', transform=axes[3].transAxes)
    axes[3].text(0.5, 0.45, f'Dot Product = {dot_product:.0f}', ha='center', va='center',
                fontsize=12, transform=axes[3].transAxes)
    axes[3].text(0.5, 0.35, f'+ Bias ({bias}) = {result:.0f}', ha='center', va='center',
                fontsize=12, transform=axes[3].transAxes)
    axes[3].text(0.5, 0.2, f'Result: {result:.0f}', ha='center', va='center',
                fontsize=16, fontweight='bold', 
                bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8),
                transform=axes[3].transAxes)
    axes[3].axis('off')
    
    plt.tight_layout()
    return result

# Visualize convolution at position (0, 0)
print("Convolution Step at Position (0, 0):")
visualize_convolution_step(letter_o, filter_kernel, position=(0, 0), bias=bias)
plt.show()

# Visualize convolution at position (1, 1)
print("\nConvolution Step at Position (1, 1):")
visualize_convolution_step(letter_o, filter_kernel, position=(1, 1), bias=bias)
plt.show()

In [None]:
# Visualize the complete feature map
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Input
axes[0].imshow(letter_o, cmap='gray_r', interpolation='nearest')
axes[0].set_title('Input: Letter O\n(6√ó6)', fontsize=12, fontweight='bold')
axes[0].grid(True, which='both', color='blue', linewidth=0.5)

# Filter
axes[1].imshow(filter_kernel, cmap='RdBu_r', interpolation='nearest')
axes[1].set_title('Filter\n(3√ó3)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(filter_kernel):
    axes[1].text(j, i, f'{val}', ha='center', va='center', 
                fontsize=12, fontweight='bold')
axes[1].grid(True, which='both', color='gray', linewidth=0.5)

# Feature Map
im = axes[2].imshow(feature_map_o, cmap='RdYlGn', interpolation='nearest')
axes[2].set_title('Feature Map\n(4√ó4)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(feature_map_o):
    axes[2].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=10, fontweight='bold')
axes[2].grid(True, which='both', color='gray', linewidth=0.5)
plt.colorbar(im, ax=axes[2])

plt.tight_layout()
plt.show()

print("\n‚úÖ Convolution complete! Feature map created.")

### Step 2: Activation Function (ReLU)

In [None]:
def relu(x):
    """
    ReLU (Rectified Linear Unit) activation function
    Returns max(0, x) element-wise
    """
    return np.maximum(0, x)

# Apply ReLU to feature map
activated_map = relu(feature_map_o)

# Visualize ReLU activation
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Before ReLU
im1 = axes[0].imshow(feature_map_o, cmap='RdYlGn', interpolation='nearest')
axes[0].set_title('Before ReLU\n(Feature Map)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(feature_map_o):
    color = 'white' if val < 0 else 'black'
    axes[0].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=10, fontweight='bold', color=color)
axes[0].grid(True, which='both', color='gray', linewidth=0.5)
plt.colorbar(im1, ax=axes[0])

# ReLU function visualization
x = np.linspace(-5, 5, 100)
y = relu(x)
axes[1].plot(x, y, linewidth=3, color='red')
axes[1].axhline(y=0, color='black', linestyle='--', alpha=0.3)
axes[1].axvline(x=0, color='black', linestyle='--', alpha=0.3)
axes[1].set_xlabel('Input', fontsize=11)
axes[1].set_ylabel('Output', fontsize=11)
axes[1].set_title('ReLU Function\nmax(0, x)', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3)
axes[1].text(2, 3, 'Positive values\nstay the same', fontsize=10, 
            bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.8))
axes[1].text(-3, 0.5, 'Negative values\nbecome 0', fontsize=10,
            bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.8))

# After ReLU
im2 = axes[2].imshow(activated_map, cmap='YlGn', interpolation='nearest')
axes[2].set_title('After ReLU\n(Activated Map)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(activated_map):
    axes[2].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=10, fontweight='bold')
axes[2].grid(True, which='both', color='gray', linewidth=0.5)
plt.colorbar(im2, ax=axes[2])

plt.tight_layout()
plt.show()

print("Feature Map (before ReLU):")
print(feature_map_o)
print("\nActivated Map (after ReLU):")
print(activated_map)
print("\n‚úÖ ReLU activation applied! Negative values set to 0.")

### Step 3: Pooling (Max Pooling)

In [None]:
def max_pool2d(feature_map, pool_size=2, stride=2):
    """
    Perform 2D max pooling
    
    Parameters:
    -----------
    feature_map : numpy array
        Input feature map
    pool_size : int
        Size of pooling window
    stride : int
        Step size for sliding the pooling window
    
    Returns:
    --------
    pooled : numpy array
        Pooled output
    """
    height, width = feature_map.shape
    
    # Calculate output dimensions
    out_height = (height - pool_size) // stride + 1
    out_width = (width - pool_size) // stride + 1
    
    pooled = np.zeros((out_height, out_width))
    
    for i in range(out_height):
        for j in range(out_width):
            # Extract region
            region = feature_map[i*stride:i*stride+pool_size,
                                j*stride:j*stride+pool_size]
            # Take maximum
            pooled[i, j] = np.max(region)
    
    return pooled

# Apply max pooling
pooled_map = max_pool2d(activated_map, pool_size=2, stride=2)

# Visualize max pooling
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Before pooling
axes[0].imshow(activated_map, cmap='YlGn', interpolation='nearest')
axes[0].set_title('Before Pooling\n(4√ó4 Activated Map)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(activated_map):
    axes[0].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=10, fontweight='bold')

# Draw pooling regions
for i in range(2):
    for j in range(2):
        rect = Rectangle((j*2-0.5, i*2-0.5), 2, 2, 
                        linewidth=3, edgecolor='red', facecolor='none')
        axes[0].add_patch(rect)
axes[0].grid(True, which='both', color='gray', linewidth=0.5)

# Pooling operation visualization
axes[1].text(0.5, 0.7, 'Max Pooling Operation', ha='center', va='center',
            fontsize=14, fontweight='bold', transform=axes[1].transAxes)
axes[1].text(0.5, 0.55, 'Pool Size: 2√ó2', ha='center', va='center',
            fontsize=12, transform=axes[1].transAxes)
axes[1].text(0.5, 0.45, 'Stride: 2', ha='center', va='center',
            fontsize=12, transform=axes[1].transAxes)
axes[1].text(0.5, 0.3, 'Selects maximum value\nfrom each 2√ó2 region', 
            ha='center', va='center', fontsize=11,
            bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.8),
            transform=axes[1].transAxes)
axes[1].axis('off')

# After pooling
im = axes[2].imshow(pooled_map, cmap='YlGn', interpolation='nearest')
axes[2].set_title('After Pooling\n(2√ó2 Pooled Map)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(pooled_map):
    axes[2].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=12, fontweight='bold')
axes[2].grid(True, which='both', color='gray', linewidth=0.5)
plt.colorbar(im, ax=axes[2])

plt.tight_layout()
plt.show()

print("Activated Map (4√ó4):")
print(activated_map)
print("\nPooled Map (2√ó2):")
print(pooled_map)
print("\n‚úÖ Max pooling complete! Dimensionality reduced.")

### Step 4: Flatten and Feed to Neural Network

In [None]:
# Flatten the pooled map
flattened_input = pooled_map.flatten()

print("Pooled Map (2D):")
print(pooled_map)
print(f"\nFlattened Input (1D): {flattened_input}")
print(f"Number of inputs to neural network: {len(flattened_input)}")

# Visualize the flattening process
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# 2D pooled map
axes[0].imshow(pooled_map, cmap='YlGn', interpolation='nearest')
axes[0].set_title('Pooled Map (2√ó2)', fontsize=12, fontweight='bold')
for (i, j), val in np.ndenumerate(pooled_map):
    axes[0].text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=12, fontweight='bold')
axes[0].grid(True, which='both', color='gray', linewidth=0.5)

# 1D flattened
axes[1].barh(range(len(flattened_input)), flattened_input, color='green', alpha=0.7)
axes[1].set_yticks(range(len(flattened_input)))
axes[1].set_yticklabels([f'Input {i+1}' for i in range(len(flattened_input))])
axes[1].set_xlabel('Value', fontsize=11)
axes[1].set_title('Flattened Input Vector', fontsize=12, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='x')
for i, v in enumerate(flattened_input):
    axes[1].text(v + 0.1, i, f'{v:.0f}', va='center', fontweight='bold')

plt.tight_layout()
plt.show()

print("\n‚úÖ Flattening complete! Ready for neural network.")

### Complete CNN Pipeline Visualization

In [None]:
def visualize_cnn_pipeline(image, filter_kernel, bias=-2, title="CNN Pipeline"):
    """
    Visualize the complete CNN pipeline
    """
    # Step 1: Convolution
    feature_map = convolve2d(image, filter_kernel, bias=bias)
    
    # Step 2: ReLU
    activated = relu(feature_map)
    
    # Step 3: Max Pooling
    pooled = max_pool2d(activated, pool_size=2, stride=2)
    
    # Step 4: Flatten
    flattened = pooled.flatten()
    
    # Create visualization
    fig = plt.figure(figsize=(18, 4))
    gs = fig.add_gridspec(1, 6, width_ratios=[1, 1, 1, 1, 1, 1.2])
    
    # Input
    ax1 = fig.add_subplot(gs[0])
    ax1.imshow(image, cmap='gray_r', interpolation='nearest')
    ax1.set_title('1. Input\n(6√ó6)', fontsize=10, fontweight='bold')
    ax1.axis('off')
    
    # Filter
    ax2 = fig.add_subplot(gs[1])
    ax2.imshow(filter_kernel, cmap='RdBu_r', interpolation='nearest')
    ax2.set_title('2. Filter\n(3√ó3)', fontsize=10, fontweight='bold')
    ax2.axis('off')
    
    # Feature Map
    ax3 = fig.add_subplot(gs[2])
    ax3.imshow(feature_map, cmap='RdYlGn', interpolation='nearest')
    ax3.set_title('3. Convolution\n(4√ó4)', fontsize=10, fontweight='bold')
    ax3.axis('off')
    
    # Activated
    ax4 = fig.add_subplot(gs[3])
    ax4.imshow(activated, cmap='YlGn', interpolation='nearest')
    ax4.set_title('4. ReLU\n(4√ó4)', fontsize=10, fontweight='bold')
    ax4.axis('off')
    
    # Pooled
    ax5 = fig.add_subplot(gs[4])
    ax5.imshow(pooled, cmap='YlGn', interpolation='nearest')
    ax5.set_title('5. Max Pool\n(2√ó2)', fontsize=10, fontweight='bold')
    for (i, j), val in np.ndenumerate(pooled):
        ax5.text(j, i, f'{val:.0f}', ha='center', va='center', 
                fontsize=10, fontweight='bold')
    ax5.axis('off')
    
    # Flattened
    ax6 = fig.add_subplot(gs[5])
    ax6.barh(range(len(flattened)), flattened, color='green', alpha=0.7)
    ax6.set_yticks(range(len(flattened)))
    ax6.set_yticklabels([f'{i+1}' for i in range(len(flattened))], fontsize=8)
    ax6.set_title('6. Flatten\n(4 inputs)', fontsize=10, fontweight='bold')
    ax6.set_xlabel('Value', fontsize=9)
    for i, v in enumerate(flattened):
        ax6.text(v + 0.1, i, f'{v:.0f}', va='center', fontsize=8, fontweight='bold')
    
    plt.suptitle(title, fontsize=14, fontweight='bold', y=1.05)
    plt.tight_layout()
    
    return flattened

# Visualize for Letter O
print("Complete CNN Pipeline for Letter O:")
output_o = visualize_cnn_pipeline(letter_o, filter_kernel, bias=-2, 
                                  title="CNN Pipeline: Letter O ‚Üí Neural Network Input")
plt.show()

# Visualize for Letter X
print("\nComplete CNN Pipeline for Letter X:")
output_x = visualize_cnn_pipeline(letter_x, filter_kernel, bias=-2,
                                  title="CNN Pipeline: Letter X ‚Üí Neural Network Input")
plt.show()

print("\n‚úÖ Complete CNN pipeline visualized!")

## 5. Building CNNs from Scratch <a id='from-scratch'></a>

Now let's build a complete CNN from scratch to classify O's and X's!

In [None]:
class SimpleCNN:
    """
    A simple CNN from scratch for binary classification (O vs X)
    """
    
    def __init__(self, input_shape=(6, 6), filter_size=3, num_hidden=4):
        """
        Initialize CNN with random weights
        """
        # CNN parameters
        self.filter = np.random.randn(filter_size, filter_size) * 0.1
        self.bias_conv = np.random.randn() * 0.1
        
        # Calculate sizes after conv and pooling
        conv_size = input_shape[0] - filter_size + 1
        pool_size = conv_size // 2
        self.flattened_size = pool_size * pool_size
        
        # Neural network parameters
        self.w_hidden = np.random.randn(self.flattened_size, num_hidden) * 0.1
        self.b_hidden = np.random.randn(num_hidden) * 0.1
        
        self.w_output = np.random.randn(num_hidden, 2) * 0.1  # 2 outputs: O and X
        self.b_output = np.random.randn(2) * 0.1
        
    def forward(self, image):
        """
        Forward pass through the network
        """
        # 1. Convolution
        self.feature_map = convolve2d(image, self.filter, bias=self.bias_conv)
        
        # 2. ReLU activation
        self.activated = relu(self.feature_map)
        
        # 3. Max pooling
        self.pooled = max_pool2d(self.activated, pool_size=2, stride=2)
        
        # 4. Flatten
        self.flattened = self.pooled.flatten()
        
        # 5. Hidden layer
        self.hidden_input = np.dot(self.flattened, self.w_hidden) + self.b_hidden
        self.hidden_output = relu(self.hidden_input)
        
        # 6. Output layer
        self.output_input = np.dot(self.hidden_output, self.w_output) + self.b_output
        
        # 7. Softmax for probabilities
        exp_scores = np.exp(self.output_input - np.max(self.output_input))
        self.output = exp_scores / np.sum(exp_scores)
        
        return self.output
    
    def predict(self, image):
        """
        Predict class for an image
        """
        output = self.forward(image)
        return np.argmax(output), output

# Create and test the CNN
cnn = SimpleCNN()

print("CNN Architecture:")
print(f"  Input: 6√ó6 image")
print(f"  Filter: 3√ó3 ({self.filter.size} parameters)")
print(f"  After convolution: 4√ó4 feature map")
print(f"  After ReLU: 4√ó4 activated map")
print(f"  After pooling: 2√ó2 pooled map")
print(f"  Flattened: {cnn.flattened_size} inputs")
print(f"  Hidden layer: 4 neurons")
print(f"  Output: 2 classes (O, X)")

# Test predictions (before training)
print("\n--- Predictions Before Training ---")
pred_o, prob_o = cnn.predict(letter_o)
print(f"Letter O: Predicted class = {pred_o} ({'O' if pred_o == 0 else 'X'})")
print(f"  Probabilities: O={prob_o[0]:.3f}, X={prob_o[1]:.3f}")

pred_x, prob_x = cnn.predict(letter_x)
print(f"Letter X: Predicted class = {pred_x} ({'O' if pred_x == 0 else 'X'})")
print(f"  Probabilities: O={prob_x[0]:.3f}, X={prob_x[1]:.3f}")

print("\n‚úÖ CNN created and tested (untrained).")

### Training the CNN from Scratch

Now let's implement backpropagation to train our CNN!

In [None]:
class TrainableCNN(SimpleCNN):
    """
    Trainable CNN with backpropagation
    """
    
    def __init__(self, input_shape=(6, 6), filter_size=3, num_hidden=4, learning_rate=0.01):
        super().__init__(input_shape, filter_size, num_hidden)
        self.lr = learning_rate
        self.loss_history = []
        
    def train_step(self, image, target):
        """
        One training step with backpropagation
        """
        # Forward pass
        output = self.forward(image)
        
        # Calculate loss (cross-entropy)
        loss = -np.log(output[target] + 1e-8)
        
        # Backward pass (simplified - only updating output and hidden weights)
        # Gradient of loss w.r.t output
        d_output = output.copy()
        d_output[target] -= 1
        
        # Gradients for output layer
        d_w_output = np.outer(self.hidden_output, d_output)
        d_b_output = d_output
        
        # Gradients for hidden layer
        d_hidden = np.dot(d_output, self.w_output.T)
        d_hidden[self.hidden_input <= 0] = 0  # ReLU gradient
        
        d_w_hidden = np.outer(self.flattened, d_hidden)
        d_b_hidden = d_hidden
        
        # Update weights
        self.w_output -= self.lr * d_w_output
        self.b_output -= self.lr * d_b_output
        self.w_hidden -= self.lr * d_w_hidden
        self.b_hidden -= self.lr * d_b_hidden
        
        return loss
    
    def train(self, images, targets, epochs=1000, verbose=True):
        """
        Train the CNN
        """
        for epoch in range(epochs):
            total_loss = 0
            for image, target in zip(images, targets):
                loss = self.train_step(image, target)
                total_loss += loss
            
            avg_loss = total_loss / len(images)
            self.loss_history.append(avg_loss)
            
            if verbose and (epoch + 1) % 100 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")

# Create training data (with some variations)
def create_shifted_o(shift_right=0, shift_down=0):
    """Create a shifted version of O"""
    img = np.zeros((6, 6))
    base_o = create_letter_o()
    # Simple shift (keeping it within bounds)
    for i in range(6):
        for j in range(6):
            new_i = min(5, max(0, i + shift_down))
            new_j = min(5, max(0, j + shift_right))
            if 0 <= i-shift_down < 6 and 0 <= j-shift_right < 6:
                img[new_i, new_j] = base_o[i-shift_down, j-shift_right]
    return img

def create_shifted_x(shift_right=0, shift_down=0):
    """Create a shifted version of X"""
    img = np.zeros((6, 6))
    base_x = create_letter_x()
    for i in range(6):
        for j in range(6):
            new_i = min(5, max(0, i + shift_down))
            new_j = min(5, max(0, j + shift_right))
            if 0 <= i-shift_down < 6 and 0 <= j-shift_right < 6:
                img[new_i, new_j] = base_x[i-shift_down, j-shift_right]
    return img

# Create training set
train_images = [
    create_letter_o(),
    create_shifted_o(shift_right=1),
    create_letter_x(),
    create_shifted_x(shift_right=1),
]

train_targets = [0, 0, 1, 1]  # 0 for O, 1 for X

# Train the CNN
print("Training CNN from scratch...\n")
cnn_trained = TrainableCNN(learning_rate=0.1)
cnn_trained.train(train_images, train_targets, epochs=500, verbose=True)

print("\n‚úÖ Training complete!")

In [None]:
# Visualize training progress
plt.figure(figsize=(10, 5))
plt.plot(cnn_trained.loss_history, linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Training Loss Over Time', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Test the trained CNN
print("\n--- Predictions After Training ---")
test_images = [
    (create_letter_o(), "Letter O"),
    (create_letter_x(), "Letter X"),
    (create_shifted_x(shift_right=1), "Letter X (shifted)"),
]

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, (image, label) in enumerate(test_images):
    pred, prob = cnn_trained.predict(image)
    
    axes[idx].imshow(image, cmap='gray_r', interpolation='nearest')
    axes[idx].set_title(f"{label}\nPredicted: {'O' if pred == 0 else 'X'}\nProb: O={prob[0]:.3f}, X={prob[1]:.3f}",
                       fontsize=11, fontweight='bold')
    axes[idx].grid(True, which='both', color='blue', linewidth=0.5, alpha=0.3)
    axes[idx].tick_params(labelbottom=False, labelleft=False)
    
    print(f"{label}: Predicted class = {pred} ({'O' if pred == 0 else 'X'})")
    print(f"  Probabilities: O={prob[0]:.3f}, X={prob[1]:.3f}\n")

plt.tight_layout()
plt.show()

print("‚úÖ CNN successfully trained and tested from scratch!")

## 6. Real-World Example: MNIST Digit Classification <a id='mnist-example'></a>

Now let's apply CNNs to a real-world problem: recognizing handwritten digits!

In [None]:
# Load MNIST dataset
print("Loading MNIST dataset...")
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape for CNN (add channel dimension)
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

print(f"Training set: {x_train.shape}")
print(f"Test set: {x_test.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

# Visualize some samples
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i].squeeze(), cmap='gray')
    ax.set_title(f'Label: {y_train[i]}', fontsize=12, fontweight='bold')
    ax.axis('off')

plt.suptitle('Sample MNIST Digits', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n‚úÖ MNIST dataset loaded and visualized!")

## 7. Building CNNs with Keras/TensorFlow <a id='with-keras'></a>

Now let's build a proper CNN using Keras!

In [None]:
def create_cnn_model():
    """
    Create a CNN model for MNIST classification
    """
    model = keras.Sequential([
        # First convolutional block
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), name='conv1'),
        layers.MaxPooling2D((2, 2), name='pool1'),
        
        # Second convolutional block
        layers.Conv2D(64, (3, 3), activation='relu', name='conv2'),
        layers.MaxPooling2D((2, 2), name='pool2'),
        
        # Third convolutional block
        layers.Conv2D(64, (3, 3), activation='relu', name='conv3'),
        
        # Flatten and dense layers
        layers.Flatten(name='flatten'),
        layers.Dense(64, activation='relu', name='dense1'),
        layers.Dropout(0.5, name='dropout'),
        layers.Dense(10, activation='softmax', name='output')
    ])
    
    return model

# Create the model
model = create_cnn_model()

# Display model architecture
print("CNN Model Architecture:")
print("="*70)
model.summary()
print("="*70)

# Compile the model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print("\n‚úÖ CNN model created and compiled!")

In [None]:
# Train the model
print("Training CNN on MNIST...\n")

history = model.fit(
    x_train, y_train,
    epochs=5,
    batch_size=128,
    validation_split=0.1,
    verbose=1
)

print("\n‚úÖ Training complete!")

In [None]:
# Visualize training history
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Loss
axes[0].plot(history.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('Model Loss', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Accuracy
axes[1].plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].set_title('Model Accuracy', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Evaluate on test set
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"\nTest Accuracy: {test_accuracy*100:.2f}%")
print(f"Test Loss: {test_loss:.4f}")

In [None]:
# Make predictions on test samples
predictions = model.predict(x_test[:20], verbose=0)
predicted_classes = np.argmax(predictions, axis=1)

# Visualize predictions
fig, axes = plt.subplots(4, 5, figsize=(15, 12))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_test[i].squeeze(), cmap='gray')
    
    true_label = y_test[i]
    pred_label = predicted_classes[i]
    confidence = predictions[i][pred_label] * 100
    
    color = 'green' if true_label == pred_label else 'red'
    ax.set_title(f'True: {true_label}, Pred: {pred_label}\nConf: {confidence:.1f}%',
                fontsize=10, fontweight='bold', color=color)
    ax.axis('off')

plt.suptitle('CNN Predictions on Test Set', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("‚úÖ Predictions visualized!")

## 8. Visualizing CNN Features <a id='visualization'></a>

Let's peek inside the CNN to see what it's learning!

In [None]:
# Create a model that outputs intermediate layer activations
layer_outputs = [layer.output for layer in model.layers[:6]]  # First 6 layers
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)

# Get activations for a sample image
sample_image = x_test[0:1]  # First test image
activations = activation_model.predict(sample_image, verbose=0)

# Visualize the original image
plt.figure(figsize=(6, 6))
plt.imshow(sample_image.squeeze(), cmap='gray')
plt.title(f'Input Image (Label: {y_test[0]})', fontsize=14, fontweight='bold')
plt.axis('off')
plt.show()

print(f"Visualizing features for digit: {y_test[0]}\n")

In [None]:
# Visualize first convolutional layer filters
layer_names = ['conv1', 'pool1', 'conv2', 'pool2', 'conv3']

for layer_name, activation in zip(layer_names, activations):
    if 'conv' in layer_name:
        n_features = min(16, activation.shape[-1])  # Show up to 16 feature maps
        size = activation.shape[1]
        
        fig, axes = plt.subplots(4, 4, figsize=(12, 12))
        
        for i, ax in enumerate(axes.flat):
            if i < n_features:
                ax.imshow(activation[0, :, :, i], cmap='viridis')
                ax.set_title(f'Filter {i+1}', fontsize=10, fontweight='bold')
            ax.axis('off')
        
        plt.suptitle(f'{layer_name.upper()} Feature Maps ({size}√ó{size})', 
                    fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
print("\n‚úÖ Feature maps visualized!")
print("\nObservations:")
print("- Early layers detect edges and simple patterns")
print("- Deeper layers detect more complex features")
print("- Each filter specializes in different aspects of the image")

In [None]:
# Visualize learned filters from the first convolutional layer
first_layer = model.layers[0]
filters, biases = first_layer.get_weights()

# Normalize filters for visualization
f_min, f_max = filters.min(), filters.max()
filters_normalized = (filters - f_min) / (f_max - f_min)

# Plot the first 16 filters
fig, axes = plt.subplots(4, 4, figsize=(10, 10))

for i, ax in enumerate(axes.flat):
    if i < 16:
        # Get the filter
        filt = filters_normalized[:, :, 0, i]
        ax.imshow(filt, cmap='gray')
        ax.set_title(f'Filter {i+1}', fontsize=10, fontweight='bold')
    ax.axis('off')

plt.suptitle('Learned Filters in First Convolutional Layer (3√ó3)', 
            fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n‚úÖ Learned filters visualized!")
print("These filters are automatically learned during training to detect useful features.")

## 9. Summary and Best Practices <a id='summary'></a>

### Key Takeaways üéØ

**Why CNNs?**
1. **Reduce parameters** through weight sharing (same filter applied everywhere)
2. **Handle shifts** through convolution and pooling
3. **Exploit spatial relationships** by looking at local neighborhoods

**CNN Architecture**
1. **Convolution**: Apply filters to detect features
2. **Activation (ReLU)**: Introduce non-linearity
3. **Pooling**: Reduce dimensionality and improve robustness
4. **Flatten**: Convert to 1D for dense layers
5. **Dense layers**: Final classification

**Best Practices**
- Start with small filters (3√ó3 is common)
- Use multiple convolutional layers to learn hierarchical features
- Apply pooling to reduce computational cost
- Use data augmentation for better generalization
- Monitor both training and validation metrics
- Use dropout to prevent overfitting

**Common Applications**
- Image classification
- Object detection
- Face recognition
- Medical image analysis
- Self-driving cars
- Video analysis

In [None]:
# Final comparison: Regular NN vs CNN
print("="*70)
print("COMPARISON: Regular Neural Network vs CNN")
print("="*70)

print("\nRegular Neural Network for 28√ó28 images:")
print("  - Input: 784 nodes (28√ó28 flattened)")
print("  - Hidden layer with 128 nodes: 784 √ó 128 = 100,352 weights")
print("  - Total: ~100,000+ parameters")
print("  - Problems: Too many parameters, no spatial awareness, not shift-invariant")

print("\nConvolutional Neural Network:")
print("  - Conv1: 32 filters (3√ó3) = 32 √ó 9 = 288 weights")
print("  - Conv2: 64 filters (3√ó3) √ó 32 = 18,432 weights")
print("  - Conv3: 64 filters (3√ó3) √ó 64 = 36,864 weights")
print(f"  - Total: {model.count_params():,} parameters")
print("  - Benefits: Fewer parameters, spatial awareness, shift-invariant!")

print("\n" + "="*70)
print("‚úÖ CNN is much more efficient and effective for images!")
print("="*70)

## Conclusion

Congratulations! üéâ You've completed a comprehensive tutorial on Convolutional Neural Networks!

You've learned:
- ‚úÖ Why CNNs are better than regular neural networks for images
- ‚úÖ The three core operations: Convolution, Activation, and Pooling
- ‚úÖ How to build CNNs from scratch with NumPy
- ‚úÖ How to build CNNs with Keras/TensorFlow
- ‚úÖ How to visualize what CNNs learn
- ‚úÖ How to apply CNNs to real-world problems (MNIST)

**Next Steps:**
- Experiment with different architectures
- Try transfer learning with pre-trained models (VGG, ResNet, etc.)
- Apply CNNs to your own image datasets
- Explore advanced topics: batch normalization, residual connections, attention mechanisms

**BAM!** üí• You're now ready to build powerful image classification systems!