# Notebook 2: 2D Convolution on Images

**Course:** 21CSE558T - Deep Neural Network Architectures  
**Module 4:** CNNs - Practical Session  
**Date:** Saturday, November 1, 2025  
**Duration:** 30 minutes  
**Objective:** Understand 2D convolution with visual demonstrations on real images

---

## From 1D to 2D

**1D Convolution:**  
```
Signal: [1, 2, 3, 4, 5]
Kernel: [a, b, c]
```

**2D Convolution:**  
```
Image:  [[1, 2, 3],      Kernel:  [[a, b],
         [4, 5, 6],                [c, d]]
         [7, 8, 9]]
```

**Same idea:** Slide a window, multiply & sum!

In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.datasets import fashion_mnist
import cv2
from scipy import signal
import warnings
warnings.filterwarnings('ignore')

# Set style
sns.set_style("white")
plt.rcParams['figure.figsize'] = (14, 6)

print("✅ Libraries loaded successfully!")

---

## Part 1: Load Fashion-MNIST Dataset

**Fashion-MNIST:**
- 70,000 grayscale images
- 28×28 pixels
- 10 classes: T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot

In [None]:
# Load Fashion-MNIST
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# Class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print(f"Training samples: {x_train.shape[0]}")
print(f"Test samples: {x_test.shape[0]}")
print(f"Image shape: {x_train.shape[1:]}")

# Visualize samples
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(f'{class_names[y_train[i]]}', fontsize=11)
    ax.axis('off')
plt.suptitle('Fashion-MNIST Sample Images', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

---

## Part 2: 2D Convolution from Scratch

In [None]:
def conv2d_simple(image, kernel, stride=1, padding=0):
    """
    2D convolution from scratch
    
    Args:
        image: Input 2D array (H x W)
        kernel: 2D filter (Kh x Kw)
        stride: Step size
        padding: Padding size
    
    Returns:
        output: Convolved image
    """
    # Add padding
    if padding > 0:
        image = np.pad(image, ((padding, padding), (padding, padding)), mode='constant')
    
    # Get dimensions
    H, W = image.shape
    Kh, Kw = kernel.shape
    
    # Calculate output size
    out_h = (H - Kh) // stride + 1
    out_w = (W - Kw) // stride + 1
    output = np.zeros((out_h, out_w))
    
    # Perform convolution
    for i in range(out_h):
        for j in range(out_w):
            # Extract region
            h_start = i * stride
            w_start = j * stride
            region = image[h_start:h_start+Kh, w_start:w_start+Kw]
            
            # Element-wise multiplication and sum
            output[i, j] = np.sum(region * kernel)
    
    return output

print("✅ 2D Convolution function defined!")

---

## Part 3: Classic Image Kernels

Let's apply famous kernels to see what they detect!

In [None]:
# Select one image (a sneaker)
sample_img = x_train[7].astype('float32')

# Define classic kernels
kernels = {
    'Identity': np.array([[0, 0, 0],
                          [0, 1, 0],
                          [0, 0, 0]]),
    
    'Blur': np.ones((3, 3)) / 9,
    
    'Sharpen': np.array([[0, -1, 0],
                         [-1, 5, -1],
                         [0, -1, 0]]),
    
    'Edge (Horizontal)': np.array([[-1, -1, -1],
                                   [0, 0, 0],
                                   [1, 1, 1]]),
    
    'Edge (Vertical)': np.array([[-1, 0, 1],
                                 [-1, 0, 1],
                                 [-1, 0, 1]]),
    
    'Sobel X': np.array([[-1, 0, 1],
                         [-2, 0, 2],
                         [-1, 0, 1]]),
    
    'Sobel Y': np.array([[-1, -2, -1],
                         [0, 0, 0],
                         [1, 2, 1]]),
    
    'Emboss': np.array([[-2, -1, 0],
                        [-1, 1, 1],
                        [0, 1, 2]])
}

# Apply all kernels
fig, axes = plt.subplots(3, 3, figsize=(15, 15))

# Original image
axes[0, 0].imshow(sample_img, cmap='gray')
axes[0, 0].set_title('Original Image', fontsize=14, fontweight='bold')
axes[0, 0].axis('off')

# Apply each kernel
for idx, (name, kernel) in enumerate(kernels.items(), 1):
    row = idx // 3
    col = idx % 3
    
    # Convolve
    output = conv2d_simple(sample_img, kernel)
    
    # Display
    axes[row, col].imshow(output, cmap='gray')
    axes[row, col].set_title(f'{name}', fontsize=12, fontweight='bold')
    axes[row, col].axis('off')

plt.suptitle('Effect of Different 2D Kernels on Fashion-MNIST Image', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n🔍 Observations:")
print("✅ Identity: No change (kernel is [0,0,0; 0,1,0; 0,0,0])")
print("✅ Blur: Image softened (averaging neighbors)")
print("✅ Sharpen: Enhanced edges and details")
print("✅ Edge kernels: Detect edges in specific directions")
print("✅ Sobel: Strong edge detection (weighted)")
print("✅ Emboss: Creates 3D-like effect")

---

## Part 4: Visualize Multiple Feature Maps

**In CNNs:** Each convolutional layer has **multiple kernels** (filters).  
**Result:** Multiple feature maps showing different patterns!

In [None]:
# Select a dress image
dress_img = x_train[3].astype('float32')

# Multiple kernels (simulating first conv layer)
feature_kernels = {
    'Vertical Edges': np.array([[-1, 0, 1],
                                [-2, 0, 2],
                                [-1, 0, 1]]),
    
    'Horizontal Edges': np.array([[-1, -2, -1],
                                  [0, 0, 0],
                                  [1, 2, 1]]),
    
    'Diagonal \\': np.array([[0, 1, 2],
                             [-1, 0, 1],
                             [-2, -1, 0]]),
    
    'Diagonal /': np.array([[2, 1, 0],
                            [1, 0, -1],
                            [0, -1, -2]]),
    
    'Corner': np.array([[1, -2, 1],
                        [-2, 4, -2],
                        [1, -2, 1]]),
    
    'High-pass': np.array([[-1, -1, -1],
                           [-1, 8, -1],
                           [-1, -1, -1]])
}

# Apply all kernels
fig, axes = plt.subplots(2, 4, figsize=(16, 8))

# Original
axes[0, 0].imshow(dress_img, cmap='gray')
axes[0, 0].set_title('Original Image\n(Dress)', fontsize=13, fontweight='bold')
axes[0, 0].axis('off')

# Show kernel
axes[0, 1].axis('off')
axes[0, 1].text(0.5, 0.5, 'Feature Maps →\n\n6 different filters\ndetect 6 patterns',
               ha='center', va='center', fontsize=14, fontweight='bold',
               bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))

# Empty cells
axes[0, 2].axis('off')
axes[0, 3].axis('off')

# Apply kernels
for idx, (name, kernel) in enumerate(feature_kernels.items()):
    if idx < 2:
        row, col = 1, idx
    else:
        row, col = 1, idx
    
    output = conv2d_simple(dress_img, kernel)
    axes[row, col].imshow(output, cmap='RdBu_r')
    axes[row, col].set_title(f'Filter {idx+1}: {name}', fontsize=11, fontweight='bold')
    axes[row, col].axis('off')

plt.suptitle('Multiple Feature Maps (Like CNN First Layer)', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n🧠 CNN Insight:")
print("In a CNN, the first convolutional layer might have 32 or 64 filters.")
print("Each filter learns to detect different patterns automatically!")
print("We don't design these filters - the network learns them during training.")

---

## Part 5: Effect of Kernel Size

In [None]:
# Same image
img = x_train[0].astype('float32')

# Different kernel sizes (all blur/average)
kernel_3x3 = np.ones((3, 3)) / 9
kernel_5x5 = np.ones((5, 5)) / 25
kernel_7x7 = np.ones((7, 7)) / 49
kernel_11x11 = np.ones((11, 11)) / 121

# Apply
output_3 = conv2d_simple(img, kernel_3x3)
output_5 = conv2d_simple(img, kernel_5x5)
output_7 = conv2d_simple(img, kernel_7x7)
output_11 = conv2d_simple(img, kernel_11x11)

# Visualize
fig, axes = plt.subplots(1, 5, figsize=(18, 4))

axes[0].imshow(img, cmap='gray')
axes[0].set_title('Original\n28×28', fontsize=12, fontweight='bold')
axes[0].axis('off')

axes[1].imshow(output_3, cmap='gray')
axes[1].set_title(f'3×3 Kernel\nOutput: {output_3.shape}', fontsize=12, fontweight='bold')
axes[1].axis('off')

axes[2].imshow(output_5, cmap='gray')
axes[2].set_title(f'5×5 Kernel\nOutput: {output_5.shape}', fontsize=12, fontweight='bold')
axes[2].axis('off')

axes[3].imshow(output_7, cmap='gray')
axes[3].set_title(f'7×7 Kernel\nOutput: {output_7.shape}', fontsize=12, fontweight='bold')
axes[3].axis('off')

axes[4].imshow(output_11, cmap='gray')
axes[4].set_title(f'11×11 Kernel\nOutput: {output_11.shape}', fontsize=12, fontweight='bold')
axes[4].axis('off')

plt.suptitle('Effect of Kernel Size on Output', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n📐 Output Size Formula (2D):")
print(f"Output_H = (Input_H - Kernel_H) + 1")
print(f"Output_W = (Input_W - Kernel_W) + 1")
print(f"\n3×3: ({img.shape[0]}, {img.shape[1]}) → {output_3.shape}")
print(f"5×5: ({img.shape[0]}, {img.shape[1]}) → {output_5.shape}")
print(f"7×7: ({img.shape[0]}, {img.shape[1]}) → {output_7.shape}")
print(f"11×11: ({img.shape[0]}, {img.shape[1]}) → {output_11.shape}")
print("\n💡 Larger kernel = More blur + Smaller output")

---

## Part 6: Stride and Padding Visualization

In [None]:
# Small test image
test_img = x_train[2][:14, :14].astype('float32')  # Crop to 14x14

# Simple edge kernel
edge_kernel = np.array([[-1, 0, 1],
                        [-1, 0, 1],
                        [-1, 0, 1]])

# Different stride values
output_s1 = conv2d_simple(test_img, edge_kernel, stride=1, padding=0)
output_s2 = conv2d_simple(test_img, edge_kernel, stride=2, padding=0)

# Different padding values
output_p0 = conv2d_simple(test_img, edge_kernel, stride=1, padding=0)
output_p1 = conv2d_simple(test_img, edge_kernel, stride=1, padding=1)
output_p2 = conv2d_simple(test_img, edge_kernel, stride=1, padding=2)

# Visualize stride
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].imshow(test_img, cmap='gray')
axes[0].set_title(f'Original\n{test_img.shape}', fontsize=13, fontweight='bold')
axes[0].axis('off')

axes[1].imshow(output_s1, cmap='RdBu_r')
axes[1].set_title(f'Stride=1\nOutput: {output_s1.shape}', fontsize=13, fontweight='bold')
axes[1].axis('off')

axes[2].imshow(output_s2, cmap='RdBu_r')
axes[2].set_title(f'Stride=2\nOutput: {output_s2.shape}', fontsize=13, fontweight='bold')
axes[2].axis('off')

plt.suptitle('Effect of Stride (Downsampling)', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

# Visualize padding
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

axes[0].imshow(test_img, cmap='gray')
axes[0].set_title(f'Original\n{test_img.shape}', fontsize=12, fontweight='bold')
axes[0].axis('off')

axes[1].imshow(output_p0, cmap='RdBu_r')
axes[1].set_title(f'Padding=0\n{output_p0.shape}', fontsize=12, fontweight='bold')
axes[1].axis('off')

axes[2].imshow(output_p1, cmap='RdBu_r')
axes[2].set_title(f'Padding=1\n{output_p1.shape}', fontsize=12, fontweight='bold')
axes[2].axis('off')

axes[3].imshow(output_p2, cmap='RdBu_r')
axes[3].set_title(f'Padding=2\n{output_p2.shape}', fontsize=12, fontweight='bold')
axes[3].axis('off')

plt.suptitle('Effect of Padding (Size Control)', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n📊 Complete Formula:")
print("Output = (Input + 2*Padding - Kernel) / Stride + 1")
print(f"\nStride=1, Pad=0: ({test_img.shape[0]} + 0 - 3) / 1 + 1 = {output_p0.shape[0]}")
print(f"Stride=1, Pad=1: ({test_img.shape[0]} + 2 - 3) / 1 + 1 = {output_p1.shape[0]} ← Same as input!")
print(f"Stride=2, Pad=0: ({test_img.shape[0]} + 0 - 3) / 2 + 1 = {output_s2.shape[0]}")

---

## Part 7: Interactive Experiment 🧪

In [None]:
# YOUR TURN! Create your own kernel

# 1. Select an image (0-9)
my_image_idx = 5  # ← Change this!
my_image = x_train[my_image_idx].astype('float32')

# 2. Create your custom 3x3 kernel
my_kernel = np.array([[-1, -1, -1],
                      [ 0,  0,  0],
                      [ 1,  1,  1]])  # ← Change these values!

# 3. Set parameters
my_stride = 1   # ← Change this!
my_padding = 0  # ← Change this!

# Apply convolution
my_output = conv2d_simple(my_image, my_kernel, stride=my_stride, padding=my_padding)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Original
axes[0].imshow(my_image, cmap='gray')
axes[0].set_title(f'Your Image\n{class_names[y_train[my_image_idx]]}\n{my_image.shape}',
                 fontsize=13, fontweight='bold')
axes[0].axis('off')

# Kernel visualization
axes[1].imshow(my_kernel, cmap='RdBu_r', vmin=-2, vmax=2)
axes[1].set_title('Your Kernel', fontsize=13, fontweight='bold')
for i in range(3):
    for j in range(3):
        axes[1].text(j, i, f'{my_kernel[i,j]:.1f}',
                    ha='center', va='center', fontsize=14, fontweight='bold')
axes[1].set_xticks([])
axes[1].set_yticks([])

# Output
axes[2].imshow(my_output, cmap='RdBu_r')
axes[2].set_title(f'Your Output\n{my_output.shape}', fontsize=13, fontweight='bold')
axes[2].axis('off')

plt.suptitle(f'Your Experiment (Stride={my_stride}, Padding={my_padding})', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"\n📊 Your Results:")
print(f"Image: {class_names[y_train[my_image_idx]]}")
print(f"Input shape: {my_image.shape}")
print(f"Kernel size: {my_kernel.shape}")
print(f"Stride: {my_stride}")
print(f"Padding: {my_padding}")
print(f"Output shape: {my_output.shape}")
print(f"\nFormula: ({my_image.shape[0]} + 2*{my_padding} - {my_kernel.shape[0]}) / {my_stride} + 1 = {my_output.shape[0]}")

---

## Summary: Key Takeaways 🎯

### What You Learned:

1. **✅ 2D convolution = sliding 2D kernel** over an image
2. **✅ Different kernels detect different patterns:**
   - Blur kernels → smooth images
   - Edge kernels → find boundaries
   - Sharpen kernels → enhance details

3. **✅ Multiple kernels = Multiple feature maps** (like CNN first layer)

4. **✅ Kernel size affects:**
   - Receptive field (how much input affects one output)
   - Output size (bigger kernel = smaller output)
   - Computation (bigger = slower)

5. **✅ Stride controls downsampling**

6. **✅ Padding preserves spatial dimensions**

7. **✅ Output size formula (2D):**
   ```
   Output_H = (Input_H + 2*Padding - Kernel_H) / Stride + 1
   Output_W = (Input_W + 2*Padding - Kernel_W) / Stride + 1
   ```

### CNN Connection:

- In CNNs, we **don't design kernels** manually
- The network **learns optimal kernels** during training
- First layers learn simple patterns (edges)
- Deeper layers learn complex patterns (shapes, objects)

---

## Practice Exercises 📝

**Before Notebook 3:**

1. Create a kernel that detects horizontal lines. Test on different Fashion-MNIST items.

2. Apply 3 different kernels to the same image. Which one best highlights the object shape?

3. Calculate: Input = 28×28, Kernel = 5×5, Stride = 2, Padding = 2. What's the output size?

4. Challenge: Can you design a kernel that detects corners (intersection of edges)?

---

## Next: Notebook 3 - Building Your First CNN! 🧠

**You're ready!** Now we'll build a complete CNN using TensorFlow/Keras on Fashion-MNIST.

---

*⏱️ Time spent: ~30 minutes*  
*💪 Difficulty: Beginner-Intermediate*  
*🎓 Mastery: 2D convolution & visualization*