# Notebook 4: Convolution Parameters (Stride, Padding, Kernel Size)

**Week 10 - Module 4: CNN Basics**
**DO3 (October 27, 2025) - Saturday**
**Duration:** 20 minutes

## Learning Objectives

1. ✅ **Understand** stride and its effect on output size
2. ✅ **Apply** padding to maintain dimensions
3. ✅ **Calculate** output dimensions using the formula
4. ✅ **Compare** different kernel sizes and their trade-offs

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import convolve2d

plt.rcParams['figure.figsize'] = (14, 6)
print("✅ Setup complete!")

## 1. Parameter #1: Stride

**Stride** = How many pixels to move when sliding the kernel

- **Stride = 1**: Slide 1 pixel at a time (most common)
- **Stride = 2**: Slide 2 pixels (faster, smaller output)
- **Stride = 3**: Slide 3 pixels (even smaller output)

### Output Size Formula (Stride):

$$
\text{Output Size} = \left\lfloor \frac{W - F}{S} \right\rfloor + 1
$$

Where:
- $W$ = input width
- $F$ = filter size
- $S$ = stride

---

In [None]:
# Demonstrate different strides
input_1d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
kernel_1d = np.array([1, 1, 1]) / 3

def convolve_with_stride(signal, kernel, stride):
    """Manual convolution with stride."""
    n = len(signal)
    k = len(kernel)
    output_size = (n - k) // stride + 1
    output = np.zeros(output_size)
    for i in range(output_size):
        pos = i * stride
        output[i] = np.sum(signal[pos:pos+k] * kernel)
    return output

# Test different strides
for stride in [1, 2, 3]:
    output = convolve_with_stride(input_1d, kernel_1d, stride)
    print(f"Stride {stride}: Output size = {len(output)}, values = {output}")

## 2. Parameter #2: Padding

**Padding** = Adding border pixels to input

**Why padding?**
- Maintain spatial dimensions
- Preserve border information
- Control output size

**Types:**
- **Valid**: No padding (output shrinks)
- **Same**: Zero padding (output = input size)

---

In [None]:
# Visualize padding
img_small = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
kernel_3x3 = np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]])

# Valid mode (no padding)
valid = convolve2d(img_small, kernel_3x3, mode='valid')
# Same mode (with padding)
same = convolve2d(img_small, kernel_3x3, mode='same')

print(f"Input shape: {img_small.shape}")
print(f"Valid mode output: {valid.shape}")
print(f"Same mode output: {same.shape}")

## 3. Output Dimension Formula (Complete)

### The Master Formula:

$$
\text{Output} = \left\lfloor \frac{W - F + 2P}{S} \right\rfloor + 1
$$

Where:
- $W$ = input width/height
- $F$ = filter size
- $P$ = padding
- $S$ = stride

### Example Calculations:

**Case 1:** Input=28×28, Filter=5×5, Stride=1, Padding=0
$$
\text{Output} = \frac{28 - 5 + 0}{1} + 1 = 24
$$

**Case 2:** Input=32×32, Filter=3×3, Stride=2, Padding=1
$$
\text{Output} = \frac{32 - 3 + 2}{2} + 1 = 16
$$

---

In [None]:
def calculate_output_size(input_size, filter_size, stride, padding):
    """Calculate output dimensions."""
    return ((input_size - filter_size + 2*padding) // stride) + 1

# Test cases
cases = [
    (28, 5, 1, 0),  # MNIST with 5x5 filter
    (32, 3, 1, 1),  # CIFAR with 3x3 filter, padding
    (224, 7, 2, 3), # ImageNet first layer
]

print("Input | Filter | Stride | Padding | Output")
print("-" * 50)
for w, f, s, p in cases:
    out = calculate_output_size(w, f, s, p)
    print(f"{w:5} | {f:6} | {s:6} | {p:7} | {out:6}")

## 4. Parameter #3: Kernel Size

**Common kernel sizes:**
- **3×3**: Most common (VGG, ResNet)
- **5×5**: Wider receptive field
- **1×1**: Channel mixing (no spatial)
- **7×7**: First layer of deep networks

### Trade-offs:

| Kernel Size | Pros | Cons |
|-------------|------|------|
| **1×1** | Fast, fewer params | No spatial info |
| **3×3** | Balanced, stackable | Smaller receptive field |
| **5×5** | Larger receptive field | More parameters |
| **7×7** | Very wide context | Expensive |

---

## Summary

### 🎯 Key Formulas

1. **Output Size:**
   $$\text{Output} = \left\lfloor \frac{W - F + 2P}{S} \right\rfloor + 1$$

2. **Common Configurations:**
   - **Same padding**: $P = (F-1)/2$ (for stride=1)
   - **Valid padding**: $P = 0$

3. **Parameter Count:**
   - For filter $F \times F \times C_{in} \times C_{out}$
   - Params = $F^2 \times C_{in} \times C_{out} + C_{out}$ (with bias)

### 🔮 Next

**Notebook 5:** Hierarchical Feature Learning (edges → shapes → objects)

---

*Week 10 - Deep Neural Network Architectures (21CSE558T)*
*SRM University - M.Tech Program*