# Notebook 7: Complete CNN Architecture

**Week 10 - Module 4: CNN Basics**
**DO3 (October 27, 2025) - Saturday**
**Duration:** 25-30 minutes

## Learning Objectives

1. ✅ **Design** complete CNN architectures
2. ✅ **Trace** feature map dimensions through network
3. ✅ **Calculate** parameter counts
4. ✅ **Build** simple CNN in Keras (preview for Tutorial T10)

---

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

print(f"TensorFlow version: {tf.__version__}")
print("✅ Setup complete!")

## 1. CNN Architecture Components

A complete CNN has:

1. **Input Layer**: Image (H × W × C)
2. **Convolutional Blocks**: [Conv → ReLU → Pool] × N
3. **Flattening**: Convert to 1D vector
4. **Fully Connected Layers**: Classification
5. **Output Layer**: Softmax for probabilities

### Standard Pattern:

```
Input (28×28×1)
    ↓
[Conv 3×3, 32 filters] → ReLU → MaxPool 2×2
    ↓ (14×14×32)
[Conv 3×3, 64 filters] → ReLU → MaxPool 2×2
    ↓ (7×7×64)
Flatten → 3136
    ↓
Dense(128) → ReLU
    ↓
Dense(10) → Softmax
    ↓
Output (10 classes)
```

---

## 2. Dimension Tracing Example

Let's trace dimensions through a simple CNN for MNIST (28×28 grayscale images).

**Architecture:**

| Layer | Operation | Output Shape | Parameters |
|-------|-----------|--------------|------------|
| Input | - | (28, 28, 1) | 0 |
| Conv1 | 3×3, 32 filters, stride=1, padding=same | (28, 28, 32) | 320 |
| MaxPool1 | 2×2, stride=2 | (14, 14, 32) | 0 |
| Conv2 | 3×3, 64 filters, stride=1, padding=same | (14, 14, 64) | 18,496 |
| MaxPool2 | 2×2, stride=2 | (7, 7, 64) | 0 |
| Flatten | - | (3136,) | 0 |
| Dense1 | 128 units | (128,) | 401,536 |
| Dense2 | 10 units | (10,) | 1,290 |

**Total Parameters:** 421,642

---

In [None]:
# Build the exact architecture above
model = keras.Sequential([
    # Conv Block 1
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),

    # Conv Block 2
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),

    # Fully Connected Layers
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model.summary()

## 3. Parameter Calculation

### Conv Layer Parameters:

$$
\text{Params} = (F_h \times F_w \times C_{in} + 1) \times C_{out}
$$

Where:
- $F_h, F_w$ = filter height, width
- $C_{in}$ = input channels
- $C_{out}$ = output channels (number of filters)
- $+1$ = bias term

**Example (Conv1):**
- Filter: 3×3
- Input channels: 1
- Output channels: 32
- Params = $(3 \times 3 \times 1 + 1) \times 32 = 320$

### Dense Layer Parameters:

$$
\text{Params} = (\text{input\_units} + 1) \times \text{output\_units}
$$

**Example (Dense1):**
- Input: 3136 (7×7×64 flattened)
- Output: 128
- Params = $(3136 + 1) \times 128 = 401,536$

---

In [None]:
def count_conv_params(filter_size, in_channels, out_channels):
    """Calculate conv layer parameters."""
    return (filter_size * filter_size * in_channels + 1) * out_channels

def count_dense_params(in_units, out_units):
    """Calculate dense layer parameters."""
    return (in_units + 1) * out_units

# Verify our calculations
conv1_params = count_conv_params(3, 1, 32)
conv2_params = count_conv_params(3, 32, 64)
dense1_params = count_dense_params(7*7*64, 128)
dense2_params = count_dense_params(128, 10)

print("Parameter Verification:")
print(f"Conv1: {conv1_params:,}")
print(f"Conv2: {conv2_params:,}")
print(f"Dense1: {dense1_params:,}")
print(f"Dense2: {dense2_params:,}")
print(f"Total: {conv1_params + conv2_params + dense1_params + dense2_params:,}")

## 4. Famous CNN Architecture: LeNet-5

**LeNet-5** (Yann LeCun, 1998) - First successful CNN

```
Input (32×32×1)
    ↓
Conv 5×5, 6 filters → Tanh → AvgPool 2×2
    ↓
Conv 5×5, 16 filters → Tanh → AvgPool 2×2
    ↓
Flatten
    ↓
Dense(120) → Tanh
    ↓
Dense(84) → Tanh
    ↓
Dense(10) → Softmax
```

**Key Innovations:**
- Hierarchical feature learning
- Weight sharing (convolution)
- Pooling for translation invariance

---

## 5. CNN vs MLP: Parameter Comparison

**Same task:** Classify 28×28 MNIST images (10 classes)

### MLP Approach:
- Flatten input: 784 units
- Hidden layer: 128 units
- Output: 10 units
- **Total params:** $(784 + 1) \times 128 + (128 + 1) \times 10 = 101,770$

### CNN Approach (our model):
- Convolutional layers + Dense layers
- **Total params:** 421,642

**Wait, CNN has MORE parameters?**

Yes, but:
1. CNN learns **hierarchical features**
2. CNN achieves **better accuracy** (98%+ vs 95%)
3. CNN is **translation invariant**
4. With **regularization**, CNN generalizes better

**Modern CNNs** (MobileNet, EfficientNet) have FEWER params than MLPs!

---

## Summary

### 🎯 Key Architecture Principles

1. **Pattern**: [Conv → Activation → Pool] × N → Flatten → Dense
2. **Channels increase**: 1 → 32 → 64 → 128 (doubling)
3. **Spatial size decreases**: 28 → 14 → 7 (pooling)
4. **Parameters**: Mostly in Dense layers (for small CNNs)

### 🔮 Next

**Notebook 8:** 3D Convolution (video, medical imaging)

---

*Week 10 - Deep Neural Network Architectures (21CSE558T)*
*SRM University - M.Tech Program*