# Notebook 5: Deep CNN with Regularization

**Course:** 21CSE558T - Deep Neural Network Architectures  
**Module 4:** CNNs - Practical Session  
**Date:** Monday, November 3, 2025  
**Duration:** 30 minutes  
**Objective:** Master regularization techniques to prevent overfitting and improve generalization

---

## The Overfitting Problem

**Scenario:** You build a deep CNN, training accuracy reaches 99%, but test accuracy is only 85%.

**Problem:** **Overfitting** - Model memorizes training data instead of learning patterns!

**Solutions:** Regularization techniques

---

## Regularization Techniques We'll Learn:

1. **Dropout** - Randomly drop neurons during training
2. **Batch Normalization** - Normalize layer inputs
3. **L2 Regularization** - Penalize large weights
4. **Early Stopping** - Stop when validation performance degrades

**Goal:** Train deeper, more powerful networks without overfitting!

In [None]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Conv2D, MaxPooling2D, GlobalAveragePooling2D,
    Flatten, Dense, Dropout, BatchNormalization
)
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.utils import to_categorical
import warnings
warnings.filterwarnings('ignore')

print(f"✅ TensorFlow version: {tf.__version__}")

# Set seed
tf.random.set_seed(42)
np.random.seed(42)

---

## Part 1: Load and Prepare Data

In [None]:
# Load Fashion-MNIST
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# Preprocessing
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)

print(f"✅ Data loaded")
print(f"Training: {x_train.shape[0]:,} samples")
print(f"Test: {x_test.shape[0]:,} samples")

---

## Part 2: Baseline CNN (Prone to Overfitting)

In [None]:
# Build a deep CNN WITHOUT regularization
baseline_model = Sequential([
    Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    
    Flatten(),
    Dense(512, activation='relu'),
    Dense(256, activation='relu'),
    Dense(10, activation='softmax')
], name='Baseline_No_Regularization')

baseline_model.compile(optimizer='adam',
                       loss='categorical_crossentropy',
                       metrics=['accuracy'])

baseline_model.summary()

print(f"\n📊 Total parameters: {baseline_model.count_params():,}")
print("\n⚠️ This deep model is prone to overfitting!")

In [None]:
# Train baseline model
print("🚀 Training baseline model (no regularization)...\n")

baseline_history = baseline_model.fit(
    x_train, y_train_cat,
    batch_size=128,
    epochs=15,
    validation_split=0.1,
    verbose=1
)

# Evaluate
baseline_test_loss, baseline_test_acc = baseline_model.evaluate(x_test, y_test_cat, verbose=0)
print(f"\n📊 Baseline Test Accuracy: {baseline_test_acc:.2%}")

In [None]:
# Visualize overfitting
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Accuracy
axes[0].plot(baseline_history.history['accuracy'], 'b-o', label='Training', linewidth=2)
axes[0].plot(baseline_history.history['val_accuracy'], 'r-o', label='Validation', linewidth=2)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Baseline Model: Accuracy', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Loss
axes[1].plot(baseline_history.history['loss'], 'b-o', label='Training', linewidth=2)
axes[1].plot(baseline_history.history['val_loss'], 'r-o', label='Validation', linewidth=2)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Loss', fontsize=12)
axes[1].set_title('Baseline Model: Loss', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.suptitle('⚠️ Notice the Gap: Training vs Validation (Overfitting!)', 
             fontsize=16, fontweight='bold', color='red')
plt.tight_layout()
plt.show()

# Calculate overfitting gap
final_train_acc = baseline_history.history['accuracy'][-1]
final_val_acc = baseline_history.history['val_accuracy'][-1]
gap = final_train_acc - final_val_acc

print(f"\n⚠️ Overfitting Analysis:")
print(f"Training Accuracy:   {final_train_acc:.2%}")
print(f"Validation Accuracy: {final_val_acc:.2%}")
print(f"Gap (overfitting):   {gap:.2%}")
if gap > 0.05:
    print("\n🔴 OVERFITTING DETECTED! Let's fix this with regularization.")

---

## Part 3: Technique 1 - Dropout

**Dropout:** Randomly "drop" (set to 0) a fraction of neurons during training.

**How it works:**
- During training: Randomly disable 20-50% of neurons each iteration
- Forces network to learn redundant representations
- Prevents co-adaptation of neurons
- During inference: Use all neurons (scaled)

**Typical values:** 0.2-0.5 (20%-50% dropout rate)

In [None]:
# Build CNN with Dropout
dropout_model = Sequential([
    Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),  # Drop 25% of activations
    
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),
    
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),
    
    Flatten(),
    Dense(512, activation='relu'),
    Dropout(0.5),  # Higher dropout in dense layers
    Dense(256, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
], name='CNN_with_Dropout')

dropout_model.compile(optimizer='adam',
                      loss='categorical_crossentropy',
                      metrics=['accuracy'])

print("✅ Model with Dropout built")
print(f"Parameters: {dropout_model.count_params():,}")

In [None]:
# Train with dropout
print("🚀 Training with Dropout...\n")

dropout_history = dropout_model.fit(
    x_train, y_train_cat,
    batch_size=128,
    epochs=15,
    validation_split=0.1,
    verbose=1
)

dropout_test_loss, dropout_test_acc = dropout_model.evaluate(x_test, y_test_cat, verbose=0)
print(f"\n📊 Dropout Model Test Accuracy: {dropout_test_acc:.2%}")

---

## Part 4: Technique 2 - Batch Normalization

**Batch Normalization:** Normalize layer inputs for each mini-batch.

**Benefits:**
- Faster training (can use higher learning rates)
- Reduces sensitivity to initialization
- Acts as regularization (slight noise)
- Allows deeper networks

**Where to place:** After Conv2D, before activation (or after activation)

In [None]:
# Build CNN with Batch Normalization
batchnorm_model = Sequential([
    Conv2D(64, (3, 3), padding='same', input_shape=(28, 28, 1)),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(64, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    
    Conv2D(128, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(128, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    
    Conv2D(256, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(256, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    
    Flatten(),
    Dense(512),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Dense(256),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Dense(10, activation='softmax')
], name='CNN_with_BatchNorm')

batchnorm_model.compile(optimizer='adam',
                        loss='categorical_crossentropy',
                        metrics=['accuracy'])

print("✅ Model with Batch Normalization built")
print(f"Parameters: {batchnorm_model.count_params():,}")

In [None]:
# Train with batch normalization
print("🚀 Training with Batch Normalization...\n")

batchnorm_history = batchnorm_model.fit(
    x_train, y_train_cat,
    batch_size=128,
    epochs=15,
    validation_split=0.1,
    verbose=1
)

batchnorm_test_loss, batchnorm_test_acc = batchnorm_model.evaluate(x_test, y_test_cat, verbose=0)
print(f"\n📊 BatchNorm Model Test Accuracy: {batchnorm_test_acc:.2%}")

---

## Part 5: Combining Everything - Best Practices

**Modern CNN Best Practices:**
1. Batch Normalization after Conv2D
2. Dropout after pooling and dense layers
3. Global Average Pooling instead of Flatten
4. Early Stopping callback

In [None]:
# Build OPTIMIZED CNN with all regularization techniques
optimized_model = Sequential([
    # Block 1
    Conv2D(64, (3, 3), padding='same', input_shape=(28, 28, 1)),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(64, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),
    
    # Block 2
    Conv2D(128, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(128, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),
    
    # Block 3
    Conv2D(256, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Conv2D(256, (3, 3), padding='same'),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.25),
    
    # Classifier
    GlobalAveragePooling2D(),  # Instead of Flatten!
    Dense(256),
    BatchNormalization(),
    tf.keras.layers.Activation('relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
], name='Optimized_CNN')

optimized_model.compile(optimizer='adam',
                        loss='categorical_crossentropy',
                        metrics=['accuracy'])

optimized_model.summary()

print(f"\n📊 Optimized Model Parameters: {optimized_model.count_params():,}")
print("\n💡 Note: GlobalAveragePooling reduces parameters significantly!")

In [None]:
# Train optimized model with Early Stopping
print("🚀 Training OPTIMIZED model (all techniques)...\n")

# Early stopping callback
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

optimized_history = optimized_model.fit(
    x_train, y_train_cat,
    batch_size=128,
    epochs=30,  # Can train longer with early stopping
    validation_split=0.1,
    callbacks=[early_stop],
    verbose=1
)

optimized_test_loss, optimized_test_acc = optimized_model.evaluate(x_test, y_test_cat, verbose=0)
print(f"\n📊 Optimized Model Test Accuracy: {optimized_test_acc:.2%}")

---

## Part 6: Compare All Models

In [None]:
# Comparison plot
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

models = [
    ('Baseline (No Reg)', baseline_history, 'blue'),
    ('With Dropout', dropout_history, 'green'),
    ('With BatchNorm', batchnorm_history, 'orange'),
    ('Optimized (All)', optimized_history, 'red')
]

# Training accuracy
for name, history, color in models:
    axes[0, 0].plot(history.history['accuracy'], label=name, linewidth=2, color=color)
axes[0, 0].set_xlabel('Epoch', fontsize=11)
axes[0, 0].set_ylabel('Training Accuracy', fontsize=11)
axes[0, 0].set_title('Training Accuracy Comparison', fontsize=13, fontweight='bold')
axes[0, 0].legend(fontsize=10)
axes[0, 0].grid(True, alpha=0.3)

# Validation accuracy
for name, history, color in models:
    axes[0, 1].plot(history.history['val_accuracy'], label=name, linewidth=2, color=color)
axes[0, 1].set_xlabel('Epoch', fontsize=11)
axes[0, 1].set_ylabel('Validation Accuracy', fontsize=11)
axes[0, 1].set_title('Validation Accuracy Comparison', fontsize=13, fontweight='bold')
axes[0, 1].legend(fontsize=10)
axes[0, 1].grid(True, alpha=0.3)

# Training loss
for name, history, color in models:
    axes[1, 0].plot(history.history['loss'], label=name, linewidth=2, color=color)
axes[1, 0].set_xlabel('Epoch', fontsize=11)
axes[1, 0].set_ylabel('Training Loss', fontsize=11)
axes[1, 0].set_title('Training Loss Comparison', fontsize=13, fontweight='bold')
axes[1, 0].legend(fontsize=10)
axes[1, 0].grid(True, alpha=0.3)

# Validation loss
for name, history, color in models:
    axes[1, 1].plot(history.history['val_loss'], label=name, linewidth=2, color=color)
axes[1, 1].set_xlabel('Epoch', fontsize=11)
axes[1, 1].set_ylabel('Validation Loss', fontsize=11)
axes[1, 1].set_title('Validation Loss Comparison', fontsize=13, fontweight='bold')
axes[1, 1].legend(fontsize=10)
axes[1, 1].grid(True, alpha=0.3)

plt.suptitle('Regularization Techniques Comparison', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Final comparison table
import pandas as pd

results_data = {
    'Model': ['Baseline', 'Dropout', 'BatchNorm', 'Optimized'],
    'Test Accuracy': [
        f"{baseline_test_acc:.2%}",
        f"{dropout_test_acc:.2%}",
        f"{batchnorm_test_acc:.2%}",
        f"{optimized_test_acc:.2%}"
    ],
    'Parameters': [
        f"{baseline_model.count_params():,}",
        f"{dropout_model.count_params():,}",
        f"{batchnorm_model.count_params():,}",
        f"{optimized_model.count_params():,}"
    ],
    'Overfitting Gap': [
        f"{baseline_history.history['accuracy'][-1] - baseline_history.history['val_accuracy'][-1]:.2%}",
        f"{dropout_history.history['accuracy'][-1] - dropout_history.history['val_accuracy'][-1]:.2%}",
        f"{batchnorm_history.history['accuracy'][-1] - batchnorm_history.history['val_accuracy'][-1]:.2%}",
        f"{optimized_history.history['accuracy'][-1] - optimized_history.history['val_accuracy'][-1]:.2%}"
    ]
}

df_results = pd.DataFrame(results_data)

print("\n" + "="*80)
print("REGULARIZATION TECHNIQUES - FINAL COMPARISON")
print("="*80)
print(df_results.to_string(index=False))
print("="*80)

print("\n🎯 Key Insights:")
print("✅ Regularization reduces overfitting gap")
print("✅ Optimized model has best generalization")
print("✅ GlobalAveragePooling reduces parameters significantly")
print("✅ Combining techniques works better than individual ones")

---

## Summary: Key Takeaways 🎯

### Regularization Techniques:

1. **✅ Dropout (0.25-0.5)**
   - Simple and effective
   - Place after pooling and dense layers
   - Higher rates (0.5) for dense layers

2. **✅ Batch Normalization**
   - Faster training
   - Better gradient flow
   - Acts as mild regularization
   - Place after Conv2D, before activation

3. **✅ Global Average Pooling**
   - Replaces Flatten + large Dense layers
   - Drastically reduces parameters
   - Better generalization

4. **✅ Early Stopping**
   - Monitor validation loss
   - Stop when no improvement
   - Restore best weights

### Best Practice Architecture Pattern:

```
Conv2D → BatchNorm → ReLU → Conv2D → BatchNorm → ReLU → MaxPool → Dropout(0.25)
   ↓
Repeat 2-3 times with increasing filters
   ↓
GlobalAveragePooling2D
   ↓
Dense → BatchNorm → ReLU → Dropout(0.5) → Dense(10, softmax)
```

### When to Use What:

| Technique | When to Use |
|-----------|-------------|
| Dropout | Always (except very small models) |
| BatchNorm | Deep networks (>10 layers) |
| GlobalAvgPool | When spatial info not critical |
| Early Stopping | When training time is long |
| L2 Regularization | When model is still overfitting |

---

## Practice Exercises 📝

1. **Experiment:** Remove dropout from the optimized model. How much does accuracy drop?

2. **Experiment:** Try different dropout rates (0.1, 0.3, 0.7). What's optimal?

3. **Challenge:** Add L2 regularization to conv layers. Does it help?

4. **Analysis:** Why does GlobalAveragePooling reduce parameters so much?

---

## Next: Notebook 6 - Data Augmentation 🔄

**Coming up:** Generate more training data synthetically to further improve accuracy!

---

*⏱️ Time spent: ~30 minutes*  
*💪 Difficulty: Intermediate-Advanced*  
*🎓 Mastery: Regularization techniques*