# CNN Overfitting Solution - Quiz 4 Question 9

This notebook contains a trained CNN that addresses overfitting by making the following modifications:
1. Replace the current convolutional layer (16 3x3 filters) with a convolutional layer that has 32 3x3 filters, using valid padding, followed by ReLU activation
2. Replace the fully-connected layer (100 units with ReLU activation) with a fully-connected layer that has 150 units (and ReLU activation)
3. Modify the training to last 20 epochs, rather than 10

*Training may take a while; do not alter other settings.*

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
import random

# Set random seeds for reproducibility
random.seed(42)
np.random.seed(42)
tf.random.set_seed(42)

print(f'TensorFlow version: {tf.__version__}')
print(f'Keras version: {keras.__version__}')

In [None]:
# Load and prepare the data (using CIFAR-10 as example)
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to categorical
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

print(f'Training data shape: {x_train.shape}')
print(f'Training labels shape: {y_train.shape}')
print(f'Test data shape: {x_test.shape}')
print(f'Test labels shape: {y_test.shape}')

In [None]:
# Original CNN model (with overfitting issues)
def create_original_model():
    model = keras.Sequential([
        layers.Conv2D(16, (3, 3), activation='relu', input_shape=(32, 32, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(100, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

# Display original model architecture
original_model = create_original_model()
print("Original Model Architecture:")
original_model.summary()

In [None]:
# Modified CNN model to address overfitting
def create_modified_model():
    model = keras.Sequential([
        # Modified: 32 3x3 filters with valid padding instead of 16 filters
        layers.Conv2D(32, (3, 3), padding='valid', activation='relu', input_shape=(32, 32, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        # Modified: 150 units instead of 100 units
        layers.Dense(150, activation='relu'),
        layers.Dense(num_classes, activation='softmax')
    ])
    return model

# Create and display modified model
model = create_modified_model()
print("Modified Model Architecture:")
model.summary()

In [None]:
# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully!")

In [None]:
# Train the model for 20 epochs (instead of 10)
print("Starting training for 20 epochs...")

history = model.fit(
    x_train, y_train,
    batch_size=32,
    epochs=20,  # Modified: 20 epochs instead of 10
    validation_data=(x_test, y_test),
    verbose=1
)

print("Training completed!")

In [None]:
# Plot training history to visualize overfitting behavior
plt.figure(figsize=(12, 4))

# Plot training & validation accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training Accuracy', marker='.')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy', marker='.')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Plot training & validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training Loss', marker='.')
plt.plot(history.history['val_loss'], label='Validation Loss', marker='.')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

In [None]:
# Evaluate the model
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"\nFinal Test Accuracy: {test_accuracy:.4f}")
print(f"Final Test Loss: {test_loss:.4f}")

# Check for overfitting by comparing final training vs validation metrics
final_train_acc = history.history['accuracy'][-1]
final_val_acc = history.history['val_accuracy'][-1]
final_train_loss = history.history['loss'][-1]
final_val_loss = history.history['val_loss'][-1]

print(f"\nFinal Training Accuracy: {final_train_acc:.4f}")
print(f"Final Validation Accuracy: {final_val_acc:.4f}")
print(f"Accuracy Gap: {final_train_acc - final_val_acc:.4f}")

print(f"\nFinal Training Loss: {final_train_loss:.4f}")
print(f"Final Validation Loss: {final_val_loss:.4f}")
print(f"Loss Gap: {final_val_loss - final_train_loss:.4f}")

In [None]:
# Analysis of overfitting behavior
print("\n=== OVERFITTING ANALYSIS ===")
print("\nModifications made to address overfitting:")
print("1. ✓ Replaced convolutional layer (16 3x3 filters) with 32 3x3 filters using valid padding")
print("2. ✓ Replaced fully-connected layer (100 units) with 150 units (both with ReLU activation)")
print("3. ✓ Modified training to last 20 epochs instead of 10")

# Determine if overfitting has been resolved
acc_gap = final_train_acc - final_val_acc
loss_gap = final_val_loss - final_train_loss

print(f"\nOverfitting Assessment:")
if acc_gap < 0.05 and loss_gap < 0.1:
    print("✓ RESOLVED: The gap between training and validation metrics is small.")
    overfitting_resolved = True
elif acc_gap < 0.1 and loss_gap < 0.2:
    print("⚠ PARTIALLY RESOLVED: Some overfitting remains but it's reduced.")
    overfitting_resolved = False
else:
    print("✗ NOT RESOLVED: Significant overfitting still present.")
    overfitting_resolved = False

print(f"\nAnswer to Quiz Question: {'True' if overfitting_resolved else 'False'}")

## Summary

This notebook implements the required modifications to address the CNN overfitting issue:

1. **Convolutional Layer**: Replaced 16 3x3 filters with 32 3x3 filters using valid padding, followed by ReLU activation
2. **Fully-Connected Layer**: Replaced 100-unit layer with 150-unit layer (both with ReLU activation)
3. **Training Duration**: Extended from 10 to 20 epochs

The model's performance is evaluated to determine whether these modifications successfully resolve the overfitting issue.