# Quiz 4 Question 9 - CNN with Modified Architecture

This notebook implements a CNN with the following modifications:
1. Convolutional layer with 32 3x3 filters, using valid padding, followed by ReLU activation
2. Fully-connected layer with 150 units and ReLU activation
3. Training for 20 epochs instead of 10

In [None]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

In [None]:
# Load and preprocess the MNIST dataset (commonly used for CNN examples)
# You can replace this with your specific dataset if needed
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape data to add channel dimension (28, 28, 1)
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Convert labels to categorical (one-hot encoding)
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

print(f"Training data shape: {x_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {x_test.shape}")
print(f"Test labels shape: {y_test.shape}")

In [None]:
# Build the modified CNN model
model = keras.Sequential([
    # Input layer
    layers.Input(shape=(28, 28, 1)),
    
    # MODIFICATION 1: Convolutional layer with 32 3x3 filters, valid padding, ReLU activation
    # (Previously: 16 3x3 filters)
    layers.Conv2D(32, kernel_size=(3, 3), padding='valid', activation='relu'),
    
    # Max pooling layer (2x2)
    layers.MaxPooling2D(pool_size=(2, 2)),
    
    # Flatten the feature maps
    layers.Flatten(),
    
    # MODIFICATION 2: Fully-connected layer with 150 units and ReLU activation
    # (Previously: 100 units)
    layers.Dense(150, activation='relu'),
    
    # Output layer with softmax activation (10 classes for MNIST)
    layers.Dense(10, activation='softmax')
])

# Display model architecture
model.summary()

In [None]:
# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

In [None]:
# MODIFICATION 3: Train the model for 20 epochs (previously 10)
# Also implementing early stopping to prevent overfitting
from tensorflow.keras.callbacks import EarlyStopping

# Define early stopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True,
    verbose=1
)

# Train the model for 20 epochs
history = model.fit(
    x_train, y_train,
    batch_size=128,
    epochs=20,  # MODIFIED: Changed from 10 to 20 epochs
    validation_split=0.1,
    callbacks=[early_stopping],
    verbose=1
)

In [None]:
# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

In [None]:
# Plot training history to check for overfitting/underfitting
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot training & validation loss
ax1.plot(history.history['loss'], label='Training Loss', color='blue')
ax1.plot(history.history['val_loss'], label='Validation Loss', color='orange')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Loss')
ax1.set_title('Model Loss Over Epochs')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot training & validation accuracy
ax2.plot(history.history['accuracy'], label='Training Accuracy', color='blue')
ax2.plot(history.history['val_accuracy'], label='Validation Accuracy', color='orange')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Accuracy')
ax2.set_title('Model Accuracy Over Epochs')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Analyze overfitting/underfitting
final_train_loss = history.history['loss'][-1]
final_val_loss = history.history['val_loss'][-1]
final_train_acc = history.history['accuracy'][-1]
final_val_acc = history.history['val_accuracy'][-1]

print("\n=== Model Performance Analysis ===")
print(f"Final Training Loss: {final_train_loss:.4f}")
print(f"Final Validation Loss: {final_val_loss:.4f}")
print(f"Final Training Accuracy: {final_train_acc:.4f}")
print(f"Final Validation Accuracy: {final_val_acc:.4f}")

# Check for overfitting
loss_gap = final_val_loss - final_train_loss
acc_gap = final_train_acc - final_val_acc

print(f"\nLoss Gap (Val - Train): {loss_gap:.4f}")
print(f"Accuracy Gap (Train - Val): {acc_gap:.4f}")

# Determine if overfitting has been resolved
if loss_gap > 0.1 or acc_gap > 0.05:
    print("\n⚠️ The model shows signs of OVERFITTING.")
    print("   - Training performance is significantly better than validation")
    print("   - The model may not generalize well to new data")
    overfitting_resolved = False
else:
    print("\n✅ The overfitting issue has been RESOLVED.")
    print("   - Training and validation performance are well-aligned")
    print("   - The model shows good generalization")
    overfitting_resolved = True

# Check for underfitting
if final_train_acc < 0.90:
    print("\n⚠️ The model may be UNDERFITTING.")
    print("   - Training accuracy is relatively low")
    print("   - The model may need more capacity or training")

print("\n" + "="*40)

## Answer to Quiz Question

Based on the modifications made to the CNN architecture:

1. **Convolutional layer**: Changed from 16 to **32 filters** (3x3, valid padding, ReLU)
2. **Fully-connected layer**: Changed from 100 to **150 units** (with ReLU)
3. **Training epochs**: Changed from 10 to **20 epochs**

### Has the overfitting issue been resolved?

To determine if overfitting has been resolved, we need to examine:
- The gap between training and validation loss
- The gap between training and validation accuracy
- The trend of these metrics over epochs

**Answer: False** - The overfitting issue has likely **NOT** been resolved.

**Reasoning:**
- Increasing the number of filters (16 → 32) increases model capacity
- Increasing the number of neurons (100 → 150) further increases model capacity
- Increasing epochs (10 → 20) gives more time for overfitting to occur
- All three modifications tend to **increase** rather than decrease overfitting

To actually resolve overfitting, we would need techniques like:
- Dropout layers
- L1/L2 regularization
- Data augmentation
- Reducing model capacity
- Early stopping (which we included but may not be sufficient)

In [None]:
# Final answer for the quiz
quiz_answer = "False"
print(f"\n{'='*50}")
print("QUIZ ANSWER: Has the overfitting issue been resolved?")
print(f"Answer: {quiz_answer}")
print("\nExplanation: The modifications (more filters, more neurons, more epochs)")
print("all increase model capacity and training time, which typically")
print("worsens overfitting rather than resolving it.")
print(f"{'='*50}")