# Tutorial T10: Building CNN for Fashion-MNIST Classification

**Week 10, Day 4 - October 29, 2025**  
**Deep Neural Network Architectures (21CSE558T)**

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. 📥 Load and preprocess the Fashion-MNIST dataset
2. 🏗️ Build a Convolutional Neural Network (CNN) using Keras Sequential API
3. 🎯 Train the CNN and visualize training history
4. 📊 Evaluate model performance on test data
5. 🔍 Visualize learned filters and feature maps
6. ⚖️ Compare CNN performance with traditional MLP

---

## Fashion-MNIST Dataset

- **60,000** training images + **10,000** test images
- **28×28** grayscale images
- **10 classes**: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot
- Created as a more challenging replacement for MNIST digits

---

## CNN Architecture Overview

```
Input (28×28×1)
    ↓
Conv2D (32 filters, 3×3) → ReLU → MaxPool (2×2)
    ↓
Conv2D (64 filters, 3×3) → ReLU → MaxPool (2×2)
    ↓
Flatten → Dense (64) → ReLU
    ↓
Output (10 classes) → Softmax
```

**Expected Accuracy**: ~90% on test set

---

Let's get started! 🚀

In [None]:
# Import required libraries
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Create output directory
os.makedirs('output', exist_ok=True)

# Print versions
print(f"TensorFlow version: {tf.__version__}")
print(f"NumPy version: {np.__version__}")
print("\n✓ Setup complete!")

---

## Part 1: Load Fashion-MNIST Dataset

Keras provides built-in access to Fashion-MNIST. The dataset will automatically download on first use (~30MB).

In [None]:
# Load Fashion-MNIST dataset
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

# Define class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Print dataset information
print("Dataset Information:")
print("=" * 50)
print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")
print(f"Number of classes: {len(class_names)}")
print(f"Pixel value range: [{X_train.min()}, {X_train.max()}]")
print("=" * 50)

### Visualize Sample Images

Let's look at some examples from each class to understand what we're working with.

In [None]:
# Visualize first 10 training images
plt.figure(figsize=(12, 3))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(class_names[y_train[i]], fontsize=9)
    plt.axis('off')
plt.suptitle('Fashion-MNIST Sample Images', fontsize=12, y=1.02)
plt.tight_layout()
plt.show()

---

## Part 2: Data Preprocessing

Before training, we need to:

1. **Normalize** pixel values from [0, 255] to [0, 1]
2. **Reshape** data to add channel dimension for CNN input
3. **One-hot encode** labels for multi-class classification

### Why normalize?
- Helps gradient descent converge faster
- Prevents numerical instability
- Standard practice in deep learning

In [None]:
# Normalize pixel values to [0, 1] range
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

print(f"After normalization: [{X_train.min():.2f}, {X_train.max():.2f}]")

# Reshape for CNN input (add channel dimension)
# CNN expects: (samples, height, width, channels)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

print(f"\nReshaped X_train: {X_train.shape}")
print(f"Reshaped X_test: {X_test.shape}")

# Convert labels to one-hot encoding
y_train_categorical = keras.utils.to_categorical(y_train, 10)
y_test_categorical = keras.utils.to_categorical(y_test, 10)

print(f"\nOne-hot encoded y_train: {y_train_categorical.shape}")
print(f"Example: Label {y_train[0]} → {y_train_categorical[0]}")
print("\n✓ Preprocessing complete!")

---

## Part 3: Build CNN Architecture

### Layer-by-Layer Breakdown:

1. **Conv2D(32, 3×3)**: First convolutional layer with 32 filters
   - Learns 32 different 3×3 patterns (edges, textures)
   - Output: 26×26×32

2. **MaxPooling2D(2×2)**: Downsampling layer
   - Reduces spatial dimensions by half
   - Output: 13×13×32

3. **Conv2D(64, 3×3)**: Second convolutional layer with 64 filters
   - Learns more complex patterns from first layer features
   - Output: 11×11×64

4. **MaxPooling2D(2×2)**: Second downsampling
   - Output: 5×5×64

5. **Flatten**: Convert 2D feature maps to 1D vector (5×5×64 = 1,600)

6. **Dense(64)**: Fully connected layer for high-level reasoning

7. **Dense(10, softmax)**: Output layer for 10 classes

### Total Parameters: ~122,000

In [None]:
# Create Sequential model
model = keras.Sequential([
    # First convolutional block
    layers.Conv2D(32, (3, 3), activation='relu', 
                  input_shape=(28, 28, 1), name='conv1'),
    layers.MaxPooling2D((2, 2), name='pool1'),
    
    # Second convolutional block
    layers.Conv2D(64, (3, 3), activation='relu', name='conv2'),
    layers.MaxPooling2D((2, 2), name='pool2'),
    
    # Fully connected layers
    layers.Flatten(name='flatten'),
    layers.Dense(64, activation='relu', name='dense1'),
    layers.Dense(10, activation='softmax', name='output')
], name='Fashion_MNIST_CNN')

# Display model architecture
model.summary()

### 📐 Output Dimension Calculations

**Formula for convolution output size:**
```
output_size = (input_size - kernel_size + 2 × padding) / stride + 1
```

**Formula for pooling output size:**
```
output_size = input_size / pool_size
```

In [None]:
print("=" * 60)
print("OUTPUT DIMENSION CALCULATIONS:")
print("=" * 60)
print("Input: 28×28×1\n")

print("Conv1 (32 filters, 3×3, stride=1, padding='valid'):")
print("  Output = (28 - 3 + 0) / 1 + 1 = 26×26×32")
print("  Parameters: (3×3×1 + 1) × 32 = 320\n")

print("Pool1 (2×2, stride=2):")
print("  Output = 26 / 2 = 13×13×32")
print("  Parameters: 0 (no trainable params)\n")

print("Conv2 (64 filters, 3×3, stride=1, padding='valid'):")
print("  Output = (13 - 3 + 0) / 1 + 1 = 11×11×64")
print("  Parameters: (3×3×32 + 1) × 64 = 18,496\n")

print("Pool2 (2×2, stride=2):")
print("  Output = 11 / 2 = 5×5×64 (floor)")
print("  Parameters: 0\n")

print("Flatten: 5×5×64 = 1,600 features")
print("Dense1: (1,600 + 1) × 64 = 102,464 parameters")
print("Output: (64 + 1) × 10 = 650 parameters")
print("\nTotal: ~122,000 parameters")
print("=" * 60)

---

## Part 4: Compile Model

Configure the model for training:

- **Optimizer**: Adam (adaptive learning rate)
- **Loss**: Categorical crossentropy (for multi-class classification)
- **Metrics**: Accuracy

In [None]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("✓ Model compiled successfully!")
print("  Optimizer: Adam")
print("  Loss: Categorical Crossentropy")
print("  Metrics: Accuracy")

---

## Part 5: Train Model

Train for 10 epochs with:
- 20% validation split
- Batch size of 128

**Expected training time**:
- CPU: 2-3 minutes
- GPU: 30-45 seconds

🚀 **Click Run and watch the training progress!**

In [None]:
# Train the model
history = model.fit(
    X_train, y_train_categorical,
    epochs=10,
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

print("\n✓ Training completed!")

---

## Part 6: Visualize Training History

Plot accuracy and loss curves to understand:
- How well the model learned
- Whether there's overfitting (training >> validation)
- When training converged

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy plot
axes[0].plot(history.history['accuracy'], label='Training', 
             marker='o', linewidth=2, markersize=6)
axes[0].plot(history.history['val_accuracy'], label='Validation', 
             marker='s', linewidth=2, markersize=6)
axes[0].set_title('Model Accuracy', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Loss plot
axes[1].plot(history.history['loss'], label='Training', 
             marker='o', linewidth=2, markersize=6)
axes[1].plot(history.history['val_loss'], label='Validation', 
             marker='s', linewidth=2, markersize=6)
axes[1].set_title('Model Loss', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Part 7: Evaluate on Test Set

Test the model on unseen data to measure true generalization performance.

In [None]:
# Evaluate on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test_categorical, verbose=0)

print("=" * 60)
print("FINAL TEST RESULTS:")
print("=" * 60)
print(f"Test Loss:     {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print("=" * 60)

### Visualize Predictions

See what the model predicts for individual test images.
- **Green** titles = correct predictions
- **Red** titles = incorrect predictions

In [None]:
# Make predictions
predictions = model.predict(X_test[:10], verbose=0)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = y_test[:10]

# Visualize
plt.figure(figsize=(15, 6))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
    
    pred = predicted_classes[i]
    true = true_classes[i]
    conf = predictions[i][pred] * 100
    
    color = 'green' if pred == true else 'red'
    plt.title(f'Pred: {class_names[pred]} ({conf:.1f}%)\n'
              f'True: {class_names[true]}', 
              color=color, fontsize=9)
    plt.axis('off')

plt.suptitle('Model Predictions (Green=Correct, Red=Wrong)', 
             fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

---

## Part 8: Visualize Learned Filters

Let's peek inside the CNN to see what filters it learned in the first convolutional layer.

These 3×3 filters are **learned automatically** from data (not hand-crafted like Sobel or Gabor filters)!

In [None]:
# Extract first conv layer weights
filters, biases = model.layers[0].get_weights()
print(f"Filter shape: {filters.shape}")
print(f"(height, width, input_channels, output_filters)\n")

# Normalize for visualization
f_min, f_max = filters.min(), filters.max()
filters_norm = (filters - f_min) / (f_max - f_min)

# Visualize 16 filters
fig, axes = plt.subplots(4, 4, figsize=(10, 10))
axes = axes.flatten()

for i in range(16):
    filter_img = filters_norm[:, :, 0, i]
    axes[i].imshow(filter_img, cmap='viridis')
    axes[i].set_title(f'Filter {i+1}', fontsize=9)
    axes[i].axis('off')

plt.suptitle('Learned 3×3 Filters in First Conv Layer', 
             fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

print("These filters detect edges, textures, and basic patterns!")

---

## Part 9: Visualize Feature Maps

See what happens to an image as it passes through the CNN layers.

**Hierarchical Feature Learning**:
- **Conv1**: Simple features (edges, textures)
- **Conv2**: Complex features (shapes, patterns)

This demonstrates how CNNs automatically learn a hierarchy of features!

In [None]:
# Create feature extraction model
# Note: Sequential models require an explicit Input layer for feature extraction
# We'll rebuild the connections using the trained model's layers

# Create a new input tensor
inputs = keras.Input(shape=(28, 28, 1))

# Apply each layer sequentially and collect outputs
x = inputs
layer_outputs_dict = {}
for layer in model.layers:
    x = layer(x)
    layer_outputs_dict[layer.name] = x

# Select the layers we want to visualize
layer_names = ['conv1', 'pool1', 'conv2', 'pool2']
layer_outputs = [layer_outputs_dict[name] for name in layer_names]

# Create the feature extraction model
activation_model = keras.Model(inputs=inputs, outputs=layer_outputs)

# Get activations for a sample image
sample_idx = 0
sample_image = X_test[sample_idx:sample_idx+1]
sample_label = class_names[y_test[sample_idx]]

print(f"Analyzing: {sample_label}\n")
activations = activation_model.predict(sample_image, verbose=0)

# Visualize feature maps
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Original image
axes[0].imshow(sample_image[0, :, :, 0], cmap='gray')
axes[0].set_title(f'Original\n({sample_label})', fontweight='bold')
axes[0].axis('off')

# Feature maps for each layer
positions = [(0, 1), (0, 2), (1, 1), (1, 2)]
for idx, (layer_name, activation) in enumerate(zip(layer_names, activations)):
    row, col = positions[idx]
    n_features = activation.shape[-1]
    size_h, size_w = activation.shape[1], activation.shape[2]
    
    # Create grid
    n_cols = 8
    n_rows = min(4, n_features // n_cols)
    display_grid = np.zeros((size_h * n_rows, size_w * n_cols))
    
    for row_idx in range(n_rows):
        for col_idx in range(n_cols):
            channel_idx = row_idx * n_cols + col_idx
            if channel_idx < n_features:
                channel_image = activation[0, :, :, channel_idx]
                channel_image -= channel_image.mean()
                if channel_image.std() > 0:
                    channel_image /= channel_image.std()
                channel_image = np.clip(channel_image, 0, 1)
                display_grid[row_idx * size_h:(row_idx + 1) * size_h,
                             col_idx * size_w:(col_idx + 1) * size_w] = channel_image
    
    axes[row * 3 + col].imshow(display_grid, cmap='viridis')
    axes[row * 3 + col].set_title(f'{layer_name}\n{activation.shape[1:]}', 
                                   fontweight='bold')
    axes[row * 3 + col].axis('off')

axes[3].axis('off')  # Hide unused subplot

plt.suptitle('Feature Maps: Hierarchical Learning in Action', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n✓ Feature maps generated successfully!")

---

## 🎓 Tutorial Complete!

### What We Accomplished:

✅ Loaded Fashion-MNIST (60,000 train + 10,000 test images)  
✅ Built CNN with 2 convolutional blocks (~122,000 parameters)  
✅ Trained model for 10 epochs (~2-3 minutes)  
✅ Achieved **~90% accuracy** on test set  
✅ Visualized training history, predictions, filters, and feature maps  

---

### 📚 Key Takeaways:

1. **CNNs excel at image tasks** due to:
   - Local connectivity (learn spatial patterns)
   - Weight sharing (parameter efficiency)
   - Translation equivariance (same features anywhere)

2. **Hierarchical feature learning**:
   - Early layers: edges, textures
   - Deeper layers: shapes, objects

3. **Pooling layers** add:
   - Translation invariance
   - Dimension reduction

4. **Fewer parameters than MLP**:
   - CNN: ~122,000 parameters
   - Equivalent MLP: millions of parameters

---

### 🏠 Homework (Due: Before Week 11)

**Task 1**: Manual convolution calculation (6×6 image, 3×3 kernel)  
**Task 2**: Design CNN for MNIST with justification  
**Task 3**: Modify this notebook - add layer, experiment with kernel sizes  

---

### 📖 Next Week (Week 11):

- Famous CNN architectures (LeNet, AlexNet, VGG, ResNet)
- Advanced techniques (Dropout, Batch Normalization)
- Designing deeper networks

---

### 🎯 Unit Test 2 Prep (October 31):

- Review convolution calculations
- Practice output dimension formulas
- Understand parameter counting
- Be ready to design CNN architectures

---

**Great work! 🚀 You've built your first CNN!**

---

## 🚀 Extension Exercises (Optional)

For students who finish early or want extra practice:

### Exercise 1: Add Dropout
Add `Dropout(0.25)` after each pooling layer and `Dropout(0.5)` after the dense layer. Compare accuracy with and without dropout.

### Exercise 2: Experiment with Architecture
Try:
- Different number of filters (16, 32, 64, 128)
- Different kernel sizes (3×3, 5×5, 7×7)
- Adding a third convolutional block

### Exercise 3: Build MLP Baseline
Create an MLP (Flatten → Dense → Dense → Output) with similar number of parameters. Compare training time and accuracy with CNN.

### Exercise 4: Confusion Matrix
Generate predictions for entire test set and create a confusion matrix using `sklearn.metrics.confusion_matrix`. Which classes are most confused?

### Exercise 5: Data Augmentation
Use `ImageDataGenerator` to apply random rotations, shifts, and flips during training. Does it improve accuracy?

In [None]:
# Extension Exercise Code Template
# Uncomment and modify as needed

# Example: Add dropout
# model_with_dropout = keras.Sequential([
#     layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
#     layers.MaxPooling2D((2, 2)),
#     layers.Dropout(0.25),  # Add dropout
#     
#     layers.Conv2D(64, (3, 3), activation='relu'),
#     layers.MaxPooling2D((2, 2)),
#     layers.Dropout(0.25),  # Add dropout
#     
#     layers.Flatten(),
#     layers.Dense(64, activation='relu'),
#     layers.Dropout(0.5),  # Add dropout
#     layers.Dense(10, activation='softmax')
# ])

# TODO: Your extension exercise code here
pass