# Exploring Convolutional Layers: Fashion-MNIST

## Description
This notebook explores the design and behavior of convolutional neural networks (CNNs) through systematic experimentation on the Fashion-MNIST dataset.

We will analyze how architectural choices impact performance and learning.

## Objectives
1. **Dataset Exploration (EDA)**: Understand the structure, distribution, and characteristics of Fashion-MNIST
2. **Baseline Model**: Implement a fully connected network as a performance reference
3. **CNN Architecture Design**: Build a convolutional network with justified design decisions
4. **Controlled Experiments**: Systematically vary one architectural aspect (kernel size) and measure its impact
5. **Interpretation**: Explain why convolutions introduce useful inductive bias for image data

## Context
In this course, neural networks are not treated as black boxes but as **architectural components** whose design choices affect performance, scalability, and interpretability.
We focus on **convolutional layers** as a concrete example of how **inductive bias** is introduced into learning systems.

**Inductive bias:** means the assumptions a model makes about the data. For images, convolutions assume that:
- Nearby pixels are related
- The same pattern can appear anywhere in the image

## Dataset Selection: Fashion-MNIST

### Why Fashion-MNIST?

We chose Fashion-MNIST from TensorFlow Keras Datasets. It is a collection of grayscale images of clothing items.

**Why is this dataset appropriate for convolutional layers?**

1. **Image-based data**: Each sample is a 28×28 grayscale image, which is the type of data convolutions are designed for.
2. **Multiple classes**: It has 10 different classes (T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Bag, Sneaker, Ankle boot).
3. **Fits in memory**: The full dataset (60,000 training + 10,000 test images) is small enough to run on any laptop.
4. **Spatial patterns matter**: Clothing items have shapes and textures that convolutional layers can detect using local filters (edges, curves, patterns).
5. **More challenging than MNIST digits**: Fashion items have more visual complexity than handwritten digits, so it is a better test for our CNN experiments.

### Class Labels
| Label | Class Name |
|-------|------------|
| 0 | T-shirt/top |
| 1 | Trouser |
| 2 | Pullover |
| 3 | Dress |
| 4 | Coat |
| 5 | Sandal |
| 6 | Shirt |
| 7 | Bag |
| 8 | Sneaker |
| 9 | Ankle boot |

In [None]:
#Install required libraries
%pip install numpy pandas matplotlib tensorflow

Collecting pip
  Downloading pip-26.0.1-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-26.0.1-py3-none-any.whl (1.8 MB)
   ---------------------------------------- 0.0/1.8 MB ? eta -:--:--
   ----------------------------------- ---- 1.6/1.8 MB 24.7 MB/s eta 0:00:01
   ---------------------------------------- 1.8/1.8 MB 14.5 MB/s  0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 25.3
    Uninstalling pip-25.3:
      Successfully uninstalled pip-25.3
Successfully installed pip-26.0.1
Note: you may need to restart the kernel to use updated packages.




Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none)
ERROR: No matching distribution found for tensorflow


In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

ModuleNotFoundError: No module named 'tensorflow'

---

# 1. Dataset Exploration (EDA)

The goal of this section is to understand the structure of Fashion-MNIST before building any model.

We will look at:
- Dataset size (how many images for training and testing)
- Image dimensions and channels
- Class distribution (are all classes balanced?)
- Visual examples of each class
- What preprocessing is needed

### 1.1 Load the Dataset
We load Fashion-MNIST directly from TensorFlow.

The data comes already split into:
- **Training set**: used to train the model
- **Test set**: used to evaluate the model

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Bag', 'Sneaker', 'Ankle boot']

print("Data loaded successfully!")

### 1.2 Dataset Size and Image Dimensions

Let's check how many images we have and what they look like in terms of shape.

- **Shape** tells us the dimensions: `(number_of_images, height, width)`
- Since these are grayscale images, there is only **1 channel** (no color). Color images (like CIFAR-10) have 3 channels: Red, Green, Blue.

In [None]:
print("DATASET SIZE")
print(f"Training images: {x_train.shape[0]}")
print(f"Test images:     {x_test.shape[0]}")
print(f"Total images:    {x_train.shape[0] + x_test.shape[0]}")
print()

print("IMAGE DIMENSIONS")
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape:     {x_test.shape}")
print(f"Single image shape:  {x_train[0].shape}")
print(f"Image height:  {x_train.shape[1]} pixels")
print(f"Image width:   {x_train.shape[2]} pixels")
print(f"Channels:      1 (grayscale)")
print()

print("PIXEL VALUES")
print(f"Data type:   {x_train.dtype}")
print(f"Min value:   {x_train.min()}")
print(f"Max value:   {x_train.max()}")
print()

print("LABELS")
print(f"Training labels shape: {y_train.shape}")
print(f"Number of classes:     {len(np.unique(y_train))}")
print(f"Class labels:          {np.unique(y_train)}")

### 1.3 Class Distribution

It is important to check if all classes have a similar number of samples. If one class has many more images than another, the model might become **biased** towards the bigger class.

A **balanced dataset** means each class has roughly the same number of samples.

In [None]:
unique_classes, train_counts = np.unique(y_train, return_counts=True)

distribution_df = pd.DataFrame({
    'Label': unique_classes,
    'Class Name': [class_names[i] for i in unique_classes],
    'Train Samples': train_counts
})

_, test_counts = np.unique(y_test, return_counts=True)
distribution_df['Test Samples'] = test_counts

print("CLASS DISTRIBUTION")
print(distribution_df.to_string(index=False))
print()
print(f"Training set - Min samples: {train_counts.min()}, Max samples: {train_counts.max()}")
print(f"The dataset is balanced: each class has exactly {train_counts[0]} training samples.")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Training set distribution
axes[0].bar(range(10), train_counts, color='steelblue')
axes[0].set_xticks(range(10))
axes[0].set_xticklabels(class_names, rotation=45, ha='right')
axes[0].set_title('Training Set - Samples per Class')
axes[0].set_ylabel('Number of Samples')
axes[0].set_xlabel('Class')

# Test set distribution
axes[1].bar(range(10), test_counts, color='coral')
axes[1].set_xticks(range(10))
axes[1].set_xticklabels(class_names, rotation=45, ha='right')
axes[1].set_title('Test Set - Samples per Class')
axes[1].set_ylabel('Number of Samples')
axes[1].set_xlabel('Class')

plt.tight_layout()
plt.show()

### 1.4 Visual Examples of Each Class

We will show 3 random examples for each of the 10 classes so we can see what the model needs to learn to classify.

In [None]:
num_examples = 3
fig, axes = plt.subplots(10, num_examples, figsize=(4, 13))

for class_idx in range(10):
    class_images = x_train[y_train == class_idx]
    
    random_indices = np.random.choice(len(class_images), num_examples, replace=False) # Random examples

    for i, idx in enumerate(random_indices):
        axes[class_idx, i].imshow(class_images[idx], cmap='gray')
        axes[class_idx, i].axis('off')
        
        if i == 0:
            axes[class_idx, i].set_title(class_names[class_idx], fontsize=8, loc='left')

plt.suptitle('3 Random Samples per Class', fontsize=12, y=1.01)
plt.tight_layout()
plt.show()

### 1.5 A Closer Look at a Single Image

Now we look at one image in detail to understand what the data looks like at the pixel level.

In [None]:
sample_image = x_train[0]
sample_label = y_train[0]

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

axes[0].imshow(sample_image, cmap='gray')
axes[0].set_title(f'Class: {class_names[sample_label]} (label={sample_label})')
axes[0].axis('off')

im = axes[1].imshow(sample_image, cmap='hot')
axes[1].set_title('Pixel Values (heatmap)')
plt.colorbar(im, ax=axes[1], fraction=0.046)

plt.tight_layout()
plt.show()

print(f"Image shape: {sample_image.shape}")
print(f"Pixel value range: [{sample_image.min()}, {sample_image.max()}]")
print(f"Mean pixel value: {sample_image.mean():.2f}")
print()

### 1.6 Preprocessing

Before we can feed the data into a neural network, we need to do some preprocessing:

#### Normalization
The pixel values are currently integers from 0 to 255. Neural networks train better when input values are small, usually in the range **0 to 1**. To do this, we simply divide by 255:

```
x_train = x_train / 255.0
```

This maps:
- 0 → 0.0
- 255 → 1.0
- 128 → ~0.50

#### Reshape for CNN
Convolutional layers in TensorFlow expect the input to have a channel dimension. Since our images are grayscale (1 channel), we need to reshape from `(28, 28)` to `(28, 28, 1)`.

#### One-Hot Encoding of Labels
The labels are integers from 0 to 9. For training with `categorical_crossentropy` loss, we convert them to one-hot vector:
- Label `3` becomes `[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`

In [None]:
x_train = x_train / 255.0
x_test = x_test / 255.0

print("After normalization:")
print(f"  x_train range: [{x_train.min()}, {x_train.max()}]")
print(f"  x_test range:  [{x_test.min()}, {x_test.max()}]")
print()

x_train_cnn = x_train.reshape(-1, 28, 28, 1)
x_test_cnn = x_test.reshape(-1, 28, 28, 1)

print("After reshape for CNN:")
print(f"  x_train_cnn shape: {x_train_cnn.shape}")
print(f"  x_test_cnn shape:  {x_test_cnn.shape}")
print()

from tensorflow.keras.utils import to_categorical 

y_train_oh = to_categorical(y_train, 10)
y_test_oh = to_categorical(y_test, 10)

print("After one-hot encoding:")
print(f"  y_train_oh shape: {y_train_oh.shape}")
print(f"  Example - label {y_train[0]} becomes: {y_train_oh[0]}")
print()
print("Preprocessing complete! Data is ready for model training.")