# Fashion MNIST Clothes Classification Neural Network

## Project Overview

This notebook implements a **fully connected neural network** to classify clothing items from the Fashion MNIST dataset.

### What is Fashion MNIST?

Fashion MNIST is a dataset of clothing images created by Zalando as a modern replacement for the classic MNIST handwritten digits dataset. It contains:

- **60,000 training images**
- **10,000 test images**
- **28Ã—28 pixel grayscale images** (single channel)
- **10 clothing categories**: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot

### Our Approach

We'll build a **Dense (Fully Connected) Neural Network** with the following architecture:

```
Input (784) â†’ Dense(512, ReLU) â†’ Dropout(0.2) â†’ Dense(256, ReLU) â†’ Dropout(0.2) â†’ Dense(10, Softmax)
```

**Key Features:**
- Multi-layer perceptron (MLP) architecture
- ReLU activation for hidden layers
- Dropout regularization to prevent overfitting
- Softmax output for multi-class classification
- Adam optimizer with Categorical Crossentropy loss

**Target Performance:** 85-90% accuracy on test set

### Notebook Structure

1. Environment setup and GPU configuration
2. Data loading and exploration
3. Data preprocessing (normalization, flattening, one-hot encoding)
4. Neural network architecture design
5. Model compilation and training
6. Performance evaluation and visualization
7. Prediction on custom images

Let's get started! ðŸš€

## Section 2: Imports & GPU Configuration

We begin by importing all necessary libraries and configuring the GPU for accelerated training.

**Key Libraries:**
- **TensorFlow/Keras**: Deep learning framework for building and training neural networks
- **NumPy**: Numerical computing for array operations
- **Matplotlib/Seaborn**: Visualization for graphs and confusion matrix
- **scikit-learn**: Metrics computation (confusion matrix)
- **Pillow**: Image loading and processing

**GPU Acceleration:**
- TensorFlow automatically detects and uses CUDA-compatible NVIDIA GPUs
- Training on GPU is 10-50x faster than CPU for neural networks
- If no GPU is available, training will fall back to CPU (slower but functional)

In [None]:
"""
=============================================================================
SECTION 1: IMPORTS AND GPU CONFIGURATION
=============================================================================
Import all required libraries and verify GPU availability.
TensorFlow will automatically use GPU if available (CUDA-compatible NVIDIA GPU required).
"""

# --- Standard Libraries ---
import numpy as np                          # Numerical computing and array operations
import random                               # Random number generation for seed setting
import os                                   # Operating system interface

# --- TensorFlow & Keras ---
import tensorflow as tf                     # Deep learning framework
from tensorflow.keras.models import Sequential    # Sequential model API
from tensorflow.keras.layers import Dense, Dropout # Neural network layers
from tensorflow.keras.datasets import fashion_mnist # Fashion MNIST dataset
from tensorflow.keras.utils import to_categorical   # One-hot encoding utility

# --- Visualization ---
import matplotlib.pyplot as plt             # Plotting and visualization
import seaborn as sns                       # Statistical visualization (for confusion matrix)

# --- Metrics ---
from sklearn.metrics import confusion_matrix # Confusion matrix computation

# --- Image Processing ---
from PIL import Image                       # Image loading and processing

# --- Reproducibility ---
# Setting random seeds ensures we get the same results every run
# This is important for debugging and comparing experiments
SEED = 42
random.seed(SEED)                           # Python random seed
np.random.seed(SEED)                        # NumPy random seed
tf.random.set_seed(SEED)                    # TensorFlow random seed

# --- GPU Configuration ---
# Check if a GPU is available for accelerated training
# TensorFlow automatically uses GPU when available
print("="*60)
print("GPU CONFIGURATION")
print("="*60)
print(f"TensorFlow Version: {tf.__version__}")

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"\nâœ“ GPU Available: {len(gpus)} device(s) detected")
    for gpu in gpus:
        print(f"  â†’ {gpu.name}")
    # Allow memory growth to prevent TensorFlow from allocating all GPU memory at once
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    print("\n  Memory growth enabled for efficient GPU usage")
else:
    print("\nâš  WARNING: No GPU detected!")
    print("  Training will run on CPU (significantly slower)")
    print("  For GPU support, ensure CUDA toolkit and cuDNN are installed")

print("="*60)

## Section 3: Define Class Names

Fashion MNIST contains **10 mutually exclusive clothing categories**. Each image belongs to exactly one class.

The labels are encoded as integers (0-9), which we map to human-readable names:

| Label | Class Name | Description |
|-------|------------|-------------|
| 0 | T-shirt/top | Short-sleeved upper body garment |
| 1 | Trouser | Long lower body garment |
| 2 | Pullover | Long-sleeved upper body garment (no buttons) |
| 3 | Dress | One-piece garment |
| 4 | Coat | Heavy outer garment |
| 5 | Sandal | Open-toed footwear |
| 6 | Shirt | Button-up upper body garment |
| 7 | Sneaker | Casual closed-toe footwear |
| 8 | Bag | Handbag/purse |
| 9 | Ankle boot | Short boot footwear |

These class names will be used for visualization and interpretation throughout the notebook.

In [None]:
"""
=============================================================================
SECTION 2: CLASS LABEL DEFINITIONS
=============================================================================
Fashion MNIST contains 10 categories of clothing items.
Each image belongs to exactly one of these classes (mutually exclusive).
"""

# Class names mapping - index corresponds to label number (0-9)
# These are the 10 clothing categories in the Fashion MNIST dataset
class_names = [
    'T-shirt/top',  # 0 - Short-sleeved upper body garment
    'Trouser',       # 1 - Long lower body garment
    'Pullover',      # 2 - Long-sleeved upper body garment (no buttons)
    'Dress',         # 3 - One-piece garment
    'Coat',          # 4 - Heavy outer garment
    'Sandal',        # 5 - Open-toed footwear
    'Shirt',         # 6 - Button-up upper body garment
    'Sneaker',       # 7 - Casual closed-toe footwear
    'Bag',           # 8 - Handbag/purse
    'Ankle boot'     # 9 - Short boot footwear
]

print(f"Number of classes: {len(class_names)}")
print("\nClass mapping:")
for i, name in enumerate(class_names):
    print(f"  {i}: {name}")

## Section 4: Load Dataset

We load the Fashion MNIST dataset directly from Keras, which provides convenient access to many popular datasets.

**Dataset Split:**
- **Training set**: 60,000 images (used to train the neural network)
- **Test set**: 10,000 images (held out for final evaluation on unseen data)

**Image Format:**
- Each image is **28Ã—28 pixels**
- **Grayscale** (single channel, not RGB)
- Pixel values range from **0 (black) to 255 (white)**
- Data type: `uint8` (unsigned 8-bit integer)

**Labels:**
- Integer values from **0 to 9**
- Correspond to the 10 clothing classes defined above

The dataset is automatically downloaded the first time you run this cell (may take a few seconds).

In [None]:
"""
=============================================================================
SECTION 3: DATA LOADING
=============================================================================
Load the Fashion MNIST dataset directly from Keras.
The dataset is automatically split into training and test sets:
  - Training set: 60,000 images (used to train the neural network)
  - Test set: 10,000 images (used to evaluate performance on unseen data)
Each image is 28x28 pixels in grayscale (single channel, values 0-255).
"""

# Load Fashion MNIST dataset
# Returns two tuples: (training data, training labels), (test data, test labels)
(X_train_raw, y_train_raw), (X_test_raw, y_test_raw) = fashion_mnist.load_data()

# Display dataset information
print("="*60)
print("DATASET LOADED SUCCESSFULLY")
print("="*60)
print(f"\nTraining set:")
print(f"  Images shape: {X_train_raw.shape}")      # (60000, 28, 28)
print(f"  Labels shape: {y_train_raw.shape}")       # (60000,)
print(f"  Pixel value range: [{X_train_raw.min()}, {X_train_raw.max()}]")

print(f"\nTest set:")
print(f"  Images shape: {X_test_raw.shape}")        # (10000, 28, 28)
print(f"  Labels shape: {y_test_raw.shape}")        # (10000,)
print(f"  Pixel value range: [{X_test_raw.min()}, {X_test_raw.max()}]")

print(f"\nImage dimensions: {X_train_raw.shape[1]}x{X_train_raw.shape[2]} pixels")
print(f"Data type: {X_train_raw.dtype}")
print("="*60)

## Section 5: Preview Images Function

Before training our neural network, it's important to **visually inspect the data** to understand what we're working with.

This function displays images at **indices 6, 7, 8, and 9** from the training set in a **2Ã—2 grid**. Visual inspection helps us:

1. **Verify data loaded correctly** - ensure images are not corrupted
2. **Understand the challenge** - see how diverse the clothing items are
3. **Check image quality** - 28Ã—28 is quite low resolution!
4. **Inspect label correctness** - confirm labels match the images

**Why indices 6-9?**
- Arbitrary choice to show variety
- You can modify the `indices` parameter to view any images you want
- Example: `preview_dataset_images(X_train_raw, y_train_raw, indices=[0, 100, 500, 1000])`

The images are displayed in grayscale with their index number and class label shown in the title.

In [None]:
"""
=============================================================================
SECTION 4: DATA VISUALIZATION - PREVIEW FUNCTION
=============================================================================
Visualize sample images from the dataset to understand what the data looks like.
This function displays images at indices 6, 7, 8, 9 in a 2x2 grid.
Visual inspection helps verify data loaded correctly and understand the challenge.
"""

def preview_dataset_images(images, labels, indices=[6, 7, 8, 9]):
    """
    Display a 2x2 grid of images from the dataset with their class labels.
    
    This function helps visualize the raw data before preprocessing,
    allowing us to verify the data loaded correctly and understand
    the visual characteristics of each clothing category.
    
    Args:
        images (np.array): Image data array of shape (N, 28, 28)
        labels (np.array): Label array of shape (N,) with values 0-9
        indices (list): List of 4 image indices to display (default: [6,7,8,9])
    """
    fig, axes = plt.subplots(2, 2, figsize=(8, 8))
    fig.suptitle('Fashion MNIST - Sample Images (Indices 6-9)', 
                 fontsize=16, fontweight='bold')
    
    # Flatten the 2x2 grid of axes for easy iteration
    axes = axes.flatten()
    
    for i, idx in enumerate(indices):
        # Display the image in grayscale
        axes[i].imshow(images[idx], cmap='gray')
        # Set title with index and class name
        axes[i].set_title(
            f'Index: {idx} | Label: {labels[idx]} ({class_names[labels[idx]]})',
            fontsize=11
        )
        # Remove axis ticks for cleaner appearance
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

# Call the preview function to display images 6, 7, 8, 9
print("Previewing images at indices 6, 7, 8, 9:")
preview_dataset_images(X_train_raw, y_train_raw, indices=[6, 7, 8, 9])

## Section 6: Data Preprocessing

Raw data needs to be transformed into a format suitable for neural network training. We perform **three essential preprocessing steps**:

### Step 1: Normalization (Scale Pixel Values)
**What:** Convert pixel values from `[0, 255]` to `[0, 1]` by dividing by 255.0

**Why:**
- Neural networks train **faster and more stably** with small input values
- Large pixel values (e.g., 255) can cause **gradient explosion**
- Normalized inputs help **gradient descent converge** more reliably
- All features are now on the **same scale**

### Step 2: Flattening (2D â†’ 1D)
**What:** Reshape each 28Ã—28 2D image into a 784-element 1D vector

**Why:**
- Dense (fully-connected) layers **require 1D input**
- They cannot process 2D grids directly (CNNs can, but we're using Dense layers)
- Flattening preserves all pixel values: 28 Ã— 28 = 784
- Shape changes: `(60000, 28, 28)` â†’ `(60000, 784)`

### Step 3: One-Hot Encoding (Labels)
**What:** Convert integer labels (0-9) to binary vectors of length 10

**Example:**
- Label `3` (Dress) â†’ `[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`
- Label `7` (Sneaker) â†’ `[0, 0, 0, 0, 0, 0, 0, 1, 0, 0]`

**Why:**
- **Categorical Crossentropy loss** expects this format
- Matches the **10-neuron softmax output** layer
- Treats all classes as **equally different** (no ordinal relationship)

After preprocessing, our data is ready for training!

In [None]:
"""
=============================================================================
SECTION 5: DATA PREPROCESSING
=============================================================================
Three preprocessing steps prepare the raw data for the neural network:

Step 1 - NORMALIZATION: Scale pixel values from [0, 255] to [0, 1]
  â†’ Neural networks train faster and more stably with small input values
  â†’ Formula: normalized_pixel = original_pixel / 255.0

Step 2 - FLATTENING: Reshape 2D images (28x28) to 1D vectors (784)
  â†’ Dense (fully-connected) layers require 1D input vectors
  â†’ Each image becomes a single row of 784 features (28 Ã— 28 = 784)

Step 3 - ONE-HOT ENCODING: Convert integer labels to binary vectors
  â†’ Example: label 3 (Dress) â†’ [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
  â†’ Required by Categorical Crossentropy loss function
  â†’ Matches the 10-neuron softmax output layer
"""

# --- Step 1: NORMALIZATION ---
# Convert pixel values from integers [0, 255] to floats [0.0, 1.0]
# This scaling helps gradient descent converge faster and prevents
# large pixel values from dominating the learning process
X_train = X_train_raw.astype('float32') / 255.0
X_test = X_test_raw.astype('float32') / 255.0
print("Step 1 - Normalization complete")
print(f"  Pixel value range: [{X_train.min()}, {X_train.max()}]")

# --- Step 2: FLATTENING ---
# Reshape each 28x28 image into a 1D vector of 784 values
# Dense layers expect 1D input: (samples, features) not (samples, height, width)
X_train = X_train.reshape(-1, 784)   # -1 means "infer number of samples"
X_test = X_test.reshape(-1, 784)
print(f"\nStep 2 - Flattening complete")
print(f"  X_train shape: {X_train.shape}")   # (60000, 784)
print(f"  X_test shape:  {X_test.shape}")     # (10000, 784)

# --- Step 3: ONE-HOT ENCODING ---
# Convert integer labels to one-hot encoded vectors
# Example: label 3 â†’ [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
# This is required because our output layer has 10 neurons with softmax,
# and Categorical Crossentropy loss expects this format
y_train = to_categorical(y_train_raw, num_classes=10)
y_test = to_categorical(y_test_raw, num_classes=10)
print(f"\nStep 3 - One-Hot Encoding complete")
print(f"  y_train shape: {y_train.shape}")   # (60000, 10)
print(f"  y_test shape:  {y_test.shape}")     # (10000, 10)
print(f"  Example - label {y_train_raw[0]} encoded as: {y_train[0]}")

# --- Summary ---
print("\n" + "="*60)
print("PREPROCESSING SUMMARY")
print("="*60)
print(f"Training data:  {X_train.shape[0]} samples, {X_train.shape[1]} features each")
print(f"Test data:      {X_test.shape[0]} samples, {X_test.shape[1]} features each")
print(f"Label encoding: {y_train.shape[1]} classes (one-hot)")
print("="*60)

## Section 7: Neural Network Architecture - Design & Explanation

### Architecture Overview
We use a **Fully Connected (Dense) Neural Network** with the following structure:

```
Input (784) â†’ Dense(512, ReLU) â†’ Dropout(0.2) â†’ Dense(256, ReLU) â†’ Dropout(0.2) â†’ Dense(10, Softmax)
```

### Why Fully Connected (Dense) Layers?
1. **Simplicity**: Dense networks are straightforward to understand and implement, making them ideal for learning fundamental neural network concepts
2. **Fashion MNIST characteristics**: The images are small (28Ã—28), grayscale, and centered - dense layers can effectively learn patterns from these simple images
3. **Sufficient performance**: Dense networks achieve 85-90% accuracy on Fashion MNIST, meeting our target
4. **Educational value**: Understanding dense networks is the foundation before moving to more complex architectures like CNNs

### Why NOT Convolutional Neural Networks (CNNs)?
While CNNs would achieve slightly higher accuracy (~92-95%) by preserving spatial relationships, our goal is to demonstrate core neural network concepts. Dense layers process all pixels equally, which is sufficient for this small, centered dataset.

### Layer-by-Layer Explanation

| Layer | Output Shape | Parameters | Purpose |
|-------|-------------|------------|----------|
| Input | (784,) | 0 | Flattened 28Ã—28 image |
| Dense 1 | (512,) | 401,920 | Learn low-level patterns (edges, textures) |
| Dropout 1 | (512,) | 0 | Prevent overfitting (drop 20% of neurons) |
| Dense 2 | (256,) | 131,328 | Combine patterns into higher-level features |
| Dropout 2 | (256,) | 0 | Additional overfitting prevention |
| Output | (10,) | 2,570 | Probability for each clothing class |

### Activation Functions
- **ReLU (Rectified Linear Unit)** for hidden layers: f(x) = max(0, x)
  - Solves the vanishing gradient problem (unlike sigmoid/tanh)
  - Computationally efficient
  - Introduces non-linearity so the network can learn complex patterns
  
- **Softmax** for output layer: converts raw scores to probabilities that sum to 1.0
  - Each of the 10 outputs represents the probability of that clothing class
  - The class with the highest probability is the prediction

### Dropout Regularization
- Randomly disables 20% of neurons during each training step
- Forces the network to not rely on any single neuron
- Improves generalization to unseen data (reduces overfitting)

### Neuron Count Rationale (512 â†’ 256 â†’ 10)
- **Decreasing width** creates a "funnel" that compresses information
- **512 neurons** in the first layer capture many low-level features from 784 inputs
- **256 neurons** in the second layer combine these into fewer, more meaningful features
- **10 neurons** in the output produce one score per clothing class

In [None]:
"""
=============================================================================
SECTION 6: NEURAL NETWORK ARCHITECTURE
=============================================================================
Build a Fully Connected (Dense) neural network for clothing classification.

Architecture: Input(784) â†’ Dense(512,ReLU) â†’ Dropout(0.2) â†’ Dense(256,ReLU) â†’ Dropout(0.2) â†’ Dense(10,Softmax)

Why this architecture?
- Dense layers are effective for small, centered images like Fashion MNIST
- ReLU activation solves vanishing gradient and is computationally efficient
- Softmax output produces probability distribution over 10 classes
- Dropout prevents overfitting by randomly disabling 20% of neurons
- Decreasing layer sizes (512â†’256â†’10) create hierarchical feature learning
"""

# Build the Sequential model
# Sequential means layers are stacked one after another in order
model = Sequential([
    
    # --- Hidden Layer 1: 512 neurons with ReLU activation ---
    # Input: 784 features (flattened 28x28 image)
    # 512 neurons learn to detect low-level patterns like edges and textures
    # ReLU activation: f(x) = max(0, x) - introduces non-linearity
    # Parameters: 784 * 512 + 512 (bias) = 401,920
    Dense(512, activation='relu', input_shape=(784,), name='hidden_layer_1'),
    
    # --- Dropout Layer 1: 20% dropout rate ---
    # Randomly sets 20% of neuron outputs to zero during training
    # This prevents the network from becoming too dependent on specific neurons
    # Improves generalization to unseen data (reduces overfitting)
    # Note: Dropout is only active during training, not during prediction
    Dropout(0.2, name='dropout_1'),
    
    # --- Hidden Layer 2: 256 neurons with ReLU activation ---
    # Combines the 512 features from layer 1 into 256 higher-level features
    # Learns more abstract patterns by combining low-level features
    # Parameters: 512 * 256 + 256 (bias) = 131,328
    Dense(256, activation='relu', name='hidden_layer_2'),
    
    # --- Dropout Layer 2: 20% dropout rate ---
    # Additional regularization to prevent overfitting
    Dropout(0.2, name='dropout_2'),
    
    # --- Output Layer: 10 neurons with Softmax activation ---
    # One neuron per clothing class (10 total)
    # Softmax converts raw scores into probabilities that sum to 1.0
    # The class with the highest probability is the predicted class
    # Parameters: 256 * 10 + 10 (bias) = 2,570
    Dense(10, activation='softmax', name='output_layer')
    
], name='fashion_mnist_classifier')

# Display model architecture summary
# Shows layer names, output shapes, and parameter counts
print("="*60)
print("MODEL ARCHITECTURE SUMMARY")
print("="*60)
model.summary()

# Calculate total trainable parameters
total_params = model.count_params()
print(f"\nTotal trainable parameters: {total_params:,}")
print("="*60)

## Section 8: Loss Function & Optimizer Configuration

### Loss Function: Categorical Crossentropy

**What it is:** Measures how different the predicted probability distribution is from the true label. It penalizes the model more when it is confidently wrong.

**Formula:** Loss = -Î£(y_true Ã— log(y_pred))

**Why chosen:**
1. **Multi-class classification**: Fashion MNIST has 10 mutually exclusive classes - Categorical Crossentropy is the standard loss for this type of problem
2. **One-hot encoded labels**: Our labels are in one-hot format `[0,0,0,1,0,...]` which matches this loss function's expected input
3. **Softmax pairing**: Mathematically pairs with softmax activation, providing clean gradients for efficient backpropagation
4. **Penalizes confidence errors**: A prediction of 0.01 for the correct class is penalized much more than 0.4

**Alternative considered:** Sparse Categorical Crossentropy - works with integer labels (0-9) instead of one-hot encoded. Functionally identical but we chose Categorical since we already one-hot encoded our labels.

### Optimizer: Adam (Adaptive Moment Estimation)

**What it is:** An advanced optimization algorithm that combines the benefits of two other optimizers - AdaGrad (adapts learning rate per parameter) and RMSProp (uses moving average of squared gradients).

**Why chosen:**
1. **Adaptive learning rates**: Automatically adjusts the learning rate for each parameter individually - parameters that update frequently get smaller rates, rare parameters get larger rates
2. **Fast convergence**: Typically requires fewer epochs than basic SGD (Stochastic Gradient Descent)
3. **Minimal tuning**: Works excellently with the default learning rate of 0.001
4. **Momentum**: Uses exponential moving averages of gradients, which helps escape local minima and smooths the optimization path
5. **Industry standard**: The most widely used optimizer in modern deep learning

**Alternatives considered:**
- **SGD**: Simpler but requires careful learning rate tuning and more epochs
- **RMSProp**: Good but Adam generally performs better by adding momentum
- **AdaGrad**: Learning rate can decay too aggressively over time

In [None]:
"""
=============================================================================
SECTION 7: LOSS FUNCTION AND OPTIMIZER CONFIGURATION
=============================================================================

LOSS FUNCTION: Categorical Crossentropy
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
- Measures difference between predicted probabilities and true one-hot labels
- Formula: Loss = -Î£(y_true Ã— log(y_pred))
- Standard choice for multi-class classification with one-hot encoded labels
- Penalizes confident wrong predictions heavily
- Pairs mathematically with softmax activation for clean gradients

OPTIMIZER: Adam (Adaptive Moment Estimation)
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
- Combines benefits of RMSprop (adaptive learning rates) and Momentum
- Automatically adjusts learning rate per parameter
- Default learning rate: 0.001 (works well without tuning)
- Fast convergence - typically needs fewer epochs than SGD
- Most widely used optimizer in deep learning

METRIC: Accuracy
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
- Percentage of correctly classified images
- Simple, intuitive measure of model performance
"""

# Compile the model - this configures the training process
# Three key components:
#   1. optimizer - HOW to update weights (Adam)
#   2. loss - WHAT to minimize (Categorical Crossentropy)
#   3. metrics - WHAT to monitor (accuracy)
model.compile(
    optimizer='adam',                    # Adam optimizer with default lr=0.001
    loss='categorical_crossentropy',    # Loss for multi-class one-hot labels
    metrics=['accuracy']                # Track accuracy during training
)

print("Model compiled successfully!")
print(f"  Optimizer: Adam (learning_rate=0.001)")
print(f"  Loss function: Categorical Crossentropy")
print(f"  Metrics: Accuracy")

## Section 9: Model Training

Now we train our neural network on the preprocessed Fashion MNIST data.

### Training Parameters Explained

**batch_size=128**
- Process **128 images at a time** before updating weights
- Larger batches = faster training but may reduce generalization
- Smaller batches = slower but potentially better generalization
- 128 is a good balance between speed and learning quality
- With 60,000 training images: 60,000 / 128 = **469 batches per epoch**

**epochs=20**
- Complete **20 full passes** through the entire training dataset
- Each epoch, the model sees all 60,000 training images
- More epochs = more learning, but also risk of overfitting
- 20 epochs is typically sufficient for convergence on Fashion MNIST

**validation_split=0.2**
- Reserve **20% of training data** (12,000 images) for validation
- Validation data is **NOT used for training** - it monitors how well the model generalizes
- This gives us three datasets:
  - **Training**: 48,000 images (80% of train set) - used to update weights
  - **Validation**: 12,000 images (20% of train set) - monitor overfitting
  - **Test**: 10,000 images (held out completely) - final evaluation

### What Happens During Training?

For each epoch:
1. **Forward pass**: Feed training images through the network, get predictions
2. **Calculate loss**: Compare predictions to true labels using Categorical Crossentropy
3. **Backward pass**: Calculate gradients (how much each weight contributed to the error)
4. **Update weights**: Adam optimizer adjusts weights to reduce loss
5. **Validation**: Evaluate on validation set (no weight updates)
6. **Repeat** for next epoch

You'll see a progress bar showing training and validation loss/accuracy for each epoch.

In [None]:
"""
=============================================================================
SECTION 8: MODEL TRAINING
=============================================================================
Train the neural network on the preprocessed Fashion MNIST training data.

Training Parameters:
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
- batch_size=128: Process 128 images at a time before updating weights.
  Larger batches = faster training but may reduce generalization.
  128 is a good balance between speed and learning quality.

- epochs=20: Complete 20 full passes through the entire training dataset.
  Each epoch, the model sees all 60,000 training images.
  20 epochs is typically sufficient for convergence on Fashion MNIST.

- validation_split=0.2: Reserve 20% of training data (12,000 images) for
  validation. This data is NOT used for training - it monitors how well
  the model generalizes to unseen data during training.
  Training: 48,000 images | Validation: 12,000 images | Test: 10,000 images
"""

print("="*60)
print("STARTING MODEL TRAINING")
print("="*60)
print(f"  Training samples:    {int(X_train.shape[0] * 0.8):,} (80% of train set)")
print(f"  Validation samples:  {int(X_train.shape[0] * 0.2):,} (20% of train set)")
print(f"  Test samples:        {X_test.shape[0]:,} (held out for final evaluation)")
print(f"  Batch size:          128")
print(f"  Epochs:              20")
print("="*60)

# Train the model
# model.fit() returns a History object containing loss and metric values per epoch
history = model.fit(
    X_train,                # Training images (60000, 784)
    y_train,                # Training labels (60000, 10) - one-hot encoded
    batch_size=128,         # Number of samples per gradient update
    epochs=20,              # Number of complete passes through training data
    validation_split=0.2,   # Fraction of training data to use for validation
    verbose=1               # Show progress bar for each epoch
)

print("\n" + "="*60)
print("TRAINING COMPLETED!")
print("="*60)

## Section 10: Learning Curves - Loss & Accuracy Graphs

Visualizing training history helps us understand **how well the model learned** and **whether it's overfitting**.

### How to Interpret These Graphs

#### Loss Graph (Left)
- **Both lines should decrease** over epochs â†’ model is learning
- **Training loss continues decreasing while validation loss increases** â†’ **OVERFITTING** (model memorizes training data)
- **Both lines plateau** â†’ model has **CONVERGED** (learned what it can)
- **Loss still decreasing at epoch 20** â†’ more epochs might improve performance
- **Green vertical line** marks the epoch with minimum validation loss (best generalization)

#### Accuracy Graph (Right)
- **Both lines should increase** over epochs
- **Gap between training and validation accuracy** indicates overfitting
- **Small gap (<5%)** is normal and acceptable
- **Large gap (>10%)** suggests overfitting - model performs much better on training data than unseen data

### What to Look For
- **Ideal scenario**: Both training and validation metrics improve together and plateau
- **Underfitting**: High loss, low accuracy on both training and validation (model hasn't learned enough)
- **Overfitting**: Training metrics great, validation metrics poor (model memorized instead of learned)
- **Good fit**: Small gap between training and validation, both metrics good

### Our Target
We aim for **85-90% test accuracy** with minimal overfitting.

In [None]:
"""
=============================================================================
SECTION 9: TRAINING VISUALIZATION - LEARNING CURVES
=============================================================================
Plot the training and validation loss/accuracy over epochs.

How to interpret these graphs:
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
LOSS GRAPH (left):
  - Both lines should decrease over epochs (model is learning)
  - If validation loss starts increasing while training loss continues
    decreasing â†’ OVERFITTING (model memorizes training data)
  - If both lines plateau â†’ model has CONVERGED (learned what it can)
  - If loss is still decreasing â†’ more epochs might help

ACCURACY GRAPH (right):
  - Both lines should increase over epochs
  - Gap between training and validation accuracy indicates overfitting
  - A small gap (<5%) is normal and acceptable
"""

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
fig.suptitle('Model Training History', fontsize=16, fontweight='bold')

# --- Plot 1: Loss over Epochs ---
axes[0].plot(history.history['loss'], label='Training Loss', 
             color='blue', linewidth=2)
axes[0].plot(history.history['val_loss'], label='Validation Loss', 
             color='red', linewidth=2, linestyle='--')
axes[0].set_title('Loss Over Epochs', fontsize=14)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss (Categorical Crossentropy)', fontsize=12)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)
# Mark the epoch with minimum validation loss
min_val_loss_epoch = np.argmin(history.history['val_loss'])
min_val_loss = min(history.history['val_loss'])
axes[0].axvline(x=min_val_loss_epoch, color='green', linestyle=':', alpha=0.7)
axes[0].annotate(f'Min val loss: {min_val_loss:.4f}\n(epoch {min_val_loss_epoch+1})',
                xy=(min_val_loss_epoch, min_val_loss),
                xytext=(min_val_loss_epoch+2, min_val_loss+0.05),
                arrowprops=dict(arrowstyle='->', color='green'),
                fontsize=10, color='green')

# --- Plot 2: Accuracy over Epochs ---
axes[1].plot(history.history['accuracy'], label='Training Accuracy', 
             color='blue', linewidth=2)
axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', 
             color='red', linewidth=2, linestyle='--')
axes[1].set_title('Accuracy Over Epochs', fontsize=14)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy', fontsize=12)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print final metrics
print(f"\nFinal Training Loss:      {history.history['loss'][-1]:.4f}")
print(f"Final Validation Loss:    {history.history['val_loss'][-1]:.4f}")
print(f"Final Training Accuracy:  {history.history['accuracy'][-1]:.4f} ({history.history['accuracy'][-1]*100:.2f}%)")
print(f"Final Validation Accuracy:{history.history['val_accuracy'][-1]:.4f} ({history.history['val_accuracy'][-1]*100:.2f}%)")

## Section 11: Model Evaluation on Test Set

The **true test** of our model is how well it performs on the **test set** - 10,000 images that were completely held out during training.

### Why the Test Set Matters

1. **Completely unseen data**: The model has NEVER seen these images during training or validation
2. **Unbiased evaluation**: Gives us an honest estimate of real-world performance
3. **Prevents data leakage**: Ensures we didn't accidentally "cheat" by tuning on validation data
4. **Simulates production**: Mimics how the model will perform on new images in deployment

### What We're Evaluating

- **Test Loss**: How well the model's probability predictions match the true labels
- **Test Accuracy**: Percentage of correctly classified images (our main metric)

### Our Target

- **Goal**: 85%+ accuracy on test set
- **Excellent**: 90%+ accuracy
- **State-of-the-art** (with CNNs): 92-95% accuracy

### Next Steps After Evaluation

We'll also generate predictions on the entire test set to create a **confusion matrix**, which shows exactly which clothing types the model confuses most often.

In [None]:
"""
=============================================================================
SECTION 10: MODEL EVALUATION ON TEST SET
=============================================================================
Evaluate the trained model on the test set (10,000 images never seen during training).
This gives us an unbiased estimate of how well our model will perform on new data.

The test set was kept completely separate during training - the model has
never learned from these images, making this a fair evaluation.
"""

# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)

print("="*60)
print("TEST SET EVALUATION RESULTS")
print("="*60)
print(f"  Test Loss:     {test_loss:.4f}")
print(f"  Test Accuracy: {test_accuracy:.4f} ({test_accuracy*100:.2f}%)")
print("="*60)

if test_accuracy >= 0.85:
    print(f"\nâœ“ Target accuracy (>85%) ACHIEVED!")
else:
    print(f"\nâš  Target accuracy (>85%) not met. Consider tuning hyperparameters.")

# Generate predictions for confusion matrix
# model.predict() returns probability arrays for each sample
y_pred_probs = model.predict(X_test, verbose=0)

# Convert probability arrays to class indices using argmax
# argmax returns the index of the highest probability = predicted class
y_pred_classes = np.argmax(y_pred_probs, axis=1)

# Convert one-hot encoded test labels back to class indices for comparison
y_true_classes = np.argmax(y_test, axis=1)

print(f"\nPredictions generated for {len(y_pred_classes)} test images")

## Section 12: Confusion Matrix (10Ã—10)

The **confusion matrix** is one of the most insightful visualizations for classification models. It shows exactly which classes the model confuses.

### How to Read the Confusion Matrix

- **ROWS** represent the **TRUE (actual) labels** - what the clothing item really is
- **COLUMNS** represent the **PREDICTED labels** - what the model thinks it is
- **DIAGONAL cells** (top-left to bottom-right) = **CORRECT predictions** âœ“
- **OFF-DIAGONAL cells** = **MISCLASSIFICATIONS** âœ—

### Example Interpretation

- Cell `[row=Shirt, col=T-shirt]` = "Shirts that were incorrectly predicted as T-shirts"
- Cell `[row=Sneaker, col=Sneaker]` = "Sneakers correctly identified as Sneakers"
- High diagonal values = good performance
- High off-diagonal values = common confusion patterns

### What to Look For

1. **Strong diagonal** - indicates overall good performance
2. **Weak spots** - classes with lower diagonal values need improvement
3. **Confusion patterns** - which classes are commonly mistaken for each other?
   - Example: Shirts vs T-shirts (similar appearance)
   - Example: Pullover vs Coat (both outerwear)
   - Example: Sneaker vs Ankle boot (both footwear)

### Perfect Model

A perfect model would have:
- **All values on the diagonal** (all predictions correct)
- **Zeros everywhere else** (no misclassifications)

In practice, some confusion is inevitable due to the complexity and similarity of certain clothing types.

In [None]:
"""
=============================================================================
SECTION 11: CONFUSION MATRIX (10Ã—10)
=============================================================================
The confusion matrix shows how well the model classifies each clothing type.

How to read it:
â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
- ROWS represent the TRUE (actual) labels
- COLUMNS represent the PREDICTED labels
- DIAGONAL cells (top-left to bottom-right) = CORRECT predictions
- OFF-DIAGONAL cells = MISCLASSIFICATIONS
- Example: If cell [row=Shirt, col=T-shirt] has a high value,
  it means the model often mistakes Shirts for T-shirts

A perfect model would have values ONLY on the diagonal (zero errors).
"""

# Compute the 10x10 confusion matrix
cm = confusion_matrix(y_true_classes, y_pred_classes)

# Create a large, detailed heatmap visualization
plt.figure(figsize=(12, 10))

# Use seaborn heatmap for professional visualization
sns.heatmap(
    cm,
    annot=True,          # Show numbers in each cell
    fmt='d',             # Integer format (not scientific notation)
    cmap='Blues',         # Blue color gradient
    xticklabels=class_names,  # Column labels = class names
    yticklabels=class_names,  # Row labels = class names
    linewidths=0.5,      # Grid line width
    linecolor='gray',    # Grid line color
    square=True          # Make cells square
)

plt.title('Confusion Matrix - Fashion MNIST Classification\n(10Ã—10: True vs Predicted Labels)', 
          fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Predicted Label', fontsize=13, labelpad=10)
plt.ylabel('True Label', fontsize=13, labelpad=10)
plt.xticks(rotation=45, ha='right', fontsize=10)
plt.yticks(rotation=0, fontsize=10)
plt.tight_layout()
plt.show()

# Print per-class accuracy from the confusion matrix
print("\n" + "="*60)
print("PER-CLASS ACCURACY (from Confusion Matrix)")
print("="*60)
for i, name in enumerate(class_names):
    # Accuracy for class i = correct predictions / total samples of class i
    class_accuracy = cm[i, i] / cm[i].sum()
    correct = cm[i, i]
    total = cm[i].sum()
    bar = 'â–ˆ' * int(class_accuracy * 30) + 'â–‘' * (30 - int(class_accuracy * 30))
    print(f"  {name:15s}: {bar} {class_accuracy:.2%}  ({correct}/{total})")
print("="*60)

# Identify most confused pairs
print("\nMost Common Misclassifications:")
# Zero out diagonal to find off-diagonal maximums
cm_no_diag = cm.copy()
np.fill_diagonal(cm_no_diag, 0)
for _ in range(5):
    idx = np.unravel_index(cm_no_diag.argmax(), cm_no_diag.shape)
    true_class = class_names[idx[0]]
    pred_class = class_names[idx[1]]
    count = cm_no_diag[idx]
    print(f"  {true_class} â†’ predicted as {pred_class}: {count} times")
    cm_no_diag[idx] = 0

## Section 13: Predict Unknown Image (User-Loaded External Images)

Now let's use our trained model to classify **your own clothing images**!

### How This Function Works

The `predict_clothing_image()` function takes any external image file and:

1. **Loads the image** from the file path
2. **Converts to grayscale** (if it's a color image)
3. **Resizes to 28Ã—28 pixels** (Fashion MNIST format)
4. **Auto-inverts colors if needed** â€” Fashion MNIST has light clothing on dark backgrounds, but real photos are the opposite. The function detects this and inverts automatically.
5. **Normalizes pixel values** to [0, 1] (divide by 255)
6. **Flattens to 784-element vector** (same as training data)
7. **Feeds through the neural network**
8. **Returns the predicted class and confidence**

### Color Inversion â€” Why It Matters

Fashion MNIST images look like this: **black background, light-colored clothing silhouette**. Real-world photos are the opposite: **light/white background, dark clothing**. Without inverting, the model gets completely different input than what it was trained on, causing poor predictions. The function handles this automatically.

### Important Notes

**Image Format Requirements:**
- Supported formats: PNG, JPG, JPEG, BMP, WEBP
- Any size image works (auto-resized to 28Ã—28)
- Color or grayscale images both work (auto-converted)

### How to Use

**Option 1 â€” File Upload Widget** (recommended for Jupyter):
Run the upload widget cell below, click "Upload", select your clothing image, and get an instant prediction.

**Option 2 â€” Manual Path Input**:
When prompted, type or paste the full path to your clothing image file.

### Visualization Output

The function displays:
- **Left plot**: Original image as loaded
- **Center plot**: The preprocessed 28Ã—28 grayscale image (what the model sees)
- **Right plot**: Confidence bar chart for all 10 classes
- **Top 3 predictions** with confidence percentages

In [None]:
"""
=============================================================================
SECTION 12: UNKNOWN IMAGE PREDICTION (User-Loaded External Images)
=============================================================================
This function loads an external clothing image provided by the user and
predicts its class using our trained neural network.

The image goes through the same preprocessing pipeline as training data:
  1. Convert to grayscale (if color)
  2. Resize to 28x28 pixels
  3. Auto-invert colors if needed (Fashion MNIST = light-on-dark,
     real photos = dark-on-light). Detected by mean pixel value.
  4. Normalize pixel values to [0, 1]
  5. Flatten to 784-element vector
  6. Reshape to (1, 784) for model input (batch of 1)
"""

def predict_clothing_image(image_path, model, class_names):
    """
    Load an external clothing image and predict its class using the trained model.
    
    Preprocesses the image to match Fashion MNIST format: grayscale, 28x28 pixels,
    light-on-dark colors, normalized, and flattened. Automatically inverts colors
    for real-world photos that have dark clothing on light backgrounds.
    
    Args:
        image_path (str): Path to the clothing image file (PNG, JPG, JPEG, BMP, WEBP)
        model: Trained Keras model
        class_names (list): List of 10 class name strings
    
    Returns:
        tuple: (predicted_class_name, confidence_score) or (None, 0.0) on error
    """
    try:
        # Step 1: Load the image file
        img = Image.open(image_path)
        print(f"Image loaded: {image_path}")
        print(f"  Original size: {img.size}, Mode: {img.mode}")
        
        # Keep a copy of the original for display
        img_original = img.copy()
        
        # Step 2: Convert to grayscale (single channel)
        # Fashion MNIST images are grayscale, so we need to match that format
        img_gray = img.convert('L')
        
        # Step 3: Resize to 28x28 pixels (Fashion MNIST image dimensions)
        img_resized = img_gray.resize((28, 28))
        
        # Step 4: Convert to numpy array for numerical processing
        img_array = np.array(img_resized)
        
        # Step 5: Auto-detect and invert colors if needed
        # Fashion MNIST has BLACK background (0) with LIGHT clothing (~255)
        # Real-world photos typically have LIGHT/WHITE background with DARK clothing
        # If mean pixel value > 127, the background is light â†’ invert colors
        mean_pixel = img_array.mean()
        if mean_pixel > 127:
            img_array = 255 - img_array
            print(f"  Color inverted (mean pixel {mean_pixel:.0f} > 127 â†’ light background detected)")
        else:
            print(f"  No inversion needed (mean pixel {mean_pixel:.0f} â‰¤ 127 â†’ dark background)")
        
        # Step 6: Normalize pixel values from [0, 255] to [0, 1]
        img_normalized = img_array.astype('float32') / 255.0
        
        # Step 7: Flatten from 2D (28, 28) to 1D (784,)
        # Then reshape to (1, 784) - batch dimension required by model
        img_flat = img_normalized.reshape(1, 784)
        
        # Step 8: Make prediction using the trained model
        prediction = model.predict(img_flat, verbose=0)
        
        # Step 9: Extract predicted class and confidence
        predicted_class_idx = np.argmax(prediction[0])
        confidence = prediction[0][predicted_class_idx]
        predicted_name = class_names[predicted_class_idx]
        
        # --- Display results ---
        fig, axes = plt.subplots(1, 3, figsize=(15, 4))
        
        # Left: Show original image as loaded
        if img_original.mode == 'L':
            axes[0].imshow(np.array(img_original), cmap='gray')
        else:
            axes[0].imshow(np.array(img_original))
        axes[0].set_title('Original Image', fontsize=12)
        axes[0].axis('off')
        
        # Center: Show preprocessed 28x28 image (what the model sees)
        axes[1].imshow(img_array, cmap='gray')
        axes[1].set_title('Preprocessed 28Ã—28\n(Model Input)', fontsize=12)
        axes[1].axis('off')
        
        # Right: Show prediction confidence for all classes
        colors = ['green' if i == predicted_class_idx else 'steelblue' 
                  for i in range(10)]
        axes[2].barh(class_names, prediction[0], color=colors)
        axes[2].set_title('Prediction Confidence', fontsize=12)
        axes[2].set_xlim([0, 1])
        axes[2].set_xlabel('Probability')
        
        fig.suptitle(f'Prediction: {predicted_name} ({confidence:.1%} confidence)', 
                     fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        print(f"\n{'='*40}")
        print(f"  Predicted Class: {predicted_name}")
        print(f"  Confidence:      {confidence:.2%}")
        print(f"{'='*40}")
        
        # Show top 3 predictions
        top3_idx = np.argsort(prediction[0])[::-1][:3]
        print("\nTop 3 predictions:")
        for rank, idx in enumerate(top3_idx, 1):
            print(f"  {rank}. {class_names[idx]:15s}: {prediction[0][idx]:.2%}")
        
        return predicted_name, float(confidence)
        
    except FileNotFoundError:
        print(f"ERROR: Image file not found at '{image_path}'")
        print("  Please check the file path and try again.")
        return None, 0.0
    except Exception as e:
        print(f"ERROR: Could not process image - {str(e)}")
        return None, 0.0


# === QUICK TEST: Predict on a sample from the test set ===
print("Quick test: predicting on a sample from the test set...\n")
sample_idx = 0
sample_img = Image.fromarray(X_test_raw[sample_idx])
sample_path = 'sample_test_image.png'
sample_img.save(sample_path)
print(f"Saved test image (true label: {class_names[y_test_raw[sample_idx]]}) to '{sample_path}'")

predicted_class, confidence = predict_clothing_image(sample_path, model, class_names)
print(f"\nTrue label: {class_names[y_test_raw[sample_idx]]}")
print(f"Predicted:  {predicted_class}")
print(f"Correct:    {'YES' if predicted_class == class_names[y_test_raw[sample_idx]] else 'NO'}")

### Try It Yourself â€” Upload Your Own Clothing Image

Run the cell below to upload and classify your own clothing images.

**How it works:**
1. Click the **"Choose Files"** button that appears
2. Select one or more clothing image files from your computer
3. The model will automatically predict each uploaded image

**Tips for best results:**
- Use images with a **clean, single-color background**
- The image should contain **one clothing item**
- Any size and format (PNG, JPG, JPEG, BMP, WEBP) works â€” it will be auto-resized
- Colors are **auto-inverted** to match Fashion MNIST format

**Run the cell again** to upload more images.

In [None]:
# SECTION 12b: UPLOAD & PREDICT using Google Colab File Upload
from google.colab import files
import io

print("=" * 60)
print("UPLOAD YOUR CLOTHING IMAGES")
print("=" * 60)
print("Click Choose Files below to select images from your computer.")
print()

uploaded = files.upload()

if uploaded:
    print(str(len(uploaded)) + " file(s) uploaded. Running predictions...")
    print()
    for filename, file_data in uploaded.items():
        print("=" * 60)
        print("Processing: " + filename)
        print("=" * 60)
        try:
            img = Image.open(io.BytesIO(file_data))
            print("  Original size: " + str(img.size) + ", Mode: " + img.mode)
            img_original = img.copy()
            img_gray = img.convert('L')
            img_resized = img_gray.resize((28, 28))
            img_array = np.array(img_resized)
            mean_pixel = img_array.mean()
            if mean_pixel > 127:
                img_array = 255 - img_array
                print("  Color inverted (light background detected)")
            else:
                print("  No inversion needed (dark background)")
            img_normalized = img_array.astype('float32') / 255.0
            img_flat = img_normalized.reshape(1, 784)
            prediction = model.predict(img_flat, verbose=0)
            predicted_class_idx = np.argmax(prediction[0])
            confidence = prediction[0][predicted_class_idx]
            predicted_name = class_names[predicted_class_idx]
            fig, axes = plt.subplots(1, 3, figsize=(15, 4))
            if img_original.mode == 'L':
                axes[0].imshow(np.array(img_original), cmap='gray')
            else:
                axes[0].imshow(np.array(img_original))
            axes[0].set_title('Original Image', fontsize=12)
            axes[0].axis('off')
            axes[1].imshow(img_array, cmap='gray')
            axes[1].set_title('Preprocessed 28x28', fontsize=12)
            axes[1].axis('off')
            bar_colors = ['green' if i == predicted_class_idx else 'steelblue' for i in range(10)]
            axes[2].barh(class_names, prediction[0], color=bar_colors)
            axes[2].set_title('Prediction Confidence', fontsize=12)
            axes[2].set_xlim([0, 1])
            axes[2].set_xlabel('Probability')
            pct = str(round(confidence * 100, 1))
            fig.suptitle('Prediction: ' + predicted_name + ' (' + pct + '%)', fontsize=14, fontweight='bold')
            plt.tight_layout()
            plt.show()
            top3_idx = np.argsort(prediction[0])[::-1][:3]
            print("  Predicted: " + predicted_name + " (" + pct + "%)")
            print("  Top 3:")
            for rank, idx in enumerate(top3_idx, 1):
                p = str(round(prediction[0][idx] * 100, 2))
                print("    " + str(rank) + ". " + class_names[idx] + ": " + p + "%")
            print()
        except Exception as e:
            print("  ERROR: Could not process " + filename + " - " + str(e))
            print()
else:
    print("No files uploaded.")

print("Done! Run this cell again to upload more images.")


## Section 14: Save Trained Model (Optional)

After spending time training our model, we want to **save it for future use** without needing to retrain.

### What Gets Saved

The `.keras` file format (TensorFlow 2.x) saves the **complete model**:

1. **Architecture** - layer structure, neuron counts, activations
2. **Weights** - all learned parameters from training
3. **Optimizer state** - Adam optimizer configuration and momentum
4. **Compilation settings** - loss function, metrics

### Benefits of Saving the Model

- **Skip retraining**: Load the model instantly instead of training for 20 epochs
- **Deployment**: Use the model in production applications
- **Sharing**: Share the trained model with others
- **Versioning**: Save different versions as you experiment
- **Continued training**: Load and continue training with more data

### How to Load the Model Later

```python
import tensorflow as tf

# Load the complete model
loaded_model = tf.keras.models.load_model('fashion_mnist_model.keras')

# Use it for predictions
predict_clothing_image('my_image.jpg', loaded_model, class_names)
```

### Alternative Save Formats

- **HDF5 format** (legacy): `model.save('model.h5')`
- **Weights only**: `model.save_weights('weights.h5')` - saves only weights, not architecture
- **SavedModel format**: `model.save('saved_model/')` - for TensorFlow Serving (production)

We recommend the `.keras` format as it's the modern standard and most convenient.

In [None]:
"""
=============================================================================
SECTION 13: SAVE TRAINED MODEL (Optional)
=============================================================================
Save the trained model to disk so it can be loaded later for predictions
without needing to retrain. This is useful for deployment or continued work.
"""

# Save the complete model (architecture + weights + optimizer state)
model.save('fashion_mnist_model.keras')
print("Model saved to 'fashion_mnist_model.keras'")
print("\nTo load the model later:")
print("  loaded_model = tf.keras.models.load_model('fashion_mnist_model.keras')")
print("  predict_clothing_image('image.jpg', loaded_model, class_names)")