# Why MLPs Fail for Image Data

**Deep Learning - University of Vermont**

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. Explain why Multi-Layer Perceptrons (MLPs) are not suitable for image classification tasks
2. Calculate the number of parameters in an MLP given different input sizes
3. Demonstrate the scalability problem with concrete numerical examples
4. Understand the motivation for Convolutional Neural Networks (CNNs)

## Introduction

Multi-Layer Perceptrons have proven effective for many machine learning tasks, particularly those involving tabular or structured data. A natural question arises: can we apply the same approach to image classification?

In this tutorial, we examine this question using the Fashion MNIST dataset as our case study. We will demonstrate that while MLPs can technically be applied to image data, fundamental scalability issues make them impractical for real-world computer vision applications.

This analysis motivates the transition to Convolutional Neural Networks, which we will cover in subsequent lectures.

## Representing Images as Input to Neural Networks

When using an MLP for image classification, each pixel must be treated as a separate input feature. For a color image, the total number of input features is given by:

$$\text{Input Size} = \text{Width} \times \text{Height} \times \text{Channels}$$

For the Fashion MNIST dataset, images are grayscale (1 channel) with dimensions 28×28:

$$28 \times 28 \times 1 = 784 \text{ input features}$$

For a standard RGB image of size 64×64:

$$64 \times 64 \times 3 = 12,288 \text{ input features}$$

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt

def calculate_input_size(width, height, channels):
    """
    Calculate the total number of input features for an image.
    
    Parameters:
    -----------
    width : int
        Width of the image in pixels
    height : int
        Height of the image in pixels
    channels : int
        Number of color channels (1 for grayscale, 3 for RGB)
    
    Returns:
    --------
    int
        Total number of input features
    """
    return width * height * channels

# Calculate input size for Fashion MNIST (grayscale, 28x28)
fashion_mnist_input = calculate_input_size(28, 28, 1)
print(f"Fashion MNIST (28×28 grayscale): {fashion_mnist_input:,} input features")

# Calculate input size for a small RGB image (64x64)
small_rgb_input = calculate_input_size(64, 64, 3)
print(f"Small RGB image (64×64):         {small_rgb_input:,} input features")

# Calculate input size for a high-resolution image (2000x2000)
large_rgb_input = calculate_input_size(2000, 2000, 3)
print(f"High-resolution image (2000×2000): {large_rgb_input:,} input features")

## Parameter Count in Multi-Layer Perceptrons

Consider an MLP with two hidden layers. The total number of parameters (weights only, excluding biases) is calculated as:

$$\text{Parameters} = (\text{Input} \times H_1) + (H_1 \times H_2) + (H_2 \times \text{Output})$$

Where:
- $H_1$ is the number of neurons in the first hidden layer
- $H_2$ is the number of neurons in the second hidden layer

The following function computes this parameter count:

In [None]:
def count_mlp_parameters(input_size, hidden1, hidden2, output_size):
    """
    Calculate the total number of weight parameters in a two-hidden-layer MLP.
    
    This calculation excludes bias terms to match the lecture slide examples.
    
    Parameters:
    -----------
    input_size : int
        Number of input features (flattened image size)
    hidden1 : int
        Number of neurons in the first hidden layer
    hidden2 : int
        Number of neurons in the second hidden layer
    output_size : int
        Number of output classes
    
    Returns:
    --------
    tuple
        (total_parameters, layer1_params, layer2_params, layer3_params)
    """
    # Weights from input layer to first hidden layer
    layer1_params = input_size * hidden1
    
    # Weights from first hidden layer to second hidden layer
    layer2_params = hidden1 * hidden2
    
    # Weights from second hidden layer to output layer
    layer3_params = hidden2 * output_size
    
    # Total parameter count
    total = layer1_params + layer2_params + layer3_params
    
    return total, layer1_params, layer2_params, layer3_params

## Case Study: Fashion MNIST Classification

Let us apply this analysis to the Fashion MNIST dataset, which contains 28×28 grayscale images of clothing items across 10 categories.

### Network Architecture

We will use the following architecture:
- **Input**: 28 × 28 × 1 = 784 pixels
- **Hidden Layer 1**: 1,000 neurons
- **Hidden Layer 2**: 1,000 neurons
- **Output**: 10 classes

In [None]:
# Define the network architecture for Fashion MNIST
INPUT_SIZE_FASHION = 28 * 28 * 1  # 784 pixels (grayscale)
HIDDEN_LAYER_1 = 1000             # First hidden layer size
HIDDEN_LAYER_2 = 1000             # Second hidden layer size
OUTPUT_SIZE = 10                  # 10 clothing categories

# Calculate the total number of parameters
total_params, l1_params, l2_params, l3_params = count_mlp_parameters(
    INPUT_SIZE_FASHION, HIDDEN_LAYER_1, HIDDEN_LAYER_2, OUTPUT_SIZE
)

# Display the results
print("Fashion MNIST Classification Network")
print("=" * 50)
print(f"\nInput size: {INPUT_SIZE_FASHION:,} features")
print(f"\nParameter breakdown:")
print(f"  Input → Hidden1:   {INPUT_SIZE_FASHION:,} × {HIDDEN_LAYER_1:,} = {l1_params:,}")
print(f"  Hidden1 → Hidden2: {HIDDEN_LAYER_1:,} × {HIDDEN_LAYER_2:,} = {l2_params:,}")
print(f"  Hidden2 → Output:  {HIDDEN_LAYER_2:,} × {OUTPUT_SIZE:,} = {l3_params:,}")
print(f"\nTotal parameters: {total_params:,}")
print(f"Approximately {total_params/1_000_000:.2f} million parameters")

## The Scalability Problem

The Fashion MNIST example demonstrates that even small images require a substantial number of parameters. However, real-world applications often involve much larger images.

Consider the example from the lecture slides: classifying images using an MLP with the same hidden layer configuration (1,000 neurons each).

### Comparison: Small vs. Large Images

In [None]:
# Define image sizes to compare (from lecture slides)
# Format: (width, height, channels, description)
IMAGE_CONFIGURATIONS = [
    (28, 28, 1, "Fashion MNIST"),    # Grayscale, 28x28
    (64, 64, 3, "Small RGB"),        # RGB, 64x64 (slides example)
    (2000, 2000, 3, "High Resolution")  # RGB, 2000x2000 (slides example)
]

print("Parameter Count Comparison")
print("=" * 70)
print(f"{'Image Type':<20} {'Dimensions':<15} {'Input Size':<15} {'Parameters'}")
print("-" * 70)

for width, height, channels, description in IMAGE_CONFIGURATIONS:
    # Calculate input size for this image configuration
    input_size = calculate_input_size(width, height, channels)
    
    # Calculate total parameters with two hidden layers of 1000 neurons
    total_params, _, _, _ = count_mlp_parameters(
        input_size, HIDDEN_LAYER_1, HIDDEN_LAYER_2, OUTPUT_SIZE
    )
    
    # Format the dimensions string
    dim_str = f"{width}×{height}×{channels}"
    
    # Format parameter count with appropriate units
    if total_params >= 1_000_000_000:
        params_str = f"{total_params/1_000_000_000:.2f} Billion"
    elif total_params >= 1_000_000:
        params_str = f"{total_params/1_000_000:.2f} Million"
    else:
        params_str = f"{total_params:,}"
    
    print(f"{description:<20} {dim_str:<15} {input_size:>12,}   {params_str}")

print("=" * 70)

## Detailed Analysis: 64×64 vs 2000×2000 Images

The lecture slides present a specific comparison that illustrates the scalability problem:

- **64×64 RGB image**: Approximately 13.38 million parameters
- **2000×2000 RGB image**: Approximately 12.01 billion parameters

Let us verify these calculations:

In [None]:
# Calculation for 64×64 RGB image (from lecture slides)
INPUT_64 = 64 * 64 * 3  # 12,288 pixels

# Total parameters: Input×H1 + H1×H2 + H2×Output
# = 12,288 × 1,000 + 1,000 × 1,000 + 1,000 × 2
# = 12,288,000 + 1,000,000 + 2,000
# = 13,290,000 (approximately 13.38M as stated in slides)

params_64_total, params_64_l1, params_64_l2, params_64_l3 = count_mlp_parameters(
    INPUT_64, 1000, 1000, 2  # Binary classification as in slides
)

print("64×64 RGB Image Classification")
print("=" * 50)
print(f"Input size: {INPUT_64:,}")
print(f"\nParameter calculation:")
print(f"  Layer 1: {INPUT_64:,} × 1,000 = {params_64_l1:,}")
print(f"  Layer 2: 1,000 × 1,000 = {params_64_l2:,}")
print(f"  Layer 3: 1,000 × 2 = {params_64_l3:,}")
print(f"\n  Total: {params_64_total:,}")
print(f"  ≈ {params_64_total/1_000_000:.2f} Million parameters")

In [None]:
# Calculation for 2000×2000 RGB image (from lecture slides)
INPUT_2000 = 2000 * 2000 * 3  # 12,000,000 pixels

# Total parameters: Input×H1 + H1×H2 + H2×Output
# = 12,000,000 × 1,000 + 1,000 × 1,000 + 1,000 × 2
# = 12,000,000,000 + 1,000,000 + 2,000
# = 12,001,002,000 (approximately 12.01G as stated in slides)

params_2000_total, params_2000_l1, params_2000_l2, params_2000_l3 = count_mlp_parameters(
    INPUT_2000, 1000, 1000, 2  # Binary classification as in slides
)

print("2000×2000 RGB Image Classification")
print("=" * 50)
print(f"Input size: {INPUT_2000:,}")
print(f"\nParameter calculation:")
print(f"  Layer 1: {INPUT_2000:,} × 1,000 = {params_2000_l1:,}")
print(f"  Layer 2: 1,000 × 1,000 = {params_2000_l2:,}")
print(f"  Layer 3: 1,000 × 2 = {params_2000_l3:,}")
print(f"\n  Total: {params_2000_total:,}")
print(f"  ≈ {params_2000_total/1_000_000_000:.2f} Billion parameters")

# Calculate the increase factor
increase_factor = params_2000_total / params_64_total
print(f"\nThe parameter count increased by a factor of {increase_factor:.0f}x")

## Visualization of the Scalability Problem

The following visualization demonstrates how parameter count grows as image resolution increases:

In [None]:
# Define a range of image sizes for comparison
IMAGE_SIZES = [
    (28, 28, 1, "Fashion MNIST"),
    (32, 32, 3, "CIFAR-10"),
    (64, 64, 3, "Small"),
    (224, 224, 3, "ImageNet"),
    (512, 512, 3, "Medium"),
    (1024, 1024, 3, "Large"),
    (2000, 2000, 3, "HD Photo"),
]

# Calculate parameters for each image size
results = []
for w, h, c, name in IMAGE_SIZES:
    input_size = w * h * c
    total_params, _, _, _ = count_mlp_parameters(input_size, 1000, 1000, 10)
    results.append({
        'name': name,
        'dimensions': f"{w}×{h}",
        'input_size': input_size,
        'parameters': total_params
    })

# Create the visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Extract data for plotting
labels = [r['dimensions'] for r in results]
params = [r['parameters'] for r in results]

# Define colors based on parameter count thresholds
colors = []
for p in params:
    if p < 100_000_000:      # Less than 100M: acceptable
        colors.append('#2ecc71')
    elif p < 1_000_000_000:  # Less than 1B: concerning
        colors.append('#f39c12')
    else:                    # 1B or more: impractical
        colors.append('#e74c3c')

# Bar chart: Parameter count by image size
axes[0].bar(labels, [p/1e9 for p in params], color=colors, edgecolor='white', linewidth=2)
axes[0].set_ylabel('Parameters (Billions)', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Image Dimensions', fontsize=12, fontweight='bold')
axes[0].set_title('MLP Parameter Count vs Image Size', fontsize=14, fontweight='bold')
axes[0].tick_params(axis='x', rotation=45)

# Add threshold lines
axes[0].axhline(y=1, color='red', linestyle='--', alpha=0.7, label='1 Billion threshold')
axes[0].axhline(y=0.1, color='orange', linestyle='--', alpha=0.7, label='100 Million threshold')
axes[0].legend(loc='upper left', fontsize=9)

# Log-scale plot: Shows exponential growth
input_sizes = [r['input_size'] for r in results]
axes[1].semilogy(input_sizes, params, 'o-', color='#3498db', 
                  linewidth=3, markersize=10, markerfacecolor='white', markeredgewidth=2)
axes[1].set_xlabel('Number of Input Features', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Parameters (log scale)', fontsize=12, fontweight='bold')
axes[1].set_title('Exponential Growth of Parameters', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)

# Annotate key data points
for r in results:
    if r['name'] in ['Fashion MNIST', 'ImageNet', 'HD Photo']:
        axes[1].annotate(
            r['name'], 
            (r['input_size'], r['parameters']), 
            textcoords="offset points", 
            xytext=(10, 10), 
            fontsize=9, 
            fontweight='bold'
        )

plt.tight_layout()
plt.show()

## Practical Implications

The large parameter counts associated with MLPs for image data lead to several critical problems:

### 1. Memory Requirements

Each parameter requires memory for storage. During training, additional memory is needed for gradients and optimizer states.

In [None]:
def calculate_memory_requirements(num_params, bytes_per_param=4):
    """
    Calculate the memory required for model parameters.
    
    Parameters:
    -----------
    num_params : int
        Total number of parameters in the model
    bytes_per_param : int
        Bytes per parameter (4 for 32-bit float, 2 for 16-bit float)
    
    Returns:
    --------
    tuple
        (weight_memory_gb, training_memory_gb)
    
    Notes:
    ------
    Training memory is estimated as 3x weight memory to account for:
    - Parameter storage
    - Gradient storage
    - Optimizer states (e.g., momentum)
    """
    # Calculate memory for weights only
    weight_memory_gb = (num_params * bytes_per_param) / (1024 ** 3)
    
    # Estimate total training memory (approximately 3x for SGD with momentum)
    training_memory_gb = weight_memory_gb * 3
    
    return weight_memory_gb, training_memory_gb

# Display memory requirements for each image size
print("Memory Requirements Analysis")
print("=" * 75)
print(f"{'Image Type':<15} {'Dimensions':<12} {'Weights (GB)':<15} {'Training (GB)':<15} {'Status'}")
print("-" * 75)

for r in results:
    weight_mem, train_mem = calculate_memory_requirements(r['parameters'])
    
    # Determine status based on memory requirements
    if train_mem < 8:
        status = "Feasible"
    elif train_mem < 24:
        status = "Requires High-End GPU"
    else:
        status = "Impractical"
    
    print(f"{r['name']:<15} {r['dimensions']:<12} {weight_mem:>12.2f}   {train_mem:>12.2f}   {status}")

print("=" * 75)

### 2. Overfitting Risk

When the number of parameters greatly exceeds the number of training examples, models tend to memorize the training data rather than learn generalizable patterns.

In [None]:
# Common dataset sizes for reference
DATASET_SIZES = {
    "Fashion MNIST": 60000,
    "CIFAR-10": 50000,
    "ImageNet": 1200000,
    "Small Custom Dataset": 10000,
}

# Calculate parameters-to-examples ratio for high-resolution images
hd_params = count_mlp_parameters(2000 * 2000 * 3, 1000, 1000, 2)[0]

print("Overfitting Risk Analysis")
print("(For 2000×2000 RGB images with MLP)")
print("=" * 65)
print(f"{'Dataset':<25} {'Examples':<15} {'Params/Example':<15} {'Risk Level'}")
print("-" * 65)

for dataset_name, num_examples in DATASET_SIZES.items():
    # Calculate the ratio of parameters to training examples
    ratio = hd_params / num_examples
    
    # Determine risk level
    # Rule of thumb: ideally less than 10 parameters per example
    if ratio < 10:
        risk = "Low"
    elif ratio < 100:
        risk = "Moderate"
    else:
        risk = "Severe"
    
    print(f"{dataset_name:<25} {num_examples:<15,} {ratio:>12,.0f}   {risk}")

print("=" * 65)
print("\nNote: Effective generalization typically requires fewer than 10-100")
print("parameters per training example.")

## Root Cause Analysis

The fundamental issue with MLPs for image data is that they treat images as flat, unstructured vectors. This approach has several inherent limitations:

1. **Loss of Spatial Structure**: An MLP does not recognize that adjacent pixels are related
2. **No Weight Sharing**: Features learned at one location cannot be applied elsewhere
3. **Full Connectivity**: Every input connects to every neuron, resulting in $O(\text{Input} \times \text{Hidden})$ parameters

The following visualization illustrates how an MLP processes an image:

In [None]:
# Create visualization comparing 2D image structure vs MLP input
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Create a simple 8x8 image with a recognizable pattern
sample_image = np.zeros((8, 8))
sample_image[2:6, 2:6] = 1.0  # Outer square
sample_image[3:5, 3:5] = 0.5  # Inner square

# Panel 1: Original 2D image structure
axes[0].imshow(sample_image, cmap='Blues')
axes[0].set_title('Original Image\n(2D Spatial Structure)', fontsize=12, fontweight='bold')
axes[0].axis('off')

# Panel 2: Flattened input for MLP
flattened = sample_image.flatten().reshape(1, -1)
axes[1].imshow(flattened, cmap='Blues', aspect='auto')
axes[1].set_title('MLP Input\n(Flattened 1D Vector)', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Single row', fontsize=10)
axes[1].set_xlabel('64 features', fontsize=10)
axes[1].set_yticks([])

# Panel 3: Visualization of full connectivity
axes[2].set_xlim(0, 10)
axes[2].set_ylim(0, 10)
axes[2].set_title('Fully Connected Architecture\n(All inputs connect to all neurons)', fontsize=12, fontweight='bold')
axes[2].axis('off')

# Draw input layer nodes
num_input_shown = 8
for i in range(num_input_shown):
    axes[2].plot(2, 9 - i, 'o', markersize=10, color='#3498db')
    axes[2].text(1, 9 - i, f'x{i}', ha='right', va='center', fontsize=8)

# Draw hidden layer nodes
num_hidden_shown = 4
for j in range(num_hidden_shown):
    axes[2].plot(8, 7.5 - j * 1.5, 'o', markersize=10, color='#2ecc71')
    axes[2].text(9, 7.5 - j * 1.5, f'h{j}', ha='left', va='center', fontsize=8)

# Draw connections (fully connected)
for i in range(num_input_shown):
    for j in range(num_hidden_shown):
        axes[2].plot([2, 8], [9 - i, 7.5 - j * 1.5], '-', color='#e74c3c', alpha=0.2, linewidth=0.5)

# Add annotation explaining the scaling problem
axes[2].text(5, 0.5, 
             f'{num_input_shown} inputs × {num_hidden_shown} hidden = {num_input_shown * num_hidden_shown} connections',
             ha='center', fontsize=9, style='italic')

plt.tight_layout()
plt.show()

## Preview: Convolutional Neural Networks

The limitations of MLPs motivated the development of Convolutional Neural Networks (CNNs), which address the scalability problem through:

1. **Local Connectivity**: Neurons connect only to small local regions (e.g., 3×3 or 5×5)
2. **Parameter Sharing**: The same filter weights are applied across the entire image
3. **Translation Invariance**: Patterns learned at one location generalize to other locations

### Parameter Comparison: MLP vs CNN

In [None]:
def count_conv_layer_params(in_channels, out_channels, kernel_size):
    """
    Calculate the number of parameters in a convolutional layer.
    
    Parameters:
    -----------
    in_channels : int
        Number of input channels
    out_channels : int
        Number of output channels (filters)
    kernel_size : int
        Size of the convolutional kernel (assumes square kernel)
    
    Returns:
    --------
    int
        Number of parameters (kernel_size² × in_channels × out_channels)
    """
    return (kernel_size ** 2) * in_channels * out_channels

# Example CNN architecture for 2000×2000 RGB images
# Conv1: 3 input channels → 32 output channels, 3×3 kernel
# Conv2: 32 input channels → 64 output channels, 3×3 kernel
# Conv3: 64 input channels → 128 output channels, 3×3 kernel

conv1_params = count_conv_layer_params(3, 32, 3)   # 864 parameters
conv2_params = count_conv_layer_params(32, 64, 3)  # 18,432 parameters
conv3_params = count_conv_layer_params(64, 128, 3) # 73,728 parameters

cnn_conv_total = conv1_params + conv2_params + conv3_params

# MLP parameters for the same image size
mlp_params_2000 = count_mlp_parameters(2000 * 2000 * 3, 1000, 1000, 2)[0]

print("Parameter Comparison: MLP vs CNN")
print("(For 2000×2000 RGB image classification)")
print("=" * 55)
print(f"\nMLP (2 hidden layers, 1000 neurons each):")
print(f"  Total: {mlp_params_2000:,} parameters")
print(f"  ≈ {mlp_params_2000/1e9:.2f} billion parameters")
print(f"\nCNN (3 convolutional layers, before pooling/FC):")
print(f"  Conv1 (3→32, 3×3):  {conv1_params:,}")
print(f"  Conv2 (32→64, 3×3): {conv2_params:,}")
print(f"  Conv3 (64→128, 3×3): {conv3_params:,}")
print(f"  Total: {cnn_conv_total:,} parameters")
print(f"  ≈ {cnn_conv_total/1e3:.0f} thousand parameters")
print(f"\nReduction factor: {mlp_params_2000/cnn_conv_total:,.0f}×")

## Summary

### Key Findings

This tutorial demonstrated that Multi-Layer Perceptrons are fundamentally unsuitable for image classification due to:

1. **Explosive parameter growth**: Parameter count scales linearly with input size
2. **Impractical memory requirements**: High-resolution images require billions of parameters
3. **Severe overfitting risk**: Too many parameters relative to available training data
4. **Loss of spatial information**: Flattening destroys the 2D structure of images

### Numerical Summary (From Lecture Slides)

For an MLP with two hidden layers (1,000 neurons each):

| Image Size | Input Features | Parameters |
|------------|----------------|------------|
| 64×64×3 | 12,288 | ~13.38 Million |
| 2000×2000×3 | 12,000,000 | ~12.01 Billion |

### Parameter Formula

$$\text{Parameters} = (W \times H \times C) \times H_1 + H_1 \times H_2 + H_2 \times \text{Output}$$

The first term dominates and grows **quadratically** with image dimensions.

---

**Next Topic**: Convolutional Neural Networks solve these problems through local connectivity, parameter sharing, and translation invariance.

In [None]:
# Final summary visualization
fig, ax = plt.subplots(figsize=(12, 6))

# Image dimensions to compare
dimensions = [28, 64, 128, 224, 512, 1024, 2000]

# Calculate MLP parameters for each dimension (assuming 3 channels for consistency)
mlp_params_list = [
    count_mlp_parameters(d * d * 3, 1000, 1000, 10)[0] 
    for d in dimensions
]

# Simplified CNN parameter estimate (grows slowly with image size)
cnn_params_list = [100000 + d * 1000 for d in dimensions]

# Plot both curves on log scale
ax.semilogy(dimensions, mlp_params_list, 'o-', color='#e74c3c', linewidth=3, 
            markersize=10, label='MLP (Fully Connected)', 
            markerfacecolor='white', markeredgewidth=2)
ax.semilogy(dimensions, cnn_params_list, 's-', color='#2ecc71', linewidth=3,
            markersize=10, label='CNN (Convolutional)',
            markerfacecolor='white', markeredgewidth=2)

# Add reference lines
ax.axhline(y=1e9, color='red', linestyle='--', alpha=0.5, label='1 Billion')
ax.axhline(y=1e6, color='orange', linestyle='--', alpha=0.5, label='1 Million')

# Labels and formatting
ax.set_xlabel('Image Dimension (pixels)', fontsize=14, fontweight='bold')
ax.set_ylabel('Number of Parameters (log scale)', fontsize=14, fontweight='bold')
ax.set_title('The Scalability Problem: Why MLPs Fail for Images', fontsize=16, fontweight='bold')
ax.legend(fontsize=11, loc='lower right')
ax.grid(True, alpha=0.3)

# Add annotations
ax.annotate('Practical for MLPs', xy=(64, mlp_params_list[1]), xytext=(100, 5e7),
            arrowprops=dict(arrowstyle='->', color='gray'),
            fontsize=10, ha='center')

ax.annotate('Impractical', xy=(1024, mlp_params_list[5]), xytext=(800, 5e10),
            arrowprops=dict(arrowstyle='->', color='#e74c3c'),
            fontsize=10, ha='center', color='#e74c3c', fontweight='bold')

plt.tight_layout()
plt.show()

print("\nConclusion: Convolutional Neural Networks are essential for practical image classification.")