## 1. Understanding CNNs: The Basics

**What are Convolutional Neural Networks?**

CNNs are a specialized type of neural network designed for processing data with a grid-like structure, such as images. While humans can easily recognize patterns in images, computers see images as arrays of numbers. CNNs help bridge this gap.

**Why use CNNs for images?**

Regular neural networks don't work well for images because:


*   Images have a lot of pixels (e.g., a 28×28 image has 784 pixels)
*   Spatial relationships between pixels matter
*    The same pattern can appear in different locations

**CNNs solve these problems using three key ideas:**

1.   Local connectivity: Looking at small regions of the image at a time
2. Parameter sharing: Using the same filters across the entire image
3. Pooling: Reducing the size of the representations

### The Building Blocks of a CNN

#### 1. Convolutional Layer

This layer applies filters to the input image to extract features:

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28010254/conv1.png)

How it works:

A filter (or **kernel**) slides across the image
At each position, it performs element-wise multiplication and summation
This creates a feature map highlighting patterns like edges, textures, etc.

Here's an animation showing how the filter moves:

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28011851/conv.gif)

#### Key Concepts in Convolution:

(Stride, Padding, Filters)

* Stride: How many pixels the filter moves each time

  * Stride of 1: Move one pixel at a time
  * Stride of 2: Move two pixels at a time (reduces output size)
![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28090227/stride1.gif)

* Padding: Adding pixels (usually zeros) around the image border

  * "Valid" padding: No padding added, output is smaller than input
  * "Same" padding: Padding added to keep the output the same size as input

  ![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28094927/padding.gif)

* Filters: Each filter extracts different features (edges, colors, textures)

  * Multiple filters create multiple feature maps
  * These maps are stacked to form the output volume

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28113904/activation-map.png)

#### 2. Activation Function (ReLU)

After convolution, an activation function is applied to introduce non-linearity. The most common is ReLU (Rectified Linear Unit), which simply replaces negative values with zero.

![](https://www.researchgate.net/publication/319235847/figure/fig3/AS:537056121634820@1505055565670/ReLU-activation-function.png)


#### 3. Pooling Layer

Pooling reduces the spatial dimensions (width and height) of the feature maps:

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2017/06/28022816/maxpool.png)

**Types of pooling:**

* Max Pooling: Takes the maximum value from each region
* Average Pooling: Takes the average value from each region

**Benefits of pooling:**

* Reduces computation in the network

#### 4. Fully Connected (Dense) Layers

After several convolution and pooling layers, the network uses fully connected layers to:

Flatten the 2D feature maps into a 1D vector
Combine features to make predictions
Output the final classification probabilities

### **The Complete CNN Architecture**

A typical CNN has this structure:

1. Input Layer (image)
2. Convolution Layer + ReLU
3. Pooling Layer
4. (Repeat steps 2-3 several times)
5. Flatten Layer
6. Fully Connected (Dense) Layer + ReLU
7. Output Layer (with Softmax for classification)

# 2. The Fashion MNIST Dataset

Before we build our model, let's understand our data.

**What is Fashion MNIST?**
Fashion MNIST is a dataset of Zalando's fashion article images:

* 60,000 training images
* 10,000 test images
* 28×28 grayscale images
* 10 clothing categories

It was created as a more challenging drop-in replacement for the original MNIST digits dataset.

**The 10 Classes:**

Each image belongs to one of these classes:

* 0 T-shirt/top
* 1 Trouser
* 2 Pullover
* 3 Dress
* 4 Coat
* 5 Sandal
* 6 Shirt
* 7 Sneaker
* 8 Bag
* 9 Ankle boot


# 3. Building Our CNN: Step by Step

Now let's implement a CNN to classify Fashion MNIST images. We'll break it down into 7 clear steps.



### Step 1: Import Libraries

In [None]:
# Basic data manipulation and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Machine learning tools
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Deep learning tools
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from tensorflow.keras.optimizers import Adam

# loading the dataset
from tensorflow.keras.datasets import fashion_mnist

### Step 2: Load and Explore the Data

In [None]:
# Load the Fashion MNIST dataset
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Print the shapes to verify
print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")

The data has 785 columns:

* The first column contains the class labels (0-9)
* The remaining 784 columns contain pixel values (0-255) for the 28×28 image

Let's visualize some example images:

In [None]:
# Define class names for better understanding
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Display some sample images with their labels
plt.figure(figsize=(10, 10))
for i in range(25):  # Show 5x5 grid of images
    plt.subplot(5, 5, i + 1)
    # X_train already contains the image data in shape (samples, 28, 28)
    img = X_train[i]  # No need to reshape as it's already in the right format
    plt.imshow(img, cmap='gray')
    label_idx = y_train[i]  # Get the label directly from y_train
    plt.title(class_names[int(label_idx)])
    plt.axis('off')
plt.tight_layout()
plt.show()

### Step 3: Prepare the Data

In [None]:
# Split the training data into training and validation sets (80% train, 20% validation)
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.2, random_state=42)

# Print the shapes to verify
print(f"Training data shape: {X_train.shape}")
print(f"Training labels shape: {y_train.shape}")
print(f"Validation data shape: {X_val.shape}")
print(f"Validation labels shape: {y_val.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Test labels shape: {y_test.shape}")

### Step 4: Build the CNN Model


In [None]:
# Create a Sequential model (layers are stacked sequentially)
model = Sequential([
    # First Convolutional Layer
    # 32 filters of size 3x3, ReLU activation
    Conv2D(
        filters=32,          # Number of filters
        kernel_size=3,       # Filter size
        activation='relu',   # Activation function
        input_shape=(28, 28, 1)  # Input image dimensions (height, width, channels)
    ),

    # First Pooling Layer
    # Reduces spatial dimensions by half
    MaxPooling2D(pool_size=2),  # 2x2 pooling window

    # Dropout Layer (prevents overfitting by randomly "dropping" 20% of neurons)
    Dropout(0.2),

    # Flatten Layer (convert 2D feature maps to 1D feature vector)
    Flatten(),

    # Fully Connected Layer with 32 neurons
    Dense(32, activation='relu'),

    # Output Layer with 10 neurons (one for each class)
    # Softmax ensures outputs sum to 1 (probability distribution)
    Dense(10, activation='softmax')
])

# Display model summary
model.summary()

### Step 5: Compile the Model


In [None]:
model.compile(
    # Loss function measures how well the model is performing
    loss='sparse_categorical_crossentropy',  # Appropriate for integer labels

    # Optimizer updates model weights based on loss
    optimizer=Adam(learning_rate=0.001),     # Adam is an adaptive optimizer

    # Metrics to monitor during training
    metrics=['accuracy']                     # Percentage of correctly classified images
)

### Step 6: Train the Model


In [None]:
# Define batch size and number of epochs
batch_size = 128  # Number of samples processed before updating weights
epochs = 2       # Number of complete passes through the training dataset

# Train the model
history = model.fit(
    X_train,                  # Training data
    y_train,                  # Training labels
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,                # Progress display mode
    validation_data=(X_val, y_val)  # Data to evaluate model after each epoch
)

### Step 7: Evaluate the Model


In [None]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test accuracy: {test_accuracy:.4f}')
print(f'Test loss: {test_loss:.4f}')

# Visualize training history
plt.figure(figsize=(12, 4))

# Plot training & validation accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Training')
plt.plot(history.history['val_accuracy'], label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()

# Plot training & validation loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Training')
plt.plot(history.history['val_loss'], label='Validation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Get predictions for test data
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)

# Create a classification report
print("Classification Report:")
print(classification_report(
    y_test,
    y_pred_classes,
    target_names=class_names
))

# Visualize some predictions
plt.figure(figsize=(12, 12))
for i in range(25):  # 5x5 grid
    plt.subplot(5, 5, i + 1)
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')

    # Green title for correct predictions, red for incorrect
    actual = int(y_test[i])
    predicted = y_pred_classes[i]

    if actual == predicted:
        color = 'green'
    else:
        color = 'red'

    plt.title(f"A: {class_names[actual]}\nP: {class_names[predicted]}",
              color=color)
    plt.axis('off')

plt.tight_layout()
plt.show()


# 4. Improving the Model

Here are some ways to potentially improve our CNN:

1. Add more convolutional layers to extract more complex features
2. Increase the number of filters in each convolutional layer
3. Use data augmentation to artificially expand the training dataset
4. Try different optimizers or learning rates
5. Experiment with batch normalization to stabilize training

Example of a deeper CNN:

In [None]:
improved_model = Sequential([
    # First convolutional block
    Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
    Conv2D(32, 3, activation='relu', padding='same'),
    MaxPooling2D(2),
    Dropout(0.25),

    # Second convolutional block
    Conv2D(64, 3, activation='relu', padding='same'),
    Conv2D(64, 3, activation='relu', padding='same'),
    MaxPooling2D(2),
    Dropout(0.25),

    # Fully connected layers
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# 5. Conclusion

To summarize what we've learned:

1. CNNs use specialized layers (convolution, pooling) to effectively process image data
2. The convolution operation extracts features using filters that slide across the image
3. Pooling reduces dimensions while preserving important information
4. The training process involves forward passes to make predictions and backward passes to update weights
5. Model evaluation requires separate validation and test sets