# 🖼️ Notebook 02: Convolutional Neural Networks (CNNs)

**Week 3-4: Deep Learning & NLP Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

In the previous notebook, we built our first fully connected neural networks. Now, we will dive into **Convolutional Neural Networks (CNNs)**, a specialized and highly effective type of neural network designed for processing grid-like data, most notably images. Understanding CNNs is crucial as they form the basis of computer vision and are even finding applications in other domains like NLP.

By the end of this notebook, you will be able to:
1.  **Understand the Convolution Operation**: Intuitively and mathematically grasp how filters (or kernels) slide over an image to create **feature maps**, which are representations of learned patterns.
2.  **Implement Pooling Layers**: Learn how **max pooling** and **average pooling** are used to downsample feature maps, reducing dimensionality and creating a degree of spatial invariance.
3.  **Design a CNN Architecture**: Understand how to assemble convolutional, pooling, and fully connected layers into a cohesive and effective CNN architecture.
4.  **Build a CNN with PyTorch**: Construct and train a CNN for a multi-class image classification task from scratch using PyTorch's `nn.Module`.
5.  **Apply to a Practical Problem**: Frame the training process around a practical use case: building a visual quality inspection model for a **Manufacturing Copilot**.
6.  **Grasp the Concept of Transfer Learning**: Briefly understand the powerful principle of leveraging pre-trained models to dramatically boost performance and reduce training time.

**Estimated Time:** 3-4 hours

---

## 📚 What are CNNs?

Convolutional Neural Networks (CNNs or ConvNets) are a class of deep neural networks that have become the dominant architecture for computer vision tasks. Their design is inspired by the organization of the animal visual cortex and they have revolutionized the field by achieving state-of-the-art results on a wide array of problems.

### Core Applications:
-   📸 **Image Classification**: Assigning a single label to an entire image (e.g., "cat", "dog", "car").
-   🎯 **Object Detection**: Identifying and drawing bounding boxes around multiple objects within an image.
-   🎨 **Image Segmentation**: Classifying each pixel in an image to create a pixel-level mask for every object.
-   🏥 **Medical Imaging**: Analyzing scans like X-rays, CTs, and MRIs for tumor detection and diagnosis.
-   🏭 **Manufacturing Quality Inspection**: Automatically detecting defects, scratches, or anomalies in products on an assembly line (our focus for this notebook).
-   🤖 **Autonomous Vehicles**: Powering the perception systems that allow self-driving cars to "see" and understand the world around them.

### The Key Insight: Spatial Hierarchies and Parameter Sharing
Unlike fully connected networks that treat an image as a flat vector of pixels, CNNs preserve and leverage the spatial relationships between pixels. They use shared weights in the form of **filters (or kernels)** to automatically and hierarchically learn features.
-   A **first layer** might learn to detect simple features like edges and color gradients.
-   A **second layer** might combine these edges to detect more complex shapes like circles, squares, or textures.
-   **Deeper layers** might combine these shapes to detect object parts like eyes, noses, or car wheels.

This hierarchical learning and parameter sharing make CNNs incredibly efficient and effective for visual data. Let's dive in and build one! 🚀

## 🚀 Agenda

Our journey through CNNs will be structured as follows:

1.  **Setting the Stage**: We'll import the necessary libraries (`torch`, `numpy`, `matplotlib`) and set up our environment.
2.  **The Convolution Operation**: We'll start with the fundamental building block of any CNN. We will create a simple grayscale image and manually apply a convolution filter to understand how it extracts features like edges.
3.  **Pooling Layers**: We'll explore how pooling (specifically max pooling) works to downsample feature maps, making the network more efficient and robust.
4.  **Building a Complete CNN Architecture**: We'll define a full CNN model in PyTorch, combining convolutional, pooling, and fully connected layers.
5.  **The Use Case: Manufacturing Quality Inspection**: We'll introduce a practical problem—classifying images of cast iron parts as either "defective" or "ok". This provides a real-world context for our model.
6.  **Data Loading and Preprocessing**: We'll load the dataset, apply necessary transformations (like resizing and converting to tensors), and set up `DataLoaders` for training and validation.
7.  **Training the CNN**: We'll write the complete training loop to train our model on the casting dataset, monitoring loss and accuracy.
8.  **Evaluation and Visualization**: We'll evaluate the model's performance on the test set and visualize its predictions to see where it succeeds and fails.
9.  **A Glimpse into Transfer Learning**: We'll conclude with a brief discussion on transfer learning, a powerful technique for leveraging pre-trained models to achieve better results with less data and time.

In [None]:
# --- 1. Set up the Environment ---

# Import core libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as T
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import os
import zipfile
from tqdm.notebook import tqdm

# --- Configuration ---

# Set a seed for reproducibility
# This ensures that any operation involving randomness (e.g., weight initialization, data shuffling)
# will produce the same results every time the code is run.
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)

# Set the default device
# This checks if a CUDA-enabled GPU is available and sets it as the device.
# If not, it falls back to the CPU. Training on a GPU is significantly faster.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"✅ Using device: {device.upper()}")

# --- Plotting Style ---

# Set a consistent and visually appealing style for all our plots
plt.style.use("seaborn-v0_8-whitegrid")
# Set a default figure size for better readability
plt.rcParams["figure.figsize"] = (10, 6)
# Use a color palette that is friendly to colorblind individuals
sns.set_palette("colorblind")

## 🧠 Part 1: The Convolution Operation - A Visual Intuition

The **convolution** is the core operation of a CNN. It's how the network "looks" at an image and identifies features.

Imagine sliding a small magnifying glass over a large picture. This magnifying glass is our **filter** (or **kernel**). The filter is a small matrix of weights. As it slides over the image, it performs an element-wise multiplication with the patch of the image it's currently on, and then sums up the results into a single pixel in the output image, called a **feature map** or **activation map**.

This process is repeated across the entire image. The key idea is that a specific filter is designed to "activate" (produce a high value) when it detects a specific feature it's looking for, such as a vertical edge, a horizontal edge, a specific color, or a certain texture.

Let's visualize this with a simple example. We'll create a basic grayscale image and apply a well-known edge detection filter to see this in action.

In [None]:
# --- 2. Visualizing the Convolution Operation ---

# a) Create a simple grayscale image
# We'll create a 10x10 numpy array representing a grayscale image.
# The image will have a sharp vertical edge in the middle:
# - The left half will be black (pixel value 0).
# - The right half will be white (pixel value 255).
image = np.zeros((10, 10))
image[:, 5:] = 255

# b) Define a vertical edge detection filter (Sobel Operator)
# This filter is designed to have high values when it detects a vertical line.
# The positive values on the left and negative values on the right will create a
# strong response when slid over the vertical edge in our image.
vertical_edge_filter = np.array([
    [1, 0, -1],
    [2, 0, -2],
    [1, 0, -1]
])

# c) Apply the convolution
# We need to convert our numpy arrays to PyTorch tensors to use PyTorch's convolution functions.
# The dimensions need to be reshaped to match what PyTorch expects:
# [Batch Size, Channels, Height, Width]
# - Batch Size = 1: We are processing one image.
# - Channels = 1: It's a grayscale image (a color image would have 3 channels: R, G, B).
# - Height = 10, Width = 10: The dimensions of our image.
image_tensor = torch.from_numpy(image).float().unsqueeze(0).unsqueeze(0)
filter_tensor = torch.from_numpy(vertical_edge_filter).float().unsqueeze(0).unsqueeze(0)

# Use PyTorch's functional API for convolution.
# `nn.functional.conv2d` takes the image, the filter, and optional parameters.
# - `padding=1`: We add a 1-pixel border around the image. This allows the 3x3 filter
#   to be centered on the edge pixels of the original image, preserving the output size.
feature_map_tensor = nn.functional.conv2d(image_tensor, filter_tensor, padding=1)

# Convert the output tensor back to a numpy array for visualization.
# `.squeeze()` removes the batch and channel dimensions.
feature_map = feature_map_tensor.squeeze().detach().numpy()


# d) Visualize the results
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Plot the original image
sns.heatmap(image, ax=axes[0], cmap='gray', cbar=False, annot=True, fmt=".0f", linewidths=.5, linecolor='red')
axes[0].set_title("Original Image (10x10)")
axes[0].set_aspect('equal')

# Plot the filter
sns.heatmap(vertical_edge_filter, ax=axes[1], cmap='coolwarm', cbar=False, annot=True, fmt=".0f", linewidths=.5)
axes[1].set_title("Vertical Edge Filter (3x3)")
axes[1].set_aspect('equal')

# Plot the resulting feature map
sns.heatmap(feature_map, ax=axes[2], cmap='gray', cbar=False, annot=True, fmt=".0f", linewidths=.5, linecolor='blue')
axes[2].set_title("Feature Map (10x10)")
axes[2].set_aspect('equal')

plt.suptitle("Visualizing the 2D Convolution Operation", fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

print("\n💡 Analysis:")
print("Notice how the feature map has high positive values exactly where the vertical edge was.")
print("The filter successfully 'detected' the feature it was designed to find.")
print("In a real CNN, the network *learns* the values of these filters on its own during training!")

## 🧠 Part 2: Pooling Layers - Downsampling for Efficiency

After a convolution layer produces a feature map, it's common to use a **pooling layer** to downsample it. Pooling reduces the spatial dimensions (height and width) of the feature map, which has several key benefits:

1.  **Reduces Computational Load**: Smaller feature maps mean fewer parameters and computations in subsequent layers, making the network faster and more memory-efficient.
2.  **Increases Receptive Field**: Each pixel in a pooled feature map corresponds to a larger area in the original image, allowing subsequent layers to learn more global patterns.
3.  **Provides Spatial Invariance**: By summarizing a neighborhood of features into a single value (e.g., the maximum), pooling makes the network slightly more robust to small translations or distortions of the feature in the input image.

The most common type of pooling is **Max Pooling**. It works by sliding a window over the feature map and, for each window, taking only the maximum value.

Let's visualize how max pooling works on the feature map we just created.

In [None]:
# --- 3. Visualizing the Max Pooling Operation ---

# a) Define the Max Pooling layer
# We'll use a 2x2 window and a stride of 2.
# - `kernel_size=2`: The pooling window will be 2x2.
# - `stride=2`: The window will move 2 pixels at a time (no overlap).
# This setup will effectively halve the height and width of the feature map.
max_pool_layer = nn.MaxPool2d(kernel_size=2, stride=2)

# b) Apply Max Pooling to our feature map
# The input tensor needs to be in the [Batch, Channel, Height, Width] format.
# We already have `feature_map_tensor` from the previous step.
pooled_map_tensor = max_pool_layer(feature_map_tensor)

# Convert the output back to a numpy array for visualization
pooled_map = pooled_map_tensor.squeeze().detach().numpy()


# c) Visualize the results
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Plot the feature map before pooling
sns.heatmap(feature_map, ax=axes[0], cmap='gray', cbar=False, annot=True, fmt=".0f", linewidths=.5, linecolor='blue')
axes[0].set_title(f"Feature Map Before Pooling ({feature_map.shape[0]}x{feature_map.shape[1]})")
axes[0].set_aspect('equal')

# Highlight the 2x2 pooling regions
for i in range(0, feature_map.shape[0], 2):
    for j in range(0, feature_map.shape[1], 2):
        axes[0].add_patch(plt.Rectangle((j, i), 2, 2, fill=False, edgecolor='red', lw=2))


# Plot the feature map after pooling
sns.heatmap(pooled_map, ax=axes[1], cmap='gray', cbar=False, annot=True, fmt=".0f", linewidths=.5, linecolor='green')
axes[1].set_title(f"Feature Map After 2x2 Max Pooling ({pooled_map.shape[0]}x{pooled_map.shape[1]})")
axes[1].set_aspect('equal')


plt.suptitle("Visualizing the Max Pooling Operation", fontsize=16)
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

print("\n💡 Analysis:")
print(f"The original {feature_map.shape} feature map has been downsampled to {pooled_map.shape}.")
print("For each 2x2 red square in the original map, only the maximum value is kept.")
print("Notice that the strong signal from the detected edge is preserved, while the less important zero-valued regions are condensed.")
print("This makes the representation more compact and efficient.")

## 🧠 Part 3: Building a Complete CNN Architecture

Now that we understand the two main components—**convolutional layers** and **pooling layers**—we can assemble them into a complete CNN.

A typical CNN architecture for image classification follows a standard pattern:

1.  **Input Layer**: An image with dimensions `[Channels, Height, Width]`. For example, a 224x224 color image would be `[3, 224, 224]`.

2.  **Convolutional Blocks**: A sequence of one or more blocks, each containing:
    *   A **Convolutional Layer (`nn.Conv2d`)**: Applies a set of learnable filters to the input, creating a stack of feature maps. This layer increases the number of channels (depth) as it learns more features.
    *   An **Activation Function (`nn.ReLU`)**: Introduces non-linearity, allowing the network to learn complex patterns.
    *   A **Pooling Layer (`nn.MaxPool2d`)**: Downsamples the feature maps, reducing their height and width.

3.  **Flattening**: After the last convolutional block, the 3D feature maps `[Channels, Height, Width]` are "flattened" into a single long 1D vector. This prepares the data for the final classification stage.

4.  **Fully Connected (FC) Layers**: A series of one or more `nn.Linear` layers, just like in the neural networks we built previously. These layers perform the final classification based on the high-level features extracted by the convolutional blocks.

5.  **Output Layer**: The final `nn.Linear` layer that produces the output scores (logits) for each class. For a binary classification problem, this will have 1 output neuron; for a 10-class problem, it will have 10.

Let's define a simple CNN model in PyTorch that follows this structure.

In [None]:
# --- 4. Defining the CNN Model in PyTorch ---

class SimpleCNN(nn.Module):
    """
    A simple Convolutional Neural Network for image classification.
    """
    def __init__(self, num_classes=1):
        """
        Initializes the layers of the CNN.
        
        Args:
            num_classes (int): The number of output classes. For binary classification, this is 1.
        """
        super(SimpleCNN, self).__init__()
        
        # --- Convolutional Feature Extractor ---
        # This part of the network is responsible for learning and extracting features from the image.
        self.features = nn.Sequential(
            # --- Block 1 ---
            # Input: [Batch, 3, 128, 128] (assuming 3-channel color images of size 128x128)
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding=1),
            # -> Output: [Batch, 16, 128, 128]
            # `in_channels=3`: For RGB images.
            # `out_channels=16`: We are learning 16 different features.
            # `kernel_size=3`: A 3x3 filter.
            # `padding=1`: Preserves the height and width (128x128).
            
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # -> Output: [Batch, 16, 64, 64] (Height and width are halved)

            # --- Block 2 ---
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),
            # -> Output: [Batch, 32, 64, 64]
            # `in_channels=16`: Must match the `out_channels` of the previous Conv layer.
            # `out_channels=32`: We learn 32 more complex features.
            
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # -> Output: [Batch, 32, 32, 32]
        )
        
        # --- Fully Connected Classifier ---
        # This part of the network takes the extracted features and performs classification.
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # -> Output: [Batch, 32 * 32 * 32] = [Batch, 32768]
            # Flattens the 3D feature map into a 1D vector.
            
            nn.Linear(in_features=32 * 32 * 32, out_features=128),
            # `in_features` must match the size of the flattened vector.
            # `out_features=128`: A hidden layer with 128 neurons.
            
            nn.ReLU(),
            
            nn.Linear(in_features=128, out_features=num_classes)
            # The final output layer. For binary classification, `num_classes` is 1.
        )

    def forward(self, x):
        """
        Defines the forward pass of the network.
        
        Args:
            x (torch.Tensor): The input batch of images.
        
        Returns:
            torch.Tensor: The output logits from the classifier.
        """
        x = self.features(x)
        x = self.classifier(x)
        return x

# Let's create a dummy input tensor to trace the shape changes through the network.
dummy_input = torch.randn(1, 3, 128, 128) # (Batch, Channels, Height, Width)
model_test = SimpleCNN()

# Pass the dummy input through the feature extractor
features_output = model_test.features(dummy_input)
# Pass the features through the classifier
classifier_output = model_test.classifier(features_output)

print("--- Tracing Tensor Shapes ---")
print(f"Dummy Input Shape:      {dummy_input.shape}")
print(f"After Feature Extractor: {features_output.shape}")
print(f"After Classifier:       {classifier_output.shape}")
print("\n✅ Model architecture seems correct.")

## 🏭 Part 4: Use Case - Manufacturing Quality Inspection

Now, let's apply our CNN to a practical problem. We'll build a **Visual Quality Inspection system** for a manufacturing process.

**The Scenario**: Imagine a factory producing cast iron parts. Some parts come off the assembly line with defects (e.g., cracks, blowholes), while others are perfectly fine. Manually inspecting every single part is slow, expensive, and prone to human error.

**The Goal**: We will train our CNN to automatically classify images of these parts as either **"defective"** or **"ok"**. This is a classic binary image classification task.

**The Dataset**: We will use the "Casting Product Image Data for Quality Inspection" dataset. It contains thousands of images of cast iron products, neatly organized into `train` and `test` folders, and further subdivided into `def_front` (defective) and `ok_front` (ok) classes.

First, let's download and extract the dataset.

In [None]:
# --- 5. Data Loading and Preprocessing ---

# a) Download and Unzip the Dataset
# The dataset is hosted as a zip file. We'll use Python's `urllib` to download it
# and `zipfile` to extract its contents.

import urllib.request

# URL of the dataset
url = "https://www.dropbox.com/s/kfx3y83zyp5c6w5/casting_data.zip?dl=1"
zip_path = "casting_data.zip"
extract_path = "casting_data"

# --- Download the file with a progress bar ---
# This provides a better user experience for large downloads.
class TqdmUpTo(tqdm):
    """Provides `update_to(block_num, block_size, total_size)`."""
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)

if not os.path.exists(extract_path):
    print(f"Downloading dataset from {url}...")
    with TqdmUpTo(unit='B', unit_scale=True, unit_divisor=1024, miniters=1,
                  desc=zip_path) as t:
        urllib.request.urlretrieve(url, filename=zip_path, reporthook=t.update_to)
    print("Download complete.")

    # --- Unzip the file ---
    print(f"Extracting {zip_path} to {extract_path}...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)
    print("Extraction complete.")
    
    # --- Clean up the zip file ---
    os.remove(zip_path)
    print(f"Removed {zip_path}.")
else:
    print(f"Dataset already exists at '{extract_path}'. Skipping download and extraction.")

# Define the paths to the training and testing directories
data_dir = os.path.join(extract_path, 'casting_data')
train_dir = os.path.join(data_dir, 'train')
test_dir = os.path.join(data_dir, 'test')

print(f"\nTrain directory: {train_dir}")
print(f"Test directory:  {test_dir}")

# Let's check the contents of the training directory
print("\nClasses in training directory:", os.listdir(train_dir))

In [None]:
# b) Define Image Transformations
# Before feeding images into our network, we must perform several preprocessing steps.
# We use `torchvision.transforms` to create a pipeline for these operations.

# `T.Compose` chains multiple transformations together.
# The transformations are applied in the order they are listed.
data_transforms = T.Compose([
    # 1. Resize the image to a fixed size (e.g., 128x128).
    # CNNs require inputs to have a consistent size.
    T.Resize((128, 128)),
    
    # 2. Convert the PIL Image to a PyTorch Tensor.
    # This also scales the pixel values from the range [0, 255] to [0.0, 1.0].
    T.ToTensor(),
    
    # 3. Normalize the tensor.
    # This standardizes the pixel values to have a mean of 0.5 and a standard deviation of 0.5
    # for each channel. The formula is: output = (input - mean) / std.
    # So, our pixel values in the range [0, 1] will be mapped to [-1, 1].
    # Normalization helps the model train faster and more stably.
    # The values (0.5, 0.5, 0.5) are the means and stds for the R, G, B channels respectively.
    T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])


# c) Create a Custom PyTorch Dataset
class CastingImageDataset(Dataset):
    """
    Custom Dataset for loading the casting product images.
    """
    def __init__(self, root_dir, transform=None):
        """
        Args:
            root_dir (string): Directory with all the images, structured with subdirectories for each class.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = []
        self.labels = []
        
        # Define the mapping from class name to integer label
        self.class_to_int = {"def_front": 1, "ok_front": 0}
        
        # Walk through the directory to find all image files and their labels
        for class_name in self.class_to_int.keys():
            class_dir = os.path.join(root_dir, class_name)
            for img_name in os.listdir(class_dir):
                if img_name.endswith(('.png', '.jpg', '.jpeg')):
                    self.image_paths.append(os.path.join(class_dir, img_name))
                    self.labels.append(self.class_to_int[class_name])

    def __len__(self):
        """Returns the total number of samples in the dataset."""
        return len(self.image_paths)

    def __getitem__(self, idx):
        """
        Fetches the sample at the given index.
        
        Args:
            idx (int): The index of the sample to fetch.
            
        Returns:
            tuple: (image, label) where image is the transformed image tensor
                   and label is the integer label.
        """
        # Load the image from its path
        img_path = self.image_paths[idx]
        # `Image.open` loads the image. `.convert("RGB")` ensures it has 3 channels,
        # even if it's grayscale, which is important for our 3-channel input CNN.
        image = Image.open(img_path).convert("RGB")
        
        # Get the corresponding label
        label = self.labels[idx]
        
        # Apply transformations if they exist
        if self.transform:
            image = self.transform(image)
            
        # Return the image tensor and its label as a tensor
        return image, torch.tensor(label, dtype=torch.float32)

# d) Create Dataset and DataLoader instances
# We create one dataset for the training data and one for the testing data.
train_dataset = CastingImageDataset(root_dir=train_dir, transform=data_transforms)
test_dataset = CastingImageDataset(root_dir=test_dir, transform=data_transforms)

# `DataLoader` takes a Dataset and wraps it in an iterable for easy batching,
# shuffling, and parallel data loading.
BATCH_SIZE = 32
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2)

print(f"✅ Found {len(train_dataset)} images in the training set.")
print(f"✅ Found {len(test_dataset)} images in the test set.")
print(f"✅ Created DataLoader with batch size {BATCH_SIZE}.")

# Let's inspect one batch to see the shape
images, labels = next(iter(train_loader))
print(f"\nShape of one batch of images: {images.shape}") # [Batch Size, Channels, Height, Width]
print(f"Shape of one batch of labels: {labels.shape}")   # [Batch Size]

In [None]:
# --- 6. Training the CNN ---

# a) Initialize the Model, Loss Function, and Optimizer

# Instantiate the CNN model
# `num_classes=1` because we are doing binary classification (defective vs. ok).
model = SimpleCNN(num_classes=1).to(device)

# Define the Loss Function
# `BCEWithLogitsLoss` is perfect for binary classification.
# It combines a Sigmoid layer and the Binary Cross Entropy loss in one single class.
# This combination is more numerically stable than using a plain Sigmoid followed by a BCELoss.
criterion = nn.BCEWithLogitsLoss()

# Define the Optimizer
# `Adam` is a popular and effective optimization algorithm.
# We pass the model's parameters to it, which are the weights and biases it will update.
# `lr=0.001` is the learning rate, which controls the step size during optimization.
optimizer = optim.Adam(model.parameters(), lr=0.001)

print("--- Ready for Training ---")
print(f"Model:      {model.__class__.__name__}")
print(f"Device:     {device.upper()}")
print(f"Loss Fn:    {criterion.__class__.__name__}")
print(f"Optimizer:  {optimizer.__class__.__name__}")

# b) Define the Training and Validation Functions
def train_one_epoch(model, data_loader, criterion, optimizer, device):
    """Performs one full pass over the training data."""
    model.train()  # Set the model to training mode
    total_loss = 0.0
    correct_predictions = 0
    total_samples = 0

    # Use tqdm for a progress bar
    for images, labels in tqdm(data_loader, desc="Training"):
        # Move data to the configured device (GPU or CPU)
        images, labels = images.to(device), labels.to(device).unsqueeze(1)
        
        # 1. Forward pass: compute predicted outputs by passing inputs to the model
        outputs = model(images)
        
        # 2. Calculate the loss
        loss = criterion(outputs, labels)
        
        # 3. Backward pass: compute gradient of the loss with respect to model parameters
        optimizer.zero_grad() # Clear previous gradients
        loss.backward()
        
        # 4. Perform a single optimization step (parameter update)
        optimizer.step()
        
        # --- Update statistics ---
        total_loss += loss.item() * images.size(0)
        
        # Convert outputs to probabilities and then to predicted classes (0 or 1)
        preds = torch.sigmoid(outputs) > 0.5
        correct_predictions += (preds == labels).sum().item()
        total_samples += labels.size(0)
        
    epoch_loss = total_loss / total_samples
    epoch_acc = correct_predictions / total_samples
    return epoch_loss, epoch_acc

def validate_one_epoch(model, data_loader, criterion, device):
    """Evaluates the model on the validation data."""
    model.eval()  # Set the model to evaluation mode
    total_loss = 0.0
    correct_predictions = 0
    total_samples = 0

    with torch.no_grad():  # Disable gradient calculation for efficiency
        for images, labels in tqdm(data_loader, desc="Validating"):
            images, labels = images.to(device), labels.to(device).unsqueeze(1)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            total_loss += loss.item() * images.size(0)
            preds = torch.sigmoid(outputs) > 0.5
            correct_predictions += (preds == labels).sum().item()
            total_samples += labels.size(0)
            
    epoch_loss = total_loss / total_samples
    epoch_acc = correct_predictions / total_samples
    return epoch_loss, epoch_acc

In [None]:
# c) Run the Training Loop

NUM_EPOCHS = 5  # Set the number of epochs for a quick training run

# Store history for plotting later
history = {
    'train_loss': [],
    'train_acc': [],
    'val_loss': [],
    'val_acc': []
}

print(f"🚀 Starting training for {NUM_EPOCHS} epochs...")

for epoch in range(NUM_EPOCHS):
    print(f"\n--- Epoch {epoch + 1}/{NUM_EPOCHS} ---")
    
    # Train for one epoch
    train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device)
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    
    # Validate for one epoch
    val_loss, val_acc = validate_one_epoch(model, test_loader, criterion, device)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    
    print(f"\nEpoch {epoch + 1} Summary:")
    print(f"  Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f}")
    print(f"  Val Loss:   {val_loss:.4f} | Val Acc:   {val_acc:.4f}")

print("\n✅ Training complete!")

## 📊 Part 5: Evaluation and Visualization

After training, it's crucial to evaluate the model's performance visually. Plotting the training and validation loss and accuracy over epochs can give us valuable insights into the training process, helping us spot issues like overfitting.

In [None]:
# --- 7. Plotting Training History ---

# Create a figure with two subplots: one for loss, one for accuracy
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

# --- Plot Loss ---
ax1.plot(history['train_loss'], label='Train Loss', color='blue', marker='o')
ax1.plot(history['val_loss'], label='Validation Loss', color='orange', marker='o')
ax1.set_title('Loss vs. Epochs', fontsize=16)
ax1.set_xlabel('Epochs', fontsize=12)
ax1.set_ylabel('Loss', fontsize=12)
ax1.legend()
ax1.grid(True)

# --- Plot Accuracy ---
ax2.plot(history['train_acc'], label='Train Accuracy', color='blue', marker='o')
ax2.plot(history['val_acc'], label='Validation Accuracy', color='orange', marker='o')
ax2.set_title('Accuracy vs. Epochs', fontsize=16)
ax2.set_xlabel('Epochs', fontsize=12)
ax2.set_ylabel('Accuracy', fontsize=12)
ax2.legend()
ax2.grid(True)

plt.suptitle('Training and Validation Metrics', fontsize=20)
plt.show()

print("\n💡 Analysis of the Plots:")
print("1. Loss Plot: We want to see both training and validation loss decreasing. If the validation loss starts to increase while training loss continues to decrease, it's a sign of overfitting.")
print("2. Accuracy Plot: We want to see both training and validation accuracy increasing. A large gap between the two may also indicate overfitting.")
print("Based on these plots, our model appears to be learning well, with both loss decreasing and accuracy increasing. The gap between training and validation is small, suggesting overfitting is not a major issue yet.")

### Visualizing Predictions

A great way to understand a classification model's performance is to look at the images it gets right and the ones it gets wrong. This can reveal patterns in the model's mistakes and give us ideas for how to improve it.

Let's write a function to fetch a batch of images from the test set, run them through our trained model, and display the images with their true labels and the model's predictions.

In [None]:
# --- 8. Visualizing Model Predictions ---

def visualize_predictions(model, data_loader, device, num_images=16):
    """
    Displays a grid of images from a data loader, with their predicted and true labels.
    """
    model.eval()  # Set the model to evaluation mode
    images, labels = next(iter(data_loader))
    images, labels = images.to(device), labels.to(device)

    with torch.no_grad():
        outputs = model(images)
        preds = torch.sigmoid(outputs) > 0.5

    # Move data back to CPU for plotting
    images = images.cpu()
    preds = preds.cpu().squeeze()
    labels = labels.cpu()

    # Inverse transform for visualization
    # We need to un-normalize the images to display them correctly.
    inv_normalize = T.Normalize(
        mean=[-0.5/0.5, -0.5/0.5, -0.5/0.5],
        std=[1/0.5, 1/0.5, 1/0.5]
    )

    # Class mapping for labels
    class_map = {0: "OK", 1: "Defective"}

    plt.figure(figsize=(15, 15))
    for i in range(num_images):
        ax = plt.subplot(4, 4, i + 1)
        
        # Un-normalize and display the image
        img = inv_normalize(images[i])
        img = img.permute(1, 2, 0) # Change from [C, H, W] to [H, W, C] for matplotlib
        plt.imshow(img)
        
        true_label = class_map[labels[i].item()]
        pred_label = class_map[preds[i].item()]
        
        # Set title color based on correctness
        title_color = 'green' if pred_label == true_label else 'red'
        
        plt.title(f"True: {true_label}\nPred: {pred_label}", color=title_color)
        plt.axis("off")
        
    plt.tight_layout()
    plt.show()

# Visualize predictions on a batch from the test set
visualize_predictions(model, test_loader, device)

## 🌟 Part 6: A Glimpse into Transfer Learning

We've built and trained a pretty effective CNN from scratch! Our model achieved impressive accuracy after just a few epochs. However, in many real-world scenarios, we don't have the massive datasets (or the computational budget) required to train a very deep network from zero.

This is where **Transfer Learning** comes in.

**The Core Idea**: Instead of starting with random weights, we start with a model that has already been trained on a huge, general-purpose dataset (like ImageNet, which has over 14 million images across 20,000 categories). The features learned by this "pre-trained" model—like edges, textures, shapes, and object parts—are often useful for other computer vision tasks.

### The Typical Transfer Learning Workflow:

1.  **Load a Pre-trained Model**: PyTorch's `torchvision.models` library provides easy access to famous, powerful architectures like **ResNet**, **VGG**, **EfficientNet**, and **MobileNet**.

2.  **Freeze the Feature Extractor**: The early convolutional layers of these models are excellent general-purpose feature extractors. We "freeze" their weights, meaning we tell PyTorch not to update them during training. This saves a massive amount of computation and prevents the model from "forgetting" the valuable features it has already learned.

3.  **Replace the Classifier Head**: The final fully connected layers of the pre-trained model are specific to its original task (e.g., classifying 1000 ImageNet classes). We chop off this "head" and replace it with a new, smaller classifier that is tailored to our specific problem (e.g., our binary "defective" vs. "ok" classification).

4.  **Fine-Tune the New Classifier**: We then train the model on our smaller, specific dataset. Since only the weights of our new, small classifier head are being updated, this training process is much faster and requires far less data to achieve high performance.

### Why is this so powerful?

-   **Higher Performance**: Pre-trained models provide a much better starting point, often leading to higher final accuracy than a model trained from scratch.
-   **Less Data Required**: Because the model already understands the basic visual world, it can learn a new task with a much smaller number of examples.
-   **Faster Training**: We are only training a small part of the network, which dramatically reduces training time.

Transfer learning is the standard approach for most computer vision applications today and is a fundamental technique in the toolkit of any deep learning practitioner. While we won't implement it here, it's the logical next step for improving our casting quality inspection system.

---

## 🎉 Congratulations!

You have successfully built, trained, and evaluated your first Convolutional Neural Network. You've learned the theory behind convolutions and pooling, assembled a complete architecture in PyTorch, and applied it to a real-world problem. You also now understand the powerful concept of transfer learning.

You are now well-equipped to tackle a wide range of computer vision challenges!