<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


<h2>Lab: Image Classification with CNN


Estimated time needed: **60** minutes


## Overview


In this lab, you will train a deep neural network for  image classification using <a href="https://cs231n.github.io/transfer-learning/">transfer learning</a>. Experiment with different hyperparameters.


## Objectives


In this lab, you will train a state-of-the-art image classifier. In practice, very few people train an entire Convolutional Neural Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size and training from scratch is resource-intensive.
Instead, it is common to use a ConvNet that has already been pretrained on a very large dataset (such as ResNet), and then fine-tune it on your own dataset. We will use the pretrained Convolutional Network as a feature extractor, training only the output layer.
In general, 100–200 images will give you a good starting point, and it only takes about half an hour. Usually, the more images you add, the better your results, but it takes longer, and the rate of improvement will decrease.


# Table of content


This notebook is organized into the following sections:
-   [Install and Import Libraries](#Install-and-Import-Libraries)
-   [Image Processing and Load Data for Dataset preparation](#Image-Processing-and-Load-Data-for-Dataset-preparation)
-   [Load Model and Train](#Load-Model-and-Train)
-   [Practice Exercise](#Practice-Exercise)


* * *


## Install and Import Libraries


**It may take time for installation so please be patient.**


In [ ]:
# Core libraries
!pip install numpy pandas matplotlib tqdm pillow --quiet


In [ ]:
# Widgets for progress bars and interactivity
!pip install ipywidgets --quiet
# If you use JupyterLab, also enable the widget extension
#!jupyter nbextension enable --py widgetsnbextension

In [ ]:
# PyTorch (for CPU)
!pip install torch torchvision --quiet

**Import Libraries and Define Auxiliary Functions**


Libraries for OS and Cloud


In [ ]:
import os
import uuid
import shutil
import json
import copy
from datetime import datetime
import zipfile
import io
import requests
import random

Libraries for Data Processing and Visualization


In [ ]:
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
from matplotlib.pyplot import imshow
from tqdm import tqdm
from ipywidgets import IntProgress
import time 

Deep Learning Libraries


In [ ]:
import torch
import torchvision.models as models
from torch.utils.data import Dataset, DataLoader,random_split
from torch.optim import lr_scheduler
from torchvision import transforms
import torch.nn as nn
torch.manual_seed(0)
from torchvision.datasets import ImageFolder
from torchvision import transforms

**Plot train cost and validation accuracy:**

The `plot_stuff` function is used to visualize the model's training performance by plotting training loss and validation accuracy on the same graph using two different y-axes. 

The training loss, shown in red on the left y-axis, indicates how well the model is minimizing error during training. 

The validation accuracy, shown in blue on the right y-axis, reflects how well the model is performing on unseen data. 

By plotting both metrics together, this function helps monitor the learning process, identify trends, and detect issues such as overfitting—where the model performs well on training data but poorly on validation data. 
This kind of visualization is valuable for understanding the effectiveness of your training strategy and making informed adjustments to model architecture or hyperparameters.


In [ ]:
def plot_stuff(COST, ACC):
    """
    Plots training cost (loss) and validation accuracy on the same figure using two y-axes.
    
    Parameters:
    COST (list or array): Total training loss per iteration (or epoch)
    ACC (list or array): Validation accuracy per iteration (or epoch)
    """
    
    # Create a new figure and a primary axis (ax1)
    fig, ax1 = plt.subplots()
    
    # Plot training loss on the primary y-axis (left)
    color = 'tab:red'
    ax1.plot(COST, color=color)
    ax1.set_xlabel('Iteration', color=color)            # Label for x-axis
    ax1.set_ylabel('Total Loss', color=color)           # Label for y-axis (left)
    ax1.tick_params(axis='y', labelcolor=color)         # Set y-axis tick color

    # Create a secondary y-axis (ax2) sharing the same x-axis
    ax2 = ax1.twinx()
    
    # Plot validation accuracy on the secondary y-axis (right)
    color = 'tab:blue'
    ax2.set_ylabel('Accuracy', color=color)             # Label for y-axis (right)
    ax2.plot(ACC, color=color)
    ax2.tick_params(axis='y', labelcolor=color)

    # Adjust layout to prevent y-label clipping
    fig.tight_layout()
    
    # Display the combined plot
    plt.show()


**Plot the transformed image:**

When using `transforms.Normalize` during preprocessing (e.g., for pretrained CNNs like ResNet), images are normalized with mean and standard deviation values. But this makes the image look odd if displayed directly.

This function reverses that normalization so the image appears as a proper RGB image for visualization, especially useful for:

Checking your image pipeline.

Displaying samples from datasets.

Debugging predictions or model outputs.


In [ ]:
def imshow_(inp, title=None):
    """
    Displays a tensor image after reversing normalization.
    
    Parameters:
    - inp (Tensor): Image tensor of shape [C, H, W], usually normalized.
    - title (str, optional): Title for the image display.
    """
    # Convert from [C, H, W] to [H, W, C] and to NumPy array
    inp = inp.permute(1, 2, 0).numpy()
    print("Image shape:", inp.shape)

    # Undo normalization (ImageNet mean and std)
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean

    # Clip values to [0, 1] range for display
    inp = np.clip(inp, 0, 1)

    # Display image
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001)  # Short pause for GUI update
    plt.show()


**Define our device as the first visible cuda device if we have CUDA available:**


In [ ]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("the device type is", device)


## Image Processing and Load Data for Dataset preparation


In this section, we will preprocess our dataset by resizing the images, converting them to tensors, and normalizing the image channels. These are the standard preprocessing steps for image data. In addition, we will apply data augmentation to the training dataset to improve generalization. The preprocessing steps for the test (or validation) dataset are the same, except that data augmentation is not applied, as we want to evaluate the model on unmodified images.


**Mean and standard deviation values (used for normalization; based on ImageNet)**

```python
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]

# Define composed transform for training dataset
composed = transforms.Compose([
    transforms.Resize((224, 224)),              # Resize to match model input
    transforms.RandomHorizontalFlip(),          # Randomly flip images horizontally
    transforms.RandomRotation(degrees=5),       # Small random rotation
    transforms.ToTensor(),                      # Convert image to tensor [C, H, W]
    transforms.Normalize(mean, std)             # Normalize using predefined mean and std
])
```


**Load Data for Dataset preparation**


Download the data:


In [ ]:
# URL of the ZIP file
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/QkkP2jRxVvxHKMPg8lnkwQ/transfer-learning-with-cnn-15-2025-06-24-t-12-35-01-829-z.zip"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Open the zip file from the downloaded content
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        zip_ref.extractall("hotdog_nothotdog")  # Extract to a target folder
    print("Download and extraction complete.")
else:
    print("Failed to download file:", response.status_code)


Splits image data into training and validation sets


Set Paths


In [ ]:
#sets path
source_dir = "hotdog_nothotdog/transfer-learning-with-cnn-15-2025-06-24-t-12-35-01-829-z"  # folder containing images and annotation file
annotations_file = os.path.join(source_dir, "_annotations.json")  # update name if needed

Load annotations


In [ ]:
# Load annotations
with open(annotations_file, "r") as f:
    annotations = json.load(f)

Set Parameters


In [ ]:
# Parameters
train_ratio = 0.9
output_dir = "dataset"  # Final output root directory


Prepare Label to Image Mapping


In [ ]:
# Prepare label -> image list
label_to_images = {}

for filename, entry in annotations["annotations"].items():
    label = entry[0]["label"]
    label_to_images.setdefault(label, []).append(filename)

Shuffle and Split into Train/Validation

**Splitting Dataset: 90% Training, 10% Validation**
We define `train_ratio = 0.9` and apply it to split each class. The first 90% of shuffled images are used for training, and the remaining 10% for validation.


In [ ]:
# Shuffle and split each class into training and validation sets
for label, image_list in label_to_images.items():
    random.shuffle(image_list)  # Shuffle the list of images to randomize the split
    
    # Calculate the number of training images (e.g., 90% of total)
    train_cutoff = int(len(image_list) * train_ratio)
    
    # Split the image list into training and validation sets
    train_images = image_list[:train_cutoff]
    val_images = image_list[train_cutoff:]

    # Loop over both splits: 'train' and 'val'
    for split, split_images in zip(["train", "val"], [train_images, val_images]):
        
        # Create the output directory for the current split and label
        # Example: dataset/train/hotdog or dataset/val/nothotdog
        out_path = os.path.join(output_dir, split, label)
        os.makedirs(out_path, exist_ok=True)  # Create the directory if it doesn't exist

        # Copy each image from the source directory to the appropriate split folder
        for img_name in split_images:
            src = os.path.join(source_dir, img_name)  # Full path to the source image
            dst = os.path.join(out_path, img_name)    # Destination path
            shutil.copy2(src, dst)  # Copy the image (preserves metadata)

# Print completion message once all images are copied
print("Train/Val split complete.")


Apply transformation


In [ ]:
# Define a series of transformations to apply to each image
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize all images to 224x224 pixels (standard for pretrained models)
    transforms.ToTensor(),          # Convert PIL Image to PyTorch tensor with shape [C, H, W] and values in [0, 1]
    transforms.Normalize(           # Normalize the image using ImageNet's mean and std for each channel (RGB)
        [0.485, 0.456, 0.406],      # Mean for Red, Green, Blue
        [0.229, 0.224, 0.225]       # Standard deviation for Red, Green, Blue
    )
])

# Load training dataset from folder structure and apply the defined transformations
train_dataset = ImageFolder("dataset/train", transform=transform)

# Load validation dataset from folder structure and apply the same transformations
# (Note: No augmentation is applied here — only resizing, tensor conversion, and normalization)
val_dataset = ImageFolder("dataset/val", transform=transform)

We can plot some of our dataset from Validation set:


In [ ]:
i = 0
for x, y in val_dataset:                     # Loop through validation dataset
    imshow_(x, f"y = {y}")               # Display the image with its label
    i += 1                               # Increment counter
    if i == 3:                           # Stop after showing 3 images
        break

### Hyperparameters


Experiment with different hyperparameters:


<b>Epoch</b> indicates the number of passes of the entire training dataset, here we will set the number of epochs to 10:


In [ ]:
n_epochs=10

<b>Batch size</b> is the number of training samples utilized in one iteration. If the batch size is equal to the total number of samples in the training set, then every epoch has one iteration. In Stochastic Gradient Descent, the batch size is set to one. A batch size of 32--512 data points seems like a good value, for more information check out the following <a href="https://arxiv.org/abs/1609.04836?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-cvstudio-2021-01-01">link</a>.


In [ ]:
batch_size=32

 <b>Learning rate</b> is used in the training of neural networks. Learning rate is a hyperparameter with a small positive value, often in the range between 0.0 and 1.0.


In [ ]:
lr=0.000001

<b>Momentum</b> is an optimization technique used with gradient descent to accelerate convergence (when a models training process stabilizes and stops making significant improvements), reduce oscillations in steep valleys, and help escape local minimas.


In [ ]:
momentum=0.9

If you set to <code>lr_scheduler=True</code>  for every epoch use a learning rate scheduler changes the range of the learning rate from a maximum or minimum value. The learning rate usually decays over time.


In [ ]:
lr_scheduler=True
base_lr=0.001
max_lr=0.01

# Load Model and Train


The `train_model()` function will train the model:

Steps Performed by `train_model()`

- Initialize Tracking Variables
  
         Set up lists to store training loss and validation accuracy per epoch.
         Store initial model weights as the best model so far.  
  
- Loop Over Epochs

         For each epoch (iteration over the full training dataset):

**Training Phase:**
- Loop Over Batches in train_loader

         Move input data (x) and labels (y) to the appropriate device (CPU or GPU).
         Set model to training mode.
         Perform a forward pass through the model.
         Calculate the loss using the criterion.
         Call backward() to compute gradients.
         Call optimizer.step() to update weights.
         Reset gradients with optimizer.zero_grad().

- Calculate Average Training Loss for the Epoch

         Append mean of batch losses to the loss_list.

- Update Learning Rate (if Scheduler is Provided)

         Call scheduler.step() to adjust learning rate.

**Validation Phase:**
- Evaluate Model on Validation Set

         Set model to evaluation mode.
         Disable gradient computation using torch.no_grad().
         Predict labels and compare to ground truth.
         Accumulate total correct predictions to compute accuracy.

- Track and Save the Best Model

         If current validation accuracy is better than the previous best, save model weights.

- Print Epoch Metrics (Optional)

         Print current learning rate, validation loss, and accuracy if print_ is True.

- Load Best Model Weights

         After all epochs, load the best-performing model weights (based on validation accuracy).

- Return Results

         Return the list of validation accuracies, training losses, and the trained model.


In [ ]:
def train_model(model, train_loader, validation_loader, criterion, optimizer, n_epochs, print_=True):
    loss_list = []        # Store average training loss per epoch
    accuracy_list = []    # Store validation accuracy per epoch
    correct = 0

    n_test = len(val_dataset)  # Total number of validation samples
    accuracy_best = 0      # Track best validation accuracy
    best_model_wts = copy.deepcopy(model.state_dict())  # Backup best model weights

    print("The first epoch should take several minutes")

    for epoch in tqdm(range(n_epochs)):  # Loop through each epoch
        loss_sublist = []  # Store individual batch losses for this epoch

        # Training phase
        for x, y in train_loader:
            x, y = x.to(device), y.to(device)
            model.train()  # Set model to training mode

            z = model(x)   # Forward pass
            loss = criterion(z, y)  # Compute loss
            loss_sublist.append(loss.item())

            loss.backward()       # Backpropagation
            optimizer.step()      # Update weights
            optimizer.zero_grad() # Reset gradients

        print(f"Epoch {epoch + 1} done")

        # Adjust learning rate if scheduler is defined
        scheduler.step()

        # Store average training loss for this epoch
        loss_list.append(np.mean(loss_sublist))

        # Validation phase
        correct = 0
        model.eval()  # Set model to evaluation mode
        with torch.no_grad():
            for x_test, y_test in validation_loader:
                x_test, y_test = x_test.to(device), y_test.to(device)
                z = model(x_test)
                _, yhat = torch.max(z.data, 1)
                correct += (yhat == y_test).sum().item()

        accuracy = correct / n_test
        accuracy_list.append(accuracy)

        # Save best model
        if accuracy > accuracy_best:
            accuracy_best = accuracy
            best_model_wts = copy.deepcopy(model.state_dict())

        # Print training progress
        if print_:
            print("Learning rate:", optimizer.param_groups[0]['lr'])
            print(f"Validation loss (epoch {epoch + 1}): {np.mean(loss_sublist):.4f}")
            print(f"Validation accuracy (epoch {epoch + 1}): {accuracy:.4f}")

    # Load best model weights before returning
    model.load_state_dict(best_model_wts)
    return accuracy_list, loss_list, model

 Load the pre-trained model resnet18. Set the parameter pretrained to true.


In [ ]:
model = models.resnet18(pretrained=True)

We will only train the last layer of the network set the parameter <code>requires_grad</code> to <code>False</code>, the network is a fixed feature extractor.


In [ ]:
for param in model.parameters():
        param.requires_grad = False
    

Number of classes


In [ ]:
n_classes = len(train_dataset.classes)
print(n_classes)


Replace the output layer model.fc of the neural network with a nn.Linear object, to classify <code>n_classes</code> different classes. For the parameters in_features  remember the last hidden layer has 512 neurons.


In [ ]:
model.fc = nn.Linear(512, n_classes)

Set device type


In [ ]:
model.to(device)

Cross-entropy loss, or log loss, measures the performance of a classification model combines LogSoftmax in one object class. It is useful when training a classification problem with C classes.


In [ ]:
criterion = nn.CrossEntropyLoss()

Create a training loader and validation loader object.


In [ ]:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset , batch_size=batch_size,shuffle=True)
validation_loader= torch.utils.data.DataLoader(dataset=val_dataset , batch_size=1)

Use the optim package to define an Optimizer that will update the weights of the model for us. 


In [ ]:
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

We use <a href='https://arxiv.org/pdf/1506.01186.pdf'>Cyclical Learning Rates</a>


In [ ]:
if lr_scheduler:
    scheduler = torch.optim.lr_scheduler.CyclicLR(
        optimizer,
        base_lr=0.001,        # Minimum learning rate
        max_lr=0.01,          # Maximum learning rate
        step_size_up=5,       # Steps to increase LR from base_lr to max_lr
        mode="triangular2"    # Learning rate follows triangular cycle and halves the max_lr each cycle
    )


Now we are going to train model,for the given images this take 25 minutes, depending on your dataset


In [ ]:
# Start time tracking
start_datetime = datetime.now()
start_time = time.time()

# Train the model
accuracy_list, loss_list, model = train_model(
    model, train_loader, validation_loader, criterion, optimizer, n_epochs=n_epochs
)

# End time tracking
end_datetime = datetime.now()
elapsed_time = time.time() - start_time

# Print results
print("Training completed.")
print(f"Start Time     : {start_datetime.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"End Time       : {end_datetime.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Elapsed Time   : {elapsed_time:.2f} seconds")


Save the model to model.pt


In [ ]:
# Save the model to model.pt
torch.save(model.state_dict(), 'model.pt')

Plot train cost and validation accuracy,  you can improve results by getting more data.


In [ ]:
plot_stuff(loss_list,accuracy_list)

# Practice Exercise 

### Test Our Model with an Uploaded Image


Upload your image, and see if it will be correctly classified.
<p><b>Instructions on How to Upload an Image:</b></p>
Use the upload button and upload an image from your local machine:
<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-CV0101EN-SkillsNetwork/images/instruction.png" width="300"  />
</center>


The image will now be in the directory in which you are working in. To read the image in a new cell, use the <code>cv2.imread</code> and read its name. For example, I uploaded <code>anothercar.jpg</code> into my current working directory - <code>cv2.imread("anothercar.jpg")</code>.

<center>
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-CV0101EN-SkillsNetwork/images/instruction2.png" width="300"  />
</center>


Else use the below images to test.


In [ ]:
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/eTwtaXXHQQkxDlkFWHqRNw/test.jpg"
!wget "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/xGq5q1hvl7HQLhmVGAOfzQ/test1.jpg"


Define the class name and Load the model


In [ ]:
# Define class names (as per training)
class_names = ['hotdog', 'nothotdog']
# Create the same model architecture as during training
model = models.resnet18(pretrained=False)
model.fc = torch.nn.Linear(model.fc.in_features, 2)  # 2 classes: hotdog / nothotdog

# Load trained weights
model.load_state_dict(torch.load("model.pt", map_location=torch.device('cpu')))
model.eval()  # Set to evaluation mode


Define Image Transformation


In [ ]:
#Define image transformations (must match training)
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize image
    transforms.ToTensor(),  # Convert to tensor
    transforms.Normalize([0.485, 0.456, 0.406],  # Normalize (same as ImageNet/pretrained)
                         [0.229, 0.224, 0.225])
])


Load and Preprocess the Image

Replace your_uploaded_file below with the name of your image as seen in your directory. In case you are using the downloaded images given in the notebook then use as `test.jpg` or `test1.jpg`


In [ ]:
image_path = "your_uploaded_file.jpg"  # Replace with your image path

# Open and convert to RGB
image = Image.open(image_path).convert("RGB")

# Apply transformations
input_tensor = transform(image).unsqueeze(0)  # Add batch dimension


Make Prediction and show result


In [ ]:
with torch.no_grad():
    outputs = model(input_tensor)
    predicted_class = torch.argmax(outputs, 1).item()
#Display result
print(f"The image was classified as: {class_names[predicted_class]}")
# Display the image with predicted label
plt.imshow(image)  # Original PIL image
plt.title(f"Predicted: {class_names[predicted_class]}")
plt.axis("off")
plt.show()

### Congratulations! You've completed the image classification lab using transfer learning. 
You successfully leveraged a pretrained deep neural network to build an effective image classifier, experimented with hyperparameters, and gained practical experience applying transfer learning—a widely used approach in modern computer vision.


## Authors


Joseph Santarcangelo

[Sathya Priya](https://www.linkedin.com/in/sathya-priya-06120a17a/) 



<!--## Change Log-->


<!--| Date (YYYY-MM-DD) | Version | Changed By | Change Description      |
| ----------------- | ------- | ---------- | ----------------------- |
| 2025-06-25        | 0.4    | Sathya Priya| Created and Converted the lab to JupyterCurrent notebook |
| 2021-05-25        | 0.3     | Yasmine    | Modifies Multiple Areas |
| 2021-05-25        | 0.3     | Kathy      | Modified Multple Areas. |
| 2021-03-08        | 0.2     | Joseph     | Modified Multiple Areas |
| 2021-02-01        | 0.1     | Joseph     | Modified Multiple Areas |-->


<h3 align="center"> &#169; IBM Corporation. All rights reserved. <h3/>
