SYSEN 5888 Spring 2026

Jonathan Lloyd

Homework 2, Question 1


Goal: Building a convolutional neural network (ConvNet) to classify images of fruits and vegetables into their respective classes

Tools: Numpy, PyTorch, Torchvision

Data: Fruits-360 on Kaggle https://www.kaggle.com/moltean/fruits 

Task: Load dataset, scale 100x100 images to 75x75, normalization and data augmentation, define training and testing datasets (85%/15% split), batch each dataset into sizes 1000, shuffle seed 42, define sequential ConvNet

In [3]:
# Check Colab Server details if running outside of Colab Online UI 
'''
import subprocess
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
print(result.stdout if result.returncode == 0 else "No GPU detected (CPU runtime)")
'''

'\nimport subprocess\nresult = subprocess.run([\'nvidia-smi\'], capture_output=True, text=True)\nprint(result.stdout if result.returncode == 0 else "No GPU detected (CPU runtime)")\n'

In [4]:
## IMPORT API KEY - UNCOMMENT APPLICABLE LINES WHEN RUNNING DIFFERENT KERNELS
# Import KAGGLE_API_KEY from .env 
# KAGGLE_API_TOKEN = os.getenv("KAGGLE_API_KEY")
# Define directly - delete key before uploading to GitHub
KAGGLE_API_TOKEN = "KGAT_4be0dffb1cb77ca36a6657e84134eaa8"
# When running in Colab - define in web using Colab Secrets 
# KAGGLE_API_TOKEN = userdata.get("KAGGLE_API_KEY")

In [5]:
# update any packages in Colab server
%pip install --upgrade numpy pandas kagglehub torch torchvision

Collecting kagglehub
  Downloading kagglehub-1.0.0-py3-none-any.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting kagglesdk<1.0,>=0.1.14 (from kagglehub)
  Downloading kagglesdk-0.1.15-py3-none-any.whl.metadata (13 kB)
Downloading kagglehub-1.0.0-py3-none-any.whl (70 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.6/70.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kagglesdk-0.1.15-py3-none-any.whl (160 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m160.4/160.4 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: kagglesdk, kagglehub
  Attempting uninstall: kagglehub
    Found existing installation: kagglehub 0.3.13
    Uninstalling kagglehub-0.3.13:
      Successfully uninstalled kagglehub-0.3.13
Successfully installed kagglehub-1.0.0 kagglesdk-0.1.15


In [7]:
# Import necessary libraries
from pathlib import Path
from dotenv import load_dotenv
import os
import kagglehub   
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import random_split, DataLoader
import torchvision
import torchvision.transforms as transforms

# Import dataset from Kaggle
# Kagglehub reads KAGGLE_API_TOKEN from environment (set in cell above)
os.environ["KAGGLE_API_TOKEN"] = KAGGLE_API_TOKEN
# On Colab: use /content to avoid cache disk limits; locally omit output_dir to use cache
path = kagglehub.dataset_download("moltean/fruits", output_dir="/content/fruits-360")

# Locate Training and Test subfolders (Fruits-360 dataset structure)
path = Path(path)
# Kaggle zip may nest Training/Test inside a subfolder (e.g. fruits-360/Training)
# Search recursively for Training directory
train_path = None
for p in path.rglob("Training"):
    if p.is_dir() and (p.parent / "Test").exists():
        train_path = p
        break
if train_path is not None:
    test_path = train_path.parent / "Test"
else:
    # Fallback: check direct children
    if (path / "Training").exists():
        train_path, test_path = path / "Training", path / "Test"
    else:
        raise FileNotFoundError(
            f"Could not find Training/Test folders. Dataset root: {path}\n"
            f"Contents: {[d.name for d in path.iterdir()] if path.exists() else 'path does not exist'}"
        )

# Load image datasets with placeholder transform (will add scaling/normalization in next steps)
print("Loading datasets")
train_dataset = torchvision.datasets.ImageFolder(str(train_path), transform=transforms.ToTensor())
test_dataset = torchvision.datasets.ImageFolder(str(test_path), transform=transforms.ToTensor())

print(f"Dataset downloaded to: {path}")
print(f"Training samples: {len(train_dataset)} | Classes: {len(train_dataset.classes)}")
print(f"Test samples: {len(test_dataset)}")


Using Colab cache for faster access to the 'fruits' dataset.
Loading datasets
Dataset downloaded to: /kaggle/input/fruits
Training samples: 131030 | Classes: 251
Test samples: 43670


In [None]:
## Image Preprocessing

# 1. Image scaling (100x100 -> 75x75):
image_size = 75

# 2. Image normalization values for RGB to [-1, 1]
normalize_means = [0.5, 0.5, 0.5]
normalize_stds = [0.5, 0.5, 0.5]

# 3. Data augmentation and normalization for training; only normalization for validation/test
train_transform = transforms.Compose([
    transforms.Resize((image_size, image_size)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor(),
    transforms.Normalize(mean=normalize_means, std=normalize_stds)
])

test_transform = transforms.Compose([
    transforms.Resize((image_size, image_size)),
    transforms.ToTensor(),
    transforms.Normalize(mean=normalize_means, std=normalize_stds)
])

# Replace raw datasets with transformed datasets
train_dataset = torchvision.datasets.ImageFolder(str(train_path), transform=train_transform)
test_dataset = torchvision.datasets.ImageFolder(str(test_path), transform=test_transform)

# 4. Define training and validation split (85%/15% from train_dataset)
total_train = len(train_dataset)
val_size = int(0.15 * total_train)
train_size = total_train - val_size

SHUFFLE_SEED = 42
torch.manual_seed(SHUFFLE_SEED)  # set random seed for reproducibility
train_subset, val_subset = random_split(train_dataset, [train_size, val_size], generator=torch.Generator().manual_seed(SHUFFLE_SEED))

# 5. Define dataloaders with batch size 1000 and consistent shuffle with seed
batch_size = 1000

train_loader = DataLoader(train_subset, batch_size=batch_size, shuffle=True, worker_init_fn=lambda worker_id: np.random.seed(SHUFFLE_SEED))
val_loader = DataLoader(val_subset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
print("Loading success")

Loading success


Architecture - MODEL 1: 

Define a Sequential model, wherein the layers are stacked sequentially and each layer has exactly one input tensor and one output tensor. Please build a ConvNet by adding the layers to the Sequential model using the configuration below. For each of the layers, initialize the kernel weights from a Glorot uniform distribution and set the random seed to 99. Additionally, initialize the bias vector as a zero vector. In this architecture, you may use different dropout values [0.1, 0.3, 0.5] and report the impact of dropout values on model performance.
![alt text](<model 1 arch.png>)


In [None]:
# Model 1: per architecture table (2 conv layers, BatchNorm, Dropout [0.1, 0.3, 0.5], Dense 256 -> 251).
# Assumes input size 75x75; Glorot uniform, zero bias, seed 99.
NUM_CLASSES = 251

class Model1(nn.Module):
    # Sequential ConvNet: Conv2D(64)->ReLU->MaxPool, Conv2D(128)->ReLU->BN->Dropout->MaxPool, Flatten->Dense(256)->ReLU->Dense(251)->Softmax.
    def __init__(self, num_classes=251, dropout=0.1, in_channels=3, input_h=75, input_w=75):
        super(Model1, self).__init__()
        # Layer 1–2: Conv2D 64, (3,3), no padding, ReLU; MaxPool2D (2,2)
        self.conv1 = nn.Conv2d(in_channels, 64, kernel_size=3, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(2, 2)
        # Layer 3–6: Conv2D 128, (3,3), no padding, ReLU; BatchNorm; Dropout; MaxPool2D (2,2)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        self.bn2 = nn.BatchNorm2d(128, eps=0.001, momentum=0.01)  # 0.01 in PyTorch = 0.99 in Keras (weight on running stats)
        self.drop2 = nn.Dropout2d(p=dropout)
        self.pool2 = nn.MaxPool2d(2, 2)
        # Layer 7–9: Flatten; Dense 256 ReLU; Dense 251
        self.flatten = nn.Flatten()
        with torch.no_grad():
            dummy = torch.zeros(1, in_channels, input_h, input_w)
            dummy = self.pool2(self.drop2(self.bn2(self.relu2(self.conv2(self.pool1(self.relu1(self.conv1(dummy))))))))
            flat_size = self.flatten(dummy).shape[1]
        self.fc1 = nn.Linear(flat_size, 256)
        self.relu_fc = nn.ReLU()
        self.fc2 = nn.Linear(256, num_classes)
        self.softmax_fc = nn.Softmax()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.relu2(self.conv2(x))
        x = self.bn2(x)
        x = self.drop2(x)
        x = self.pool2(x)
        x = self.flatten(x)
        x = self.relu_fc(self.fc1(x))
        x = self.fc2(x)
        x = self.softmax_fc(x)
        return x

    def _init_weights_biases(self):
        torch.manual_seed(99)
        for m in [self.conv1, self.conv2, self.fc1, self.fc2]:
            if hasattr(m, "weight") and m.weight is not None:
                nn.init.xavier_uniform_(m.weight)
            if hasattr(m, "bias") and m.bias is not None:
                nn.init.zeros_(m.bias)


Architecture - MODEL 2: 

The performance of the CNN model is notably impacted by the number of convolutional layers it employs. In the preceding design, two convolutional layers were integrated. Kindly introduce an additional convolutional layer (as depicted in the updated architecture below) and elaborate on the roles of convolutional layers.
![alt text](<model 2 arch.png>)

In [None]:
# Model 2: per architecture table (3 conv layers 64->128->256, BatchNorm, Dropout 0.3, Dense 512->251).
class Model2(nn.Module):
    # Sequential ConvNet per table: Conv(64)->ReLU->Pool, Conv(128)->ReLU->Pool, Conv(256)->ReLU->BN->Dropout(0.3)->Pool, Flatten->Dense(512)->ReLU->Dense(251)->Softmax.
    def __init__(self, num_classes=251, dropout=0.3, in_channels=3, input_h=75, input_w=75):
        super(Model2, self).__init__()
        # Layers 1–2: Conv2D 64, (3,3), no padding, ReLU; MaxPool2D (2,2)
        self.conv1 = nn.Conv2d(in_channels, 64, kernel_size=3, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(2, 2)
        # Layers 3–4: Conv2D 128, (3,3), no padding, ReLU; MaxPool2D (2,2)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(2, 2)
        # Layers 5–8: Conv2D 256, (3,3), no padding, ReLU; BatchNorm; Dropout 0.3; MaxPool2D (2,2)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=0)
        self.relu3 = nn.ReLU()
        self.bn3 = nn.BatchNorm2d(256, eps=0.001, momentum=0.01)  # 0.01 in PyTorch = 0.99 in Keras (weight on running stats)
        self.drop3 = nn.Dropout2d(p=dropout)
        self.pool3 = nn.MaxPool2d(2, 2)
        # Layers 9–11: Flatten; Dense 512 ReLU; Dense 251
        self.flatten = nn.Flatten()
        with torch.no_grad():
            dummy = torch.zeros(1, in_channels, input_h, input_w)
            dummy = self.pool1(self.relu1(self.conv1(dummy)))
            dummy = self.pool2(self.relu2(self.conv2(dummy)))
            dummy = self.pool3(self.drop3(self.bn3(self.relu3(self.conv3(dummy)))))
            flat_size = self.flatten(dummy).shape[1]
        self.fc1 = nn.Linear(flat_size, 512)
        self.relu_fc = nn.ReLU()
        self.fc2 = nn.Linear(512, num_classes)
        self.softmax_fc = nn.Softmax()

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool2(self.relu2(self.conv2(x)))
        x = self.relu3(self.conv3(x))
        x = self.bn3(x)
        x = self.drop3(x)
        x = self.pool3(x)
        x = self.flatten(x)
        x = self.relu_fc(self.fc1(x))
        x = self.fc2(x)
        x = self.softmax_fc(x)
        return x

    def _init_weights_biases(self):
        torch.manual_seed(99)
        for m in [self.conv1, self.conv2, self.conv3, self.fc1, self.fc2]:
            if hasattr(m, "weight") and m.weight is not None:
                nn.init.xavier_uniform_(m.weight)
            if hasattr(m, "bias") and m.bias is not None:
                nn.init.zeros_(m.bias)


Training: The model is compiled by specifying the optimizer, the loss function and metrics to be recorded at each step of the training process. The ADAM optimizer should minimize the categorical cross entropy. The ConvNet model can be trained and evaluated with the previously created data generators. The training step size can be calculated by dividing the number of images in the generator with the batch size for training and testing data, respectively.

In [None]:
# Define data storage 
results_dataframe = pd.DataFrame(columns=[
    'Model Name', 'Dropout', 'Epochs', 'Training Accuracy (%)', 
    'Validation Accuracy (%)', 'Test Accuracy (%)'
])

# Maximum epochs to run training
# Start with 50, go down to 20 if too time consuming
MAX_EPOCHS = 50

## Helper Functions

# MODEL TRAIN
def train(model, optimizer, train_loader, val_loader, max_epochs):
    # Train model through max epochs
    model.train()
    training_loss_curve = []
    validation_loss_curve = []

    # Run training 
    for i in range(max_epochs):
        # Run through train loader, validation loader

        # Update loss curves
        training_loss_curve.append()
        validation_loss_curve.append()
        

    return training_loss_curve, validation_loss_curve

# MODEL EVAL 
def evaluate(model, train_loader, val_loader, test_loader):
    # Compute 3 accuracy percentages 
    model.eval()
    


    return training_acc, validation_acc, test_acc 

# PLOT LOSS CURVES 
def plot_loss(curve, dropout, dataset):
    # Plot and save loss curve
    # dataset = [Training, Validation]
    output_dir = "Plot JPGs"
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    plt.figure()
    plt.plot(curve)
    plt.xlabel("Epoch")
    if dataset == 'Training':
        plt.ylabel("Training Loss")
    elif dataset == 'Validation':
        plt.ylable("Validation Loss")

    plt.title(f"Categorical Cross Entropy Loss Curve for {dataset} Dataset, Dropout {dropout}") 
    filename = f"{output_dir}/LossCurve_{dataset}_Dataset_{dropout}_Dropout.jpg"
        
    plt.savefig(filename)
    plt.close()

In [None]:
# MAIN EXPERIMENT HELPER
def run_experiment(model_arch, train_loader, val_loader, test_loader, max_epochs=50, dropout=None):
    # Run full experiment: instantiate model, train, evaluate accuracies, plot, return results to dataframe
    if model_arch == 1:
        model = Model1(dropout=dropout)
    elif model_arch == 2:
        model = Model2()
    else:
        raise ValueError(f"ERROR: Model selected {model_arch} does not match possible options (1, 2).")
        
    # Define Optimizer 
    # Adam, Categorical Cross Entropy
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters())

    # Train
    training_loss_curve, validation_loss_curve = train(model, optimizer, train_loader, val_loader, max_epochs)

    # Evaluate 
    training_acc, validation_acc, test_acc = evaluate(model, train_loader, val_loader, test_loader)

    # Save plots 
    plot_loss(training_loss_curve, dropout, "Training")
    plot_loss(validation_loss_curve, dropout, "Validation")

    # return results
    row = {
        'Model Name': f"Model {model_arch}", 
        'Dropout': dropout, 
        'Epochs': max_epochs, 
        'Training Accuracy (%)': training_acc,
        'Validation Accuracy (%)': validation_acc, 
        'Test Accuracy (%)': test_acc
    }

    return row 
    

In [None]:
# Train and Test Model 1a, 1b, 1c
# For loop to pass different dropouts
DROPOUT_SELECTOR = [0.1, 0.3, 0.5]
for d in DROPOUT_SELECTOR:
    print(f"RUNNING MODEL 1, DROPOUT {d}")
    model_run = run_experiment(1, train_loader, val_loader, test_loader, max_epochs=MAX_EPOCHS, dropout=d)
    results_dataframe = pd.concat([results_dataframe, pd.Dataframe([model_run])], ignore_index=True)
    print(f"COMPLETED MODEL 1, DROPOUT {d}")

# Train and Test Model 2, no dropout arg
print(f"RUNNING MODEL 2")
model_run_2 = run_experiment(2, train_loader, val_loader, test_loader, max_epochs=MAX_EPOCHS)
results_dataframe = pd.concat([results_dataframe, pd.Dataframe([model_run_2])], ignore_index=True)
print(f"COMPLETED MODEL 2")

Deliverables: Please report the training and validation accuracy after the training process is carried out for 50 epochs (you can train for 20 epochs if the training is time consuming), in addition to the achieved accuracy levels on the test dataset. Also, plot the loss curves for both training and validation datasets. Discuss the functions of dropout values and the number of convolutional layers in relation to the CNN model performance. Please make sure to submit your working code files along with the final results and the plots.

In [None]:
# Print results dataframe
print(results_dataframe)

In [None]:
# Plot all loss curves together

Discussion:



Bonus (+1): A skip connection in a neural network is a connection that skips one or more layers and connects to a later layer. Residual Networks (ResNets) have popularized the use of skip connections to address the vanishing gradient problem, and hence enabling the training of deeper networks. Your task for this bonus part is to integrate such a skip connection, any types of skip connections are acceptable. For instance, linking the output of the first layer convolutional directly to the input of the last convolutional layer in your model architecture. Based on your results, analyze and discuss any improvements or effects this change has on the model's performance