## Fruit Freshness Classification Project

### Project Overview

The goal of this project is to build a **multi-class image classification system** capable of distinguishing between **fresh and rotten fruit** across multiple categories. This problem mirrors real-world quality control and food inspection use cases, where automated visual systems are used to detect spoilage and ensure product quality.

The model is trained to classify images into **six categories**:
- Fresh apples  
- Fresh oranges  
- Fresh bananas  
- Rotten apples  
- Rotten oranges  
- Rotten bananas  

Due to the relatively limited size of the dataset, this project emphasizes the use of **transfer learning**, **data augmentation**, and **fine-tuning** to achieve strong generalization performance.

### Technical Approach

To solve this problem, I leveraged a **pretrained convolutional neural network** as a feature extractor and added a custom classification head suited for the six target classes. The training pipeline includes:

- Transfer learning using a pretrained image backbone
- Data augmentation to improve robustness and reduce overfitting
- Fine-tuning with a reduced learning rate once the classifier converges
- Categorical cross-entropy loss for multi-class classification
- Validation-based evaluation to monitor generalization performance

The model is considered successful once it reaches strong validation accuracy while maintaining stable loss behavior, demonstrating its ability to generalize beyond the training data.

---

## Dataset

The dataset is organized under the `data/fruits/` directory and follows a standard image classification folder structure. Images are grouped by class, allowing labels to be inferred directly from directory names during training.

This structure enables efficient loading using a custom PyTorch `Dataset` and supports scalable experimentation with different model architectures and preprocessing pipelines.

---

## Implementation

Below, I define the training environment, data loaders, and model architecture using **PyTorch** and **Torchvision**. The training loop tracks both loss and accuracy across training and validation sets to ensure the model learns meaningful visual representations of fruit freshness.

In [1]:
import torch
import torch.nn as nn
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms.v2 as transforms
import torchvision.io as tv_io

import glob
from PIL import Image

import utils

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.cuda.is_available()

True

## Dataset

This project focuses on building a multi-class image classification system to distinguish between **fresh and rotten fruits** using computer vision and deep learning.

The dataset consists of six classes:
- Fresh apples
- Fresh bananas
- Fresh oranges
- Rotten apples
- Rotten bananas
- Rotten oranges

Images are organized in a directory-based structure (`data/fruits/`), enabling label inference directly from folder names. This structure supports scalable dataset loading and aligns with standard computer vision workflows.

Given the six target categories, the model architecture is designed with a **six-unit output layer**, optimized using **categorical cross-entropy loss** to handle multi-class classification.

<img src="./images/fruits.png" style="width: 600px;">

## Transfer Learning with ImageNet

To improve performance on a limited dataset, this project leverages **transfer learning** using a convolutional neural network pretrained on **ImageNet**.

A VGG16 architecture is used as the base feature extractor. Pretrained weights provide robust low- and mid-level visual features (edges, textures, shapes), allowing the model to generalize effectively despite limited labeled data.

Since the dataset consists of RGB images, the model operates on three-channel inputs. The pretrained backbone is later extended with custom classification layers tailored to the six fruit categories.

We encourage you to start with a model pretrained on ImageNet. Load the model with the correct weights. Because these pictures are in color, there will be three channels for red, green, and blue. We've filled in the input shape for you. If you need a reference for setting up the pretrained model, please take a look at [notebook 05b](05b_presidential_doggy_door.ipynb) where we implemented transfer learning.

In [2]:
from torchvision.models import vgg16
from torchvision.models import VGG16_Weights

weights = VGG16_Weights.DEFAULT
vgg_model = vgg16(weights=weights)

## Freeze Base Model

To preserve the general visual features learned from ImageNet, the pretrained VGG16 backbone is **frozen during initial training**. This prevents large gradient updates from overwriting useful low-level representations such as edges, textures, and shapes.

Freezing the base model allows training to focus on the newly added classification layers, which adapt the pretrained features to the fruit freshness task.

In [3]:
# Freeze base model
vgg_model.requires_grad_(False)
next(iter(vgg_model.parameters())).requires_grad

False

## Custom Classification Head

The pretrained VGG16 model is extended with a **custom classification head** tailored to the fruit freshness dataset.

Early layers of VGG16 capture general visual patterns, while deeper layers become increasingly task-specific. To balance generalization and task adaptation, only a subset of the original classifier layers is retained.

Additional fully connected layers are appended to:
- Reduce feature dimensionality
- Introduce non-linearity
- Produce class logits for six fruit categories

In [4]:
vgg_model.classifier[0:3]

Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
)

In [5]:
N_CLASSES = 6

my_model = nn.Sequential(
    vgg_model.features,
    vgg_model.avgpool,
    nn.Flatten(),
    vgg_model.classifier[0:3],
    nn.Linear(4096, 500),
    nn.ReLU(),
    nn.Linear(500, N_CLASSES)
)
my_model

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

## Model Configuration and Optimization

The model is configured for **multi-class classification** using a categorical loss function appropriate for six mutually exclusive classes.

The Adam optimizer is selected for its adaptive learning rate and strong performance on deep convolutional networks. All model parameters are moved to the selected compute device prior to training.

In [6]:
loss_function = nn.CrossEntropyLoss()
optimizer = Adam(my_model.parameters())
my_model = torch.compile(my_model.to(device))

## Data Transforms and Augmentation

To preprocess the input images, we will use the transforms included with the VGG16 weights.

In [7]:
pre_trans = weights.transforms()

Because the dataset is relatively small, I apply light, controlled data augmentation
during training to improve generalization while preserving the semantic structure
of the images. Augmentations are intentionally kept subtle to avoid introducing
unrealistic artifacts that could harm classification performance.

In [8]:
IMG_WIDTH, IMG_HEIGHT = (224, 224)

random_trans = transforms.Compose([
    transforms.RandomResizedCrop((IMG_WIDTH, IMG_HEIGHT), scale=(0.9, 1.0)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=10)
])

## Load Dataset

Rather than relying on a prebuilt dataset loader, I implemented a custom `Dataset` class to maintain full control over image preprocessing, labeling logic, and device placement. Images are read directly from disk, converted to RGB, and mapped to integer class labels based on folder names, enabling a clean and extensible data pipeline.

In [9]:
DATA_LABELS = ["freshapples", "freshbanana", "freshoranges", "rottenapples", "rottenbanana", "rottenoranges"] 
    
class MyDataset(Dataset):
    def __init__(self, data_dir):
        self.imgs = []
        self.labels = []
        
        for l_idx, label in enumerate(DATA_LABELS):
            data_paths = glob.glob(data_dir + label + '/*.png', recursive=True)
            for path in data_paths:
                img = tv_io.read_image(path, tv_io.ImageReadMode.RGB)
                self.imgs.append(pre_trans(img).to(device))
                self.labels.append(torch.tensor(l_idx).to(device))


    def __getitem__(self, idx):
        img = self.imgs[idx]
        label = self.labels[idx]
        return img, label

    def __len__(self):
        return len(self.imgs)

In [10]:
n = 32

train_path = "data/fruits/train/"
train_data = MyDataset(train_path)
train_loader = DataLoader(train_data, batch_size=n, shuffle=True)
train_N = len(train_loader.dataset)

valid_path = "data/fruits/valid/"
valid_data = MyDataset(valid_path)
valid_loader = DataLoader(valid_data, batch_size=n, shuffle=False)
valid_N = len(valid_loader.dataset)

## Model Training & Evaluation

I trained the model using a custom training loop that separates training and validation logic for clarity and reuse. To keep the notebook focused on experimentation, the core `train` and `validate` routines are implemented in a separate utility module. (utils.py)

Training was performed for a fixed number of epochs while monitoring both training loss and validation accuracy to ensure stable convergence and avoid overfitting.


In [11]:
epochs = 10

for epoch in range(epochs):
    print('Epoch: {}'.format(epoch))
    utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)
    utils.validate(my_model, valid_loader, valid_N, loss_function)

Epoch: 0
Train - Loss: 13.2651 Accuracy: 0.8748
Valid - Loss: 2.8015 Accuracy: 0.9453
Epoch: 1
Train - Loss: 4.9151 Accuracy: 0.9569
Valid - Loss: 1.3459 Accuracy: 0.9666
Epoch: 2
Train - Loss: 2.3922 Accuracy: 0.9788
Valid - Loss: 2.0203 Accuracy: 0.9605
Epoch: 3
Train - Loss: 2.5682 Accuracy: 0.9755
Valid - Loss: 2.0580 Accuracy: 0.9514
Epoch: 4
Train - Loss: 2.9935 Accuracy: 0.9695
Valid - Loss: 1.0286 Accuracy: 0.9605
Epoch: 5
Train - Loss: 1.9643 Accuracy: 0.9839
Valid - Loss: 1.4339 Accuracy: 0.9635
Epoch: 6
Train - Loss: 3.2611 Accuracy: 0.9721
Valid - Loss: 1.3228 Accuracy: 0.9726
Epoch: 7
Train - Loss: 1.8676 Accuracy: 0.9839
Valid - Loss: 1.4661 Accuracy: 0.9574
Epoch: 8
Train - Loss: 2.3329 Accuracy: 0.9738
Valid - Loss: 1.0605 Accuracy: 0.9666
Epoch: 9
Train - Loss: 1.5466 Accuracy: 0.9873
Valid - Loss: 1.3923 Accuracy: 0.9605


## Fine-Tuning the Pretrained Backbone

After achieving strong validation performance with a frozen ImageNet backbone, I performed a brief fine-tuning step by unfreezing the pretrained layers and continuing training with a very low learning rate.

This allows the deeper convolutional features to adapt slightly to domain-specific visual cues (fresh vs. rotten fruit) while minimizing the risk of catastrophic forgetting.

In [12]:
# Unfreeze the base model
vgg_model.requires_grad_(True)
optimizer = Adam(my_model.parameters(), lr=.0001)

In [13]:
epochs = 1

for epoch in range(epochs):
    print('Epoch: {}'.format(epoch))
    utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)
    utils.validate(my_model, valid_loader, valid_N, loss_function)

Epoch: 0
Train - Loss: 1.2777 Accuracy: 0.9856
Valid - Loss: 1.3674 Accuracy: 0.9574


## Project Conclusion

In this project, I developed an end-to-end image classification system to distinguish between **fresh and rotten fruit across six categories** using transfer learning with a pretrained VGG16 backbone.

Starting from raw image data, I designed a custom PyTorch dataset pipeline, applied task-appropriate data augmentation, and leveraged pretrained ImageNet features to accelerate convergence and improve generalization. By initially freezing the backbone and training a lightweight classifier head, the model achieved strong baseline performance. A brief fine-tuning phase with a low learning rate further refined high-level visual features while preserving pretrained representations.

The final model achieved **over 95% validation accuracy**, demonstrating robust performance across multiple fruit types and spoilage conditions. This project highlights the effectiveness of transfer learning for small-to-medium image datasets and reinforces best practices in model freezing, controlled fine-tuning, and GPU-efficient training.

Future improvements could include experimenting with alternative architectures (e.g., ResNet or EfficientNet), incorporating class-imbalance handling, or extending the model to real-world deployment scenarios such as mobile inference or edge devices.