# Introduction

This project focuses on the task of classifying images of fruits using Convolutional Neural Networks (CNNs). The primary goal is to explore how image classification can be effectively achieved under **constrained conditions**, such as **limited data availability** and **resource-limited deployment environments** (e.g., embedded systems or smartphones).

Image classification is widely used in applications like automated food sorting, agricultural monitoring, quality control, and retail automation. Efficient and accurate classification systems can significantly reduce manual labor, increase throughput, and ensure consistency across vision-based workflows.

## Objectives

In this study, we aim to:

- Build and train a **lightweight CNN** capable of classifying fruit images using **grayscale input**.
- Apply **lightweight grayscale-specific augmentations** (e.g., flips, small rotations, noise) only to training data, to improve robustness without violating low-data constraints.
- Optimize for **small model size and fast inference**, making the model suitable for edge deployment.
- Compare the performance of our lightweight CNN with a **well-known pretrained model (MobileNetV2)** to understand trade-offs in accuracy, size, and complexity.

## Key Features of This Project

- ### Data-Efficient Design  
  We work with a **small dataset** and **grayscale images**, using **minimal augmentations** to improve generalization while simulating real-world data scarcity.

- ### Deployment-Ready Modeling  
  The CNN architecture is selected and trained to support **low-power, real-time inference** on embedded hardware or mobile devices. Post-training, we apply **quantization** (float32 → int8) to reduce memory and latency.

- ### Comparative Evaluation  
  We benchmark our model against a **pretrained MobileNetV2**, comparing trade-offs between size, accuracy, and complexity in a deployment context.

- ### Transparent Experimentation  
  All preprocessing, modeling, and evaluation steps are clearly explained and reproducible, with detailed tracking of accuracy and inference performance.

## Why This Matters

By exploring how far a compact CNN can go under tight constraints, this project contributes to the field of **edge-ready deep learning** — enabling smart, efficient vision systems even where compute and data are scarce.


# Step 1: Data Analysis & Dataset Sample

## 1.1 Data Acquisition

The dataset used in this project is the [Fruits 360 dataset](https://www.kaggle.com/datasets/moltean/fruits), publicly available on Kaggle. It contains thousands of labeled fruit images in `.jpg` format, sorted into folders by fruit class. Each image is **100×100 pixels** and originally in **RGB format**.

We selected this dataset because:
- It includes a **wide variety of fruit classes** suitable for multi-class classification.
- Images are already **clean and labeled**, minimizing preprocessing effort.
- It is a well-known benchmark dataset for fruit classification tasks.
- The folder structure is compatible with **PyTorch’s `ImageFolder`**, making it easy to load.

---

## 1.2 Dataset Reduction & Cleaning

To simulate a more realistic and constrained edge-case scenario, we **manually selected 8 specific fruit categories** from the dataset. This allows us to:
- Focus the classification task on a smaller, balanced subset.
- Reduce dataset size for faster training and evaluation.
- Better explore model performance in a **low-data regime**.

We ensured that:
- The image dimensions are consistent (**100×100**).
- All images are intact and correctly labeled.
- Class balance is maintained across training and test splits.

Additionally, during preprocessing, all images are converted to **grayscale** to reduce input complexity and align with embedded system constraints.

---

## 1.3 Exploratory Data Analysis (EDA)

To confirm that our selected subset is suitable for fair training, we examined the **class distribution** using simple utility functions.

### Count Function:
```python
def count_images_per_class(directory):
    return {
        class_name: len(os.listdir(os.path.join(directory, class_name)))
        for class_name in os.listdir(directory)
        if os.path.isdir(os.path.join(directory, class_name))
    }


- <a href="01_eda.ipynb" target="_blank">01_eda.ipynb</a> — Exploratory Data Analysis & Dataset Sampling  



In [None]:
## Notebook Workflow
- <a href="01_eda.ipynb" target="_blank">01_eda.ipynb</a> — Exploratory Data Analysis & Dataset Sampling  
- <a href="02_preprocessing.ipynb" target="_blank">02_preprocessing.ipynb</a> — Image preprocessing, dataset loading  
- <a href="03_train_cnn.ipynb" target="_blank">03_train_cnn.ipynb</a> — Baseline CNN training  
- <a href="04_eval_mobilenetv2.ipynb" target="_blank">04_eval_mobilenetv2.ipynb</a> — MobileNetV2 evaluation  
- <a href="05_results.ipynb" target="_blank">05_results.ipynb</a> — Final metrics, comparison, discussion  
