Image Classification: Deep Learning vs Traditional Computer Vision

This repository contains the coursework project for COMP64301: Cognitive Robotics and Computer Vision. The project aims to design, implement, and evaluate computer vision algorithms, providing a comprehensive comparison between modern Deep Learning methods and traditional Computer Vision techniques for image classification.

Overview

The goal of this project is to benchmark two fundamentally different paradigms in computer vision across two distinct levels of difficulty—coarse-grained and fine-grained image classification.

We systematically compare a Convolutional Neural Network (CNN) via transfer learning against a classic Bag of Visual Words (BoW) pipeline using SIFT features and SVM.

Datasets & Preprocessing

Two datasets were selected to evaluate the models on different visual tracking tasks:

Caltech 101 Subset (Coarse-grained): 10 diverse classes (airplanes, motorbikes, faces_easy, watch, leopards, bonsai, car_side, ketch, chandelier, hawksbill).
Oxford-IIIT Pet Subset (Fine-grained): 10 visually similar cat and dog breeds (Abyssinian, Bengal, Birman, Bombay, British Shorthair, Egyptian Mau, Maine Coon, Persian, Ragdoll, Russian Blue).

Data Split: Images were strictly divided into Training (70%), Validation (15%), and Testing (15%) sets. A fixed random seed (42) was used to ensure reproducible splits and reliable baseline comparisons.

Methodologies

1. Deep Learning Approach (CNN)

We utilized a ResNet18 architecture, chosen for its excellent balance between computational efficiency and accuracy.

Transfer Learning: Initialized with ImageNet pre-trained weights.
Feature Extractor: All convolutional backbone layers were frozen.
Classifier: Only the final Fully Connected (FC) layer was re-trained to map outputs to our 10 target classes.
Preprocessing & Augmentation: Images resized to 224×224, normalized using standard ImageNet parameters, and randomly flipped horizontally during training to prevent overfitting.
Training Setup: Adam Optimizer, Cross-Entropy Loss. Hyperparameter search over Learning Rates (0.001, 0.0005), Batch Sizes (32, 64), and Epochs (15, 30). Trained on an NVIDIA RTX 4060 GPU.

2. Traditional Computer Vision Approach (BoW)

We implemented a classic Bag of Visual Words (BoW) pipeline from scratch.

Feature Extraction: Extracted local scale-invariant features using the SIFT algorithm.
Visual Vocabulary: Clustered training descriptors using K-Means to build a visual dictionary. Explored vocabulary sizes of $K \in {50, 100, 200, 500}$.
Image Encoding: Mapped descriptors to visual words to generate normalized histogram representations for each image.
Classification: Trained Support Vector Machines (SVM), comparing Linear and Radial Basis Function (RBF) kernels, with regularization parameter $C \in {0.1, 1.0, 10.0}$. Executed entirely on CPU.

Results & Comparative Analysis

An extensive evaluation was conducted on the unseen Test set.

Metric / Dimension	Deep Learning (ResNet18)	Traditional CV (BoW + SIFT)
Coarse Classification (Caltech101)	~100% (Perfect classification)	85.3% (Best: K=500, RBF Kernel)
Fine Classification (Oxford Pets)	~92.7%	41.7%
Computational Resources	Heavy; requires GPU (~166s training)	Lightweight; CPU sufficient (~30s training)
Interpretability	Low (Black box representations)	High (Explainable feature histograms)

Key Findings

Task Complexity: Traditional BoW pipelines perform reasonably well on coarse-grained tasks where distinct shapes and edge features (SIFT) are sufficient. However, they struggle significantly with fine-grained tasks (like pet breeds) where nuanced textures and hierarchical patterns are required.
Feature Hierarchy: CNNs inherently learn deep, hierarchical spatial feature representations, driving their massive performance lead (92.7% vs 41.7%) on complex datasets.
Trade-offs: While CNNs dominate in accuracy, they are opaque ("black-box") and hardware-intensive. Traditional methods remain highly interpretable, deterministic, and computationally lightweight.

Project Structure

.
├── code/
│   ├── generate_splits.py             # Data splitting logic (70/15/15)
│   ├── generate_splits_and_folders.py # Prepares directory structures for PyTorch
│   ├── train_cnn.py                   # Script for training ResNet18
│   ├── train_bow.py                   # Script for SIFT extraction + K-Means + SVM
│   ├── test.py                        # Evaluation on test sets
│   └── final_plots.py                 # Generates loss curves and confusion matrices
├── data/
│   ├── raw/                           # Original Caltech101 and Oxford-IIIT datasets
│   └── processed/                     # Train/Val/Test subsets safely segregated
├── docs/                              # Project report & LaTeX source files
|   └── COMP64301_Assignment_v3.pdf    # Final report PDF
└── results/
    ├── cnn_models/                    # Saved PyTorch checkpoint weights (.pth)
    ├── figures/                       # Confusion matrices and plots
    ├── cnn_results.csv                # CNN test metrics
    └── bow_results.csv                # BoW test metrics

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
data/processed		data/processed
docs		docs
results		results
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Classification: Deep Learning vs Traditional Computer Vision

Overview

Datasets & Preprocessing

Methodologies

1. Deep Learning Approach (CNN)

2. Traditional Computer Vision Approach (BoW)

Results & Comparative Analysis

Key Findings

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image Classification: Deep Learning vs Traditional Computer Vision

Overview

Datasets & Preprocessing

Methodologies

1. Deep Learning Approach (CNN)

2. Traditional Computer Vision Approach (BoW)

Results & Comparative Analysis

Key Findings

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages