Skip to content

Qehbr/Fish-Classification-DS-Project

Repository files navigation

Fish Classification Data Science Project

Overview

This repository contains code and documentation for the Fish Classification project, focusing on fish classification using a deep learning approach. The project involves training models to classify fish images into 9 different classes and different experiments.

Files

All files contain proper documantation, overview:

Classes

  • CNN_Fish - model for fish classification
  • FishDataset - dataset class to retrieve images

Model Train and Testing

  • train - train the model using K-Fold cross validation
  • utils - functionf for reading and train/test splitting of data

Pretrained Models

  • model_architectures_main - main for using pretrained models as classifiers of images
  • model_architectures_train_test - train,test,validate pretrained models
  • model_architectures_utils.py - getters for different models and classifiers

Classifiers with Feature Extractor models

  • feature_extractor_main - main for using pretrained models as feature extractor

Dataset

Link to Kaggle Dataset

Structure

The dataset comprises 9 labels of fish:

Original Images:

50 images for each fish (except one class with 30 images) RGB images with varying sizes image

Training and Evaluation

Initial Model

We initially used original images with k-fold Cross Validation (k=5). For each fold: 80/20 train-validation/test split. The best-performing fold achieved the following results:

image image

  • Train Accuracy: 95.3%
  • Validation Accuracy: 66.7%
  • Test Accuracy: 54.7%

Examples of correctly/incorrectly/uncertain classified images:

image image image

Improvements

To enhance model performance, we implemented the following changes:

  • Used augmented images (7 new roated images for each image)
  • Hyperparameter tuning (learning rate, batch size, epochs)
  • Data preprocessing (image normalization) Results for the best fold after improvements:

image image

  • Train Accuracy: 97.9%
  • Validation Accuracy: 83.8%
  • Test Accuracy: 84.9%

Examples of correctly/incorrectly/uncertain classified images: image image image

Iterative Test-time Augmentation (ITA)

We experimented with ITA, achieving improved results but not surpassing the model without ITA:

image image

  • Train Accuracy: 99.9%
  • Validation Accuracy: 81.2%
  • Test Accuracy: 80.2%

Examples of correctly/incorrectly classified images: image image

Class Addition

A new class, "Janitor Fish," was added:

image

Results were slightly worse but still good performance:

image image

  • Train Accuracy: 93.5%
  • Validation Accuracy: 80.5%
  • Test Accuracy: 79.2%

Model Comparison

We compared the performance of different deep learning models:

VGG19:

  • Validation Accuracy: 26.09%
  • Test Accuracy: 26.74%

ResNet18:

  • Validation Accuracy: 98.55%
  • Test Accuracy: 100%

DenseNet121:

  • Validation Accuracy: 98.55%
  • Test Accuracy: 100%

ResNet50:

  • Validation Accuracy: 98.55%
  • Test Accuracy: 100%

Classical ML Algorithms:

We used ResNet18 as a feature extractor model and applied Random Forest, SVM, and KNN classifiers:

Random Forest:

  • Validation Accuracy: 97.1%
  • Test Accuracy: 100%

SVM:

  • Validation Accuracy: 98.55%
  • Test Accuracy: 100%

KNN:

  • Validation Accuracy: 98.55%
  • Test Accuracy: 100%

Fast Learning Experiment

We conducted experiments to assess how quickly models learn the dataset. ResNet18 demonstrated the best performance in terms of fast learning. image

About

Data Science project for fish dataset from Kaggle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages