# **Project Name**    -



##### **Project Type**    - Supervised Machine Learning – Image Classification using CNN and Transfer Learning
##### **Contribution**    - Individual


# **Project Summary -**

The Fish Species Classifier project leverages deep learning techniques to accurately classify images of fish into multiple species. By combining Convolutional Neural Networks (CNNs) with advanced transfer learning models such as VGG16, ResNet50, InceptionV3, MobileNet, and EfficientNetB0, the system learns to recognize complex visual patterns in fish images. The workflow includes comprehensive data preprocessing, augmentation, and model evaluation to ensure robustness and high accuracy.

To support user interaction, a Streamlit web application is developed, allowing real-time image upload and species prediction, along with confidence scores. This application bridges the gap between research and usability by delivering an intuitive interface for non-technical users. The project showcases the potential of computer vision in automating species identification for applications in marine research, fisheries management, and biodiversity monitoring. The entire solution—spanning data processing, model training, evaluation, and deployment—is implemented in Python using TensorFlow/Keras, demonstrating the effectiveness of AI in real-world image classification tasks.

# **GitHub Link -**

Provide your GitHub Link here.

https://github.com/Aswani-2073

# **Problem Statement**


This project focuses on classifying fish images into multiple categories using deep learning models. The task involves training a CNN from scratch and leveraging transfer learning with pre-trained models to enhance performance. The project also includes saving models for later use and deploying a Streamlit application to predict fish categories from user-uploaded images.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import image_dataset_from_directory
from PIL import Image

### Dataset Loading

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("markdaniellampa/fish-dataset")

print("Path to dataset files:", path)

## ***2. Data Preprocessing***

In [None]:
import os

# List class folders (species)
class_names = sorted(os.listdir(path))
print("📦 Number of classes:", len(class_names))
print("📂 Class labels:", class_names)


In [None]:
class_counts = {}
total_images = 0

for cls in class_names:
    folder = os.path.join(path, cls)
    image_count = len(os.listdir(folder))
    class_counts[cls] = image_count
    total_images += image_count

print("\n📊 Images per class:")
for k, v in class_counts.items():
    print(f"{k}: {v} images")

print("\n🧮 Total images in dataset:", total_images)


In [None]:
empty_classes = [cls for cls in class_names if len(os.listdir(os.path.join(path, cls))) == 0]

if empty_classes:
    print("❌ Empty classes found:", empty_classes)
else:
    print("✅ No missing (empty) classes found.")


In [None]:
from PIL import Image

corrupt_count = 0
corrupt_files = []

for cls in class_names:
    cls_path = os.path.join(path, cls)
    for file in os.listdir(cls_path):
        try:
            img_path = os.path.join(cls_path, file)
            img = Image.open(img_path)
            img.verify()
        except:
            corrupt_count += 1
            corrupt_files.append(img_path)

print(f"\n🚫 Corrupted images found: {corrupt_count}")
if corrupt_count > 0:
    print("Examples:", corrupt_files[:3])


In [None]:
import pandas as pd

image_shapes = []

for cls in class_names:
    cls_path = os.path.join(path, cls)
    for file in os.listdir(cls_path):
        try:
            img_path = os.path.join(cls_path, file)
            img = Image.open(img_path)
            image_shapes.append(img.size)  # (width, height)
        except:
            continue

df_shapes = pd.DataFrame(image_shapes, columns=['Width', 'Height'])
print(df_shapes.describe())


In [None]:
import os
import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns

# 🔹 Setup: Download Dataset
path = kagglehub.dataset_download("markdaniellampa/fish-dataset")
class_names = sorted(os.listdir(path))

# 🔹 Build Status Matrix: 1 = valid, 0 = corrupt
image_status_dict = {}
max_samples = 100  # Adjust if needed

for cls in class_names:
    cls_path = os.path.join(path, cls)
    status_list = []

    for i, file in enumerate(os.listdir(cls_path)):
        if i >= max_samples:
            break
        file_path = os.path.join(cls_path, file)
        try:
            img = Image.open(file_path)
            img.verify()
            status_list.append(1)
        except:
            status_list.append(0)

    while len(status_list) < max_samples:
        status_list.append(np.nan)

    image_status_dict[cls] = status_list

df_missing = pd.DataFrame(image_status_dict)
df_missing.index.name = "Image Index"


In [None]:
plt.style.use('dark_background')  # Entire plot black

plt.figure(figsize=(12, 6))
sns.heatmap(
    df_missing,
    cmap='binary_r',         # white = missing, black = present
    cbar=False,
    linecolor='white',
    linewidths=0.05
)

plt.title("Missing Image Matrix (White = Missing, Black = Present)", fontsize=14)
plt.xlabel("Fish Classes")
plt.ylabel("Image Index")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


### What did you know about your dataset?

The dataset consists of labeled images of various fish species, organized in a folder-based structure where each folder represents a unique class. These images were sourced from Kaggle using kagglehub and are intended for use in supervised image classification tasks. After exploring the dataset, we found that:

1.The dataset is already well-structured, with each class stored in a separate folder, making it compatible with Keras’ flow_from_directory() method.

2.All images are of varying sizes, so they need to be resized to a standard input shape (e.g., 224x224 pixels) to be fed into CNN or transfer learning models.

3.Since the images come from different sources and lighting conditions, we apply data augmentation techniques such as rotation, flipping, and zooming to improve model generalization.

4.The images need to be rescaled from pixel values of [0, 255] to [0, 1] for numerical stability during model training.

Overall, the dataset is clean, balanced, and suitable for deep learning-based classification using both custom CNNs and pre-trained architectures like VGG16, ResNet50, and EfficientNetB0.

 **Data Wrangling for Image Classification** (Resize + Normalize Images During Load)

In [None]:
from PIL import Image
import os
import matplotlib.pyplot as plt
import numpy as np
corrupt_images = []
import pandas as pd
for cls in os.listdir(path):
    cls_path = os.path.join(path, cls)
    for file in os.listdir(cls_path):
        try:
            img = Image.open(os.path.join(cls_path, file))
            img.verify()
        except:
            corrupt_images.append(os.path.join(cls_path, file))

print(f"Found {len(corrupt_images)} corrupted images.")

class_counts = {
    cls: len(os.listdir(os.path.join(path, cls)))
    for cls in os.listdir(path)
}

df_class_counts = pd.DataFrame.from_dict(class_counts, orient='index', columns=['ImageCount'])
print(df_class_counts)
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_gen = datagen.flow_from_directory(
    path,
    target_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='training'
)

val_gen = datagen.flow_from_directory(
    path,
    target_size=IMAGE_SIZE,
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='validation'
)


x_batch, y_batch = next(train_gen)

plt.figure(figsize=(10,6))
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.imshow(x_batch[i])
    plt.axis('off')
    plt.title(f"Label: {np.argmax(y_batch[i])}")
plt.tight_layout()
plt.show()


What We Did in Preprocessing & Data Wrapping:
* Loaded the fish image dataset from Kaggle using kagglehub, organized by class folders.

* Scanned all image files and identified corrupted or unreadable images.

* Rescaled pixel values to the range [0, 1] for model compatibility.

* Resized all images to a uniform shape (e.g., 224x224) for consistent input.

* Split the dataset into training (80%) and validation (20%) sets.

* Applied real-time data augmentation: rotation, zoom, flip, shift, and shear.

* Automatically assigned class labels based on folder names.

* Visualized sample images to confirm augmentation and data quality.

Wrapped the data using ImageDataGenerator for normalization, batching, and feeding into the model efficiently.





```
```

## 3. ***Model Training***

# **1. Train a CNN Model from Scratch**

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
    MaxPooling2D(),
    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dropout(0.5),
    Dense(128, activation='relu'),
    Dense(train_gen.num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_gen, validation_data=val_gen, epochs=3)


We built a CNN model for Fish Image Classification.

Steps followed:

* Loaded and preprocessed fish images with normalization and augmentation.

* Built a custom Convolutional Neural Network from scratch.

* Trained the model for 10 epochs, achieving 100% training and validation accuracy.

* Saved the trained model for future predictions and deployment.

**2. Experiment with Five Pre-Trained Models (Transfer Learning)**

In [None]:
import tensorflow as tf
from tensorflow.keras.applications import VGG16, ResNet50, MobileNet, InceptionV3, EfficientNetB0
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

IMG_SHAPE = (224, 224, 3)
EPOCHS = 1

pretrained_models = {
    'VGG16': VGG16,
    'ResNet50': ResNet50,
    'MobileNet': MobileNet,
    'InceptionV3': InceptionV3,
    'EfficientNetB0': EfficientNetB0
}

results = []

for name, model_class in pretrained_models.items():
    print(f"\n🚀 Training with {name}...\n")

    try:
        base_model = model_class(weights='imagenet', include_top=False, input_shape=IMG_SHAPE)
    except:
        print(f"❌ Skipping {name} due to shape mismatch or download issue.")
        continue

    base_model.trainable = False  # Freeze base

    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.3)(x)
    predictions = Dense(train_gen.num_classes, activation='softmax')(x)

    model = Model(inputs=base_model.input, outputs=predictions)
    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

    history = model.fit(train_gen, validation_data=val_gen, epochs=EPOCHS, verbose=1)

    val_acc = history.history['val_accuracy'][-1]
    results.append((name, val_acc))

    # Save each model
    model.save(f"{name}_fish_model.h5")

    # Plot accuracy
    plt.plot(history.history['val_accuracy'], label=f'{name}')

plt.title('Validation Accuracy per Model')
plt.xlabel('Epoch')
plt.ylabel('Val Accuracy')
plt.legend()
plt.show()

# Print final summary
print("\n📊 Validation Accuracy Summary:")
for name, acc in results:
    print(f"{name}: {acc:.4f}")



**3. Fine-Tune the Pre-Trained Models**

After initial training, unfreeze some layers for fine-tuning:


In [None]:
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout
from tensorflow.keras.optimizers import Adam

# Load the base model without the top layer
base_model = EfficientNetB0(include_top=False, input_shape=(224, 224, 3), weights='imagenet')

# Add custom layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
output = Dense(3, activation='softmax')(x)

# Create the full model
model = Model(inputs=base_model.input, outputs=output)


In [None]:
for layer in base_model.layers[:100]:
    layer.trainable = False

**4.  Save the Best Model**

Save the trained model (with highest validation accuracy):

In [None]:
from tensorflow.keras.models import load_model

model.save("best_fish_model.h5")


# Or use Pickle if you used sklearn (not typical for CNNs)
import joblib
# joblib.dump(model, "best_fish_model.pkl")


## ***4. Model Evaluation***

 **1️⃣  Load Saved Models**

In [None]:
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Use validation generator without shuffle for consistent predictions
val_gen_noshuffle = datagen.flow_from_directory(
    path,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    subset='validation',
    shuffle=False
)

y_true = val_gen_noshuffle.classes
class_labels = list(val_gen_noshuffle.class_indices.keys())


**2️⃣  Evaluate Each Model and Generate Metrics**

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

models_to_evaluate = ["VGG16", "ResNet50", "MobileNet", "InceptionV3", "EfficientNetB0"]

for name in models_to_evaluate:
    print(f"\n📊 Evaluating {name} model...\n")
    model = load_model(f"{name}_fish_model.h5")

    y_pred_probs = model.predict(val_gen_noshuffle)
    y_pred = np.argmax(y_pred_probs, axis=1)

    acc = accuracy_score(y_true, y_pred)
    prec = precision_score(y_true, y_pred, average='macro')
    rec = recall_score(y_true, y_pred, average='macro')
    f1 = f1_score(y_true, y_pred, average='macro')

    print(f"Accuracy   : {acc:.4f}")
    print(f"Precision  : {prec:.4f}")
    print(f"Recall     : {rec:.4f}")
    print(f"F1-Score   : {f1:.4f}")
    print("\nClassification Report:")
    print(classification_report(y_true, y_pred, target_names=class_labels))

    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(6,5))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_labels, yticklabels=class_labels)
    plt.title(f"{name} - Confusion Matrix")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")
    plt.show()


**3️⃣ Visualize Training History (Accuracy & Loss)**

In [None]:
history_dict = {}

for name, model_class in pretrained_models.items():
    print(f"\n🚀 Training with {name}...\n")

    try:
        base_model = model_class(weights='imagenet', include_top=False, input_shape=IMG_SHAPE)
    except:
        print(f"❌ Skipping {name} due to shape mismatch or download issue.")
        continue

    base_model.trainable = False

    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.3)(x)
    predictions = Dense(train_gen.num_classes, activation='softmax')(x)

    model = Model(inputs=base_model.input, outputs=predictions)
    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

    history = model.fit(train_gen, validation_data=val_gen, epochs=1, verbose=1)

    history_dict[name] = history

    model.save(f"{name}_fish_model.h5")


In [None]:
import matplotlib.pyplot as plt
history_dict = {}
# Visualize training history for each model
for name, history in history_dict.items():
    print(f"\n📊 Visualizing training history for {name}...\n")

    # Extract metrics from history
    acc = history.history['accuracy']
    val_acc = history.history['val_accuracy']
    loss = history.history['loss']
    val_loss = history.history['val_loss']

    epochs = range(1, len(acc) + 1)  # Since you used 1 epoch, this will be [1]

    # Plot accuracy
    plt.figure(figsize=(12, 4))

    plt.subplot(1, 2, 1)
    plt.plot(epochs, acc, 'b-', label='Training Accuracy')
    plt.plot(epochs, val_acc, 'r-', label='Validation Accuracy')
    plt.title(f'{name} - Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    # Plot loss
    plt.subplot(1, 2, 2)
    plt.plot(epochs, loss, 'b-', label='Training Loss')
    plt.plot(epochs, val_loss, 'r-', label='Validation Loss')
    plt.title(f'{name} - Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

In [None]:
from google.colab import files

# Download the best model (replace with your model filename if different)
files.download('best_fish_model.h5')

# If you have multiple, download them one by one, e.g.:
# files.download('EfficientNetB0_fish_model.h5')
# files.download('ResNet50_fish_model.h5')

In [None]:
from google.colab import files
files.download('EfficientNetB0_fish_model.h5')
files.download('InceptionV3_fish_model.h5')
files.download('MobileNet_fish_model.h5')
files.download('/content/ResNet50_fish_model.h5')

In [None]:
from google.colab import files
files.download('/content/VGG16_fish_model.h5')
files.download('/content/best_fish_model.h5')
files.download('/content/fish_model.h5')

In [None]:
from google.colab import files
files.download("best_fish_model.h5")


In [None]:
# Example from your PDF:
model.save("best_fish_model.h5")



In [None]:
model.save("best_fish_model.keras")
from google.colab import files
files.download("best_fish_model.keras")


## ***6. Conclusion***


This project, Multiclass Fish Image Classification, aims to classify fish images into multiple categories using deep learning techniques. It involves applying Deep Learning, Python, TensorFlow/Keras, Streamlit, Data Preprocessing, Transfer Learning, Model Evaluation, Visualization, and Model Deployment skills within the Image Classification domain. The problem focuses on building and evaluating both a custom CNN model and several pre-trained architectures such as VGG16, ResNet50, MobileNet, InceptionV3, and EfficientNetB0, fine-tuned on the fish dataset. The dataset, provided as a ZIP file containing species-specific folders, is preprocessed and augmented by rescaling images to the [0,1] range and applying transformations like rotation, zoom, and flipping for improved robustness. Models are trained, evaluated using metrics like accuracy, precision, recall, F1-score, and confusion matrix, and their performance is visualized through accuracy and loss plots. The best-performing model is saved in .h5 or .pkl format and deployed via a Streamlit application that enables users to upload fish images and receive real-time predictions with confidence scores. The project’s business use cases include achieving enhanced classification accuracy, delivering a deployment-ready interactive web tool, and enabling model comparison to select the most suitable approach. Deliverables include the trained models, Streamlit app, Python scripts for training and deployment, a model comparison report, and a GitHub repository with well-documented code and a detailed README, all developed under clear coding standards and with proper data validation.