<a href="https://colab.research.google.com/github/blgayatri/DS_Projects/blob/main/Multiclass_Fish_Image_Classification_ML_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Multiclass Fish Image Classification



##### **Project Type**    - Deep Learning - Image Classification
##### **Contribution**    - Individual
##### **Team Member 1 -** Lakshmi Gayatri Balivada

# **Project Summary -**

The objective of this project was to develop a deep learning model capable of classifying images of fish into multiple species using Convolutional Neural Networks (CNNs). The dataset used contained 6,225 images belonging to 11 distinct classes. The images varied in resolution and were resized to 224×224 pixels for uniform processing. The dataset was split into training, validation, and testing sets to evaluate the model’s performance effectively.

To begin, the dataset was mounted from Google Drive and extracted in Google Colab. Data preprocessing included image rescaling to normalize pixel values between 0 and 1, as well as augmentation techniques such as rotation, zooming, and horizontal flipping to enhance model generalization and reduce overfitting.

A CNN model was implemented with multiple convolutional, pooling, and dropout layers, followed by a fully connected layer and a softmax output layer for multiclass classification. The model was trained for 10 epochs using the Adam optimizer and categorical crossentropy loss function. Training progress was monitored using accuracy and loss plots, while the final performance was evaluated on the validation and test datasets.

The results demonstrated that the model achieved promising accuracy on unseen test data, showing a good ability to differentiate between fish species. A confusion matrix and classification report provided insights into class-wise performance. Although the model performed well, there is potential for further improvement using advanced architectures like ResNet50, EfficientNet, or MobileNetV2, as well as hyperparameter tuning and dataset expansion.

In conclusion, this project highlights the effectiveness of CNN-based deep learning in solving multiclass image classification problems. With additional optimization, such a system could be highly valuable in applications such as fisheries management, marine biodiversity studies, and automated ecological monitoring.

# **GitHub Link -**

# **Problem Statement**


The goal of this project is to develop a machine learning model capable of accurately classifying images of different fish species. With increasing concerns about marine biodiversity and the need for efficient monitoring of aquatic ecosystems, automated fish species identification can greatly assist researchers, conservationists, and fisheries management. This project aims to use deep learning techniques, particularly convolutional neural networks (CNNs), to build a robust classifier that can distinguish between multiple fish classes from input images. The model’s performance will be evaluated using standard metrics such as accuracy, precision, recall, and F1-score. Ultimately, the classifier will be deployed in a user-friendly interface to facilitate practical usage.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.






















```
# This is formatted as code
```

# ***1. Setup & Project Initialization***

***Install Required Libraries (run in terminal)***

### Import Libraries

In [None]:
!pip install -q tensorflow

### 2. Mount Google Drive & Extract Dataset

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
from google.colab import drive
import os, zipfile

# 1️⃣ Mount Google Drive
drive.mount('/content/drive')

# 2️⃣ Go to your folder and check the exact filename
folder_path = "/content/drive/My Drive/Data Science/Projects/Multiclass Fish Image Classification"
print(os.listdir(folder_path))  # Look for the .zip file name here

# 3️⃣ Once you see the exact name, update this line:
zip_path = os.path.join(folder_path, "Dataset.zip")
extract_path = "/content/drive/My Drive/Data Science/Projects/Multiclass Fish Image Classification/images.cv_jzk6llhf18tm3k0kyttxz"

# 4️⃣ Extract the file
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

print("Files extracted to:", extract_path)

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os

# ✅ Correct base path
base_path = "/content/drive/My Drive/Data Science/Projects/Multiclass Fish Image Classification/images.cv_jzk6llhf18tm3k0kyttxz/data"

train_dir = os.path.join(base_path, "train")
val_dir = os.path.join(base_path, "val")
test_dir = os.path.join(base_path, "test")

# ✅ Image size and batch
img_size = (224, 224)
batch_size = 32

# ✅ Data augmentation for training
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

# ✅ No augmentation for validation/test
val_test_datagen = ImageDataGenerator(rescale=1./255)

# ✅ Data generators
train_gen = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

val_gen = val_test_datagen.flow_from_directory(
    val_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

test_gen = val_test_datagen.flow_from_directory(
    test_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

# ✅ Class names
class_names = list(train_gen.class_indices.keys())
print("Classes:", class_names)

In [None]:
import os

# Base dataset folder
base_path = "/content/drive/My Drive/Data Science/Projects/Multiclass Fish Image Classification/images.cv_jzk6llhf18tm3k0kyttxz/data"

# Define paths for train, val, and test
train_dir = os.path.join(base_path, "train")
val_dir = os.path.join(base_path, "val")
test_dir = os.path.join(base_path, "test")

print("Train dir exists:", os.path.exists(train_dir))
print("Val dir exists:", os.path.exists(val_dir))
print("Test dir exists:", os.path.exists(test_dir))


### 3. CNN Model



In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Image size and batch size
img_size = (224, 224)
batch_size = 32

# Training data generator with augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

# Validation & test data generator (no augmentation)
val_test_datagen = ImageDataGenerator(rescale=1./255)

# Create generators
train_gen = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

val_gen = val_test_datagen.flow_from_directory(
    val_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

test_gen = val_test_datagen.flow_from_directory(
    test_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

# Save class names for later use
class_names = list(train_gen.class_indices.keys())
print("Class names:", class_names)

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Input shape & number of classes
input_shape = (224, 224, 3)
num_classes = len(class_names)

# Build the CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),

    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),

    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Summary
model.summary()

# Train the model
history = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=10
)

### 4. Model Evaluation and Performance Evaluation

In [None]:
import matplotlib.pyplot as plt

# Evaluate on test set
test_loss, test_acc = model.evaluate(test_gen)
print(f"Test Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")

# Plot accuracy & loss curves
plt.figure(figsize=(12, 4))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.show()

### 3. Data Preprocessing & Augmentation

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Get predictions on test data
pred_probs = model.predict(test_gen)
pred_classes = np.argmax(pred_probs, axis=1)

# True labels
true_classes = test_gen.classes
class_labels = list(test_gen.class_indices.keys())

# Plot a few predictions
plt.figure(figsize=(12, 8))
for i in range(9):
    img, label = test_gen[i]  # get a batch
    for j in range(3):  # pick 3 images from the batch
        idx = i * test_gen.batch_size + j
        plt.subplot(3, 3, j + 1)
        plt.imshow(img[j])
        plt.axis('off')
        plt.title(f"Pred: {class_labels[pred_classes[idx]]}\nTrue: {class_labels[label[j].argmax()]}")
    break  # only first batch
plt.tight_layout()
plt.show()

In [None]:
# Generate predictions on the test set
predictions = model.predict(test_gen, verbose=1)

# Now proceed to evaluation
y_pred = np.argmax(predictions, axis=1)
y_true = test_gen.classes
labels = list(test_gen.class_indices.keys())

from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Classification report
print("Classification Report:\n")
print(classification_report(y_true, y_pred, target_names=labels))

# Confusion matrix
cm = confusion_matrix(y_true, y_pred)

plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

### 4. Build CNN Model

In [None]:
import random
from tensorflow.keras.preprocessing import image

# Get some random indices from test set
random_indices = random.sample(range(len(test_gen.filenames)), 9)

plt.figure(figsize=(12, 12))

for i, idx in enumerate(random_indices):
    img_path = test_gen.filepaths[idx]
    img = image.load_img(img_path, target_size=img_size)
    img_array = image.img_to_array(img) / 255.0  # normalize
    img_array_expanded = np.expand_dims(img_array, axis=0)

    # Predict
    pred = model.predict(img_array_expanded, verbose=0)
    pred_class = class_names[np.argmax(pred)]
    true_class = class_names[test_gen.classes[idx]]

    plt.subplot(3, 3, i + 1)
    plt.imshow(img_array)
    plt.title(f"Pred: {pred_class}\nTrue: {true_class}", color=("green" if pred_class == true_class else "red"))
    plt.axis("off")

plt.suptitle("Sample Predictions from Test Set", fontsize=16)
plt.show()

In [None]:
# Save the entire model
model.save("/content/fish_species_classifier.h5")

# (Optional) Save just weights
# model.save_weights("/content/fish_species_weights.h5")

print("Model saved successfully!")

In [None]:
from tensorflow.keras.models import load_model

loaded_model = load_model("/content/fish_species_classifier.h5")


# **Conclusion**

Summary of Work Done:

* Built a CNN-based Multi-class Fish Species Classifier using TensorFlow/Keras.

* Used data augmentation to improve generalization.

* Achieved an accuracy of <INSERT_ACCURACY> on the validation set.

* Evaluated the model with a classification report and confusion matrix.

Challenges Faced:

* Dataset imbalance for certain classes.

* Possible overfitting on training data.

Future Improvements:

* Experiment with deeper architectures like ResNet50, EfficientNet, or MobileNetV2.

* Use Transfer Learning to improve accuracy.

* Increase dataset size by adding more fish species images.

* Deploy as a web app using Streamlit or Flask.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***