<a href="https://colab.research.google.com/github/Vikash-Chaubey7061/DATA-SCIENCE-PROJECT-USING-PYTHON/blob/main/Fish_Multiclass.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project Title - Multiclass Fish Image Classification

Project By - Vikash Kumar Chaubey

Problem Statement: This project focuses on classifying fish images into multiple categories using deep learning models. The task involves training a CNN from scratch and leveraging transfer learning with pre-trained models to enhance performance. The project also includes saving models for later use and deploying a Streamlit application to predict fish categories from user-uploaded images.

Business Use Cases:
Enhanced Accuracy: Determine the best model architecture for fish image classification.
Deployment Ready: Create a user-friendly web application for real-time predictions.
Model Comparison: Evaluate and compare metrics across models to select the most suitable approach for the task.


Approach:

Data Preprocessing and Augmentation
1.Rescale images to [0, 1] range.
2.Apply data augmentation techniques like rotation, zoom, and flipping to enhance model robustness.

Model Training
Train a CNN model from scratch.
Experiment with five pre-trained models (e.g., VGG16, ResNet50, MobileNet, InceptionV3, EfficientNetB0).
Fine-tune the pre-trained models on the fish dataset.
Save the trained model (max accuracy model ) in .h5 or .pkl format for future use.

Model Evaluation
Compare metrics such as accuracy, precision, recall, F1-score, and confusion matrix across all models.
Visualize training history (accuracy and loss) for each model.

Deployment
Build a Streamlit application to:
Allow users to upload fish images.
Predict and display the fish category.
Provide model confidence scores.
Documentation and Deliverables
Provide comprehensive documentation of the approach, code, and evaluation.
Create a GitHub repository with a detailed README.
.

Dataset
The dataset consists of images of fish, categorized into folders by species. The dataset is loaded using TensorFlow's ImageDataGenerator for efficient processing.
Dataset:Data as Zip file


In [None]:
# Import necessary libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import VGG16, ResNet50, MobileNet, InceptionV3, EfficientNetB0
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from sklearn.metrics import classification_report, confusion_matrix
import streamlit as st
from PIL import Image
import cv2

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

# Define paths
dataset_path = "fish_dataset"  # Update this path to your dataset location
train_dir = os.path.join(dataset_path, "train")
val_dir = os.path.join(dataset_path, "validation")
test_dir = os.path.join(dataset_path, "test")

# Create directories if they don't exist
os.makedirs(train_dir, exist_ok=True)
os.makedirs(val_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)

# Data preprocessing and augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

# Create data generators
batch_size = 32
img_size = (224, 224)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

val_generator = val_datagen.flow_from_directory(
    val_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical'
)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='categorical',
    shuffle=False
)

# Get class names and number of classes
class_names = list(train_generator.class_indices.keys())
num_classes = len(class_names)

print(f"Number of classes: {num_classes}")
print(f"Class names: {class_names}")

Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.
Number of classes: 0
Class names: []


In [None]:
!pip install streamlit

Collecting streamlit
  Downloading streamlit-1.48.0-py3-none-any.whl.metadata (9.5 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.48.0-py3-none-any.whl (9.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m68.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m106.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[?25hIns

  # Build a CNN from Scratch

In [None]:
def build_cnn_model(input_shape=(224, 224, 3), num_classes=num_classes):
    model = Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        MaxPooling2D((2, 2)),

        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),

        Conv2D(128, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),

        Conv2D(256, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),

        Flatten(),
        Dense(512, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')
    ])

    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    return model

# Create and display the model
cnn_model = build_cnn_model()
cnn_model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


# Train the CNN Model

In [None]:
# Define callbacks
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

checkpoint = ModelCheckpoint(
    'best_cnn_model.h5',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max',
    verbose=1
)

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# Train the model
epochs = 30

history_cnn = cnn_model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    validation_data=val_generator,
    validation_steps=val_generator.samples // batch_size,
    epochs=epochs,
    callbacks=[checkpoint, early_stopping]
)

# Save the final model
cnn_model.save('fish_classification_cnn.h5')

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [16]:
import zipfile
import os

# Define the path to the zip file and extraction directory
zip_file_path = r"/content/Dataset OF Fish Multiclassification.zip"  # Update this path if necessary
extracted_path = "/content/fish_dataset" # Changed extraction path to a Colab directory

# Unzip the dataset
try:
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(extracted_path)
    print(f"Dataset extracted to: {extracted_path}")

    # Verify the contents of the extracted directory
    print("Contents of extracted directory:")
    print(os.listdir(extracted_path))

except FileNotFoundError:
    print(f"Error: The file {zip_file_path} was not found.")
except zipfile.BadZipFile:
    print(f"Error: The file {zip_file_path} is not a valid zip file.")
except Exception as e:
    print(f"An error occurred during extraction: {e}")

Error: The file /content/Dataset OF Fish Multiclassification.zip is not a valid zip file.


 Evaluate the CNN Model

In [3]:
def plot_training_history(history):
    plt.figure(figsize=(12, 4))

    # Plot training & validation accuracy values
    plt.subplot(1, 2, 1)
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('Model Accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Validation'], loc='upper left')

    # Plot training & validation loss values
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model Loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.tight_layout()
    plt.show()

# Plot training history
plot_training_history(history_cnn)

# Evaluate on test set
test_loss, test_acc = cnn_model.evaluate(test_generator)
print(f"Test Accuracy: {test_acc:.4f}")

# Generate predictions
test_generator.reset()
predictions = cnn_model.predict(test_generator)
predicted_classes = np.argmax(predictions, axis=1)

# True classes
true_classes = test_generator.classes

# Classification report
print("Classification Report:")
print(classification_report(true_classes, predicted_classes, target_names=class_names))

# Confusion matrix
cm = confusion_matrix(true_classes, predicted_classes)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

NameError: name 'history_cnn' is not defined

Transfer Learning with Pre-trained Models

In [4]:
def create_transfer_model(base_model, num_classes=num_classes):
    # Freeze the base model
    base_model.trainable = False

    # Create new model on top
    inputs = tf.keras.Input(shape=(224, 224, 3))
    x = base_model(inputs, training=False)
    x = GlobalAveragePooling2D()(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.5)(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = Model(inputs, outputs)

    model.compile(optimizer=Adam(learning_rate=0.001),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    return model

# Function to fine-tune a model
def fine_tune_model(model, base_model, fine_tune_at_layer, learning_rate=0.0001):
    # Unfreeze layers
    for layer in base_model.layers[fine_tune_at_layer:]:
        layer.trainable = True

    model.compile(optimizer=Adam(learning_rate=learning_rate),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model


# List of pre-trained models to try
pretrained_models = {
    'VGG16': VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
    'ResNet50': ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
    'MobileNet': MobileNet(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
    'InceptionV3': InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3)),
    'EfficientNetB0': EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
}

# Dictionary to store models and histories
models = {}
histories = {}

# Define callbacks
# Early stopping for initial transfer learning
early_stopping_transfer = EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)


# Train each pre-trained model
for name, base_model in pretrained_models.items():
    print(f"\nTraining {name}...")

    # Create model
    model = create_transfer_model(base_model)

    # Define checkpoint for initial transfer learning
    checkpoint_transfer = ModelCheckpoint(
        f'best_{name.lower()}_transfer_model.h5',
        monitor='val_accuracy',
        save_best_only=True,
        mode='max',
        verbose=1
    )

    # Train the model (initial transfer learning)
    history = model.fit(
        train_generator,
        steps_per_epoch=train_generator.samples // batch_size,
        validation_data=val_generator,
        validation_steps=val_generator.samples // batch_size,
        epochs=10,  # Fewer epochs for initial transfer learning
        callbacks=[checkpoint_transfer, early_stopping_transfer]
    )

    # Save model and history
    model.save(f'fish_classification_{name.lower()}.h5')
    models[name] = model
    histories[name] = history

    # Plot training history
    plot_training_history(history)

    # Evaluate on test set
    test_loss, test_acc = model.evaluate(test_generator)
    print(f"{name} Test Accuracy: {test_acc:.4f}")

    # --- Fine-tuning (Optional - uncomment and run only for the best model) ---
    # if name == 'EfficientNetB0': # Example: If EfficientNetB0 is the best model
    #     print(f"\nFine-tuning {name}...")
    #     fine_tune_at = 100 # Example layer to unfreeze
    #     fine_tuned_model = fine_tune_model(model, base_model, fine_tune_at)

    #     # Define callbacks for fine-tuning
    #     checkpoint_fine_tune = ModelCheckpoint(
    #         f'best_{name.lower()}_fine_tuned_model.h5', # This is the filename the app looks for
    #         monitor='val_accuracy',
    #         save_best_only=True,
    #         mode='max',
    #         verbose=1
    #     )
    #     early_stopping_fine_tune = EarlyStopping(
    #         monitor='val_loss',
    #         patience=5,
    #         restore_best_weights=True
    #     )

    #     # Train the fine-tuned model
    #     history_fine_tune = fine_tuned_model.fit(
    #         train_generator,
    #         steps_per_epoch=train_generator.samples // batch_size,
    #         validation_data=val_generator,
    #         validation_steps=val_generator.samples // batch_size,
    #         epochs=epochs, # Use more epochs for fine-tuning
    #         callbacks=[checkpoint_fine_tune, early_stopping_fine_tune]
    #     )

    #     # Save the fine-tuned model (This will overwrite the best checkpoint)
    #     fine_tuned_model.save(f'fish_classification_{name.lower()}_fine_tuned.h5')
    #     models[f'{name}_FineTuned'] = fine_tuned_model # Add to models dictionary
    #     histories[f'{name}_FineTuned'] = history_fine_tune # Add to histories dictionary

    #     # Plot fine-tuning history
    #     plot_training_history(history_fine_tune)

    #     # Evaluate fine-tuned model
    #     test_loss_ft, test_acc_ft = fine_tuned_model.evaluate(test_generator)
    #     print(f"{name} Fine-tuned Test Accuracy: {test_acc_ft:.4f}")

NameError: name 'num_classes' is not defined

Model Comparison and Selection

In [None]:
# Evaluate all models on test set
model_results = []

for name, model in models.items():
    test_loss, test_acc = model.evaluate(test_generator)
    model_results.append({
        'Model': name,
        'Test Accuracy': test_acc,
        'Parameters': model.count_params()
    })

# Add CNN model results
cnn_test_loss, cnn_test_acc = cnn_model.evaluate(test_generator)
model_results.append({
    'Model': 'Custom CNN',
    'Test Accuracy': cnn_test_acc,
    'Parameters': cnn_model.count_params()
})

# Create results dataframe
results_df = pd.DataFrame(model_results)
results_df = results_df.sort_values('Test Accuracy', ascending=False)
print("\nModel Comparison:")
print(results_df)

# Plot comparison
plt.figure(figsize=(10, 6))
sns.barplot(x='Test Accuracy', y='Model', data=results_df, palette='viridis')
plt.title('Model Comparison by Test Accuracy')
plt.xlabel('Test Accuracy')
plt.ylabel('Model')
plt.xlim(0, 1)
plt.show()

 Create Streamlit Web Application

In [33]:
# Create a new file named app.py

import streamlit as st
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy as np
from PIL import Image
import os

# Load the best model (replace with your best model path)
model = load_model('best_fine_tuned_efficientnetb0_model.h5')

# Class names (replace with your actual class names)
class_names = ['Black Sea Sprat', 'Gilt-Head Bream', 'Hourse Mackerel',
               'Red Mullet', 'Red Sea Bream', 'Sea Bass', 'Shrimp', 'Striped Red Mullet',
               'Trout']  # Example - replace with your actual classes

def preprocess_image(img):
    img = img.resize((224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = img_array / 255.0
    return img_array

def predict_image(img):
    processed_img = preprocess_image(img)
    predictions = model.predict(processed_img)
    predicted_class = class_names[np.argmax(predictions)]
    confidence = np.max(predictions) * 100
    return predicted_class, confidence, predictions

# Streamlit app
st.title("Fish Species Classification")
st.write("Upload an image of a fish, and we'll predict its species.")

uploaded_file = st.file_uploader("Choose a fish image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image', use_column_width=True)
    st.write("")
    st.write("Classifying...")

    predicted_class, confidence, predictions = predict_image(image)

    st.success(f"Prediction: {predicted_class}")
    st.success(f"Confidence: {confidence:.2f}%")

    # Show prediction probabilities
    st.subheader("Prediction Probabilities:")
    probs_df = pd.DataFrame({
        'Class': class_names,
        'Probability': predictions[0]
    }).sort_values('Probability', ascending=False)

    st.dataframe(probs_df)

    # Visualize probabilities
    st.subheader("Probability Distribution:")
    fig, ax = plt.subplots()
    sns.barplot(x='Probability', y='Class', data=probs_df, palette='viridis', ax=ax)
    ax.set_title('Prediction Probabilities')
    st.pyplot(fig)

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = 'best_fine_tuned_efficientnetb0_model.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

 Deployment Instructions

In [11]:
pip install streamlit tensorflow pillow matplotlib seaborn pandas numpy

Collecting streamlit
  Downloading streamlit-1.48.0-py3-none-any.whl.metadata (9.5 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.48.0-py3-none-any.whl (9.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.9/9.9 MB[0m [31m112.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m127.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading watchdog-6.0.0-py3-none-manylinux2014_x86_64.whl (79 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.1/79.1 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25hIn

Run the Streamlit app:

In [10]:
get_ipython().system('streamlit run app.py')

/bin/bash: line 1: streamlit: command not found


In [35]:
# Create the app.py file with the Streamlit app code
streamlit_code =
import streamlit as st
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy as np
from PIL import Image
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the best model (replace with your best model path)
# Make sure this path is correct after training and saving your model
try:
    model = load_model('best_fine_tuned_efficientnetb0_model.h5')
except Exception as e:
    st.error(f"Error loading model: {e}")
    st.stop()


# Class names (replace with your actual class names)
# Get class names from the generator if possible, or define them manually
# Assuming class_names is defined in a previous cell and accessible
try:
    # This assumes train_generator is still in scope and has class_names
    # If not, you'll need to manually define class_names
    class_names = list(train_generator.class_indices.keys())
except NameError:
    # Define manually if train_generator is not available
    class_names = ['Black Sea Sprat', 'Gilt-Head Bream', 'Hourse Mackerel',
                   'Red Mullet', 'Red Sea Bream', 'Sea Bass', 'Shrimp',
                   'Striped Red Mullet', 'Trout'] # Example - replace with your actual classes
    st.warning("train_generator not found, using hardcoded class names. Ensure these match your trained model.")


def preprocess_image(img):
    img = img.resize((224, 224))
    img_array = image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = img_array / 255.0 # Ensure consistent scaling with training
    return img_array

def predict_image(img):
    processed_img = preprocess_image(img)
    predictions = model.predict(processed_img)
    predicted_class_index = np.argmax(predictions)
    predicted_class = class_names[predicted_class_index]
    confidence = np.max(predictions) * 100
    return predicted_class, confidence, predictions[0] # Return predictions[0] as a 1D array

# Streamlit app
st.title("Fish Species Classification")
st.write("Upload an image of a fish, and we'll predict its species.")

uploaded_file = st.file_uploader("Choose a fish image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    try:
        image = Image.open(uploaded_file)
        st.image(image, caption='Uploaded Image', use_column_width=True)
        st.write("")
        st.write("Classifying...")

        predicted_class, confidence, predictions = predict_image(image)

        st.success(f"Prediction: {predicted_class}")
        st.success(f"Confidence: {confidence:.2f}%")

        # Show prediction probabilities
        st.subheader("Prediction Probabilities:")
        probs_df = pd.DataFrame({
            'Class': class_names,
            'Probability': predictions
        }).sort_values('Probability', ascending=False)

        st.dataframe(probs_df)

        # Visualize probabilities
        st.subheader("Probability Distribution:")
        fig, ax = plt.subplots()
        sns.barplot(x='Probability', y='Class', data=probs_df, palette='viridis', ax=ax)
        ax.set_title('Prediction Probabilities')
        st.pyplot(fig)

    except Exception as e:
        st.error(f"An error occurred during processing: {e}")



with open('app.py', 'w') as f:
    f.write(streamlit_code)

print("app.py created successfully!")

app.py created successfully!
