## Fish Image Species Classification (Transfer Learning)

Given *images of fish*, let's try to predict the **species** of fish present in a given image.

We will use Tensorflow/Keras pretrained CNN to make our predictions. 

Data source: https://www.kaggle.com/datasets/crowww/a-large-scale-fish-dataset

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd

from pathlib import Path
import os.path

from sklearn.model_selection import train_test_split

import tensorflow as tf

2025-01-02 12:04:32.686461: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
image_dir = Path('archive/Fish_Dataset/Fish_Dataset')

### Creating File DataFrame

In [3]:
# Get filepaths and labels
filepaths = list(image_dir.glob(r'**/*.png'))

In [4]:
filepaths

[PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00346.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00091.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00751.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00817.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00313.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00639.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00224.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00778.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00043.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00693.png'),
 PosixPath('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00556.png'),

In [5]:
first_split = os.path.split('archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream/00213.png')[0]
first_split

'archive/Fish_Dataset/Fish_Dataset/Gilt-Head Bream/Gilt-Head Bream'

In [6]:
second_split = os.path.split(first_split)[1]
second_split

'Gilt-Head Bream'

In [7]:
labels = list(map(lambda x: os.path.split(os.path.split(x)[0])[1], filepaths))
labels

['Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',
 'Gilt-Head Bream',


In [8]:
filepaths = pd.Series(filepaths, name='Filepath').astype(str)
labels = pd.Series(labels, name='Label')

In [9]:
# Concatenate filepaths and labels
image_df = pd.concat([filepaths, labels], axis=1)
image_df

Unnamed: 0,Filepath,Label
0,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
1,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
2,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
3,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
4,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
...,...,...
18000,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18001,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18002,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18003,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass


In [10]:
# Removing ".ipynb checkpoints" labels
image_df.drop(image_df[image_df['Label'] == '.ipynb_checkpoints'].index, axis=0, inplace=True)

In [11]:
image_df['Label'].unique()

array(['Gilt-Head Bream', 'Gilt-Head Bream GT', 'Red Sea Bream',
       'Red Sea Bream GT', 'Black Sea Sprat', 'Black Sea Sprat GT',
       'Trout GT', 'Trout', 'Striped Red Mullet GT', 'Striped Red Mullet',
       'Shrimp GT', 'Shrimp', 'Red Mullet GT', 'Red Mullet',
       'Hourse Mackerel GT', 'Hourse Mackerel', 'Sea Bass GT', 'Sea Bass'],
      dtype=object)

In [12]:
# Drop GT images
image_df['Label'] = image_df['Label'].apply(lambda x: np.NaN if x[-2:] == 'GT' else x)
image_df = image_df.dropna(axis=0)
image_df

Unnamed: 0,Filepath,Label
0,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
1,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
2,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
3,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
4,archive/Fish_Dataset/Fish_Dataset/Gilt-Head Br...,Gilt-Head Bream
...,...,...
18000,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18001,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18002,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
18003,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass


In [13]:
image_df['Label'].unique()

array(['Gilt-Head Bream', 'Red Sea Bream', 'Black Sea Sprat', 'Trout',
       'Striped Red Mullet', 'Shrimp', 'Red Mullet', 'Hourse Mackerel',
       'Sea Bass'], dtype=object)

In [14]:
image_df['Label'].value_counts()

Label
Gilt-Head Bream       1000
Red Sea Bream         1000
Black Sea Sprat       1000
Trout                 1000
Striped Red Mullet    1000
Shrimp                1000
Red Mullet            1000
Hourse Mackerel       1000
Sea Bass              1000
Name: count, dtype: int64

In [15]:
image_df.sample(200*9)['Label'].value_counts()

Label
Red Mullet            219
Gilt-Head Bream       214
Hourse Mackerel       206
Trout                 200
Striped Red Mullet    196
Black Sea Sprat       193
Sea Bass              193
Shrimp                191
Red Sea Bream         188
Name: count, dtype: int64

In [16]:
category = "Hourse Mackerel"
image_df.query("Label==@category")

Unnamed: 0,Filepath,Label
15005,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
15006,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
15007,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
15008,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
15009,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
...,...,...
16000,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
16001,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
16002,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
16003,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel


In [17]:
# Sample 200 images from each class
samples = []

for category in image_df['Label'].unique():
    category_slice = image_df.query("Label == @category")
    samples.append(category_slice.sample(200, random_state=1))

image_df = pd.concat(samples, axis=0).sample(frac=1.0, random_state=1).reset_index(drop=True) # sampling with fraction 100% (without replacement) shuffles the data

In [18]:
image_df

Unnamed: 0,Filepath,Label
0,archive/Fish_Dataset/Fish_Dataset/Hourse Macke...,Hourse Mackerel
1,archive/Fish_Dataset/Fish_Dataset/Black Sea Sp...,Black Sea Sprat
2,archive/Fish_Dataset/Fish_Dataset/Trout/Trout/...,Trout
3,archive/Fish_Dataset/Fish_Dataset/Red Mullet/R...,Red Mullet
4,archive/Fish_Dataset/Fish_Dataset/Striped Red ...,Striped Red Mullet
...,...,...
1795,archive/Fish_Dataset/Fish_Dataset/Striped Red ...,Striped Red Mullet
1796,archive/Fish_Dataset/Fish_Dataset/Sea Bass/Sea...,Sea Bass
1797,archive/Fish_Dataset/Fish_Dataset/Shrimp/Shrim...,Shrimp
1798,archive/Fish_Dataset/Fish_Dataset/Red Sea Brea...,Red Sea Bream


In [19]:
image_df['Label'].value_counts()

Label
Hourse Mackerel       200
Black Sea Sprat       200
Trout                 200
Red Mullet            200
Striped Red Mullet    200
Gilt-Head Bream       200
Sea Bass              200
Shrimp                200
Red Sea Bream         200
Name: count, dtype: int64

In [20]:
train_df, test_df = train_test_split(image_df, train_size=0.7, shuffle=True, random_state=1)

### Loading the Images

In [21]:
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function = tf.keras.applications.mobilenet_v2.preprocess_input,
    validation_split = 0.2
)

test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
    preprocessing_function = tf.keras.applications.mobilenet_v2.preprocess_input
)

In [39]:
train_images = train_generator.flow_from_dataframe(
    dataframe = train_df,
    x_col = 'Filepath',
    y_col = 'Label',
    target_size = (224, 224),
    color_mode = 'rgb',
    class_mode = 'categorical',
    batch_size = 32,
    shuffle = True,
    seed = 42,
    subset = 'training'
)

val_images = train_generator.flow_from_dataframe(
    dataframe = train_df,
    x_col = 'Filepath',
    y_col = 'Label',
    target_size = (224, 224),
    color_mode = 'rgb',
    class_mode = 'categorical',
    batch_size = 32,
    shuffle = True,
    seed = 42,
    subset = 'validation'
)

test_images = test_generator.flow_from_dataframe(
    dataframe = test_df,
    x_col = 'Filepath',
    y_col = 'Label',
    target_size = (224, 224),
    color_mode = 'rgb',
    class_mode = 'categorical',
    batch_size = 32,
    shuffle = False
)

Found 1008 validated image filenames belonging to 9 classes.
Found 252 validated image filenames belonging to 9 classes.
Found 540 validated image filenames belonging to 9 classes.


### Load Pretrained Model

In [22]:
pretrained_model = tf.keras.applications.MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights = 'imagenet',
    pooling = 'avg'
)

pretrained_model.trainable = False

2025-01-02 12:04:35.062728: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


### Training

In [40]:
pretrained_model.summary()

Model: "mobilenetv2_1.00_224"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 Conv1 (Conv2D)                 (None, 112, 112, 32  864         ['input_1[0][0]']                
                                )                                                                 
                                                                                                  
 bn_Conv1 (BatchNormalization)  (None, 112, 112, 32  128         ['Conv1[0][0]']                  
                                )                                              

In [42]:
pretrained_model.input

<KerasTensor: shape=(None, 224, 224, 3) dtype=float32 (created by layer 'input_1')>

In [44]:
train_images.next()[1]

array([[0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1.],
       [0.

In [45]:
inputs = pretrained_model.input

x = tf.keras.layers.Dense(128, activation='relu')(pretrained_model.output)
x = tf.keras.layers.Dense(128, activation='relu')(x)

outputs = tf.keras.layers.Dense(9, activation='softmax')(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.compile(
    optimizer='adam',
    loss = 'categorical_crossentropy',
    metrics=['accuracy']
)

history = model.fit(
    train_images,
    validation_data = val_images,
    epochs=100,
    callbacks = [
        tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            patience=3,
            restore_best_weights=True
        )
    ]
)

Epoch 1/100


2025-01-02 12:45:39.807388: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]




2025-01-02 12:47:39.622301: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

### Results

In [47]:
results = model.evaluate(test_images)
print("Test Loss: {:.5f}".format(results[0]))
print("Test Accuracy: {:.2f}%".format(results[1] * 100))

2025-01-02 16:44:40.851484: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]


Test Loss: 0.01319
Test Accuracy: 99.44%
