# Introduction

This notebook uses tranfer learning to classify spectrogram data on dolphin whistles. Much of the code used here was adpapted from https://keras.io/guides/transfer_learning/.

Credit for parts of the code dealing with the `ImageDataGenerator` is given to Josh Wheeler and Gemma Ruseva (https://github.com/JoshWheeler08/DolphinAcoustics-Classifier/blob/main/vip_dolphin.ipynb)

The code seeks to make a model which can distinguish among Common, Melon Head, and Bottlenose dolphin species.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import classification_report
from tensorflow.keras import layers

# NOTE: Although not ideal, transfer learning models should be saved in an h5 format to avoid issues with tensorflow loading.
#       This (to date) seems to be the best workaround to an issue in tensorflow.

In [None]:
# Mount Google Drive so data can be accessed
from google.colab import drive
drive.mount('/content/drive') # '/content' is the current working directory

Mounted at /content/drive


# Loading the Base Model and Adding Extra Layers

Here we load the Xception base model for transfer learning.This is a fairly complex CNN-based model. More can be read about it here: https://arxiv.org/abs/1610.02357.

The base model is "frozen" so that its hyperparameters are not drastically changed by subsequent training. 

We then add two new layers for training on the spectrogram data.

A list of several alternatives to the Xception base model can be found here: https://keras.io/api/applications/#available-models. 

In [None]:
# input shape set to that used with the dolphins spectrogram data
image_shape = (202, 413, 3)

base_model = keras.applications.Xception(
    weights="imagenet",  # Load weights pre-trained on ImageNet.
    input_shape=image_shape, 
    include_top=False,
)  # Do not include the ImageNet classifier at the top.

# Freeze the base_model
base_model.trainable = False

# Create new model on top
inputs = keras.Input(shape=image_shape)
x = keras.Sequential()(inputs)

# Pre-trained Xception weights require that input be scaled
# from (0, 255) to a range of (-1., +1.), the rescaling layer
# outputs: `(inputs * scale) + offset`
scale_layer = keras.layers.Rescaling(scale=1 / 127.5, offset=-1)
x = scale_layer(x)

# The base model contains batchnorm layers. We want to keep them in inference mode
# when we unfreeze the base model for fine-tuning, so we make sure that the
# base_model is running in inference mode here.
x = base_model(x, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dropout(0.2)(x)  # Regularize with dropout

number_of_outputs = 3 # Here, the number of outputs is set to three since we are working 
                      # on a three-category multi-classification problem.
outputs = keras.layers.Dense(number_of_outputs)(x)
model = keras.Model(inputs, outputs)

model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/xception/xception_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 202, 413, 3)]     0         
                                                                 
 sequential (Sequential)     multiple                  0         
                                                                 
 rescaling (Rescaling)       (None, 202, 413, 3)       0         
                                                                 
 xception (Functional)       (None, 7, 13, 2048)       20861480  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                                                

# Loading the Training, Test and Validation Data

The spectrogram image data are now loaded.

In [None]:
# code section adapted from https://github.com/JoshWheeler08/DolphinAcoustics-Classifier/blob/main/vip_dolphin.ipynb

IMAGE_SHAPE = (202, 413)
directory_name = "/content/drive/MyDrive/Dolphin_Acoustics_VIP/normalised-dclde-clips/train-test/"

TEST_DATA_DIR = directory_name + "test"
TRAINING_DATA_DIR = directory_name + "train"

train_datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=.20
) #https://stackoverflow.com/questions/42443936/keras-split-train-test-set-when-using-imagedatagenerator

test_generator = ImageDataGenerator(
                    rescale=1./255
                ).flow_from_directory(
                      TEST_DATA_DIR,
                      shuffle=True,
                      batch_size = 50,
                      target_size=IMAGE_SHAPE
                      )

validation_generator = train_datagen.flow_from_directory(
    TRAINING_DATA_DIR,
    subset="validation",
    shuffle=True,
    target_size=IMAGE_SHAPE
)
                
train_generator = train_datagen.flow_from_directory(
    TRAINING_DATA_DIR,
    subset="training",
    shuffle=True,
    target_size=IMAGE_SHAPE
)

Found 2557 images belonging to 3 classes.
Found 1193 images belonging to 3 classes.
Found 4777 images belonging to 3 classes.


# Fitting the Model to the Spectrogram image data.

We now compile and fit the model. The new classification layers are trained, while the base model's hyperparameters remain unchanged.

In [None]:
model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.CategoricalAccuracy()]
)

epochs = 15

model.fit(test_generator, epochs=epochs, validation_data=validation_generator)
model.summary()
model.save(directory_name + "2022_04_04_xception.h5")

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 202, 413, 3)]     0         
                                                                 
 sequential (Sequential)     multiple                  0         
                                                                 
 rescaling (Rescaling)       (None, 202, 413, 3)       0         
                                                                 
 xception (Functional)       (None, 7, 13, 2048)       20861480  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                      

# Fine-Tuning the Model

Now that we have trained our own classification layers on top of the Xception base model, we can train all layers of the model now by setting `base_model.trainable = True` (i.e. "unfreezing" the base model) and setting a very low learning rate. A low learning rate is used to prevent destroying the model's useful pre-trained features.

Doing this would ideally provide an extra boost to the model's performance, but it can lead to overfitting if we are not careful.

In [None]:
# Unfreeze the base_model. Note that it keeps running in inference mode
# since we passed `training=False` when calling it. This means that
# the batchnorm layers will not update their batch statistics.
# This prevents the batchnorm layers from undoing all the training
# we've done so far.
base_model.trainable = True
model.summary()

model.compile(
    optimizer=keras.optimizers.Adam(1e-5),  # Low learning rate
    loss=keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.CategoricalAccuracy()]
)

epochs = 10
model.fit(train_generator, epochs=epochs, validation_data=validation_generator)

# save final fine-tuned model
model.save(directory_name + "2022_04_04_xception_fine_tuned.h5")

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 202, 413, 3)]     0         
                                                                 
 sequential (Sequential)     multiple                  0         
                                                                 
 rescaling (Rescaling)       (None, 202, 413, 3)       0         
                                                                 
 xception (Functional)       (None, 7, 13, 2048)       20861480  
                                                                 
 global_average_pooling2d (G  (None, 2048)             0         
 lobalAveragePooling2D)                                          
                                                                 
 dropout (Dropout)           (None, 2048)              0         
                                                             

# Model Evaluation

We now do some elementary analysis on the performance of our model on test data. See https://github.com/dolphin-acoustics-vip/CetaceXplain for how we can use CetaceXplain for more in-depth analysis

In [None]:
# code section adapted from https://github.com/JoshWheeler08/DolphinAcoustics-Classifier/blob/main/vip_dolphin.ipynb
test_generator.reset()
test_loss, test_acc = model.evaluate(test_generator, verbose=2)

# SUMMARY STATISTICS
print("----- Evaluation Summary statistics -----")
print("Test accuracy = ", test_acc)
print("Test loss = ", test_loss)

52/52 - 40s - loss: 0.2685 - categorical_accuracy: 0.9054 - 40s/epoch - 771ms/step
----- Evaluation Summary statistics -----
Test accuracy =  0.905357837677002
Test loss =  0.2684841752052307
