## This notebook is scratch work to show proof of work. Ended up splitting this notebook into a binary and multiclass notebooky since it took too long to restart the kernel and run all cells each time I wanted to test a model. Did not want to re-run the binary models to run and test the multiclass models

## Import statements and loading the data

In [None]:
import h5py
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D, BatchNormalization, SeparableConv2D, SpatialDropout2D
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
import random
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy
from sklearn.metrics import roc_curve, auc, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
f = h5py.File('bird_spectrograms.hdf5', 'r')
list(f.keys())

In [None]:
for key in list(f.keys()):
    print(f[key].shape)

In [None]:
# set seeds to ensure reproducibility 
np.random.seed(5322)
tf.random.set_seed(5322)
random.seed(5322)

## Binary Classification (Song Sparrow and House Sparrow)
- Just because these two species have the most samples in our dataset

In [None]:
# picked the two species with most samples for binary classifcation model
# sonspa = song sparrow, houspa = house sparrow
X1 = np.array(f['sonspa'])
X2 = np.array(f['houspa'])

In [None]:
# create labels. 0 for song sparrow, 1 for house sparrow
y1 = np.zeros(X1.shape[2])
y2 = np.ones(X2.shape[2])

In [None]:
# combine into one set
X = np.concatenate((X1, X2), axis = 2)
y = np.concatenate((y1, y2))

In [None]:
# X has shape (128, 517, total_samples)
# we need to format input like (total_samples, 128, 517, channels = 1 not 3 since not rgb)
X = np.transpose(X, (2, 0, 1))
X = X[..., np.newaxis]
# X is now (total_samples, 128, 517, 1)

In [None]:
# normalization before splitting into train and set sets to avoid data leakage
X = X.astype(np.float32)
X = X / np.max(X)

In [None]:
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 5322, stratify = y)

## First simple model 7 total layers
- no l2 regularization yet
- no dropout
- (2, 2) max pooling to take max val of 2x2 windows
- flatten layer

In [None]:
# our first and simplest cnn for binary classification
model = Sequential([
    # layer group 1
    Conv2D(16, (3,3), activation = 'relu', input_shape = (128, 517, 1)),
    MaxPooling2D(2, 2),
    # layer group 2
    Conv2D(32, (3,3), activation = 'relu'),
    MaxPooling2D(2, 2),
    # flatten layer
    Flatten(),
    # dense layers
    Dense(units = 64, activation = 'relu'),
    Dense(units = 1, activation = 'sigmoid')
], name = "Binary_Model_1")

# compile model
model.compile(
    optimizer = 'adam',
    loss = 'binary_crossentropy',
    metrics = ['accuracy', tf.keras.metrics.AUC(name = 'auc')]
)

# model summary
model.summary()

early_stopping = EarlyStopping(monitor = 'val_loss', patience = 3, restore_best_weights = True)

# train model
history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 16,
    validation_split = 0.2,
    callbacks = [early_stopping]
)

# test the model
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')
print(f'Test AUC: {test_auc}')

In [None]:
# predictions
y_pred_probs = model.predict(X_test)

# probabilities to binary predictions (0 or 1) using a threshold of 0.5
y_pred = (y_pred_probs >= 0.5).astype(int)

print(classification_report(y_test, y_pred, zero_division = 0, target_names = ['Song Sparrow', 'House Sparrow']))

In [None]:
# visualize training accuracy versus validation accuracy
plt.plot(history.history['accuracy'], label = 'Train Accuracy')
plt.plot(history.history['val_accuracy'], label = 'Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Binary Model 1 Training History')
plt.show()

In terms of just accuracy, this simple model does decent given its simplicity\
Looking deeper into the classifcation report, there is some class imbalance, which leads to no song sparrows being predicted\
The model did do well on the House Sparrow class which had more data samples. 100% recall and 70% precision is not bad given how basic and boilerplate the model is\
The 0.5 test AUC indicates it was random guessing as well\
Another important thing is the number of parameters in the dense layer. 7.8 million parameters is crazy given our dataset size, so add something like global average pooling 2d to reduce to a reasonable number\
Interesting how accuracy did not change, but the loss did slightly which did not allow early stopping\
Overall, decent simple model, but needs more complexity and tuning since the model stopped very early on

## Second model with improvements/adjustments 13 total layers
- added another layer "group" for complexity
- added 0.01 l2 regularization to discourage large weights
- added 30% dropout to help with overfitting and regularization
- also included global average pooling 2D instead of flatten to keep the number of parameters at each layer reasonable
- added early stopping since previous model converged early on
- also increased the number of epochs to allow the model to possibly improve accuracy

In [None]:
# our second cnn for binary classification
model = Sequential([
    # layer group 1
    Conv2D(16, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.01)),
    MaxPooling2D(pool_size = (2, 2)),
    Dropout(rate = 0.3),
    # layer group 2
    Conv2D(32, (3, 3), activation = 'relu', kernel_regularizer = l2(0.01)),
    MaxPooling2D(pool_size = (2, 2)),
    Dropout(rate = 0.3),
    # layer group 3
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.01)),
    MaxPooling2D(pool_size = (2, 2)),
    Dropout(rate = 0.3),
    # pooling to keep # of parameters in check
    GlobalAveragePooling2D(),
    # dense layers
    Dense(units = 64, activation = 'relu', kernel_regularizer = l2(0.01)),
    Dropout(rate = 0.3),
    Dense(units = 1, activation = 'sigmoid')
], name = "Binary_Model_2")

# compile model
model.compile(
    optimizer = 'adam',
    loss = 'binary_crossentropy',
    metrics = ['accuracy', tf.keras.metrics.AUC(name = 'auc')]
)

# model summary
model.summary()

# early stopping since previous model did not improve early on
early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

# train the model
history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 32,
    validation_split = 0.2,
    callbacks = [early_stopping],
)

# evaluate the model
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')
print(f'Test AUC: {test_auc}')

In [None]:
# predictions
y_pred_probs = model.predict(X_test)

# probabilities to binary predictions (0 or 1) using a threshold of 0.5
y_pred = (y_pred_probs >= 0.5).astype(int)

print(classification_report(y_test, y_pred, zero_division = 0, target_names = ['Song Sparrow', 'House Sparrow']))

In [None]:
# visualize training accuracy versus validation accuracy
plt.plot(history.history['accuracy'], label = 'Train Accuracy')
plt.plot(history.history['val_accuracy'], label = 'Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Binary Model 2 Training History')
plt.show()

Somehow did worse with adjustments in terms of accuracy and AUC. Might need to make more drastic tuning changes\
This model definitely overfit seen with the higher training accuracy and much lower validation accuracy\
The lack of improvement in validation accuracy also indicates the model is not learning well in the first place\
Performed worse than random guessing slightly\
At least this model made some predictions for both classes rather than just 1 of the 2\
Stopped very early on like the previous model\
Might need to add more layers for complexity\
Change regularization weight to try and prevent overfitting even more\
Maybe adjust dropout rate to scale with filter sizes

## Final binary model with even more improvements/adjustments 24 total layers version 1
- added a couple more layers for more complexity
- started the filter size at 32 instead of 16
- modified the dropouts at each layer to scale with filter sizes as they increase
- change l2 regularization to 0.0001 from 0.01 to prevent overfitting a little bit
- reduced the adam optimizer learning rate to allow for better convergence
- added batch normalization to stabilize and speed up training a little bit given we added more layers
- kept padding to be 'same' to retain more spatial information and details from our input as layer size increases. helps given small input sizes
- reduced stopping patience to 3 in case continous non improvement like previous model
- tried label smoothing so instead of comparing predictions with 0 and 1, it compares with 0.05 and 0.95

In [None]:
# final more complex cnn for binary classification with additional layers version 1
model = Sequential([
    # layer group 1
    Conv2D(32, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # layer group 2
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # layer group 3
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # layer group 4
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # layer group 5
    Conv2D(256, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # keep # of params in check
    GlobalAveragePooling2D(),
    # dense layers
    Dense(units = 128, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.4),
    Dense(units = 1, activation = 'sigmoid')
], name = "Binary_Model_3")

# compile the model with lower learning rate
model.compile(
    optimizer = Adam(learning_rate = 0.0001),
    loss = BinaryCrossentropy(label_smoothing = 0.05),
    metrics = ['accuracy', tf.keras.metrics.AUC(name = 'auc')]
)

# model summary
model.summary()

# early stopping with some patience
early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

# train the model
history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 32,
    validation_split = 0.2,
    callbacks = [early_stopping]
)

# evaluate the model
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')
print(f'Test AUC: {test_auc}')

In [None]:
# visualize training accuracy versus validation accuracy
plt.plot(history.history['accuracy'], label = 'Train Accuracy')
plt.plot(history.history['val_accuracy'], label = 'Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Binary Model 3 Training History')
plt.show()

In [None]:
# predictions
y_pred_probs = model.predict(X_test)

# probabilities to binary predictions (0 or 1) using a threshold of 0.5
y_pred = (y_pred_probs >= 0.5).astype(int)

print(classification_report(y_test, y_pred, zero_division = 0, target_names = ['Song Sparrow', 'House Sparrow']))

In [None]:
# predicted probabilities
y_scores = y_pred_probs.ravel()

# ROC for class 1 house sparrow
fpr1, tpr1, _ = roc_curve(y_test, y_scores)
auc1 = auc(fpr1, tpr1)

# ROC for class 0 song sparrow
fpr0, tpr0, _ = roc_curve(1 - y_test, 1 - y_scores)
auc0 = auc(fpr0, tpr0)

# plot both
plt.figure(figsize=(8, 6))
plt.plot(fpr1, tpr1, label = f'Class House Sparrow (AUC = {auc1:.4f})', color = 'darkorange')
plt.plot(fpr0, tpr0, label = f'Class Song Sparrow (AUC = {auc0:.4f})', color = 'blue')
plt.plot([0, 1], [0, 1], 'k--', label = 'Random Guessing')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves for Both Classes (Binary Classification)')
plt.legend(loc = 'lower right')
plt.grid()
plt.show()

Much more stable model with much better accuracy than previous 2 with the gradual and progressive learning\
Decent accuracy --> 81% on test set\
Model generalizes well and stop early\
This model does a much better job at predicting both classes but could do better for song sparrow maybe\
Seems the additional layers, the label smoothing, lower l2 regularization, scaling dropout, same padding, batch normalization, and smaller learning rate helped improve the model significantly\
No signs of overfitting either which is amazing

## Final binary model with even more improvements/adjustments 24 total layers version 2
- only difference is the batch size. 32 --> 16
- given we only had a couple thousand samples, 16 or 32 seemed like a reasonable choice

In [None]:
# final more complex cnn for binary classification with additional layers version 2
model = Sequential([
    # layer group 1
    Conv2D(32, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # layer group 2
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # layer group 3
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # layer group 4
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # layer group 5
    Conv2D(256, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # keep # of params in check
    GlobalAveragePooling2D(),
    # dense layers
    Dense(units = 128, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.4),
    Dense(units = 1, activation = 'sigmoid')
], name = "Binary_Model_4")

# compile the model with lower learning rate
model.compile(
    optimizer = Adam(learning_rate = 0.0001),
    loss = BinaryCrossentropy(label_smoothing = 0.05),
    metrics = ['accuracy', tf.keras.metrics.AUC(name = 'auc')]
)

# model summary
model.summary()

# early stopping with some patience
early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

# train the model
history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 16,
    validation_split = 0.2,
    callbacks = [early_stopping]
)

# evaluate the model
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')
print(f'Test AUC: {test_auc}')

In [None]:
# visualize training accuracy versus validation accuracy
plt.plot(history.history['accuracy'], label = 'Train Accuracy')
plt.plot(history.history['val_accuracy'], label = 'Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Binary Model 4 Training History')
plt.show()

In [None]:
# predictions
y_pred_probs = model.predict(X_test)

# probabilities to binary predictions (0 or 1) using a threshold of 0.5
y_pred = (y_pred_probs >= 0.5).astype(int)

print(classification_report(y_test, y_pred, zero_division = 0, target_names = ['Song Sparrow', 'House Sparrow']))

In [None]:
# predicted probabilities
y_scores = y_pred_probs.ravel()

# ROC for class 1 house sparrow
fpr1, tpr1, _ = roc_curve(y_test, y_scores)
auc1 = auc(fpr1, tpr1)

# ROC for class 0 song sparrow
fpr0, tpr0, _ = roc_curve(1 - y_test, 1 - y_scores)
auc0 = auc(fpr0, tpr0)

# plot both
plt.figure(figsize=(8, 6))
plt.plot(fpr1, tpr1, label = f'Class House Sparrow (AUC = {auc1:.4f})', color = 'darkorange')
plt.plot(fpr0, tpr0, label = f'Class Song Sparrow (AUC = {auc0:.4f})', color = 'blue')
plt.plot([0, 1], [0, 1], 'k--', label = 'Random Guessing')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves for Both Classes (Binary Classification)')
plt.legend(loc = 'lower right')
plt.grid()
plt.show()

A little bit better. Reducing batch size got us ~3% increase in accuracy and AUC\
Also did better in terms of precision and recall for each class\
No signs of overfitting either. The training and validation accuracies were similar to one another which is good

## Multiclass Classification (All species)

In [None]:
X_list = []
y_list = []

# loop through each species' data
for species in list(f.keys()):
    spectrograms = np.array(f[species])  # (128, 517, N)
    n_samples = spectrograms.shape[2]
    labels = [species] * n_samples

    X_list.append(spectrograms)
    y_list.extend(labels)

In [None]:
# stack and reshape
X = np.concatenate(X_list, axis = 2)      # (128, 517, total_samples)
X = np.transpose(X, (2, 0, 1))           # (samples, 128, 517)
X = X[..., np.newaxis]                   # (samples, 128, 517, 1)

In [None]:
# normalize spectrograms to [0, 1]
X = X / np.max(X)

In [None]:
# labels to arrays
y = np.array(y_list)

In [None]:
# encode species to ints
le = LabelEncoder()
y_encoded = le.fit_transform(y)          # e.g., 'amecro' → 0, 'amerob' → 1, etc.
label_map = dict(zip(le.transform(le.classes_), le.classes_))

In [None]:
# one hot encode
y_onehot = to_categorical(y_encoded)     # shape: (samples, 12)

In [None]:
# split train and test. stratify to balance species
X_train, X_test, y_train, y_test = train_test_split(
    X, y_onehot, test_size = 0.2, random_state = 5322, stratify = y_encoded
)

In [None]:
# also get y_test_raw = y_encoded to decode predictions later and for plotting
_, y_test_raw = train_test_split(
    y_encoded, test_size = 0.2, random_state = 5322, stratify = y_encoded
)

In [None]:
datagen = ImageDataGenerator(
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    fill_mode='nearest'
)

datagen.fit(X_train)

In [None]:
num_classes = len(le.classes_)
num_classes

In [None]:
# first multiclass model
model = Sequential([
    # layer group 1
    Conv2D(32, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    SpatialDropout2D(0.3),
    # group 2
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    SpatialDropout2D(0.3),
    # group 3
    SeparableConv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # group 4
    SeparableConv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # group 5
    SeparableConv2D(256, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.5),
    # global pooling
    GlobalAveragePooling2D(),
    Dense(units = 128, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.5),
    Dense(units = num_classes, activation = 'softmax')
], name = "Multiclass_Model_1")

model.compile(
    optimizer = Adam(learning_rate = 0.0001),
    loss = CategoricalCrossentropy(),
    metrics = ['accuracy']
)

model.summary()

early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

history = model.fit(
    datagen.flow(X_train, y_train, batch_size = 16),
    validation_data = (X_test, y_test),
    epochs = 100,
    # batch_size = 16,
    # validation_split = 0.2,
    callbacks=[early_stopping]
)

# Evaluation
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')

In [None]:
# predictions
y_pred_probs = model.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)
y_true = np.argmax(y_test, axis=1)

# get class labels in correct order (by integer encoding)
target_names = [label_map[i] for i in sorted(label_map.keys())]

print(classification_report(y_true, y_pred, target_names = target_names, digits = 4, zero_division = 0))

In [None]:
# confusion matrix
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize = (10, 8))
sns.heatmap(cm, annot = True, fmt = 'd', cmap = 'Blues')
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix")
plt.show()

Did not do as well as I thought it would.

## Multiclass Classification with adjustments
- added label smoothing of 0.05
- increased batch size to 64 from 16

In [None]:
# first multiclass model
model = Sequential([
    # layer group 1
    Conv2D(32, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # group 2
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # group 3
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # group 4
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # group 5
    Conv2D(256, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # global pooling
    GlobalAveragePooling2D(),
    Dense(units = 256, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.3),
    Dense(units = 128, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.4),
    Dense(units = num_classes, activation = 'softmax')
], name = "Multiclass_Model_2")

model.compile(
    optimizer = Adam(learning_rate = 0.0001),
    loss = CategoricalCrossentropy(label_smoothing = 0.05),
    metrics = ['accuracy']
)

model.summary()

early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 64,
    validation_split = 0.2,
    callbacks=[early_stopping]
)

# Evaluation
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')

In [None]:
# first multiclass model
model = Sequential([
    # layer group 1
    Conv2D(32, (3, 3), activation = 'relu', input_shape = (128, 517, 1), kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # group 2
    Conv2D(64, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.2),
    # group 3
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # group 4
    Conv2D(128, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.3),
    # group 5
    Conv2D(256, (3, 3), activation = 'relu', kernel_regularizer = l2(0.0001), padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(2, 2),
    Dropout(0.4),
    # global pooling
    GlobalAveragePooling2D(),
    Dense(units = 256, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.3),
    Dense(units = 128, activation = 'relu', kernel_regularizer = l2(0.0001)),
    Dropout(0.4),
    Dense(units = num_classes, activation = 'softmax')
], name = "Multiclass_Model_1")

model.compile(
    optimizer = Adam(learning_rate = 0.0001),
    loss = CategoricalCrossentropy(label_smoothing = 0.05),
    metrics = ['accuracy']
)

model.summary()

early_stopping = EarlyStopping(monitor = 'val_loss', patience = 5, restore_best_weights = True)

history = model.fit(
    X_train, y_train,
    epochs = 100,
    batch_size = 16,
    validation_split = 0.2,
    callbacks=[early_stopping]
)

# Evaluation
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'\nTest accuracy: {test_acc}')