# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [1]:
#reset all variables when running again to avoid any mistakes

%reset -f

In [2]:
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.utils import compute_class_weight

import gc
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from PIL import Image

import tensorflow as tf

from keras.src.legacy.preprocessing.image import ImageDataGenerator

from matplotlib.ticker import StrMethodFormatter


## Model Selection

We are considering and evaluating self-defined CNN and pretrained CNNs. Convolutional Neural Networks are proven to work well for the task of image classification. Most likely transformer models could potentially have an even better performance but it is highly likely that the added complexity will not justify the potential increase in performance.

Transfer learning based on different model architectures is also performed. Different training methods are utilized and comppared


## Hyperparameters


In [None]:
# choose main hyperparameters here

#data / feature selections
balanced_flag = False

#Traing data splits :
test_split = 0.2
val_split = 0.20 # remember - this is fractional  after the test data has been split from the initial balanced sub dataset

# image parameters
target_size = (299,299) #pixel size to load img
#efficientnet V2S recommended img size 384 , https://www.kaggle.com/models/google/efficientnet-v2
#Inception V3 recommended image size 299x299

#select data augmentation
aug_flag = True
#augmentation params
horizontal_flip=False
vertical_flip=False
rotation_range=15
shear_range= 1
zoom_range = 0.07

#training
max_epochs = 100
loss_stop_patience = 7
learningRate = 0.001

loss_stop_patience_multi = [5,6,8] #patience in each step

#class weights
Use_class_weights = True

#early stopping
StoppingSelector = 'val_loss' #valid values: 'val_loss','val_f1'

#flags for models to include
model1_flag = False
model2_flag = False
model3_flag = False
feature_extract_flag = False
fine_tune_flag = False
full_fine_tune_flag = False
multi_phase_fine_tune_flag = False
reducedClassNumber_flag = False
EfficientNet_Flag = False

#set batch size according to balanc selection
if balanced_flag:
    batch_size = 8 #later player around with batch size to see how it affects performance
else:
    batch_size = 32 #hopefully speeds up training

In [None]:
# specify the model loss function

#options: 'normal' :   tf.keras.losses.CategoricalCrossentropy()    , 'focal' : tf.keras.losses.CategoricalCrossentropy()

lossSelect = 'normal'


In [5]:
#select optimizer

optimizer = tf.keras.optimizers.Adam(learning_rate=learningRate)

In [6]:
#make customizable string to add to dir for testing
if lossSelect == 'focal':
    custom_save_str = '_focalLoss'
else:
    custom_save_str = ''

## Feature Engineering

[Describe any additional feature engineering you've performed beyond what was done for the baseline model.]

We test data augmentation (to varying degrees), dataset balancing and class weights.


In [7]:
# Path Definitions to relevant data + data loading

base_file_path = 'C:/Users/nikoLocal/Documents/Opencampus/Machine_Vision_challenge_data/'
image_path = base_file_path + '/input_train/input_train'

label_csv_name = 'Y_train_eVW9jym.csv'

#Loading .csv data to dataframes
train_df = pd.read_csv(os.path.join(base_file_path, label_csv_name))


In [8]:
#DataFrame Preprocessing

#add another column to the dataframe according to dictionaries to map Labels correctly to numbers
dict_numbers = {'GOOD': 0,'Boucle plate':1,'Lift-off blanc':2,'Lift-off noir':3,'Missing':4,'Short circuit MOS':5}
dict_strings = {'GOOD': '0_GOOD','Boucle plate':'1_Flat loop','Lift-off blanc':'2_White lift-off','Lift-off noir':'3_Black lift-off','Missing':'4_Missing','Short circuit MOS':'5_Short circuit MOS'}
# for Test Data ("random submission" dataframe)
dict_strings_sub = {0: '0_GOOD',1:'1_Flat loop',2:'2_White lift-off',3:'3_Black lift-off',4:'4_Missing',5:'5_Short circuit MOS',6:'6_Drift'}

#list of all labels in the data
label_list = ['0_GOOD','1_Flat loop','2_White lift-off','3_Black lift-off','4_Missing','5_Short circuit MOS']

#create new columns in DFs via .map() method
train_df['LabelNum'] = train_df['Label'].map(dict_numbers)
train_df['LabelStr'] = train_df['Label'].map(dict_strings)

#number of classes
num_classes = len(label_list)

# get counts of label with the least entries
countList = train_df['LabelStr'].value_counts()
minCounts = countList.min()

BalancedDF = pd.DataFrame()
#concat sampled dataframes for each included label
for i in range(num_classes):
    BalancedDF = pd.concat([BalancedDF,train_df[train_df['LabelStr'] == label_list[i]].sample(n=minCounts)],axis=0)

#split dataframe according to fractional test size
train_df_balanced, test_df_balanced = train_test_split(BalancedDF, test_size=test_split, random_state=42) #keep random state constant to ensure

train_df_train, train_df_test = train_test_split(train_df, test_size=test_split, random_state=42) #keep random state constant to ensure

In [9]:
#test if worked as intended
print('Balanced DF Label Counts:')
print(BalancedDF['LabelStr'].value_counts())

Balanced DF Label Counts:
LabelStr
0_GOOD                 71
1_Flat loop            71
2_White lift-off       71
3_Black lift-off       71
4_Missing              71
5_Short circuit MOS    71
Name: count, dtype: int64


In [10]:
#test if worked as intended
print('DF Label Counts:')
print(train_df['LabelStr'].value_counts())

DF Label Counts:
LabelStr
4_Missing              6472
0_GOOD                 1235
2_White lift-off        270
5_Short circuit MOS     126
3_Black lift-off        104
1_Flat loop              71
Name: count, dtype: int64


In [11]:
#compute class weights for use of unbalanced datasets

class_numbers = np.unique(train_df_train['LabelNum'])

class_weights_unb = compute_class_weight(class_weight='balanced' ,classes = class_numbers,y=train_df_train['LabelNum'])
class_weights_b = compute_class_weight(class_weight='balanced' ,classes = class_numbers,y=train_df_balanced['LabelNum'])
equal_weights = np.ones(num_classes)

#make dicts that can be used by keras
class_w_unb_dict = dict(zip(class_numbers, class_weights_unb))
class_w_b_dict = dict(zip(class_numbers, class_weights_b))
class_w_equal_dict = dict(zip(class_numbers, equal_weights))

In [12]:
# make a dataset with only labels: 0_Good, 1_Flat loop and 2_Defective
# in order to test whether flat loop can be recognized with high precision by a NN

#make a deepcopy of existing df
df_reduced_labels = train_df.copy(deep = True)

#dict_strings = {'GOOD': '0_GOOD','Boucle plate':'1_Flat loop','Lift-off blanc':'2_White lift-off','Lift-off noir':'3_Black lift-off','Missing':'4_Missing','Short circuit MOS':'5_Short circuit MOS'}

df_reduced_labels.loc[df_reduced_labels['LabelStr'] == '2_White lift-off', 'LabelStr'] = '2_Defective'
df_reduced_labels.loc[df_reduced_labels['LabelStr'] == '3_Black lift-off', 'LabelStr'] = '2_Defective'
df_reduced_labels.loc[df_reduced_labels['LabelStr'] == '4_Missing', 'LabelStr'] = '2_Defective'
df_reduced_labels.loc[df_reduced_labels['LabelStr'] == '5_Short circuit MOS', 'LabelStr'] = '2_Defective'

df_reduced_labels.loc[df_reduced_labels['LabelNum'] == 2, 'LabelNum'] = 2
df_reduced_labels.loc[df_reduced_labels['LabelNum'] == 3, 'LabelNum'] = 2
df_reduced_labels.loc[df_reduced_labels['LabelNum'] == 4, 'LabelNum'] = 2
df_reduced_labels.loc[df_reduced_labels['LabelNum'] == 5, 'LabelNum'] = 2

# split into training and test data
df_reduced_labels_train, df_reduced_labels_test = train_test_split(df_reduced_labels, test_size=test_split, random_state=42) #keep random state constant to ensure

#class weights for this dataset
class_numbers_redLabel = np.unique(df_reduced_labels_train['LabelNum'])

class_weights_redLabel = compute_class_weight(class_weight='balanced' ,classes = class_numbers_redLabel,y=df_reduced_labels_train['LabelNum'])

#make dicts that can be used by keras
class_weights_redLabel_dict = dict(zip(class_numbers_redLabel, class_weights_redLabel))

num_classes_redLabel = 3

label_list_reduced = ['0_Good','1_Flat Loop','2_Defective']

In [13]:
df_reduced_labels['LabelStr'].value_counts()

LabelStr
2_Defective    6972
0_GOOD         1235
1_Flat loop      71
Name: count, dtype: int64

## Hyperparameter Tuning

[Discuss any hyperparameter tuning methods you've applied, such as Grid Search or Random Search, and the rationale behind them.]
So far we have done hyperparameters variation "by hand" only. Some parameters, such as the image size and data augmentation have been systematically varied and the effects on model performance noted.

Automated hyperparameter variation has not been utilized in the scope of this project. Although we are sure that there is enormous potential for improvement there.

In [None]:
# make a unique string (name) to save model and evaluation to file
# incorporate most important hyperparameters
# make a subfolder for one set of hyperparameters for more tidy folder and file structure

if aug_flag:
    augmentation_str = 'Aug'
else:
    augmentation_str = 'NoAug'

if balanced_flag:
    balance_str = 'balanced'
else:
    balance_str = 'unbalanced'

#class weights
if Use_class_weights:
    Cweights_str = '_Cweights'
else:
    Cweights_str = ''

hyperparam_name = 'ImgSz_{}_{}_{}{}{}'.format(target_size[0],augmentation_str,balance_str,Cweights_str,custom_save_str)
hyperparam_dir = os.path.join(base_file_path,'model_evaluation')
hyperparam_dir = os.path.join(hyperparam_dir,hyperparam_name)
#check if folder exists - if not create it
if not os.path.isdir(hyperparam_dir):
    os.makedirs(hyperparam_dir)

In [None]:
# initialize ImageDataGenerators
# use ImageDataGen because it has method flow_from_dataframe() that works really well together with pandas dataframes
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
# although deprecated the functionality can be used as discussed in feedback session

# HYPERPARAMTERS ########
class_mode = 'categorical' # how to store labels - either categorical (one-hot encoding) or as numbers
#class_mode = 'input'
labelCol = 'LabelStr'
#########################

#normalize pixel intensities
rescale = 1.0/255.0

datagen = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=val_split)

datagen_augmentation = ImageDataGenerator(
    horizontal_flip=horizontal_flip,
    vertical_flip=vertical_flip,
    rotation_range=rotation_range,
    shear_range= shear_range,
    zoom_range = zoom_range,
    rescale=rescale,
    validation_split=val_split)

datagen_test = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=0.0)


##########################################################

#unbalanced datasets

train_generator_unbalanced = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_unbalanced_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=False,
    seed=42,
    subset='training')

train_generator = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=False,
    seed=42,
    subset='training')

# generators for transfer learning - color mode is color here. Pretrained models expect color input

train_generator_unbalanced_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_aug_color = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')

train_generator_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')


In [None]:
#dataset for reduced class number

train_generator_redLabel = datagen.flow_from_dataframe(
    df_reduced_labels_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_redLabel_val = datagen.flow_from_dataframe(
    df_reduced_labels_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_redLabel = datagen_test.flow_from_dataframe(
    df_reduced_labels_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_redLabel_aug = datagen_augmentation.flow_from_dataframe(
    df_reduced_labels_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_redLabel_metrics = datagen_test.flow_from_dataframe(
    df_reduced_labels_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')

## Implementation

[Implement the final model(s) you've selected based on the above steps.]

Here we implement various self-defined and preexisting models to compare their performance. All models are always initialized but not always trained to save on execution time.

In [None]:
# build a model to be used as baseline model
# use "simplest" CNN as baseline

model_1_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_2_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_3_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_1_CNN.summary()
model_2_CNN.summary()
model_3_CNN.summary()


In [None]:
#loss

if balanced_flag:
    focal_alpha = class_weights_b
else:
    focal_alpha = class_weights_unb

# callback that monitors validation accuracy / loss
# https://keras.io/api/callbacks/early_stopping/


match StoppingSelector:
    case 'val_loss':
        StopCallback = tf.keras.callbacks.EarlyStopping(
            monitor='val_loss',
            min_delta=0.01,
            patience= loss_stop_patience,
            restore_best_weights=True,
            verbose = 2,
            start_from_epoch = 1
        )
    case 'val_f1':
        StopCallback = tf.keras.callbacks.EarlyStopping(
            monitor='val_f1_score',
            min_delta=0.005,
            patience= loss_stop_patience,
            restore_best_weights=True,
            verbose = 2,
            mode='max'
        )

class ResetValLossOnTrainBegin(tf.keras.callbacks.Callback):
    def on_train_begin(self, logs=None):
        #set val loss to a high value in case there is a history left
        logs["val_loss"] =  1e3


In [None]:
# load weights gained by training from medical MRI data into inception V3

# Load pre-trained InceptionV3 with correct input size

#path to weights
weights_subpath = 'pre_trained_models/radiology_net/InceptionV3.pth'
medical_weights_path = os.path.join(base_file_path,weights_subpath)

#convert weights to keras format

#base_transfer_model_medical = tf.keras.applications.InceptionV3(
#    weights=medical_weights_path,
#    include_top=False,
#    input_shape=(target_size[0], target_size[0], 3)
#)


In [None]:
#Transfer learning model - feature extraction

# Load pre-trained InceptionV3 with correct input size
base_transfer_model = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers for feature extraction
base_transfer_model.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_feature_extraction_model = tf.keras.Sequential([
    base_transfer_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_feat = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_feat = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_feature_extraction_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_feat,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_feature_extraction_model.summary()

In [None]:
# inception model with last 20 layers unfrozen

# Load pre-trained InceptionV3 with correct input size
base_transfer_model_2 = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers except last few blocks
base_transfer_model_2.trainable = True
for layer in base_transfer_model_2.layers[:-20]:
    layer.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_fine_tune_model = tf.keras.Sequential([
    base_transfer_model_2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_fine = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_fine = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_fine_tune_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss=loss_fun_fine,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_fine_tune_model.summary()

In [23]:
# inception model with all layers unfrozen

# Load pre-trained InceptionV3 with correct input size
base_transfer_model_3 = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers except last few blocks
base_transfer_model_3.trainable = True
#for layer in base_transfer_model_3.layers[:-20]:
#    layer.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_full_fine_tune_model = tf.keras.Sequential([
    base_transfer_model_3,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2), #do I need to keep this ?
    tf.keras.layers.Dense(num_classes, activation = 'softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_full_fine = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_full_fine = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_full_fine_tune_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss=loss_fun_full_fine,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_full_fine_tune_model.summary()

In [24]:
# inception model where layers are successively unfrozen during training

# Load pre-trained InceptionV3 with correct input size
base_transfer_model_4 = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers of Inception model
base_transfer_model_4.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_multiPhase_fine_tune_model = tf.keras.Sequential([
    base_transfer_model_4,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation = 'softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_multi_fine = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_multi_fine = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_multiPhase_fine_tune_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), #start with "normal" learning rate
    loss=loss_fun_multi_fine,
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_multiPhase_fine_tune_model.summary()

In [25]:
# inception model where layers are sucessively unfrozen during training - for reduced class Number

# Load pre-trained InceptionV3 with correct input size
base_transfer_model_5 = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers except last few blocks
base_transfer_model_5.trainable = False
#for layer in base_transfer_model_3.layers[:-20]:
#    layer.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_multiPhase_redLabel = tf.keras.Sequential([
    base_transfer_model_5,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2), #do I need to keep this ?
    tf.keras.layers.Dense(num_classes_redLabel, activation = 'softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_multi_redLabel = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_multi_redLabel = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_multiPhase_redLabel.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), #start with "normal" learning rate
    loss=loss_fun_multi_redLabel,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_multiPhase_redLabel.summary()

In [26]:
# EfficientNet model where layers are sucessively unfrozen during training

# Load pre-trained InceptionV3 with correct input size
base_model_efficientNet = tf.keras.applications.EfficientNetV2S(
    weights='imagenet',
    include_top=False,
    include_preprocessing=True,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers except last few blocks
base_model_efficientNet.trainable = False
#for layer in base_transfer_model_3.layers[:-20]:
#    layer.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
EfficientNet_multiPhase_model = tf.keras.Sequential([
    base_model_efficientNet,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2), #do I need to keep this ?
    tf.keras.layers.Dense(num_classes, activation = 'softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_EfficientNet = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_EfficientNet = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

EfficientNet_multiPhase_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), #start with "normal" learning rate
    loss=loss_fun_EfficientNet,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

EfficientNet_multiPhase_model.summary()

In [27]:
# compile Models

#select based on Str
if lossSelect == 'normal':
    loss_fun_1 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_1 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_1_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_1,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#select based on Str
if lossSelect == 'normal':
    loss_fun_2 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_2 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_2_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_2,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#select based on Str
if lossSelect == 'normal':
    loss_fun_3 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_3 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_3_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_3,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    #metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    weighted_metrics =["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#F1 average parameter needs to be anything other than None if using linewise output when fiting the model...



In [None]:
#select model datasets based on flags

#for model1 - 3
if balanced_flag:
    CNN_model_val_gen = train_generator_val
    if aug_flag:
        CNN_model_gen = train_generator_aug
    else:
        CNN_model_gen = train_generator
    #class weights
    class_weights_training = class_weights_b
else:
    CNN_model_val_gen = train_generator_unbalanced_val
    if aug_flag:
        CNN_model_gen = train_generator_unbalanced_aug
    else:
        CNN_model_gen = train_generator_unbalanced

#for transfer learning
if balanced_flag:
    transfer_model_val_gen = train_generator_val_color
    if aug_flag:
        transfer_model_gen = train_generator_aug_color
    else:
        transfer_model_gen = train_generator_color
else:
    transfer_model_val_gen = train_generator_unbalanced_val_color
    if aug_flag:
        transfer_model_gen = train_generator_unbalanced_aug_color
    else:
        transfer_model_gen = train_generator_unbalanced_color


if Use_class_weights:
    if balanced_flag:
        class_weights_training = class_w_b_dict
    else:
        class_weights_training = class_w_unb_dict
else:
    class_weights_training = class_w_equal_dict


In the next sections models are trained based on whether their corresponding flag was set to True for training

In [None]:
#model 1
if model1_flag:

    history_1 = model_1_CNN.fit(
    CNN_model_gen,
    validation_data = CNN_model_val_gen,
    epochs=max_epochs,
    class_weight = class_weights_training,
    callbacks=[StopCallback,ResetValLossOnTrainBegin()],
    verbose = 2 #2 is one line per epoch -
    )

In [None]:
#Model 2

if model2_flag:
    history_2 = model_2_CNN.fit(
        CNN_model_gen,
        validation_data = CNN_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [31]:
#Model 3

if model3_flag:
    # model_3_CNN.fit(
    #     CNN_model_gen,
    #     validation_data = CNN_model_val_gen,
    #     epochs=3,
    #     class_weight = class_weights_training,
    #     callbacks=[StopCallback,ResetValLossOnTrainBegin()],
    #     verbose = 2 #2 is one line per epoch -
    # )

    history_3 = model_3_CNN.fit(
        CNN_model_gen,
        validation_data = CNN_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

Epoch 1/100
166/166 - 177s - 1s/step - accuracy: 0.5454 - f1_score: 0.5461 - loss: 1.2161 - precision: 0.7770 - recall: 0.4018 - val_accuracy: 0.9645 - val_f1_score: 0.9690 - val_loss: 0.4443 - val_precision: 0.9744 - val_recall: 0.8625
Epoch 2/100
166/166 - 174s - 1s/step - accuracy: 0.8723 - f1_score: 0.8720 - loss: 0.4077 - precision: 0.8914 - recall: 0.8427 - val_accuracy: 0.9751 - val_f1_score: 0.9767 - val_loss: 0.1192 - val_precision: 0.9758 - val_recall: 0.9743
Epoch 3/100
166/166 - 171s - 1s/step - accuracy: 0.8923 - f1_score: 0.8919 - loss: 0.3425 - precision: 0.9056 - recall: 0.8716 - val_accuracy: 0.9607 - val_f1_score: 0.9702 - val_loss: 0.2540 - val_precision: 0.9845 - val_recall: 0.8640
Epoch 4/100
166/166 - 175s - 1s/step - accuracy: 0.9041 - f1_score: 0.9035 - loss: 0.2755 - precision: 0.9100 - recall: 0.8899 - val_accuracy: 0.9622 - val_f1_score: 0.9684 - val_loss: 0.1803 - val_precision: 0.9644 - val_recall: 0.9622
Epoch 5/100
166/166 - 169s - 1s/step - accuracy: 0.9

In [32]:
#transfer learning. Feature extraction

if feature_extract_flag:
    history_feat_extract = inception_feature_extraction_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

Epoch 1/100
166/166 - 248s - 1s/step - accuracy: 0.5702 - f1_score: 0.5664 - loss: 1.1982 - precision: 0.7259 - recall: 0.3830 - val_accuracy: 0.9199 - val_f1_score: 0.9373 - val_loss: 0.2503 - val_precision: 0.9643 - val_recall: 0.8761
Epoch 2/100
166/166 - 241s - 1s/step - accuracy: 0.7600 - f1_score: 0.7594 - loss: 0.6700 - precision: 0.8455 - recall: 0.6917 - val_accuracy: 0.9562 - val_f1_score: 0.9624 - val_loss: 0.1366 - val_precision: 0.9690 - val_recall: 0.9441
Epoch 3/100
166/166 - 243s - 1s/step - accuracy: 0.7792 - f1_score: 0.7776 - loss: 0.5847 - precision: 0.8360 - recall: 0.7236 - val_accuracy: 0.9381 - val_f1_score: 0.9461 - val_loss: 0.1678 - val_precision: 0.9616 - val_recall: 0.9267
Epoch 4/100
166/166 - 239s - 1s/step - accuracy: 0.8357 - f1_score: 0.8356 - loss: 0.4923 - precision: 0.8733 - recall: 0.7817 - val_accuracy: 0.9471 - val_f1_score: 0.9549 - val_loss: 0.1996 - val_precision: 0.9576 - val_recall: 0.9222
Epoch 5/100
166/166 - 243s - 1s/step - accuracy: 0.8

In [33]:
#transfer learning. fine tuning

if fine_tune_flag:
    # inception_fine_tune_model.fit(
    #     transfer_model_gen,
    #     validation_data = transfer_model_val_gen,
    #     epochs=3,
    #     class_weight = class_weights_training,
    #     callbacks=[StopCallback,ResetValLossOnTrainBegin()],
    #     verbose = 2 #2 is one line per epoch -
    # )

    history_fine_tune = inception_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [34]:
# transfer learning. fine tuning of full model
# propably make a scheduler for the learning rate.
# Also - train model sucessively

if full_fine_tune_flag:
    # inception_fine_tune_model.fit(
    #     transfer_model_gen,
    #     validation_data = transfer_model_val_gen,
    #     epochs=3,
    #     class_weight = class_weights_training,
    #     callbacks=[StopCallback,ResetValLossOnTrainBegin()],
    #     verbose = 2 #2 is one line per epoch -
    # )

    history_full_fine_tune = inception_full_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [35]:
# transfer learning. fine tuning of full model
# multi phase fine tuning

#delete history if it already exists to unsure expected local stopping behaviour
if 'history_multi_fine_tune' in locals():
    del history_multi_fine_tune

if multi_phase_fine_tune_flag:

    match StoppingSelector:
        case 'val_loss':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    # train last layer first
    history_multi_fine_tune = inception_multiPhase_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback_multi],
        verbose = 2 #2 is one line per epoch -
    )

In [36]:
if multi_phase_fine_tune_flag:
    #delete old history to avoid early stopping unexpected behaviour
    del history_multi_fine_tune

    #set last 30 layers to be trainable
    for layer in base_transfer_model_4.layers[-30:]:
        layer.trainable = True

    match StoppingSelector:
        case 'val_loss':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    inception_multiPhase_fine_tune_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), #go to lower learning rate
        loss=loss_fun_multi_fine,
        #loss=tf.keras.losses.SparseCategoricalCrossentropy,
        #metrics=["accuracy",'precision',]
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_multi_fine_tune = inception_multiPhase_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback_multi,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [37]:
if multi_phase_fine_tune_flag:

    match StoppingSelector:
        case 'val_loss':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_multi = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    #delete old history to avoid early stopping unexpected behaviour
    del history_multi_fine_tune

    #set last 100 layers to trainable
    for layer in base_transfer_model_4.layers[-100:]:
        layer.trainable = True

    inception_multiPhase_fine_tune_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), #go to lower learning rate
        loss=loss_fun_multi_fine,
        #loss=tf.keras.losses.SparseCategoricalCrossentropy,
        #metrics=["accuracy",'precision',]
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_multi_fine_tune = inception_multiPhase_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback_multi,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [38]:
#train fine tuning model with reduced number of label classes

if 'history_multi_redLabel' in locals():
    del history_multi_redLabel

if reducedClassNumber_flag:

    match StoppingSelector:
        case 'val_loss':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    # train last layer first
    history_multi_redLabel = inception_multiPhase_redLabel.fit(
        train_generator_redLabel_aug,
        validation_data = train_generator_redLabel_val,
        epochs=max_epochs,
        class_weight = class_weights_redLabel_dict,
        callbacks=[StopCallback_redLabel],
        verbose = 2 #2 is one line per epoch -
    )

In [39]:
if reducedClassNumber_flag:

    del history_multi_redLabel

    match StoppingSelector:
        case 'val_loss':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    #set last 30 layers to be trainable
    for layer in base_transfer_model_5.layers[-30:]:
        layer.trainable = True

    inception_multiPhase_redLabel.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), #go to lower learning rate
        loss=loss_fun_multi_redLabel,
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_multi_redLabel = inception_multiPhase_redLabel.fit(
        train_generator_redLabel_aug,
        validation_data = train_generator_redLabel_val,
        epochs=max_epochs,
        class_weight = class_weights_redLabel_dict,
        callbacks=[StopCallback_redLabel],
        verbose = 2 #2 is one line per epoch -
    )

In [40]:
if reducedClassNumber_flag:

    #del history_multi_redLabel

    match StoppingSelector:
        case 'val_loss':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_redLabel = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    #set last 30 layers to be trainable
    for layer in base_transfer_model_5.layers[-100:]:
        layer.trainable = True

    inception_multiPhase_redLabel.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), #go to lower learning rate
        loss=loss_fun_multi_redLabel,
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_multi_redLabel = inception_multiPhase_redLabel.fit(
        train_generator_redLabel_aug,
        validation_data = train_generator_redLabel_val,
        epochs=max_epochs,
        class_weight = class_weights_redLabel_dict,
        callbacks=[StopCallback_redLabel],
        verbose = 2 #2 is one line per epoch -
    )

In [41]:
#multi phase fine tuning with efficientnet

#delete history if it already exists to unsure expected local stopping behaviour
if 'history_EfficientNet' in locals():
    del history_EfficientNet

if EfficientNet_Flag:

    match StoppingSelector:
        case 'val_loss':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[0],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    # train last layer first
    history_EfficientNet = EfficientNet_multiPhase_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=10,
        class_weight = class_weights_training,
        callbacks=[StopCallback_EfficientNet],
        verbose = 2 #2 is one line per epoch -
    )

In [42]:
if EfficientNet_Flag:

    del history_EfficientNet

    match StoppingSelector:
        case 'val_loss':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[1],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    #set last 30 layers to be trainable
    for layer in base_model_efficientNet.layers[-30:]:
        layer.trainable = True

    EfficientNet_multiPhase_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3), #go to lower learning rate
        loss=loss_fun_multi_redLabel,
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_EfficientNet = EfficientNet_multiPhase_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=10,
        class_weight = class_weights_training,
        callbacks=[StopCallback_EfficientNet],
        verbose = 2 #2 is one line per epoch -
    )

In [None]:
if EfficientNet_Flag:

    #del history_EfficientNet

    match StoppingSelector:
        case 'val_loss':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_loss',
                min_delta=0.01,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                start_from_epoch = 1
            )
        case 'val_f1':
            StopCallback_EfficientNet = tf.keras.callbacks.EarlyStopping(
                monitor='val_f1_score',
                min_delta=0.005,
                patience= loss_stop_patience_multi[2],
                restore_best_weights=True,
                verbose = 2,
                mode='max'
            )

    #set last 30 layers to be trainable
    for layer in base_model_efficientNet.layers[-100:]:
        layer.trainable = True

    EfficientNet_multiPhase_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), #go to lower learning rate
        loss=loss_fun_multi_redLabel,
        weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    )

    history_EfficientNet = EfficientNet_multiPhase_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[StopCallback_EfficientNet],
        verbose = 2 #2 is one line per epoch -
    )

## Evaluation Metrics

[Clearly specify which metrics you'll use to evaluate the model performance, and why you've chosen these metrics.]

Like stated on the .md files in the project we mainly focus on the F1 Score. However, for each model and hyperparameter set a classification report including class-specific metrics is save to a .csv file for immidiate and later evaluation


In [None]:
# Model 2

if model2_flag:

    #make one folder for each model to save metrics
    model_2_dir = os.path.join(hyperparam_dir,'model_2')
    if not os.path.isdir(model_2_dir):
        os.makedirs(model_2_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_2_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_2_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 2. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_2_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_2_CNN.save(os.path.join(model_2_dir,'model.keras'))

In [None]:
# Model 3

if model3_flag:

    #make one folder for each model to save metrics
    model_3_dir = os.path.join(hyperparam_dir,'model_3')
    if not os.path.isdir(model_3_dir):
        os.makedirs(model_3_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_3_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_3_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 3. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_3_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_3_CNN.save(os.path.join(model_3_dir,'model.keras'))

In [46]:
# Model 1

if model1_flag:

    #make one folder for each model to save metrics
    model_1_dir = os.path.join(hyperparam_dir,'model_1')
    if not os.path.isdir(model_1_dir):
        os.makedirs(model_1_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_1_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_1_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 1. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_1_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_1_CNN.save(os.path.join(model_1_dir,'model.keras'))

[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 213ms/step - accuracy: 0.9825 - f1_score: 0.9823 - loss: 0.1455 - precision: 0.9825 - recall: 0.9819
[1m1656/1656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 19ms/step
Model 1. Test Accuracy: 0.982 | Test Loss: 0.145 | Test Precision: 0.982 | Test Recall: 0.982 | Test F1 Score: 0.982:
                     precision    recall  f1-score   support

             0_GOOD       0.97      0.99      0.98       238
        1_Flat loop       0.64      0.60      0.62        15
   2_White lift-off       0.85      0.91      0.88        57
   3_Black lift-off       0.83      0.77      0.80        13
          4_Missing       1.00      0.99      1.00      1316
5_Short circuit MOS       0.86      0.71      0.77        17

           accuracy                           0.98      1656
          macro avg       0.86      0.83      0.84      1656
       weighted avg       0.98      0.98      0.98      1656



In [47]:
# Model transfer feature extraction

if feature_extract_flag:

    #make one folder for each model to save metrics
    model_feat_extract_dir = os.path.join(hyperparam_dir,'InceptionV3_feat_extract')
    if not os.path.isdir(model_feat_extract_dir):
        os.makedirs(model_feat_extract_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_feature_extraction_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_feature_extraction_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Feat. Extract. Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_feat_extract_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_feature_extraction_model.save(os.path.join(model_feat_extract_dir,'model.keras'))

[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 1s/step - accuracy: 0.9577 - f1_score: 0.9637 - loss: 0.1293 - precision: 0.9734 - recall: 0.9517
[1m1656/1656[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m94s[0m 56ms/step
Feat. Extract. Model:  Test Accuracy: 0.958 | Test Loss: 0.129 | Test Precision: 0.973 | Test Recall: 0.952 | Test F1 Score: 0.964:
                     precision    recall  f1-score   support

             0_GOOD       0.94      0.85      0.89       238
        1_Flat loop       0.56      0.60      0.58        15
   2_White lift-off       0.87      0.79      0.83        57
   3_Black lift-off       0.22      0.85      0.35        13
          4_Missing       1.00      0.99      1.00      1316
5_Short circuit MOS       0.73      0.65      0.69        17

           accuracy                           0.96      1656
          macro avg       0.72      0.79      0.72      1656
       weighted avg       0.97      0.96      0.96      1656



In [48]:
#for tine tuning model

if fine_tune_flag:

    #make one folder for each model to save metrics
    model_feat_extract_dir = os.path.join(hyperparam_dir,'InceptionV3_fine_tune')
    if not os.path.isdir(model_feat_extract_dir):
        os.makedirs(model_feat_extract_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_fine_tune_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_fine_tune_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_fine_tune_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_fine_tune_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Fine tune Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_feat_extract_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_fine_tune_model.save(os.path.join(model_feat_extract_dir,'model.keras'))

In [49]:
#for full tine tuning model

if full_fine_tune_flag:

    #make one folder for each model to save metrics
    model_full_fine_tune_dir = os.path.join(hyperparam_dir,'InceptionV3_full_fine_tune')
    if not os.path.isdir(model_full_fine_tune_dir):
        os.makedirs(model_full_fine_tune_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_full_fine_tune_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_full_fine_tune_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_full_fine_tune_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_full_fine_tune_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Full fine tune Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_full_fine_tune_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_full_fine_tune_model.save(os.path.join(model_full_fine_tune_dir,'model.keras'))

In [50]:
#for multi phase full tine tuning model - reduced class number
if reducedClassNumber_flag:

    #make one folder for each model to save metrics
    model_fine_tune_redLabel_dir = os.path.join(hyperparam_dir,'InceptionV3_MultiPhase_reducedLabelNum')
    if not os.path.isdir(model_fine_tune_redLabel_dir):
        os.makedirs(model_fine_tune_redLabel_dir)

    # test accuracy on test data - test accuracy ALWAYS on full test set

    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_multiPhase_redLabel.evaluate(test_generator_redLabel)

    #for classification report
    true_labels = test_generator_redLabel_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = inception_multiPhase_redLabel.predict(test_generator_redLabel_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Reduced Label Number:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list_reduced))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list_reduced,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_fine_tune_redLabel_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_multiPhase_redLabel.save(os.path.join(model_fine_tune_redLabel_dir,'model.keras'))

In [51]:
#for multi phase full tine tuning model

if multi_phase_fine_tune_flag:

    #make one folder for each model to save metrics
    model_multi_full_fine_tune_dir = os.path.join(hyperparam_dir,'InceptionV3_MultiPhase_full_fine_tune')
    if not os.path.isdir(model_multi_full_fine_tune_dir):
        os.makedirs(model_multi_full_fine_tune_dir)

    # test accuracy on test data - test accuracy ALWAYS on full test set
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_multiPhase_fine_tune_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_multiPhase_fine_tune_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_multiPhase_fine_tune_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_multiPhase_fine_tune_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Full fine tune Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_multi_full_fine_tune_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_multiPhase_fine_tune_model.save(os.path.join(model_multi_full_fine_tune_dir,'model.keras'))

In [52]:
if EfficientNet_Flag:

    #make one folder for each model to save metrics
    model_EfficientNet_fine_tune_dir = os.path.join(hyperparam_dir,'EfficientNet_MultiPhase_fine_tune')
    if not os.path.isdir(model_EfficientNet_fine_tune_dir):
        os.makedirs(model_EfficientNet_fine_tune_dir)

    # test accuracy on test data - test accuracy ALWAYS on full test set
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = EfficientNet_multiPhase_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_multiPhase_fine_tune_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = EfficientNet_multiPhase_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = EfficientNet_multiPhase_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Full fine tune Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_EfficientNet_fine_tune_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    EfficientNet_multiPhase_model.save(os.path.join(model_EfficientNet_fine_tune_dir,'model.keras'))

In [53]:
#clear all models from memory to prevent any bugs and weird behaviour of early stopping

#see: https://stackoverflow.com/questions/58137677/keras-model-training-memory-leak

del model_1_CNN
del model_2_CNN
del model_3_CNN

del inception_feature_extraction_model
del base_transfer_model
del inception_fine_tune_model
del base_transfer_model_2
del inception_full_fine_tune_model
del base_transfer_model_3
del inception_multiPhase_fine_tune_model
del base_transfer_model_4

In [54]:

gc.collect()
tf.keras.backend.clear_session(
    free_memory=True
)
tf.compat.v1.reset_default_graph()





## Comparative Analysis

[Compare the performance of your model(s) against the baseline model. Discuss any improvements or setbacks and the reasons behind them.]

A table comparing the performances of different models and hyperparameter settings can be found in the github (Model_Performance_overview.xls or Model_Performance_overview.csv).

Some results stand out:

* data augmentation seems to lower model performance across the board even when we see overfitting in training. The likely reason is that the data itself is very regular without a lot of orientation of the features in the images. Therefore, moderate data augmentation was used only. However, there is very likely still a lot of room for improvement here.
* The transfer learning model performs worse than the 3 relatively simple models. Especially for low image resolutions. The most likely reason is that, as of now we only use feature extraction. For any image size that the model was not originally trained on this will very likely mean a bad performance. For higher resolutions the transfer learning model performs better in comparison
* Higher image resolution does not really improve model performance.

Some things are still missing in the analysis / evaluation and will be added in the near future:

* Transfer learning models with fine tuning
* Different transfer learning base architectures
* When a best model is found we will tackle the task of identifying the drift label class
* More finetuning of hyperparameters for few selected models
* class weighting instead of balanced dataset (balanced dataset is very small)

## Update on model performance after the presentation

* Transfer learning Inception V3: using multi-phase fine tuning drastically improved performance
* Best performing model still depends on hyperparameters -> there is very likely a lot of room for improvement in systematic hyperparameter variation