# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [111]:
#reset all variables when running again to avoid any mistakes

%reset -f

In [112]:
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.utils import compute_class_weight

import gc
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from PIL import Image

import tensorflow as tf

from keras.src.legacy.preprocessing.image import ImageDataGenerator

from matplotlib.ticker import StrMethodFormatter


## Model Selection

We are considering and evaluating self-defined CNN and pretrained CNNs. Convolutional Neural Networks are proven to work well for the task of image classification. Most likely transformer models could potentially have an even better performance but it is highly likely that the added complexity will not justify the potential increase in performance.



## Hyperparameters


In [113]:
# choose main hyperparameters here

#data / feature selections
balanced_flag = True

#Traing data splits :
test_split = 0.2
val_split = 0.20 # remember - this is fractional  after the test data has been split from the initial balanced sub dataset

# image parameters
target_size = (299,299) #pixel size to load img

#select data augmentation
aug_flag = True
#augmentation params
horizontal_flip=False
vertical_flip=False
rotation_range=15
shear_range= 1
zoom_range = 0.07

#training
max_epochs = 100
loss_stop_patience = 7
learningRate = 0.001

#class weights
Use_class_weights = False

#flags for models to include
model1_flag = False
model2_flag = True
model3_flag = True
feature_extract_flag = False
fine_tune_flag = True

#set batch size according to balanc selection
if balanced_flag:
    batch_size = 8 #later player around with batch size to see how it affects performance
else:
    batch_size = 32 #hopefully speeds up training

In [114]:
# specify the model loss function

#options: 'normal' :   tf.keras.losses.CategoricalCrossentropy()    , 'focal' : tf.keras.losses.CategoricalCrossentropy()

lossSelect = 'focal'


In [115]:
#select optimizer

optimizer = tf.keras.optimizers.Adam(learning_rate=learningRate)

In [116]:
#make customizable string to add to dir for testing

custom_save_str = '_focalLoss'

## Feature Engineering

[Describe any additional feature engineering you've performed beyond what was done for the baseline model.]

We test data augmentation (to varying degrees), dataset balancing and class weights.


In [117]:
# Path Definitions to relevant data + data loading

base_file_path = 'C:/Users/nikoLocal/Documents/Opencampus/Machine_Vision_challenge_data/'
image_path = base_file_path + '/input_train/input_train'

label_csv_name = 'Y_train_eVW9jym.csv'

#Loading .csv data to dataframes
train_df = pd.read_csv(os.path.join(base_file_path, label_csv_name))


In [118]:
#DataFrame Preprocessing


#add another column to the dataframe according to dictionaries to map Labels correctly to numbers
dict_numbers = {'GOOD': 0,'Boucle plate':1,'Lift-off blanc':2,'Lift-off noir':3,'Missing':4,'Short circuit MOS':5}
dict_strings = {'GOOD': '0_GOOD','Boucle plate':'1_Flat loop','Lift-off blanc':'2_White lift-off','Lift-off noir':'3_Black lift-off','Missing':'4_Missing','Short circuit MOS':'5_Short circuit MOS'}
# for Test Data ("random submission" dataframe)
dict_strings_sub = {0: '0_GOOD',1:'1_Flat loop',2:'2_White lift-off',3:'3_Black lift-off',4:'4_Missing',5:'5_Short circuit MOS',6:'6_Drift'}

#list of all labels in the data
label_list = ['0_GOOD','1_Flat loop','2_White lift-off','3_Black lift-off','4_Missing','5_Short circuit MOS']

#create new columns in DFs via .map() method
train_df['LabelNum'] = train_df['Label'].map(dict_numbers)
train_df['LabelStr'] = train_df['Label'].map(dict_strings)

#number of classes
num_classes = len(label_list)

# get counts of label with the least entries
countList = train_df['LabelStr'].value_counts()
minCounts = countList.min()

BalancedDF = pd.DataFrame()
#concat sampled dataframes for each included label
for i in range(num_classes):
    BalancedDF = pd.concat([BalancedDF,train_df[train_df['LabelStr'] == label_list[i]].sample(n=minCounts)],axis=0)

#test if worked as intended
print(BalancedDF['LabelStr'].value_counts())

#split dataframe according to fractional test size
train_df_balanced, test_df_balanced = train_test_split(BalancedDF, test_size=test_split, random_state=42) #keep random state constant to ensure

train_df_train, train_df_test = train_test_split(train_df, test_size=test_split, random_state=42) #keep random state constant to ensure

LabelStr
0_GOOD                 71
1_Flat loop            71
2_White lift-off       71
3_Black lift-off       71
4_Missing              71
5_Short circuit MOS    71
Name: count, dtype: int64


In [119]:
#compute class weights for use of unbalanced datasets

class_numbers = np.unique(train_df_train['LabelNum'])

class_weights_unb = compute_class_weight(class_weight='balanced' ,classes = class_numbers,y=train_df_train['LabelNum'])
class_weights_b = compute_class_weight(class_weight='balanced' ,classes = class_numbers,y=train_df_balanced['LabelNum'])
equal_weights = np.ones(num_classes)

#make dicts that can be used by keras
class_w_unb_dict = dict(zip(class_numbers, class_weights_unb))
class_w_b_dict = dict(zip(class_numbers, class_weights_b))
class_w_equal_dict = dict(zip(class_numbers, equal_weights))

In [120]:
# initialize ImageDataGenerators
# use ImageDataGen because it has method flow_from_dataframe() that works really well together with pandas dataframes
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
# although deprecated the functionality can be used as discussed in feedback session

# HYPERPARAMTERS ########


class_mode = 'categorical' # how to store labels - either categorical (one-hot encoding) or as numbers
#class_mode = 'input'
labelCol = 'LabelStr'
#########################

#normalize pixel intensities
rescale = 1.0/255.0

datagen = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=val_split)

datagen_augmentation = ImageDataGenerator(
    horizontal_flip=horizontal_flip,
    vertical_flip=vertical_flip,
    rotation_range=rotation_range,
    shear_range= shear_range,
    zoom_range = zoom_range,
    rescale=rescale,
    validation_split=val_split)

datagen_test = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=0.0)


##########################################################

#unbalanced datasets

train_generator_unbalanced = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_unbalanced_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=False,
    seed=42,
    subset='training')

train_generator = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=False,
    seed=42,
    subset='training')

# generators for transfer learning - color mode is color here. Pretrained models expect color input

train_generator_unbalanced_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')

train_generator_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')


Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 272 validated image filenames belonging to 6 classes.
Found 68 validated image filenames belonging to 6 classes.
Found 272 validated image filenames belonging to 6 classes.
Found 68 validated image filenames belonging to 6 classes.
Found 86 validated image filenames belonging to 6 classes.
Found 86 validated image filenames belonging to 6 classes.
Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 272 validated image filename

## Hyperparameter Tuning

[Discuss any hyperparameter tuning methods you've applied, such as Grid Search or Random Search, and the rationale behind them.]
So far we have done hyperparameters variation "by hand" only. Some parameters, such as the image size and data augmentation have been systemarically varied and the effects on model performance noted.

We plan to do more hyperparamter tuning in the future.

In [121]:
# make a unique string (name) to save model and evaluation to file
# incorporate most important hyperparameters
# make a subfolder for one set of hyperparameters for more tidy folder and file structure

if aug_flag:
    augmentation_str = 'Aug'
else:
    augmentation_str = 'NoAug'

if balanced_flag:
    balance_str = 'balanced'
else:
    balance_str = 'unbalanced'

#class weights
if Use_class_weights:
    Cweights_str = '_Cweights'
else:
    Cweights_str = ''

hyperparam_name = 'ImgSz_{}_{}_{}{}{}'.format(target_size[0],augmentation_str,balance_str,Cweights_str,custom_save_str)
hyperparam_dir = os.path.join(base_file_path,'model_evaluation')
hyperparam_dir = os.path.join(hyperparam_dir,hyperparam_name)
#check if folder exists - if not create it
if not os.path.isdir(hyperparam_dir):
    os.makedirs(hyperparam_dir)

## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [122]:
# build a model to be used as baseline model
# use "simplest" CNN as baseline

model_1_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_2_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_3_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_1_CNN.summary()
model_2_CNN.summary()
model_3_CNN.summary()


In [123]:
#loss

if balanced_flag:
    focal_alpha = class_weights_b
else:
    focal_alpha = class_weights_unb

# callback that monitors validation accuracy / loss
# https://keras.io/api/callbacks/early_stopping/

val_loss_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    min_delta=0.01,
    patience= loss_stop_patience,
    restore_best_weights=True,
    verbose = 2,
    start_from_epoch = 3
)

class ResetValLossOnTrainBegin(tf.keras.callbacks.Callback):
    def on_train_begin(self, logs=None):
        #set val loss to a high value in case there is a history left
        logs["val_loss"] =  1e3


In [124]:
#Transfer learning model - feature extraction

# Load pre-trained InceptionV3 with correct input size
base_transfer_model = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers for feature extraction
base_transfer_model.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_feature_extraction_model = tf.keras.Sequential([
    base_transfer_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_feat = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_feat = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_feature_extraction_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_feat,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_feature_extraction_model.summary()

In [125]:
# inception model with last 20 layers unfrozen

# Load pre-trained InceptionV3 with correct input size
base_transfer_model_2 = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers except last few blocks
base_transfer_model_2.trainable = True
for layer in base_transfer_model_2.layers[:-20]:
    layer.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_fine_tune_model = tf.keras.Sequential([
    base_transfer_model_2,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

#select based on Str
if lossSelect == 'normal':
    loss_fun_fine = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_fine = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

inception_fine_tune_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss=loss_fun_fine,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

inception_fine_tune_model.summary()

In [126]:
# compile Models

#select based on Str
if lossSelect == 'normal':
    loss_fun_1 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_1 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_1_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_1,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#select based on Str
if lossSelect == 'normal':
    loss_fun_2 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_2 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_2_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_2,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    weighted_metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#select based on Str
if lossSelect == 'normal':
    loss_fun_3 = tf.keras.losses.CategoricalCrossentropy()
else:
    loss_fun_3 = tf.keras.losses.CategoricalFocalCrossentropy(alpha = focal_alpha,gamma = 2)

model_3_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=learningRate),
    loss=loss_fun_3,
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    #metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
    weighted_metrics =["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#F1 average parameter needs to be anything other than None if using linewise output when fiting the model...



In [127]:
#select model datasets based on flags

#for model1 - 3
if balanced_flag:
    CNN_model_val_gen = train_generator_val
    if aug_flag:
        CNN_model_gen = train_generator_aug
    else:
        CNN_model_gen = train_generator
    #class weights
    class_weights_training = class_weights_b
else:
    CNN_model_val_gen = train_generator_unbalanced_val
    if aug_flag:
        CNN_model_gen = train_generator_unbalanced_aug
    else:
        CNN_model_gen = train_generator_unbalanced

#for transfer learning
if balanced_flag:
    transfer_model_val_gen = train_generator_val_color
    if aug_flag:
        transfer_model_gen = train_generator_aug_color
    else:
        transfer_model_gen = train_generator_color
else:
    transfer_model_val_gen = train_generator_unbalanced_val_color
    if aug_flag:
        transfer_model_gen = train_generator_unbalanced_aug_color
    else:
        transfer_model_gen = train_generator_unbalanced_color


if Use_class_weights:
    if balanced_flag:
        class_weights_training = class_w_b_dict
    else:
        class_weights_training = class_w_unb_dict
else:
    class_weights_training = class_w_equal_dict


In [128]:

if model1_flag:

    history_1 = model_1_CNN.fit(
    CNN_model_gen,
    validation_data = CNN_model_val_gen,
    epochs=max_epochs,
    class_weight = class_weights_training,
    callbacks=[val_loss_stop,ResetValLossOnTrainBegin()],
    verbose = 2 #2 is one line per epoch -
    )

In [129]:
#Model 2

if model2_flag:

    history_2 = model_2_CNN.fit(
        CNN_model_gen,
        validation_data = CNN_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[val_loss_stop,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

Epoch 1/100
34/34 - 16s - 460ms/step - accuracy: 0.3603 - f1_score: 0.3539 - loss: 2.8996 - precision: 0.2838 - recall: 0.0772 - val_accuracy: 0.7059 - val_f1_score: 0.7087 - val_loss: 0.5105 - val_precision: 1.0000 - val_recall: 0.3676
Epoch 2/100
34/34 - 14s - 415ms/step - accuracy: 0.6949 - f1_score: 0.6947 - loss: 0.4555 - precision: 0.8447 - recall: 0.5000 - val_accuracy: 0.8529 - val_f1_score: 0.8588 - val_loss: 0.2880 - val_precision: 0.9400 - val_recall: 0.6912
Epoch 3/100
34/34 - 14s - 412ms/step - accuracy: 0.8493 - f1_score: 0.8483 - loss: 0.2132 - precision: 0.9177 - recall: 0.7794 - val_accuracy: 0.8529 - val_f1_score: 0.8414 - val_loss: 0.1623 - val_precision: 0.9091 - val_recall: 0.7353
Epoch 4/100
34/34 - 14s - 419ms/step - accuracy: 0.8750 - f1_score: 0.8757 - loss: 0.1578 - precision: 0.9364 - recall: 0.8125 - val_accuracy: 0.9265 - val_f1_score: 0.9268 - val_loss: 0.1323 - val_precision: 0.9365 - val_recall: 0.8676
Epoch 5/100
34/34 - 14s - 411ms/step - accuracy: 0.9

In [130]:
#Model 3

if model3_flag:
    history_3 = model_3_CNN.fit(
        CNN_model_gen,
        validation_data = CNN_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[val_loss_stop,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

Epoch 1/100
34/34 - 13s - 386ms/step - accuracy: 0.2279 - f1_score: 0.2276 - loss: 1.7301 - precision: 0.0690 - recall: 0.0074 - val_accuracy: 0.5000 - val_f1_score: 0.4528 - val_loss: 0.9523 - val_precision: 0.7500 - val_recall: 0.0441
Epoch 2/100
34/34 - 11s - 329ms/step - accuracy: 0.6140 - f1_score: 0.6159 - loss: 0.6354 - precision: 0.8293 - recall: 0.3750 - val_accuracy: 0.8529 - val_f1_score: 0.8527 - val_loss: 0.2813 - val_precision: 0.9545 - val_recall: 0.6176
Epoch 3/100
34/34 - 11s - 328ms/step - accuracy: 0.7978 - f1_score: 0.7978 - loss: 0.3097 - precision: 0.8934 - recall: 0.6471 - val_accuracy: 0.9559 - val_f1_score: 0.9552 - val_loss: 0.0819 - val_precision: 1.0000 - val_recall: 0.9118
Epoch 4/100
34/34 - 11s - 328ms/step - accuracy: 0.8529 - f1_score: 0.8525 - loss: 0.2036 - precision: 0.9071 - recall: 0.7537 - val_accuracy: 0.8824 - val_f1_score: 0.8728 - val_loss: 0.1500 - val_precision: 0.8939 - val_recall: 0.8676
Epoch 5/100
34/34 - 11s - 327ms/step - accuracy: 0.9

In [131]:
#transfer learning. Feature extraction

if feature_extract_flag:
    history_feat_extract = inception_feature_extraction_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[val_loss_stop,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

In [132]:
#transfer learning. fine tuning

if fine_tune_flag:
    history_fine_tune = inception_fine_tune_model.fit(
        transfer_model_gen,
        validation_data = transfer_model_val_gen,
        epochs=max_epochs,
        class_weight = class_weights_training,
        callbacks=[val_loss_stop,ResetValLossOnTrainBegin()],
        verbose = 2 #2 is one line per epoch -
    )

Epoch 1/100
34/34 - 22s - 648ms/step - accuracy: 0.6213 - f1_score: 0.6192 - loss: 0.6172 - precision: 0.7574 - recall: 0.5625 - val_accuracy: 0.7500 - val_f1_score: 0.7375 - val_loss: 0.7386 - val_precision: 0.7538 - val_recall: 0.7206
Epoch 2/100
34/34 - 13s - 388ms/step - accuracy: 0.8603 - f1_score: 0.8605 - loss: 0.2112 - precision: 0.8939 - recall: 0.8051 - val_accuracy: 0.8088 - val_f1_score: 0.8033 - val_loss: 0.2831 - val_precision: 0.8438 - val_recall: 0.7941
Epoch 3/100
34/34 - 13s - 385ms/step - accuracy: 0.8750 - f1_score: 0.8734 - loss: 0.1970 - precision: 0.9127 - recall: 0.7684 - val_accuracy: 0.8971 - val_f1_score: 0.8924 - val_loss: 0.1582 - val_precision: 0.8955 - val_recall: 0.8824
Epoch 4/100
34/34 - 13s - 388ms/step - accuracy: 0.9044 - f1_score: 0.9050 - loss: 0.1388 - precision: 0.9520 - recall: 0.8750 - val_accuracy: 0.9118 - val_f1_score: 0.9154 - val_loss: 0.1574 - val_precision: 0.9375 - val_recall: 0.8824
Epoch 5/100
34/34 - 13s - 388ms/step - accuracy: 0.9

## Evaluation Metrics

[Clearly specify which metrics you'll use to evaluate the model performance, and why you've chosen these metrics.]


In [133]:
# Model 1

if model1_flag:

    #make one folder for each model to save metrics
    model_1_dir = os.path.join(hyperparam_dir,'model_1')
    if not os.path.isdir(model_1_dir):
        os.makedirs(model_1_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_1_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_1_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 1. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_1_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_1_CNN.save(os.path.join(model_1_dir,'model.keras'))

In [134]:
# Model 2

if model2_flag:

    #make one folder for each model to save metrics
    model_2_dir = os.path.join(hyperparam_dir,'model_2')
    if not os.path.isdir(model_2_dir):
        os.makedirs(model_2_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_2_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_2_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 2. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_2_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_2_CNN.save(os.path.join(model_2_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 49ms/step - accuracy: 0.8605 - f1_score: 0.8629 - loss: 0.2005 - precision: 0.9000 - recall: 0.8372
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
Model 2. Test Accuracy: 0.86 | Test Loss: 0.2 | Test Precision: 0.9 | Test Recall: 0.837 | Test F1 Score: 0.863:
                     precision    recall  f1-score   support

             0_GOOD       1.00      0.82      0.90        17
        1_Flat loop       0.88      0.78      0.82        18
   2_White lift-off       0.68      0.93      0.79        14
   3_Black lift-off       0.80      0.92      0.86        13
          4_Missing       0.90      1.00      0.95         9
5_Short circuit MOS       1.00      0.80      0.89        15

           accuracy                           0.86        86
          macro avg       0.88      0.88      0.87        86
       weighted avg       0.88      0.86      0.86        86



In [135]:
# Model 3

if model3_flag:

    #make one folder for each model to save metrics
    model_3_dir = os.path.join(hyperparam_dir,'model_3')
    if not os.path.isdir(model_3_dir):
        os.makedirs(model_3_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_3_CNN.predict(test_generator_metrics)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator_unbalanced)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = model_3_CNN.predict(test_generator_unbalanced_metrics)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Model 3. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_3_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    model_3_CNN.save(os.path.join(model_3_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 58ms/step - accuracy: 0.9302 - f1_score: 0.9305 - loss: 0.1732 - precision: 0.9412 - recall: 0.9302
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step
Model 3. Test Accuracy: 0.93 | Test Loss: 0.173 | Test Precision: 0.941 | Test Recall: 0.93 | Test F1 Score: 0.931:
                     precision    recall  f1-score   support

             0_GOOD       1.00      1.00      1.00        17
        1_Flat loop       0.94      0.83      0.88        18
   2_White lift-off       0.81      0.93      0.87        14
   3_Black lift-off       0.92      0.92      0.92        13
          4_Missing       0.90      1.00      0.95         9
5_Short circuit MOS       1.00      0.93      0.97        15

           accuracy                           0.93        86
          macro avg       0.93      0.94      0.93        86
       weighted avg       0.93      0.93      0.93        86



In [136]:
# Model transfer feature extraction

if feature_extract_flag:

    #make one folder for each model to save metrics
    model_feat_extract_dir = os.path.join(hyperparam_dir,'InceptionV3_feat_extract')
    if not os.path.isdir(model_feat_extract_dir):
        os.makedirs(model_feat_extract_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_feature_extraction_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_feature_extraction_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Feat. Extract. Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_feat_extract_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_feature_extraction_model.save(os.path.join(model_feat_extract_dir,'model.keras'))

In [137]:
#for tine tuning model

if fine_tune_flag:

    #make one folder for each model to save metrics
    model_feat_extract_dir = os.path.join(hyperparam_dir,'InceptionV3_fine_tune')
    if not os.path.isdir(model_feat_extract_dir):
        os.makedirs(model_feat_extract_dir)

    # test accuracy on test data
    if balanced_flag:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_fine_tune_model.evaluate(test_generator_color)

        #for classification report
        true_labels = test_generator_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_fine_tune_model.predict(test_generator_metrics_color)

    else:
        test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_fine_tune_model.evaluate(test_generator_unbalanced_color)

        true_labels = test_generator_unbalanced_metrics.classes
        # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
        predicted_labels = inception_fine_tune_model.predict(test_generator_unbalanced_metrics_color)

    #convert to numerical - np.argmax directly does the job
    predicted_labels = np.argmax(predicted_labels, axis=-1)

    print(f"Fine tune Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

    print(classification_report(true_labels, predicted_labels,target_names = label_list))

    #save as dict for future use as well
    report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
    #convert to dataframe for easy use and saving to csv
    report_df = pd.DataFrame(report).transpose()

    #save to file
    metrics_baseline_savename = os.path.join(model_feat_extract_dir,'classification_report.csv')

    report_df.to_csv(metrics_baseline_savename)

    #save model as well for future use
    #save the model:
    inception_fine_tune_model.save(os.path.join(model_feat_extract_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 287ms/step - accuracy: 0.9070 - f1_score: 0.9065 - loss: 0.2344 - precision: 0.9059 - recall: 0.8953
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 56ms/step
Fine tune Model:  Test Accuracy: 0.907 | Test Loss: 0.234 | Test Precision: 0.906 | Test Recall: 0.895 | Test F1 Score: 0.907:
                     precision    recall  f1-score   support

             0_GOOD       1.00      0.94      0.97        17
        1_Flat loop       0.93      0.72      0.81        18
   2_White lift-off       0.74      1.00      0.85        14
   3_Black lift-off       1.00      1.00      1.00        13
          4_Missing       0.90      1.00      0.95         9
5_Short circuit MOS       0.93      0.87      0.90        15

           accuracy                           0.91        86
          macro avg       0.92      0.92      0.91        86
       weighted avg       0.92      0.91      0.91        86



In [138]:
#clear all models from memory to prevent any bugs and weird behaviour of early stopping

#see: https://stackoverflow.com/questions/58137677/keras-model-training-memory-leak

del model_1_CNN
del model_2_CNN
del model_3_CNN

del inception_feature_extraction_model
del inception_fine_tune_model


In [139]:

gc.collect()
tf.keras.backend.clear_session(
    free_memory=True
)
tf.compat.v1.reset_default_graph()


## Comparative Analysis

[Compare the performance of your model(s) against the baseline model. Discuss any improvements or setbacks and the reasons behind them.]

A table comparing the performances of different models and hyperparameter settings can be found in the github (Model_Performance_overview.xls or Model_Performance_overview.csv).

Some results stand out:

* data augmentation seems to lower model performance across the board even when we see overfitting in training. The likely reason is that the data itself is very regular without a lot of orientation of the features in the images. Therefore, we well adjust data augmentation in future to exlude image flipping etc.
* The transfer learning model performs worse than the 3 relatively simple models. Especially for low image resolutions. The most likely reason is that, as of now we only use feature extraction. For any image size that the model was not originally trained on this will very likely mean a bad performance. For higher resolutions the transfer learning model performs better in comparison
* Higher image resolution does not really improve model performance.

Some things are still missing in the analysis / evaluation and will be added in the near future:

* Transfer learning models with fine tuning
* Different transfer learning base architectures
* When a best model is found we will tackle the task of identifying the drift label class
* More finetuning of hyperparameters for few selected models
* class weighting instead of balanced dataset (balanced dataset is very small)