# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [6]:
# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from PIL import Image

import tensorflow as tf

from keras.src.legacy.preprocessing.image import ImageDataGenerator

from matplotlib.ticker import StrMethodFormatter


## Model Selection

We are considering and evaluating self-defined CNN and pretrained CNNs. Convolutional Neural Networks are proven to work well for the task of image classification. Most likely transformer models could potentially have an even better performance but it is highly likely that the added complexity will not justify the potential increase in performance.



## Hyperparameters


In [7]:
# choose main hyperparameters here

#data / feature selections
balanced_flag = True

#Traing data splits :
test_split = 0.2
val_split = 0.20 # remember - this is fractional  after the test data has been split from the initial balanced sub dataset

# image parameters
target_size = (299,299) #pixel size to load img
batch_size = 8 #later player around with batch size to see how it affects performance

#select data augmentation
aug_flag = False

#training
max_epochs = 100
loss_stop_patience = 7

## Feature Engineering

[Describe any additional feature engineering you've performed beyond what was done for the baseline model.]

We test data augmentation (to varying degrees), dataset balancing and class weights.


In [8]:
# Path Definitions to relevant data + data loading

base_file_path = 'C:/Users/nikoLocal/Documents/Opencampus/Machine_Vision_challenge_data/'
image_path = base_file_path + '/input_train/input_train'

label_csv_name = 'Y_train_eVW9jym.csv'

#Loading .csv data to dataframes
train_df = pd.read_csv(os.path.join(base_file_path, label_csv_name))


In [9]:
#DataFrame Preprocessing


#add another column to the dataframe according to dictionaries to map Labels correctly to numbers
dict_numbers = {'GOOD': 0,'Boucle plate':1,'Lift-off blanc':2,'Lift-off noir':3,'Missing':4,'Short circuit MOS':5}
dict_strings = {'GOOD': '0_GOOD','Boucle plate':'1_Flat loop','Lift-off blanc':'2_White lift-off','Lift-off noir':'3_Black lift-off','Missing':'4_Missing','Short circuit MOS':'5_Short circuit MOS'}
# for Test Data ("random submission" dataframe)
dict_strings_sub = {0: '0_GOOD',1:'1_Flat loop',2:'2_White lift-off',3:'3_Black lift-off',4:'4_Missing',5:'5_Short circuit MOS',6:'6_Drift'}

#list of all labels in the data
label_list = ['0_GOOD','1_Flat loop','2_White lift-off','3_Black lift-off','4_Missing','5_Short circuit MOS']

#create new columns in DFs via .map() method
train_df['LabelNum'] = train_df['Label'].map(dict_numbers)
train_df['LabelStr'] = train_df['Label'].map(dict_strings)

#number of classes
num_classes = len(label_list)

# get counts of label with the least entries
countList = train_df['LabelStr'].value_counts()
minCounts = countList.min()

BalancedDF = pd.DataFrame()
#concat sampled dataframes for each included label
for i in range(num_classes):
    BalancedDF = pd.concat([BalancedDF,train_df[train_df['LabelStr'] == label_list[i]].sample(n=minCounts)],axis=0)

#test if worked as intended
print(BalancedDF['LabelStr'].value_counts())

#split dataframe according to fractional test size
train_df_balanced, test_df_balanced = train_test_split(BalancedDF, test_size=test_split, random_state=42) #keep random state constant to ensure

train_df_train, train_df_test = train_test_split(train_df, test_size=test_split, random_state=42) #keep random state constant to ensure

LabelStr
0_GOOD                 71
1_Flat loop            71
2_White lift-off       71
3_Black lift-off       71
4_Missing              71
5_Short circuit MOS    71
Name: count, dtype: int64


In [10]:
# initialize ImageDataGenerators
# use ImageDataGen because it has method flow_from_dataframe() that works really well together with pandas dataframes
# https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
# although deprecated the functionality can be used as discussed in feedback session

# HYPERPARAMTERS ########


class_mode = 'categorical' # how to store labels - either categorical (one-hot encoding) or as numbers
#class_mode = 'input'
labelCol = 'LabelStr'
#########################

#normalize pixel intensities
rescale = 1.0/255.0

datagen = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=val_split)

datagen_augmentation = ImageDataGenerator(
    horizontal_flip=True,
    vertical_flip=True,
    rotation_range=10,
    shear_range= 5,
    zoom_range = 0.05,
    rescale=rescale,
    validation_split=val_split)

datagen_test = ImageDataGenerator(
    horizontal_flip=False,
    vertical_flip=False,
    rotation_range=0.0,
    shear_range=0.0,
    rescale=rescale,
    validation_split=0.0)


##########################################################

#unbalanced datasets

train_generator_unbalanced = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_unbalanced_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_aug = datagen_augmentation.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="grayscale",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="grayscale",
    shuffle=False,
    seed=42,
    subset='training')

# generators for transfer learning - color mode is color here. Pretrained models expect color input

train_generator_unbalanced_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_unbalanced_val_color = datagen.flow_from_dataframe(
    train_df_train,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_unbalanced_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_unbalanced_metrics_color = datagen_test.flow_from_dataframe(
    train_df_test,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_val_color = datagen.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

train_generator_aug_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

train_generator_aug_val_color = datagen_augmentation.flow_from_dataframe(
    train_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='validation')

test_generator_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=batch_size,
    color_mode="rgb",
    shuffle=True,
    seed=42,
    subset='training')

test_generator_metrics_color = datagen_test.flow_from_dataframe(
    test_df_balanced,
    image_path,
    x_col='filename',
    y_col=labelCol,
    target_size=target_size,
    class_mode=class_mode,
    batch_size=1,
    color_mode="rgb",
    shuffle=False,
    seed=42,
    subset='training')


Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 272 validated image filenames belonging to 6 classes.
Found 68 validated image filenames belonging to 6 classes.
Found 272 validated image filenames belonging to 6 classes.
Found 68 validated image filenames belonging to 6 classes.
Found 86 validated image filenames belonging to 6 classes.
Found 86 validated image filenames belonging to 6 classes.
Found 5298 validated image filenames belonging to 6 classes.
Found 1324 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 1656 validated image filenames belonging to 6 classes.
Found 272 validated image filename

## Hyperparameter Tuning

[Discuss any hyperparameter tuning methods you've applied, such as Grid Search or Random Search, and the rationale behind them.]
So far we have done hyperparameters variation "by hand" only. Some parameters, such as the image size and data augmentation have been systemarically varied and the effects on model performance noted.

We plan to do more hyperparamter tuning in the future.

In [11]:
# make a unique string (name) to save model and evaluation to file
# incorporate most important hyperparameters
# make a subfolder for one set of hyperparameters for more tidy folder and file structure

if aug_flag:
    augmentation_str = 'Aug'
else:
    augmentation_str = 'NoAug'

if balanced_flag:
    balance_str = 'balanced'
else:
    balance_str = 'unbalanced'

hyperparam_name = 'ImgSz_{}_{}_{}'.format(target_size[0],augmentation_str,balance_str)
hyperparam_dir = os.path.join(base_file_path,'model_evaluation')
hyperparam_dir = os.path.join(hyperparam_dir,hyperparam_name)
#check if folder exists - if not create it
if not os.path.isdir(hyperparam_dir):
    os.makedirs(hyperparam_dir)

## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [29]:
# build a model to be used as baseline model
# use "simplest" CNN as baseline

model_1_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_2_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_3_CNN = tf.keras.Sequential([
    tf.keras.layers.Input((target_size[0], target_size[1], 1)),  #image are greyscale - so in total dim (width,height,1)
    tf.keras.layers.Conv2D(32, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation="relu"),
    tf.keras.layers.MaxPool2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dense(num_classes, activation="softmax"),
])

model_1_CNN.summary()
model_2_CNN.summary()
model_3_CNN.summary()


In [30]:
#Transfer learning model - feature extraction

# Load pre-trained InceptionV3 with correct input size
base_transfer_model = tf.keras.applications.InceptionV3(
    weights='imagenet',
    include_top=False,
    input_shape=(target_size[0], target_size[0], 3)
)

# Freeze all layers for feature extraction
base_transfer_model.trainable = False

# Simple classification head
# - GlobalAveragePooling2D reduces spatial dimensions
# - Final Dense layer maps to class probabilities
inception_feature_extraction_model = tf.keras.Sequential([
    base_transfer_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

inception_feature_extraction_model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

In [31]:
# compile Models

model_1_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

model_2_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

model_3_CNN.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    #loss=tf.keras.losses.SparseCategoricalCrossentropy,
    #metrics=["accuracy",'precision',]
    metrics=["accuracy",'precision','recall',tf.keras.metrics.F1Score(average='weighted')]
)

#F1 average parameter needs to be anything other than None if using linewise output when fiting the model...



In [15]:
# callback that monitors validation accuracy / loss
# https://keras.io/api/callbacks/early_stopping/
val_loss_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    min_delta=0.01,
    patience= loss_stop_patience,
    restore_best_weights=True,
    verbose = 2,
    start_from_epoch = 5
)

In [16]:
#select model datasets based on flags

#for model1 - 3
if balanced_flag:
    CNN_model_val_gen = train_generator_val
    if aug_flag:
        CNN_model_gen = train_generator_aug
    else:
        CNN_model_gen = train_generator
else:
    CNN_model_val_gen = train_generator_unbalanced_val
    if aug_flag:
        CNN_model_gen = train_generator_unbalanced_aug
    else:
        CNN_model_gen = train_generator_unbalanced

#for transfer learning
if balanced_flag:
    transfer_model_val_gen = train_generator_val_color
    if aug_flag:
        transfer_model_gen = train_generator_aug_color
    else:
        transfer_model_gen = train_generator_color
else:
    transfer_model_val_gen = train_generator_unbalanced_val_color
    if aug_flag:
        transfer_model_gen = train_generator_unbalanced_aug_color
    else:
        transfer_model_gen = train_generator_unbalanced_color

In [17]:
#Model 1
history_1 = model_1_CNN.fit(
CNN_model_gen,
validation_data = CNN_model_val_gen,
epochs=max_epochs,
callbacks=[val_loss_stop],
verbose = 2 #2 is one line per epoch -
)

Epoch 1/100
34/34 - 26s - 767ms/step - accuracy: 0.3566 - f1_score: 0.3563 - loss: 10.7637 - precision: 0.3796 - recall: 0.3419 - val_accuracy: 0.7206 - val_f1_score: 0.7163 - val_loss: 1.6770 - val_precision: 0.7164 - val_recall: 0.7059
Epoch 2/100
34/34 - 24s - 698ms/step - accuracy: 0.8199 - f1_score: 0.8179 - loss: 0.6644 - precision: 0.8514 - recall: 0.7794 - val_accuracy: 0.9118 - val_f1_score: 0.9121 - val_loss: 0.4398 - val_precision: 0.9375 - val_recall: 0.8824
Epoch 3/100
34/34 - 24s - 693ms/step - accuracy: 0.9779 - f1_score: 0.9778 - loss: 0.0961 - precision: 0.9925 - recall: 0.9706 - val_accuracy: 0.8971 - val_f1_score: 0.8978 - val_loss: 0.4340 - val_precision: 0.8971 - val_recall: 0.8971
Epoch 4/100
34/34 - 23s - 689ms/step - accuracy: 0.9963 - f1_score: 0.9963 - loss: 0.0292 - precision: 0.9963 - recall: 0.9963 - val_accuracy: 0.8676 - val_f1_score: 0.8723 - val_loss: 0.5441 - val_precision: 0.8788 - val_recall: 0.8529
Epoch 5/100
34/34 - 23s - 691ms/step - accuracy: 0.

In [32]:
#Model 2


history_2 = model_2_CNN.fit(
    CNN_model_gen,
    validation_data = CNN_model_val_gen,
    epochs=max_epochs,
    callbacks=[val_loss_stop],
    verbose = 2 #2 is one line per epoch -
)

Epoch 1/100
34/34 - 16s - 460ms/step - accuracy: 0.4154 - f1_score: 0.4082 - loss: 3.9656 - precision: 0.4878 - recall: 0.2206 - val_accuracy: 0.7647 - val_f1_score: 0.7663 - val_loss: 1.0365 - val_precision: 0.8919 - val_recall: 0.4853
Epoch 2/100
34/34 - 14s - 415ms/step - accuracy: 0.8824 - f1_score: 0.8805 - loss: 0.4804 - precision: 0.9386 - recall: 0.7868 - val_accuracy: 0.8382 - val_f1_score: 0.8385 - val_loss: 0.4608 - val_precision: 0.9016 - val_recall: 0.8088
Epoch 3/100
34/34 - 15s - 433ms/step - accuracy: 0.9669 - f1_score: 0.9670 - loss: 0.1017 - precision: 0.9704 - recall: 0.9632 - val_accuracy: 0.8676 - val_f1_score: 0.8685 - val_loss: 0.3212 - val_precision: 0.9219 - val_recall: 0.8676
Epoch 4/100
34/34 - 14s - 414ms/step - accuracy: 0.9816 - f1_score: 0.9816 - loss: 0.0677 - precision: 0.9852 - recall: 0.9816 - val_accuracy: 0.9118 - val_f1_score: 0.9121 - val_loss: 0.2181 - val_precision: 0.9531 - val_recall: 0.8971
Epoch 5/100
34/34 - 14s - 413ms/step - accuracy: 0.9

In [19]:
#Model 3


history_3 = model_3_CNN.fit(
    CNN_model_gen,
    validation_data = CNN_model_val_gen,
    epochs=max_epochs,
    callbacks=[val_loss_stop],
    verbose = 2 #2 is one line per epoch -
)

Epoch 1/100
34/34 - 14s - 404ms/step - accuracy: 0.4228 - f1_score: 0.4215 - loss: 1.4972 - precision: 0.7333 - recall: 0.2022 - val_accuracy: 0.7059 - val_f1_score: 0.7216 - val_loss: 0.9627 - val_precision: 0.8158 - val_recall: 0.4559
Epoch 2/100
34/34 - 12s - 343ms/step - accuracy: 0.8419 - f1_score: 0.8412 - loss: 0.5517 - precision: 0.8821 - recall: 0.7978 - val_accuracy: 0.8235 - val_f1_score: 0.8177 - val_loss: 0.4271 - val_precision: 0.9474 - val_recall: 0.7941
Epoch 3/100
34/34 - 12s - 346ms/step - accuracy: 0.9265 - f1_score: 0.9265 - loss: 0.2659 - precision: 0.9361 - recall: 0.9154 - val_accuracy: 0.8971 - val_f1_score: 0.8977 - val_loss: 0.2722 - val_precision: 0.9231 - val_recall: 0.8824
Epoch 4/100
34/34 - 12s - 344ms/step - accuracy: 0.9559 - f1_score: 0.9559 - loss: 0.1143 - precision: 0.9559 - recall: 0.9559 - val_accuracy: 0.9412 - val_f1_score: 0.9440 - val_loss: 0.1782 - val_precision: 0.9552 - val_recall: 0.9412
Epoch 5/100
34/34 - 11s - 335ms/step - accuracy: 0.9

In [34]:
#transfer learning. Feature extraction

val_loss_stop = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    min_delta=0.01,
    patience= 3,
    restore_best_weights=True,
    verbose = 2,
    start_from_epoch = 5
)

history_feat_extract = inception_feature_extraction_model.fit(
    transfer_model_gen,
    validation_data = transfer_model_val_gen,
    epochs=max_epochs,
    callbacks=[val_loss_stop],
    verbose = 2 #2 is one line per epoch -
)

Epoch 1/100
34/34 - 20s - 595ms/step - accuracy: 0.4449 - f1_score: 0.4411 - loss: 1.4516 - precision: 0.7558 - recall: 0.2390 - val_accuracy: 0.7647 - val_f1_score: 0.7358 - val_loss: 0.8971 - val_precision: 0.9667 - val_recall: 0.4265
Epoch 2/100
34/34 - 12s - 345ms/step - accuracy: 0.7279 - f1_score: 0.7215 - loss: 0.8267 - precision: 0.8439 - recall: 0.5368 - val_accuracy: 0.6912 - val_f1_score: 0.6947 - val_loss: 0.6921 - val_precision: 0.8800 - val_recall: 0.6471
Epoch 3/100
34/34 - 12s - 349ms/step - accuracy: 0.8015 - f1_score: 0.7990 - loss: 0.6288 - precision: 0.9038 - recall: 0.6912 - val_accuracy: 0.7941 - val_f1_score: 0.7897 - val_loss: 0.6001 - val_precision: 0.8846 - val_recall: 0.6765
Epoch 4/100
34/34 - 12s - 341ms/step - accuracy: 0.8529 - f1_score: 0.8526 - loss: 0.5185 - precision: 0.9196 - recall: 0.7574 - val_accuracy: 0.8088 - val_f1_score: 0.7936 - val_loss: 0.5222 - val_precision: 0.8571 - val_recall: 0.7059
Epoch 5/100
34/34 - 12s - 349ms/step - accuracy: 0.8

## Evaluation Metrics

[Clearly specify which metrics you'll use to evaluate the model performance, and why you've chosen these metrics.]


In [25]:
# Model 1

#make one folder for each model to save metrics
model_1_dir = os.path.join(hyperparam_dir,'model_1')
if not os.path.isdir(model_1_dir):
    os.makedirs(model_1_dir)

# test accuracy on test data
if balanced_flag:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator)

    #for classification report
    true_labels = test_generator_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_1_CNN.predict(test_generator_metrics)

else:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_1_CNN.evaluate(test_generator_unbalanced)

    true_labels = test_generator_unbalanced_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_1_CNN.predict(test_generator_unbalanced_metrics)

#convert to numerical - np.argmax directly does the job
predicted_labels = np.argmax(predicted_labels, axis=-1)

print(f"Model 1. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

print(classification_report(true_labels, predicted_labels,target_names = label_list))

#save as dict for future use as well
report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
#convert to dataframe for easy use and saving to csv
report_df = pd.DataFrame(report).transpose()

#save to file
metrics_baseline_savename = os.path.join(model_1_dir,'classification_report.csv')

report_df.to_csv(metrics_baseline_savename)

#save model as well for future use
#save the model:
model_1_CNN.save(os.path.join(model_1_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 70ms/step - accuracy: 0.7674 - f1_score: 0.7663 - loss: 0.6175 - precision: 0.7674 - recall: 0.7674
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 20ms/step
Model 1. Test Accuracy: 0.767 | Test Loss: 0.617 | Test Precision: 0.767 | Test Recall: 0.767 | Test F1 Score: 0.766:
                     precision    recall  f1-score   support

             0_GOOD       0.63      0.71      0.67        17
        1_Flat loop       0.79      0.61      0.69        18
   2_White lift-off       0.64      1.00      0.78        14
   3_Black lift-off       1.00      0.85      0.92        13
          4_Missing       1.00      1.00      1.00         9
5_Short circuit MOS       0.82      0.60      0.69        15

           accuracy                           0.77        86
          macro avg       0.81      0.79      0.79        86
       weighted avg       0.79      0.77      0.77        86



In [33]:
# Model 2

#make one folder for each model to save metrics
model_2_dir = os.path.join(hyperparam_dir,'model_2')
if not os.path.isdir(model_2_dir):
    os.makedirs(model_2_dir)

# test accuracy on test data
if balanced_flag:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator)

    #for classification report
    true_labels = test_generator_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_2_CNN.predict(test_generator_metrics)

else:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_2_CNN.evaluate(test_generator_unbalanced)

    true_labels = test_generator_unbalanced_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_2_CNN.predict(test_generator_unbalanced_metrics)

#convert to numerical - np.argmax directly does the job
predicted_labels = np.argmax(predicted_labels, axis=-1)

print(f"Model 2. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

print(classification_report(true_labels, predicted_labels,target_names = label_list))

#save as dict for future use as well
report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
#convert to dataframe for easy use and saving to csv
report_df = pd.DataFrame(report).transpose()

#save to file
metrics_baseline_savename = os.path.join(model_2_dir,'classification_report.csv')

report_df.to_csv(metrics_baseline_savename)

#save model as well for future use
#save the model:
model_2_CNN.save(os.path.join(model_2_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 50ms/step - accuracy: 0.8488 - f1_score: 0.8501 - loss: 0.5193 - precision: 0.8675 - recall: 0.8372
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step
Model 2. Test Accuracy: 0.849 | Test Loss: 0.519 | Test Precision: 0.867 | Test Recall: 0.837 | Test F1 Score: 0.85:
                     precision    recall  f1-score   support

             0_GOOD       0.94      0.88      0.91        17
        1_Flat loop       0.93      0.72      0.81        18
   2_White lift-off       0.68      0.93      0.79        14
   3_Black lift-off       0.85      0.85      0.85        13
          4_Missing       1.00      0.78      0.88         9
5_Short circuit MOS       0.82      0.93      0.88        15

           accuracy                           0.85        86
          macro avg       0.87      0.85      0.85        86
       weighted avg       0.87      0.85      0.85        86



In [23]:
# Model 3

#make one folder for each model to save metrics
model_3_dir = os.path.join(hyperparam_dir,'model_3')
if not os.path.isdir(model_3_dir):
    os.makedirs(model_3_dir)

# test accuracy on test data
if balanced_flag:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator)

    #for classification report
    true_labels = test_generator_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_3_CNN.predict(test_generator_metrics)

else:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = model_3_CNN.evaluate(test_generator_unbalanced)

    true_labels = test_generator_unbalanced_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = model_3_CNN.predict(test_generator_unbalanced_metrics)

#convert to numerical - np.argmax directly does the job
predicted_labels = np.argmax(predicted_labels, axis=-1)

print(f"Model 3. Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

print(classification_report(true_labels, predicted_labels,target_names = label_list))

#save as dict for future use as well
report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
#convert to dataframe for easy use and saving to csv
report_df = pd.DataFrame(report).transpose()

#save to file
metrics_baseline_savename = os.path.join(model_3_dir,'classification_report.csv')

report_df.to_csv(metrics_baseline_savename)

#save model as well for future use
#save the model:
model_3_CNN.save(os.path.join(model_3_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 57ms/step - accuracy: 0.8837 - f1_score: 0.8831 - loss: 0.3625 - precision: 0.8929 - recall: 0.8721
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step
Model 3. Test Accuracy: 0.884 | Test Loss: 0.362 | Test Precision: 0.893 | Test Recall: 0.872 | Test F1 Score: 0.883:
                     precision    recall  f1-score   support

             0_GOOD       0.84      0.94      0.89        17
        1_Flat loop       1.00      0.72      0.84        18
   2_White lift-off       0.76      0.93      0.84        14
   3_Black lift-off       0.85      0.85      0.85        13
          4_Missing       1.00      1.00      1.00         9
5_Short circuit MOS       0.93      0.93      0.93        15

           accuracy                           0.88        86
          macro avg       0.90      0.90      0.89        86
       weighted avg       0.90      0.88      0.88        86



In [35]:
# Model transfer feature extraction

#make one folder for each model to save metrics
model_feat_extract_dir = os.path.join(hyperparam_dir,'InceptionV3_feat_extract')
if not os.path.isdir(model_feat_extract_dir):
    os.makedirs(model_feat_extract_dir)

# test accuracy on test data
if balanced_flag:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_color)

    #for classification report
    true_labels = test_generator_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = inception_feature_extraction_model.predict(test_generator_metrics_color)

else:
    test_loss, test_accuracy, test_precision, test_recall,test_f1_score = inception_feature_extraction_model.evaluate(test_generator_unbalanced_color)

    true_labels = test_generator_unbalanced_metrics.classes
    # model.predict directly gives you the output of the last mode layer. so percentages when using i.e. 'softmax'
    predicted_labels = inception_feature_extraction_model.predict(test_generator_unbalanced_metrics_color)

#convert to numerical - np.argmax directly does the job
predicted_labels = np.argmax(predicted_labels, axis=-1)

print(f"Feat. Extract. Model:  Test Accuracy: {test_accuracy:.3g} | Test Loss: {test_loss:.3g} | Test Precision: {test_precision:.3g} | Test Recall: {test_recall:.3g} | Test F1 Score: {test_f1_score:.3g}:")

print(classification_report(true_labels, predicted_labels,target_names = label_list))

#save as dict for future use as well
report = classification_report(true_labels, predicted_labels,target_names = label_list,output_dict=True)
#convert to dataframe for easy use and saving to csv
report_df = pd.DataFrame(report).transpose()

#save to file
metrics_baseline_savename = os.path.join(model_feat_extract_dir,'classification_report.csv')

report_df.to_csv(metrics_baseline_savename)

#save model as well for future use
#save the model:
inception_feature_extraction_model.save(os.path.join(model_feat_extract_dir,'model.keras'))

[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 277ms/step - accuracy: 0.8256 - f1_score: 0.8266 - loss: 0.5592 - precision: 0.8272 - recall: 0.7791
[1m86/86[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 61ms/step
Feat. Extract. Model:  Test Accuracy: 0.826 | Test Loss: 0.559 | Test Precision: 0.827 | Test Recall: 0.779 | Test F1 Score: 0.827:
                     precision    recall  f1-score   support

             0_GOOD       0.74      1.00      0.85        17
        1_Flat loop       1.00      0.67      0.80        18
   2_White lift-off       0.63      0.86      0.73        14
   3_Black lift-off       0.91      0.77      0.83        13
          4_Missing       1.00      1.00      1.00         9
5_Short circuit MOS       0.92      0.73      0.81        15

           accuracy                           0.83        86
          macro avg       0.87      0.84      0.84        86
       weighted avg       0.86      0.83      0.83        86



## Comparative Analysis

[Compare the performance of your model(s) against the baseline model. Discuss any improvements or setbacks and the reasons behind them.]

A table comparing the performances of different models and hyperparameter settings can be found in the github (Model_Performance_overview.xls or Model_Performance_overview.csv).

Some results stand out:

* data augmentation seems to lower model performance across the board even when we see overfitting in training. The likely reason is that the data itself is very regular without a lot of orientation of the features in the images. Therefore, we well adjust data augmentation in future to exlude image flipping etc.
* The transfer learning model performs worse than the 3 relatively simple models. Especially for low image resolutions. The most likely reason is that, as of now we only use feature extraction. For any image size that the model was not originally trained on this will very likely mean a bad performance. For higher resolutions the transfer learning model performs better in comparison
* Higher image resolution does not really improve model performance.

Some things are still missing in the analysis / evaluation and will be added in the near future:

* Transfer learning models with fine tuning
* Different transfer learning base architectures
* When a best model is found we will tackle the task of identifying the drift label class
* More finetuning of hyperparameters for few selected models
* class weighting instead of balanced dataset (balanced dataset is very small)