# Quality Control of Carbon Look Components via Surface Defect Classification with Deep Neural Networks

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/airtlab/surface-defect-classification-in-carbon-look-components-using-deep-neural-networks/blob/master/notebooks/Surface_Defect_Classification_in_Carbon_Look_Components_Using_Deep_Neural_Networks.ipynb)

This notebook contains the source code of the experiments presented in
>A. Silenzi, S. Tomassini, N. Falcionelli, P. Contardo, A. Bonci, A.F. Dragoni, P. Sernani, *Quality Control of Carbon Look Components via Surface Defect Classification with Deep Neural Networks*.

The paper is currently under review for the publication in the [Sensors MDPI journal](https://www.mdpi.com/journal/sensors).

Specifically, the experiments are **accuracy tests of ten different 2D Convolutional Neural Networks (2D CNNs) pretrained on Imagenet**. The models are combined with fully connected layers to classify images of carbon-look components into defective and non-defective and to recognize different types of surface defects. Such models perform a classification on samples of a real case study.

The image database is publicly available in a dedicated GitHub repository:
> <https://github.com/airtlab/surface-defect-classification-in-carbon-look-components-dataset>

The tested 2D CNNs are:

- VGG16 ([https://keras.io/api/applications/vgg/#vgg16-function](https://keras.io/api/applications/vgg/#vgg16-function))
- VGG19 ([https://keras.io/api/applications/vgg/#vgg19-function](https://keras.io/api/applications/vgg/#vgg19-function))
- ResNet50V2 ([https://keras.io/api/applications/resnet/#resnet50v2-function](https://keras.io/api/applications/resnet/#resnet50v2-function))
- ResNet101V2 ([https://keras.io/api/applications/resnet/#resnet101v2-function](https://keras.io/api/applications/resnet/#resnet101v2-function))
- ResNet152V2 ([https://keras.io/api/applications/resnet/#resnet152v2-function](https://keras.io/api/applications/resnet/#resnet152v2-function))
- InceptionV3 ([https://keras.io/api/applications/inceptionv3](https://keras.io/api/applications/inceptionv3))
- MobileNetV2 ([https://keras.io/api/applications/mobilenet/#mobilenetv2-function](https://keras.io/api/applications/mobilenet/#mobilenetv2-function))
- NASNetMobile ([https://keras.io/api/applications/nasnet/#nasnetmobile-function](https://keras.io/api/applications/nasnet/#nasnetmobile-function))
- DenseNet121 ([https://keras.io/api/applications/densenet/#densenet121-function](https://keras.io/api/applications/densenet/#densenet121-function))
- Xception ([https://keras.io/api/applications/xception/](https://keras.io/api/applications/xception/))

### Note

The results presented in the paper are computed with a **GPU runtime**. **10 randomized tests for each model** were performed, to generalize the performance of the proposed model. Due to the randomization of the dataset splitting and non-deterministic behaviour of GPU computation, the results can slightly change across different runs.

For more information about non-determism on GPU with TensorFlow check <https://github.com/NVIDIA/framework-determinism>.

## 1 Preliminary Operations
The following cells:
- install the packages used in the experiments (opencv, matplotlib, scikit-learn, scikit-image, pands). If the experiments run in Google Colab, these packages are already available and there is no need to install them manually;
- print some information such as the version of the used packages (Keras, Tensorflow, Numpy, Scikit-learn), the CPU, and the GPU of the machine hosting the notebook;
- import the libraries used for the experiments
- **clone the image repository** into the /datarepo directory.

In [None]:
# Installs used package. There is no need to run this cell in Google Colab
!pip install opencv-python-headless
!pip install -U matplotlib
!pip install -U scikit-learn
!pip install pandas
!pip install -U scikit-image

In [None]:
# Keras, Tensorflow, Scikit-Learn versions installed

from keras import __version__
from keras import backend as K
import sklearn
import matplotlib
import cv2
import numpy
import scipy

print('Using Scipy version: {}.'.format(scipy.__version__))
print('Using Numpy version: {}.'.format(numpy.__version__))
print('Using Scikit-learn version: {}.'.format(sklearn.__version__))
print('Using OpenCV version: {}.'.format(cv2.__version__))
print('Using Matplotlib version: {}.'.format(matplotlib.__version__))
print('Using Keras version:', __version__, 'backend:', K.backend())

if K.backend() == "tensorflow":
    import tensorflow as tf
    device_name = tf.test.gpu_device_name()
    if device_name == '':
        device_name = "None"
    print('Using TensorFlow version:', tf.__version__, ', GPU:', device_name)

In [None]:
# CPU in use
!cat /proc/cpuinfo

In [None]:
# GPU in use
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In [None]:
# Import of used libraries

from keras.models import model_from_json, Model
import os, sys, re, codecs, csv
from PIL import Image
from os import listdir, stat
import cv2
import numpy as np
import matplotlib.pyplot as plt
import csv
import random
from random import shuffle
import shutil
import argparse

import pandas as pd
import sklearn
from sklearn.model_selection import StratifiedShuffleSplit, train_test_split
from sklearn.metrics import roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, RocCurveDisplay
from tensorflow.keras.callbacks import EarlyStopping

from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Input, Dense, Dropout, Embedding, Activation, Flatten, TimeDistributed, Bidirectional, LSTM, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, GlobalMaxPool2D
from tensorflow.keras.applications.xception import Xception, preprocess_input as xception_preprocess_input
from tensorflow.keras.applications.vgg16 import VGG16, preprocess_input as vgg16_preprocess_input
from tensorflow.keras.applications.vgg19 import VGG19, preprocess_input as vgg19_preprocess_input
from tensorflow.keras.applications.resnet_v2 import ResNet50V2, preprocess_input as resnet_v2_preprocess_input
from tensorflow.keras.applications.resnet_v2 import ResNet101V2
from tensorflow.keras.applications.resnet_v2 import ResNet152V2
from tensorflow.keras.applications.inception_v3 import InceptionV3, preprocess_input as inception_v3_preprocess_input
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2, preprocess_input as inception_resnet_v2_preprocess_input
from tensorflow.keras.applications.densenet import DenseNet121, preprocess_input as densenet_preprocess_input
from tensorflow.keras.applications.nasnet import NASNetMobile, preprocess_input as nasnet_preprocess_input
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2, preprocess_input as mobilenet_v2_preprocess_input
from tensorflow.keras import optimizers
from tensorflow.keras.utils import Sequence

from io import BytesIO
import tensorflow as tf
import keras
from keras import preprocessing
from keras.preprocessing import image
from keras.preprocessing.image import load_img
from keras import models
from keras.regularizers import l2

from keras.preprocessing.sequence import pad_sequences

from keras.callbacks import ModelCheckpoint
from keras import initializers, regularizers, constraints, layers
from keras.utils.data_utils import get_file
from keras.preprocessing.image import img_to_array
from keras.utils.np_utils import to_categorical

import skimage
import scipy
from scipy import ndimage, misc
from PIL import Image

In [None]:
# Downloads the image dataset for the classification of carbon look components
!mkdir datarepo
!git clone https://github.com/airtlab/surface-defect-classification-in-carbon-look-components-dataset.git datarepo

## 2 Image Preprocessing

The following cells:
- define **utility functions for data preprocessing**. Specifically, these functions
    - reduce the image size;
    - perform data augmentation by flipping images horizontally and vertically
    - copy the images to different paths;
    - convert to gray scale;
    - perform illumination normalization;
- **run the image preprocessing**. Note that **only data augmentation** is performed (the best results were achieved without converting to grayscale and without illumination normalization). In the augmented image datasets, the original images are included as well. After these operations, four datasets are available for the experiments
    - **the original dataset for binary classification**, which contains images from two classes, i.e. negative (no defects) and positive (with defects). There are **200 images per class**, with a total of 400 images;
    - **the augmented dataset for binary classification**, which contains the images of the previous dataset flipped horizontally, vertically, and in their original shape as well. There are **600 images per class**, with a total of 1200 images;
    - **the original dataset for multi-class classification**, which contains images from three classes, i.e. negative (no defects), with recoverable defects, and with non-recoverable defects. There are **500 images per class**, with a total of 1500 images;
    - **the augmented dataset for multi-class classification**, which contains the images of the previous fataset flipped horizontally, vertically, and in their original shape as well. There are **1500 images per class**, with a total of 4500 images.

In [None]:
# Constants with the paths to the datasets
ORIGINAL_BINARY_DATASET_DIR = 'datarepo/carbon-binary'
AUGMENTED_BINARY_DATASET_DIR = 'datarepo/carbon-binary-augmented'
ORIGINAL_MULTI_DATASET_DIR = 'datarepo/carbon-multiclass'
AUGMENTED_MULTI_DATASET_DIR = 'datarepo/carbon-multiclass-augmented'

def downsize(path):

    """ Downsize images byte dimesion

    Parameters
    ----------
    path : str
           Path to the folder with images to be downsized
    """
    for defect_type in os.listdir(path):
        read_path = path + '/' + defect_type + '/'
        for files in os.listdir(read_path):
            img_to_downsize = Image.open(read_path + files)
            img_to_downsize.save(read_path + files, optimize=True, quality=85)


def data_augmentation(dataset_path, augmented_dataset_path):

    """ Data augmentation function for horizontal and vertical flip

    Parameters
    ----------
    dataset_path : str
                   Path to the folder with images to be augmented
    augmented_dataset_path : str
                             Destination path for the synthetic samples. It will
                             contain the same folders which are in dataset_path,
                             with new data
    """

    path = dataset_path

    if not os.path.exists(augmented_dataset_path):
        os.makedirs(augmented_dataset_path)

    for defect_type in os.listdir(path):
        if path == 'datarepo/carbon-binary':
            i = 200
        elif path == 'datarepo/carbon-multiclass':
            i = 500
        if not os.path.exists(augmented_dataset_path + '/' + defect_type):
            os.makedirs(augmented_dataset_path + '/' + defect_type)
            read_path = path + '/' + defect_type + '/'
            write_path = augmented_dataset_path + '/' + defect_type + '/'
        for files in os.listdir(read_path):
            img_to_augment = cv2.imread(read_path + files)
            horizontal_img = cv2.flip(img_to_augment, 0)
            vertical_img = cv2.flip(img_to_augment, 1)
            root, ext = os.path.splitext(files)
            if defect_type == 'negative':
                cv2.imwrite(os.path.join(write_path, 'negative_' + str(i + 1) + ext), horizontal_img)
                cv2.imwrite(os.path.join(write_path, 'negative_' + str(i + 2) + ext), vertical_img)
            elif defect_type == 'positive':
                cv2.imwrite(os.path.join(write_path, 'positive_' + str(i + 1) + ext), horizontal_img)
                cv2.imwrite(os.path.join(write_path, 'positive_' + str(i + 2) + ext), vertical_img)
            elif defect_type == 'non_recoverable_defects':
                cv2.imwrite(os.path.join(write_path, 'nrd_' + str(i + 1) + ext), horizontal_img)
                cv2.imwrite(os.path.join(write_path, 'nrd_' + str(i + 2) + ext), vertical_img)
            elif defect_type == 'recoverable_defects':
                cv2.imwrite(os.path.join(write_path, 'rd_' + str(i + 1) + ext), horizontal_img)
                cv2.imwrite(os.path.join(write_path, 'rd_' + str(i + 2) + ext), vertical_img)
            i += 2


def copy_original_files(dataset_path, augmented_dataset_path):

    """ Copies the original files into the augmented dataset

    It copies all the files included in the folders at dataset_path, in the same folders
    in augmented_dataset_path

    Parameters
    ----------
    dataset_path : str
                   Path to the folder with images to be augmented
    augmented_dataset_path : str
                             Destination path for the copied files
    """

    for defect_type in os.listdir(dataset_path):
        for files in os.listdir(dataset_path + '/' + defect_type):
            shutil.copyfile((dataset_path + '/' + defect_type + '/' + files), (augmented_dataset_path + '/' + defect_type + '/' + files))


def convert_gray(path):

    """ Convert path images to grayscale

    Parameters
    ----------
    dataset_path : str
                   Path to the folder with images to be converted
    """
    for defect_type in os.listdir(path):
        read_path = path + '/' + defect_type + '/'
        for files in os.listdir(read_path):
            img = cv2.imread(read_path + files)
            gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
            cv2.imwrite(os.path.join(read_path + files), gray)


def illumination_normalization(path):

    """ Function for Local Illumination Normalization (CLAHE)

    Parameters
    ----------
    path : str
           Path to the folder with images to be augmented
    """
    for defect_type in os.listdir(path):
        read_path = path + '/' + defect_type + '/'
        for files in os.listdir(read_path):
            img = cv2.imread(read_path + files, 0)
            clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8)) # Create a CLAHE object (Arguments are optional)
            new_img = clahe.apply(img)
            cv2.imwrite(os.path.join(read_path + files), new_img)

In [None]:
# Creates the augmented dataset and copies there also the original data

#downsize(original_data_dir)

data_augmentation(ORIGINAL_BINARY_DATASET_DIR, AUGMENTED_BINARY_DATASET_DIR)
copy_original_files(ORIGINAL_BINARY_DATASET_DIR, AUGMENTED_BINARY_DATASET_DIR)
data_augmentation(ORIGINAL_MULTI_DATASET_DIR, AUGMENTED_MULTI_DATASET_DIR)
copy_original_files(ORIGINAL_MULTI_DATASET_DIR, AUGMENTED_MULTI_DATASET_DIR)

#convert_gray(original_data_dir)
#recoverable_defects = os.listdir(original_data_dir + '/recoverable_defects/')
#sample_image = load_img(os.path.join(original_data_dir + '/recoverable_defects/', recoverable_defects[111]), target_size=(224,224))
#plt.imshow(sample_image)
#plt.show()

#sample_image = cv2.imread(os.path.join(original_data_dir + '/recoverable_defects/', recoverable_defects[111]))
#horizontal_img = cv2.flip(sample_image, 0)
#plt.imshow(horizontal_img)
#plt.show()
# Convert original and augmented dataset to grayscale
#convert_gray(original_data_dir)
#convert_gray(augmented_data_dir)

# Apply normalization for both datasets
#illumination_normalization(original_data_dir)
#illumination_normalization(augmented_data_dir)

#sample_image = cv2.imread(os.path.join(original_data_dir + '/recoverable_defects/', recoverable_defects[111]))
#plt.imshow(sample_image)
#plt.show()

## 3 Binary classification experiments

The following cells:

- define utility functions **to download a pretrained model** and build and end-to-end network for binary classification based on a pretrained model;
- define **ten utility functions to build the ten end-to-end deep neural networks** developed for the experiments to test the binary classification of carbon look component images into negative (no defects) and positive (with defects).
- define the **utility function to run an experiment with the binary classification**. An experiment consists of tests repeated 10 times with the **stratified shuffle split cross-validation scheme**. In each split 80% of data are used for training, and 20% of data are used for testing. 12,5% of the training data (i.e. 10% of the entire dataset) is used for validation. In other words, in each test **70% of data are actually for training, 10% for validation, and 20% for testing.**

In [None]:
def GetPretrainedModel(ModelConstructor, input_shape=(224,224,3), print_summary=True, layers_to_finetune=0):

    """ Builds a pretrained 2D CNN with the Imagenet weights, freezing all layers except "layers_to_finetune"

    Parameters
    ----------
    ModelConstructor : Callable[[bool], [str], [tuple], Sequential]
                       Function that download the pretrained model, i.e. one of the Keras applications:
                       https://keras.io/api/applications/
                       The arguments are include_top, weights, and input_shape.
    input_shape : tuple
                  The input shape for the pretrained model.
    print_summary : bool
                    If True prints the model summary.
    layers_to_finetune : bool
                 The number of final layers that should not be freezed. Freezes all the layers if
                 lower or equal than 0, or greater of the number of layers of the network.

    Returns
    -------
    model : Sequential
          The instantiated model.
    """

    model = ModelConstructor(include_top=False, weights="imagenet", input_shape=input_shape)

    if print_summary:
        print('Pretrained model')
        model.summary()
    if layers_to_finetune > 0 and layers_to_finetune <= len( model.layers):
        for layer in model.layers[:-layers_to_finetune]:
            layer.trainable = False
    else:
        for layer in model.layers:
            layer.trainable = False
    return model


def GetEndToEndModel(GetPretrainedModel, ModelConstructor, optimizer, loss, input_shape=(224,224,3),
    print_summary=True, layers_to_finetune=0, include_global_avarage=True, include_batch_norm=True,
    dense_units=[512], classes=2):

    """ Creates the end to end model composed of a pretrained deep neural network (one of the keras
    applications with the Imagenet weights) and fully connected dense layers trained from scratch).

    Parameters
    ----------
    GetPretrainedModel : Callable[[Callable[[bool], [str], [tuple], Sequential]], [tuple], [bool], [int], Sequential]
                Function that instantiates the pretrained model.
    ModelConstructor : Callable[[bool], [str], [tuple], Sequential]
                       Function that download the pretrained model, i.e. one of the Keras applications:
                       https://keras.io/api/applications/
                       The arguments are include_top, weights, and input_shape.
    optimizer : Optimizer
                One of the Keras optmizers https://keras.io/api/optimizers/
    loss : str
           String that identifies the loss function to be applied
    input_shape : tuple
                  The input shape for the pretrained model.
    print_summary : bool
                    If True prints the model summary.
    layers_to_finetune : bool
                 The number of final layers that should not be freezed. Freezes all the layers if
                 lower or equal than 0, or greater of the number of layers of the network.
    include_global_avarage : bool
                             If True, includes a GlobalAvaragePooling2D layer after the pretrained model.
    include_batch_norm : bool
                         If True, includes a BatchNormalization layer after the pretrained model.
    dense_unit : list
                 List of ints where each int represent the number of units of dense layers to be added to the model.
    classes : int
              Number of classes in output for the model, i.e. the number of neurons of the final Softmax layer.

    Returns
    -------
    model : Sequential
            The instantiated model
    """

    model = Sequential()

    # Add the convolutional base model
    model.add(GetPretrainedModel(ModelConstructor,input_shape,print_summary,layers_to_finetune))

    if include_global_avarage:
        model.add(GlobalAveragePooling2D())
    #model.add(GlobalMaxPool2D())
    #model.add(MaxPooling2D())

    if include_batch_norm:
        model.add(BatchNormalization())
    #model.add(Dropout(0.5))

    # Add new layers (by default, kernel_initializer = 'glorot_uniform')
    model.add(Flatten())
    #model.add(Dense(1024, activation='relu'))
    #model.add(Dense(1024, activation='relu'))
    #model.add(Dense(1024, activation='relu'))
    #model.add(Dense(512, activation='relu'))
    for dense_unit in dense_units:
        model.add(Dense(dense_unit, activation='relu'))

    model.add(Dense(classes, activation='softmax'))

    if print_summary:
        print('End to end model')
        model.summary()

    model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])

    return model

In [None]:
def GetBestBinaryVGG16Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, VGG16, optimizers.SGD(learning_rate=0.0001, momentum=0.9),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryVGG19Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, VGG19, optimizers.SGD(learning_rate=0.001, momentum=0.9),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryResNet50V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet50V2, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryResNet101V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet101V2, optimizers.Adam(learning_rate=0.001),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryResNet152V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet152V2, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryInceptionV3Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, InceptionV3, optimizers.SGD(learning_rate=0.001, momentum=0.9),
    'binary_crossentropy', (299,299,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryMobileNetV2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, MobileNetV2, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [256, 128], 2)

def GetBestBinaryNasNetMobileModel(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, NASNetMobile, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (224,224,3), print_summary, 4, True, False, [256, 128], 2)

def GetBestBinaryDenseNet121Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, DenseNet121, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 2)

def GetBestBinaryXceptionModel(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, Xception, optimizers.Adam(learning_rate=0.0001),
    'binary_crossentropy', (299,299,3), print_summary, 4, True, False, [256, 128], 2)

In [None]:
def runExperiment(GetConvModel, Preprocess_input, input_shape, batchSize, path,
    Model_name, folderName, rState, split_number=10):

    """" Runs a classifier test on the binary dataset, using the stratified shuffled splits, with
        70% of data for training, 10% for validation, 20% for test

    Parameters
    ----------
    GetConvModel : Callable[Sequential]
                   Function returning the sequential model to be tested.
    Preprocess_input : Callable[[][]]
    input_shape : tuple
                  Shape of a model's input.
    batchSize : int
                Batch size to be used for training and testing
    path : str
           Path to the folder with images to be processed
    Model_name : str
                 Model name to be used in the title of AUC-ROC plot
    folderName : str
                 Name for the folder to store the AUC-ROC plot.
    rState : int, RandomState instance or None
             Controls the randomness of the training and testing indices produced.
             Pass an int for reproducible output across multiple function calls.
    split_number : int
                   Number of split to run the Stratified Shuffle Split (default 5)

    """

    # Prepare samples tensor and label array
    negative = os.listdir(path + '/negative/')
    count_negative = len(negative)
    print("Negative images: ", count_negative)
    positive = os.listdir(path + '/positive/')
    count_positive = len(positive)
    print("Positive images: ", count_positive)
    total_samples = count_negative + count_positive
    print("Total images: ", total_samples)
    imgs_array = np.ones((total_samples, input_shape[0], input_shape[1], input_shape[2]))
    print("Tensor shape ", imgs_array.shape, "\n")
    label_array = []

    index = 0

    for defect_type in os.listdir(path):
        read_path = path + '/' + defect_type + '/'
        for filename in os.listdir(read_path):
            imgs = load_img(os.path.join(read_path, filename), target_size=(input_shape[0], input_shape[1]))
            imgs = img_to_array(imgs)
            imgs = imgs.reshape((1, imgs.shape[0], imgs.shape[1], imgs.shape[2]))
            imgs = Preprocess_input(imgs)
            imgs_array[index,:,:,:] = imgs
            if defect_type == 'negative':
                label_array.append(0)
            elif defect_type == 'positive':
                label_array.append(1)
            index += 1

    X = imgs_array
    y = label_array

    # Makes y_samples a categorical matrix
    y = to_categorical(y, num_classes = 2)
    n_classes = y.shape[1]
    nsplits = split_number
    cv = StratifiedShuffleSplit(n_splits=nsplits, train_size=0.8, random_state = rState)

    tprs = []
    aucs = []
    scores = []
    prec_negative = np.zeros(shape=(nsplits))
    prec_positive = np.zeros(shape=(nsplits))
    recall_negative = np.zeros(shape=(nsplits))
    recall_positive = np.zeros(shape=(nsplits))
    f1Scores_negative = np.zeros(shape=(nsplits))
    f1Scores_positive = np.zeros(shape=(nsplits))
    mean_fpr = np.linspace(0, 1, 100)
    plt.figure(num=1, figsize=(10,10))
    i = 1

    for train, test in cv.split(X, y):
        X_train, X_val, y_train, y_val = train_test_split(X[train][:], y[train], test_size = 0.125, random_state = rState)
        X_test, y_test = X[test], y[test]
        model = GetConvModel(i==1)
        es = EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1, restore_best_weights=True)
        history = model.fit(x=X_train, y=y_train, validation_data=(X_val, y_val), epochs=100, batch_size=batchSize, verbose=1, callbacks=[es], shuffle=True)

        del X_train
        del X_val

        print("Computing scores...")
        evaluation = model.evaluate(X_test, y_test)
        scores.append(evaluation)
        print("Computing probs...")
        probas = model.predict(X_test, batch_size=batchSize, verbose=1)
        y_true = y_test.argmax(axis=1)
        pred = probas.argmax(axis=1)

        #del X_test

        # Compute ROC curve and area under the curve
        fpr = dict()
        tpr = dict()
        roc_auc = dict()

        for k in range(n_classes):
            fpr[k], tpr[k], _ = roc_curve(y_test[:, k], probas[:, k])
            roc_auc[k] = auc(fpr[k], tpr[k])

        # Compute micro-average ROC curve and ROC area
        fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), probas.ravel())
        tprs.append(np.interp(mean_fpr, fpr["micro"], tpr["micro"]))
        roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
        aucs.append(roc_auc["micro"])
        plt.plot(fpr["micro"], tpr["micro"], lw=2, alpha=0.3, label='ROC split %d (AUC = %0.4f)' % (i, roc_auc["micro"]))

        y_pred = np.round(pred)
        report = classification_report(y_true, y_pred, target_names=['negative', 'positive'], output_dict=True)
        prec_negative[i - 1] = report['negative']['precision']
        prec_positive[i - 1] = report['positive']['precision']
        recall_negative[i - 1] = report['negative']['recall']
        recall_positive[i - 1] = report['positive']['recall']
        f1Scores_negative[i - 1] = report['negative']['f1-score']
        f1Scores_positive[i - 1] = report['positive']['f1-score']

        print('confusion matrix split ' + str(i))
        print(confusion_matrix(y_true, y_pred))
        print(classification_report(y_true, y_pred, target_names=['negative', 'positive']))
        print('Loss: ' + str(evaluation[0]))
        print('Accuracy: ' + str(evaluation[1]))
        print('\n')

        i += 1

        del report
        del model

    plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r', label='Chance', alpha=.8)

    mean_tpr = np.mean(tprs, axis=0)
    mean_auc = auc(mean_fpr, mean_tpr)
    std_auc = np.std(aucs)
    plt.plot(mean_fpr, mean_tpr, color='b', label=r'Mean ROC (AUC = %0.4f $\pm$ %0.4f)' % (mean_auc, std_auc), lw=2, alpha=.8)

    std_tpr = np.std(tprs, axis=0)
    tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
    tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
    plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2, label=r'$\pm$ 1 std. dev.')

    plt.xlim([-0.01, 1.01])
    plt.ylim([-0.01, 1.01])
    plt.xlabel('False Positive Rate',fontsize=18)
    plt.ylabel('True Positive Rate',fontsize=18)
    plt.title('Cross-Validation ROC of ' + Model_name + ' model', fontsize=18)
    plt.legend(loc="lower right", prop={'size': 15})

    np_scores = np.array(scores)
    losses = np_scores[:, 0:1]
    accuracies = np_scores[:, 1:2]
    print('Losses')
    print(losses)
    print('Accuracies')
    print(accuracies)
    print('Precision class negative:\n', prec_negative)
    print('Precision class positive:\n', prec_positive)
    print('Recall class negative:\n', recall_negative)
    print('Recall class positive:\n', recall_positive)
    print('F1-scores class negative:\n', f1Scores_negative)
    print('F1-scores class positive:\n', f1Scores_positive)
    print("Avg loss: {0} +/- {1}".format(np.mean(losses), np.std(losses)))
    print("Avg accuracy: {0} +/- {1}".format(np.mean(accuracies), np.std(accuracies)))
    print("\nAvg Precision class negative: {0} +/- {1}".format(np.mean(prec_negative), np.std(prec_negative)))
    print("Avg Precision class positive: {0} +/- {1}".format(np.mean(prec_positive), np.std(prec_positive)))
    print("\nAvg Recall class negative: {0} +/- {1}".format(np.mean(recall_negative), np.std(recall_negative)))
    print("Avg Recall class positive: {0} +/- {1}".format(np.mean(recall_positive), np.std(recall_positive)))
    print("\nAvg f1-score class negative: {0} +/- {1}".format(np.mean(f1Scores_negative), np.std(f1Scores_negative)))
    print("Avg f1-score class positive: {0} +/- {1}".format(np.mean(f1Scores_positive), np.std(f1Scores_positive)))

    plt.savefig(folderName + '/' + Model_name.replace('+', '') + '.pdf')
    plt.show()

    del imgs_array
    del label_array
    del X
    del y
    del prec_negative
    del prec_positive
    del recall_negative
    del recall_positive
    del f1Scores_negative
    del f1Scores_positive
    del accuracies
    del losses
    del np_scores

### 3.1 Experiments with the original binary dataset

The following cells run the **tests on the original binary dataset**, model by model, for each of the ten compared neural networks. Before running the experiments a folder to store the AUC-ROC plots of the experiments is created. After each experiments, the imagenet weights of the tested neural network can be deleted, in case limited disk space is available.

In [None]:
# Creates a folder to store the results of experiments with the binary dataset
!mkdir binary

In [None]:
# Binary classification on the original dataset with VGG16
runExperiment(GetBestBinaryVGG16Model, vgg16_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR, 'VGG16 + dense layers',
    'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with VGG19
runExperiment(GetBestBinaryVGG19Model, vgg19_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'VGG19 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with ResNet50V2
runExperiment(GetBestBinaryResNet50V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'ResNet50V2 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with ResNet101V2
runExperiment(GetBestBinaryResNet101V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'ResNet101V2 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with ResNet152V2
runExperiment(GetBestBinaryResNet152V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'ResNet152V2 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with InceptionV3
runExperiment(GetBestBinaryInceptionV3Model, inception_v3_preprocess_input, (299,299,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'InceptionV3 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with MobileNetV2
runExperiment(GetBestBinaryMobileNetV2Model, mobilenet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'MobileNetV2 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with NASNetMobile
runExperiment(GetBestBinaryNasNetMobileModel, nasnet_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'NASNetMobile + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with MobileNet
runExperiment(GetBestBinaryDenseNet121Model, densenet_preprocess_input, (224,224,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'DenseNet121 + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the original dataset with Xception
runExperiment(GetBestBinaryXceptionModel, xception_preprocess_input, (299,299,3), 32, ORIGINAL_BINARY_DATASET_DIR,
    'Xception + dense layers', 'binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

### 3.2 Experiments with the augmented binary dataset

The following cells run the **tests on the augmented binary dataset**, model by model, for each of the ten compared neural networks. Before running the experiments a folder to store the AUC-ROC plots of the experiments is created. After each experiments, the imagenet weights of the tested neural network can be deleted, in case limited disk space is available.

In [None]:
# Creates a folder to store the results of experiments with the augmented binary dataset
!mkdir augmented-binary

In [None]:
# Binary classification on the augmented dataset with VGG16
runExperiment(GetBestBinaryVGG16Model, vgg16_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'VGG16 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with VGG19
runExperiment(GetBestBinaryVGG19Model, vgg19_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'VGG19 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with ResNet50V2
runExperiment(GetBestBinaryResNet50V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'ResNet50V2 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with ResNet101V2
runExperiment(GetBestBinaryResNet101V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'ResNet101V2 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with ResNet152V2
runExperiment(GetBestBinaryResNet152V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'ResNet152V2 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with InceptionV3
runExperiment(GetBestBinaryInceptionV3Model, inception_v3_preprocess_input, (299,299,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'InceptionV3 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with MobileNetV2
runExperiment(GetBestBinaryMobileNetV2Model, mobilenet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'MobileNetV2 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with NASNetMobile
runExperiment(GetBestBinaryNasNetMobileModel, nasnet_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'NASNetMobile + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with DenseNet121
runExperiment(GetBestBinaryDenseNet121Model, densenet_preprocess_input, (224,224,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'DenseNet121 + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Binary classification on the augmented dataset with Xception
runExperiment(GetBestBinaryXceptionModel, xception_preprocess_input, (299,299,3), 32, AUGMENTED_BINARY_DATASET_DIR,
    'Xception + dense layers', 'augmented-binary', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

## 4 Multi-class classification experiments

The following cells:

- define an utility class to load training and test samples from disk during the experiments, in order to avoid keeping all the dataset in RAM;
- define **ten utility functions to build the ten end-to-end deep neural networks** developed for the experiments to test the multi-class classification of carbon look component images into negative (no defects), with recoverable defects, and with non-recoverable defects;
- define the **utility function to run an experiment with the multi-class classification**. An experiment consists of tests repeated 10 times with the **stratified shuffle split cross-validation scheme**. In each split 80% of data are used for training, and 20% of data are used for testing. 12,5% of the training data (i.e. 10% of the entire dataset) is used for validation. In other words, in each test **70% of data are actually for training, 10% for validation, and 20% for testing.**

In [None]:
from sklearn.preprocessing import label_binarize
from scipy import interp
from tensorflow.keras.utils import Sequence

class DataGen(Sequence) :
    """ A sequence of data for training/test/validation, loaded from memory
    batch by batch. Extends the tensorflow.keras.utils.Sequence: https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence

    Attributes
    ----------
    base_path : str
                path to the folder including the samples.
    filenames : list<str>
                list of sample filenames.
    labels : list<str>
             list of sample labels.
    batch_size : int
                 batch size to load samples

    """

    def __init__(self, Preprocess_input, base_path, filenames, labels, input_shape, batch_size) :
        self.Preprocess_input = Preprocess_input
        self.base_path = base_path
        self.filenames = filenames
        self.labels = labels
        self.input_shape = input_shape
        self.batch_size = batch_size


    def __len__(self) :
        return (np.ceil(len(self.filenames) / float(self.batch_size))).astype(np.int)


    def __getitem__(self, idx) :
        batch_x = self.filenames[idx * self.batch_size : (idx+1) * self.batch_size]
        batch_y = self.labels[idx * self.batch_size : (idx+1) * self.batch_size]
        index = 0
        imgs_array = np.ones((len(batch_x), self.input_shape[0], self.input_shape[1], self.input_shape[2]))

        for file_name in batch_x:
            imgs = load_img(file_name, target_size=(self.input_shape[0], self.input_shape[1]))
            imgs = img_to_array(imgs)
            imgs = imgs.reshape((1, imgs.shape[0], imgs.shape[1], imgs.shape[2]))
            imgs = self.Preprocess_input(imgs)
            imgs_array[index,:,:,:] = imgs
            index += 1

        return imgs_array, np.array(batch_y)


def GetBestThreeClassesVGG16Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, VGG16, optimizers.SGD(learning_rate=0.001, momentum=0.9),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, True, [512], 3)

def GetBestThreeClassesVGG19Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, VGG19, optimizers.SGD(learning_rate=0.001, momentum=0.9),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, True, [512], 3)

def GetBestThreeClassesResNet50V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet50V2, optimizers.Adam(learning_rate=0.001),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesResNet101V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet101V2, optimizers.Adam(learning_rate=0.001),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesResNet152V2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, ResNet152V2, optimizers.Adam(learning_rate=0.001),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesInceptionV3Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, InceptionV3, optimizers.Adam(learning_rate=0.0001),
    'categorical_crossentropy', (299,299,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesMobileNetV2Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, MobileNetV2, optimizers.SGD(learning_rate=0.0001, momentum=0.9),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesNasNetMobileModel(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, NASNetMobile, optimizers.Adam(learning_rate=0.0001),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesDenseNet121Model(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, DenseNet121, optimizers.Adam(learning_rate=0.0001),
    'categorical_crossentropy', (224,224,3), print_summary, 8, True, False, [512], 3)

def GetBestThreeClassesXceptionModel(print_summary=True) :
    return GetEndToEndModel(GetPretrainedModel, Xception, optimizers.Adam(learning_rate=0.0001),
    'categorical_crossentropy', (299,299,3), print_summary, 4, True, False, [512], 3)

def runThreeClassesExperiment(GetConvModel, Preprocess_input, input_shape, batchSize,
                              path, Model_name, folderName, rState, split_number=5):

    """" Runs a classifier test on the dataset composed of three classes (no
    defects, non-recoverable defects, recoverable defects), using the stratified
    shuffled splits, with 70% of data for training, 10% for validation, 20% for test.

    Parameters
    ----------
    GetConvModel : Callable[Sequential]
                   Function returning the sequential model to be tested.
    Preprocess_input : Callable[[][]]
    input_shape : tuple
                  Shape of a model's input.
    batchSize : int
                Batch size to be used for training and testing
    path : str
           Path to the folder with images to be processed
    Model_name : str
                 Model name to be used in the title of AUC-ROC plot
    folderName : str
                 Name for the folder to store the AUC-ROC plot.
    rState : int, RandomState instance or None
             Controls the randomness of the training and testing indices produced.
             Pass an int for reproducible output across multiple function calls.
    split_number : int
                   Number of split to run the Stratified Shuffle Split (default 5)

    """

    # Prepare samples tensor and label array
    negative = os.listdir(path + '/negative/')
    count_negative = len(negative)
    print("Negative images: ", count_negative)
    non_recoverable_defects = os.listdir(path + '/non_recoverable_defects/')
    count_non_recoverable_defects = len(non_recoverable_defects)
    print("Non recoverable defects images: ", count_non_recoverable_defects)
    recoverable_defects = os.listdir(path + '/recoverable_defects/')
    count_recoverable_defects = len(recoverable_defects)
    print("Recoverable defects images: ", count_recoverable_defects)
    total_samples = count_negative + count_non_recoverable_defects + count_recoverable_defects
    print("Total images: ", total_samples)
    filenames = []
    label_array = []

    index = 0

    for defect_type in os.listdir(path):
        read_path = path + '/' + defect_type + '/'
        for filename in os.listdir(read_path):
            filenames.append(os.path.join(read_path, filename))
            if defect_type == 'negative':
                label_array.append(0)
            elif defect_type == 'non_recoverable_defects':
                label_array.append(1)
            elif defect_type == 'recoverable_defects':
                label_array.append(2)
            index += 1

    X = np.array(filenames)
    y = label_array

    y = label_binarize(y, classes=[0, 1, 2])
    n_classes = y.shape[1]
    nsplits = split_number
    #cv = StratifiedKFold(n_splits=nsplits, shuffle=True)
    cv = StratifiedShuffleSplit(n_splits=nsplits, train_size=0.8, random_state = rState)

    tprs = []
    aucs = []
    scores = []
    prec_negative = np.zeros(shape=(nsplits))
    prec_nrd = np.zeros(shape=(nsplits))
    prec_rd = np.zeros(shape=(nsplits))
    #prec = np.zeros(shape=(nsplits))
    recall_negative = np.zeros(shape=(nsplits))
    recall_nrd = np.zeros(shape=(nsplits))
    recall_rd = np.zeros(shape=(nsplits))
    #recall = np.zeros(shape=(nsplits))
    f1Scores_negative = np.zeros(shape=(nsplits))
    f1Scores_nrd = np.zeros(shape=(nsplits))
    f1Scores_rd = np.zeros(shape=(nsplits))
    #f1Scores = np.zeros(shape=(nsplits))
    mean_fpr = np.linspace(0, 1, 100)
    plt.figure(num=1, figsize=(10,10))
    i = 1

    for train, test in cv.split(X, y):
        X_train, X_val, y_train, y_val = train_test_split(X[train][:], y[train], test_size = 0.125, random_state = rState)
        X_test, y_test = X[test], y[test]
        model = GetConvModel(i==1)
        es = EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1, restore_best_weights=True)
        #history = model.fit(x=X_train, y=y_train, validation_data=(X_val, y_val), epochs=100, batch_size=batchSize, verbose=1, callbacks=[es], shuffle=True)
        training_batch_generator = DataGen(Preprocess_input, '', X_train, y_train, input_shape, batchSize)
        validation_batch_generator = DataGen(Preprocess_input, '', X_val, y_val, input_shape, batchSize)
        test_batch_generator = DataGen(Preprocess_input, '', X[test][:], y[test], input_shape, batchSize)
        history = model.fit(x=training_batch_generator, validation_data=validation_batch_generator, epochs=100, batch_size=batchSize, verbose=1, callbacks=[es], shuffle=True)

        del X_train
        del X_val

        print("\nComputing scores...")
        evaluation = model.evaluate(x=test_batch_generator)
        scores.append(evaluation)
        print("Computing probs...")
        probas = model.predict(x=test_batch_generator, batch_size=batchSize, verbose=1)
        y_true = y_test.argmax(axis=1)
        pred = probas.argmax(axis=1)

        #del X_test

        # Compute ROC curve and area under the curve
        fpr = dict()
        tpr = dict()
        roc_auc = dict()

        for k in range(n_classes):
            fpr[k], tpr[k], _ = roc_curve(y_test[:, k], probas[:, k])
            roc_auc[k] = auc(fpr[k], tpr[k])

        # Compute micro-average ROC curve and ROC area
        fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), probas.ravel())
        tprs.append(np.interp(mean_fpr, fpr["micro"], tpr["micro"]))
        roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
        aucs.append(roc_auc["micro"])
        plt.plot(fpr["micro"], tpr["micro"], lw=2, alpha=0.3, label='ROC split %d (AUC = %0.4f)' % (i, roc_auc["micro"]))

        y_pred = np.round(pred)
        report = classification_report(y_true, y_pred, target_names=['negative', 'non_recoverable_defects', 'recoverable_defects'], output_dict=True)
        prec_negative[i - 1] = report['negative']['precision']
        prec_nrd[i - 1] = report['non_recoverable_defects']['precision']
        prec_rd[i - 1] = report['recoverable_defects']['precision']

        recall_negative[i - 1] = report['negative']['recall']
        recall_nrd[i - 1] = report['non_recoverable_defects']['recall']
        recall_rd[i - 1] = report['recoverable_defects']['recall']

        f1Scores_negative[i - 1] = report['negative']['f1-score']
        f1Scores_nrd[i - 1] = report['non_recoverable_defects']['f1-score']
        f1Scores_rd[i - 1] = report['recoverable_defects']['f1-score']

        print('\nconfusion matrix split ' + str(i))
        print(confusion_matrix(y_true, y_pred))
        print(classification_report(y_true, y_pred, target_names=['negative', 'non_recoverable_defects', 'recoverable_defects']))
        print('Loss: ' + str(evaluation[0]))
        print('Accuracy: ' + str(evaluation[1]))
        print('\n')

        i += 1

        del report
        del model

    plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='r', label='Chance', alpha=.8)

    mean_tpr = np.mean(tprs, axis=0)
    mean_auc = auc(mean_fpr, mean_tpr)
    std_auc = np.std(aucs)
    plt.plot(mean_fpr, mean_tpr, color='b', label=r'Mean ROC (AUC = %0.4f $\pm$ %0.4f)' % (mean_auc, std_auc), lw=2, alpha=.8)

    std_tpr = np.std(tprs, axis=0)
    tprs_upper = np.minimum(mean_tpr + std_tpr, 1)
    tprs_lower = np.maximum(mean_tpr - std_tpr, 0)
    plt.fill_between(mean_fpr, tprs_lower, tprs_upper, color='grey', alpha=.2, label=r'$\pm$ 1 std. dev.')

    plt.xlim([-0.01, 1.01])
    plt.ylim([-0.01, 1.01])
    plt.xlabel('False Positive Rate',fontsize=18)
    plt.ylabel('True Positive Rate',fontsize=18)
    plt.title('Cross-Validation ROC of ' + Model_name + ' model', fontsize=18)
    plt.legend(loc="lower right", prop={'size': 15})

    np_scores = np.array(scores)
    losses = np_scores[:, 0:1]
    accuracies = np_scores[:, 1:2]
    print('Losses')
    print(losses)
    print('Accuracies')
    print(accuracies)

    print('Precision class negative:\n', prec_negative)
    print('Precision class non_recoverable_defects:\n', prec_nrd)
    print('Precision class recoverable_defects:\n', prec_rd)

    print('Recall class negative:\n', recall_negative)
    print('Recall class non_recoverable_defects:\n', recall_nrd)
    print('Recall class recoverable_defects:\n', recall_rd)

    print('F1-scores class negative:\n', f1Scores_negative)
    print('F1-scores class non_recoverable_defects:\n', f1Scores_nrd)
    print('F1-scores class recoverable_defects:\n', f1Scores_rd)

    print("Avg loss: {0} +/- {1}".format(np.mean(losses), np.std(losses)))
    print("Avg accuracy: {0} +/- {1}".format(np.mean(accuracies), np.std(accuracies)))
    print("\nAvg Precision class negative: {0} +/- {1}".format(np.mean(prec_negative), np.std(prec_negative)))
    print("Avg Precision class non_recoverable_defects: {0} +/- {1}".format(np.mean(prec_nrd), np.std(prec_nrd)))
    print("Avg Precision class recoverable_defects: {0} +/- {1}".format(np.mean(prec_rd), np.std(prec_rd)))
    print("\nAvg Recall class negative: {0} +/- {1}".format(np.mean(recall_negative), np.std(recall_negative)))
    print("Avg Recall class non_recoverable_defects: {0} +/- {1}".format(np.mean(recall_nrd), np.std(recall_nrd)))
    print("Avg Recall class recoverable_defects: {0} +/- {1}".format(np.mean(recall_rd), np.std(recall_rd)))
    print("\nAvg f1-score class negative: {0} +/- {1}".format(np.mean(f1Scores_negative), np.std(f1Scores_negative)))
    print("Avg f1-score class non_recoverable_defects: {0} +/- {1}".format(np.mean(f1Scores_nrd), np.std(f1Scores_nrd)))
    print("Avg f1-score class recoverable_defects: {0} +/- {1}".format(np.mean(f1Scores_rd), np.std(f1Scores_rd)))

    plt.savefig(folderName + '/' + Model_name.replace('+', '') + '.pdf')
    plt.show()

    del label_array
    del X
    del y
    del prec_negative
    del prec_nrd
    del prec_rd
    del recall_negative
    del recall_nrd
    del recall_rd
    del f1Scores_negative
    del f1Scores_nrd
    del f1Scores_rd
    del accuracies
    del losses
    del np_scores

### 4.1 Experiments with the original multi-class dataset

The following cells run the **tests on the original multi-class dataset**, model by model, for each of the ten compared neural networks. Before running the experiments a folder to store the AUC-ROC plots of the experiments is created. After each experiments, the imagenet weights of the tested neural network can be deleted, in case limited disk space is available.

In [None]:
# Creates a folder to store the results of experiments with the multi-class dataset
!mkdir 'multi'

In [None]:
# Multi-class classification on the original dataset with VGG16
runThreeClassesExperiment(GetBestThreeClassesVGG16Model, vgg16_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'VGG16 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with VGG19
runThreeClassesExperiment(GetBestThreeClassesVGG19Model, vgg19_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
                          'VGG19 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with ResNet50V2
runThreeClassesExperiment(GetBestThreeClassesResNet50V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'ResNet50V2 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with ResNet101V2
runThreeClassesExperiment(GetBestThreeClassesResNet101V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'ResNet101V2 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with ResNet152V2
runThreeClassesExperiment(GetBestThreeClassesResNet152V2Model, resnet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'ResNet152V2 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with InceptionV3
runThreeClassesExperiment(GetBestThreeClassesInceptionV3Model, inception_v3_preprocess_input, (299,299,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'InceptionV3 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with MobileNetV2
runThreeClassesExperiment(GetBestThreeClassesMobileNetV2Model, mobilenet_v2_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'MobileNetV2 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with NASNetMobile
runThreeClassesExperiment(GetBestThreeClassesNasNetMobileModel, nasnet_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'NASNetMobile + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with DenseNet121
runThreeClassesExperiment(GetBestThreeClassesDenseNet121Model, densenet_preprocess_input, (224,224,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'DenseNet121 + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the original dataset with Xception
runThreeClassesExperiment(GetBestThreeClassesXceptionModel, xception_preprocess_input, (299,299,3), 32, ORIGINAL_MULTI_DATASET_DIR,
    'Xception + dense layers', 'multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

### 4.2 Experiments with the original multi-class dataset

The following cells run the **tests on the augmented multi-class dataset**, model by model, for each of the ten compared neural networks. Before running the experiments a folder to store the AUC-ROC plots of the experiments is created. After each experiments, the imagenet weights of the tested neural network can be deleted, in case limited disk space is available.

In [None]:
# Creates a folder to store the results of experiments with the augmented multi-class dataset
!mkdir 'augmented-multi'

In [None]:
# Multi-class classification on the augmented dataset with VGG16
runThreeClassesExperiment(GetBestThreeClassesVGG16Model, vgg16_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'VGG16 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with VGG19
runThreeClassesExperiment(GetBestThreeClassesVGG19Model, vgg19_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
                          'VGG19 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with ResNet50V2
runThreeClassesExperiment(GetBestThreeClassesResNet50V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'ResNet50V2 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with ResNet101V2
runThreeClassesExperiment(GetBestThreeClassesResNet101V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'ResNet101V2 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with ResNet152V2
runThreeClassesExperiment(GetBestThreeClassesResNet152V2Model, resnet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'ResNet152V2 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with InceptionV3
runThreeClassesExperiment(GetBestThreeClassesInceptionV3Model, inception_v3_preprocess_input, (299,299,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'InceptionV3 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with MobileNetV2
runThreeClassesExperiment(GetBestThreeClassesMobileNetV2Model, mobilenet_v2_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'MobileNetV2 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with NASNetMobile
runThreeClassesExperiment(GetBestThreeClassesNasNetMobileModel, nasnet_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'NASNetMobile + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with DenseNet121
runThreeClassesExperiment(GetBestThreeClassesDenseNet121Model, densenet_preprocess_input, (224,224,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'DenseNet121 + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/

In [None]:
# Multi-class classification on the augmented dataset with Xception
runThreeClassesExperiment(GetBestThreeClassesXceptionModel, xception_preprocess_input, (299,299,3), 32, AUGMENTED_MULTI_DATASET_DIR,
    'Xception + dense layers', 'augmented-multi', 42, 10)

In [None]:
# Run this to delete the weights of the previously tested neuraln networks, in
# case you have limited disk space
!rm -r ~/.keras/models/*
!ls ~/.keras/models/