# Marcer-Marchisciana
**Andrea Marcer - 10537040**

**Matteo Marchisciana - 10586574**

In this homework, we tried different models and approaches to tackle the problem. The solutions we tested are:
- Custom models
- Face detection approach
- Two steps approach
- Transfer learning
 - VGG16
 - *VGG16 with regularization*
 - Xception
 - NASNet Mobile

Here we give a brief description of each approach and why we used it. More info will be written at the start of each section.

### Custom models
First, we tried to take inspiration from VGG16 architecture, copying the main structure, but reducing number of layers and filters to avoid overfitting since the VGG network is very complex. However, we only reached a record of ~85% on validation accuracy.

### Face detection approach
We thought that feeding directly the images to the CNN wasn't meaningful enought to classify correctly the dataset. In order to have higher possibility to classify the image we could detect all the faces and understand if they are wearing a mask or not. To do so we used an external Neural Network (MTCNN) to perform face detection; once we had the location of the faces we cropped each of them and fed them to the mask/no mask CNN. Even if this latter CNN was performing very well on the training and validation set (~96% accuracy) the MTCNN wasn't able to detect some faces or was detecting background. The overall performance was ~87% on the test using the training dataset of the homework.

### Two steps approach
For the same reason as the Face detection approach, we designed a two-steps approach. The first step is to use a CNN to detect if there is at least one mask in the image and, if the response is affermative (it detects at least one mask), the second step is to distinguish if there is at least a face without a mask. For this reason we tried two different CNN on the homework dataset, based off of VGG16. The first CNN (mask vs nomask) performed really well on validation (~97%) but, expectedly, the second one did not match its performance, scoring ~91%. This approach performed ~86% on the Kaggle test.

### Transfer learning
We tried different models for transfer learning. We started with VGG16 as seen in the lessons, then we tested Xception by Google, and NASNet mobile. The best performance, however, was initially reached with VGG16, so we decided to focus and spend more time optimizing the parameters for this model.
We also tried to use the VGG16 preprocessing function, but the model performed equal or better to our preprocessing.
The best result was obtained with:
- lambda_conv = 0.001
- freeze_until = 6
- learning_rate = 1e-5
- Dropout rate = 0.3
- Epoch = 23
- Two dense layer of 512 units, Dropout, Softmax

# Overfitting
The main issue we encountered was overfitting. The validation accuracy often remained around ~85% while the training accuracy was ~97% and training loss was very low. For this reason nearly every approach we followed was aimed at reducing overfitting.
To do so we used Dropout in the top part of the network, we introduced kernel and bias regularizers in Convolutional layers, and tried different configuration of fine tuning. Our main effort was spent in researching and optimizing these parameters.

### Data augmentation
We initially started training the models without data augmentation, but we soon found out that applying it was another method to reduce overfitting. The parameter we used for the final model are 
 - rotation_range=30
 - width_shift_range=25
 - height_shift_range=25
 - zoom_range=0.2
 - horizontal_flip=True

We did not want to go too far with transformations because we feared it might impact the label of the image (for example, maybe shifting the image too far would crop a face).

## Attachments
To better understand our work, we thought it was better to include certain files used in this homework. The file we included are:
- *TestModel.py*: the script we used to generate the CSV from the predictions of the test set. We took these CSVs and submitted them on Kaggle to evaluate our predictions.

- *TestModelTwoSteps.py*: this is the variant of the TestModel.py script we used to evaluate our Two Steps Approach.

- *FinalModel.py*: the script that generated the best performance, both on validation set and Kaggle test.

- *VGG16Imagenet_regularized.py*: this script defines a single function and returns the VGG16 model with regularization and imagenet weights. It is called by FinalModel.py

- *TrainingSplitter.py*: the script that splits the training set in training and validation dataset, both labeled, allowing to use the flow_from_directory method.

# Initialization

In [None]:
!sudo pip install mtcnn

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import seaborn as sns

from mtcnn.mtcnn import MTCNN
from sklearn.metrics import confusion_matrix, classification_report
from keras_preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard
from keras.regularizers import l1, l2, l1_l2
from datetime import datetime
from PIL import Image
from google.colab import drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
################ Settings ################
SEED = 123456789

verbose = 0;

# Current Working Directory
cwd = '/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/'
##########################################

tf.random.set_seed(SEED)
np.random.seed(SEED)

# Custom
With this model we took inspiration from the VGG16's architecture. The custom model has 10 Convolutional layers divided in 5 blocks, with MaxPooling layer at the end of each block. In the last part of the network we have a flatten layer and one dense layer with 512 neurons. We added one Dropout layer to avoid overfitting.

## Dataset

In [None]:
################ Settings ################
# Input image shape
img_height = 256 
img_width = 256
img_channels = 3

# Directories
dataset_dir = os.path.join(cwd, 'homework1-dataset')
training_dir = os.path.join(dataset_dir, 'training')
validation_dir = os.path.join(dataset_dir, 'training')
test_dir = os.path.join(dataset_dir, 'test')
##########################################

#### Dataframe

In [None]:
################ Settings ################
json_path = os.path.join(dataset_dir, 'train_gt.json')
##########################################

json = pd.read_json(lines=True, path_or_buf=json_path)

# Create Data Frame 
df = pd.DataFrame(json)
# First row is the label of the file
df = df.rename(index={0: "label"})
# Transpose the dataframe
df = df.T
# Convert elements to strings 
df['file'] = df.index.astype(str)
df['label'] = df['label'].astype(str)
# Shuffle the dataframe
df = df.sample(frac=1)
# Set number of classes as the number of different values in the label column
num_classes = df.groupby('label').count().T.columns.size

print(df.groupby('label').count())
print("\n\n")
print(df)

       file
label      
0      1900
1      1897
2      1817



          label       file
16201.jpg     0  16201.jpg
10306.jpg     1  10306.jpg
16294.jpg     1  16294.jpg
14489.jpg     0  14489.jpg
15982.jpg     1  15982.jpg
...         ...        ...
10714.jpg     0  10714.jpg
10787.jpg     0  10787.jpg
16311.jpg     2  16311.jpg
15736.jpg     1  15736.jpg
14511.jpg     1  14511.jpg

[5614 rows x 2 columns]


### Training Dataset

In [None]:
################ Settings ################
apply_data_augmentation = False
batch_size = 64
validation_split = 0.2
##########################################

# ImageDataGenerator
if apply_data_augmentation:
    train_data_gen = ImageDataGenerator(
        rotation_range = 10,
        width_shift_range = 10,
        height_shift_range = 10,
        zoom_range = 0.3,
        horizontal_flip=True,
        # vertical_flip=True,
        fill_mode = "nearest",
        cval=0,
        rescale=1./255,# All pixels in the range 0-1
        validation_split=validation_split) 
else:
    train_data_gen = ImageDataGenerator(
        rescale=1./255,
        validation_split=validation_split)
    
train_gen=train_data_gen.flow_from_dataframe(
  dataframe = df,
  directory = training_dir,
  x_col = "file",
  y_col = "label",
  subset = "training",
  batch_size = batch_size,
  seed = SEED,
  shuffle = True,
  class_mode = "categorical",
  target_size = (img_height, img_width)
)

train_dataset = tf.data.Dataset.from_generator(
    lambda: train_gen,
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

train_dataset = train_dataset.repeat()

Found 4492 validated image filenames belonging to 3 classes.


### Validation Dataset

In [None]:
#valid_data_gen = ImageDataGenerator(rescale=1./255)

valid_gen = train_data_gen.flow_from_dataframe(
    dataframe=df,
    directory = validation_dir,
    x_col="file",
    y_col="label",
    target_size = (img_height, img_width),
    class_mode = "categorical",
    seed = SEED,
    subset = "validation",
    batch_size = batch_size
)

valid_dataset = tf.data.Dataset.from_generator(
    lambda: valid_gen, 
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, img_height, img_width, img_channels], [None, num_classes]))

valid_dataset = valid_dataset.repeat()

Found 1122 validated image filenames belonging to 3 classes.


## Model

In [None]:
################ Model parameters ################
activation_function = 'selu'
padding = 'same'
pool_size = (2, 2)
output_activation_function = 'softmax'
input_shape = (img_height, img_width, img_channels)
output_neurons = num_classes
#####################################################

In [None]:
model = Sequential()

model.add(Conv2D(64, (5,5), activation=activation_function, padding=padding, input_shape=input_shape))
model.add(Conv2D(64, (3,3), activation=activation_function, padding=padding))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Conv2D(128, (3,3), activation=activation_function, padding=padding))
model.add(Conv2D(128, (3,3), activation=activation_function, padding=padding))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Conv2D(256, (3,3), activation=activation_function, padding=padding))
model.add(Conv2D(256, (3,3), activation=activation_function, padding=padding))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Conv2D(512, (3,3), activation=activation_function, padding=padding))
model.add(Conv2D(512, (3,3), activation=activation_function, padding=padding))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Conv2D(512, (3,3), activation=activation_function, padding=padding))
model.add(Conv2D(512, (3,3), activation=activation_function, padding=padding))
model.add(MaxPooling2D(pool_size=pool_size))

model.add(Flatten())
model.add(Dense(512, activation=activation_function))
model.add(Dropout(rate = 0.2))
model.add(Dense(output_neurons, activation=output_activation_function))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 256, 256, 64)      4864      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 64)      36928     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 128)     73856     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 128, 128, 128)     147584    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 64, 64, 256)       2

## Training

In [None]:
################ Optimization params ################ 
loss = tf.keras.losses.CategoricalCrossentropy()
learning_rate = 1e-4
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
metrics = ['accuracy']
#####################################################

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

In [None]:
################ Settings ################
# Tensorboard
tensorBoard = False

# Early stopping
earlyStopping = True
patience = 10

# Model check
checkpoints_dir = os.path.join(cwd, 'checkpoints')
modelCheckpoint = False
##########################################

callback_list = []

if (earlyStopping):
  es = EarlyStopping(
      monitor = 'val_loss',
      min_delta = 0, 
      patience = patience,
      verbose = verbose,
      mode = 'auto',
      baseline = None,
      restore_best_weights = True)
  callback_list.append(es)

if (modelCheckpoint):
  cp = ModelCheckpoint(
      checkpoints_dir,
      monitor = 'val_loss',
      verbose = verbose,
      save_best_only = False,
      save_weights_only = False,
      mode = 'auto',
      save_freq = 1)
  callback_list.append(cp)

if (tensorBoard):
  tb = TensorBoard(
      log_dir='/content/tb_log',
      profile_batch=0,
      histogram_freq=1, # if 1 shows weights histograms
      ) 
  callback_list.append(tb)

In [None]:
################ Settings ################
epochs = 300
##########################################
 
model.fit(
    x = train_dataset,
    epochs = epochs,
    callbacks = callback_list,
    steps_per_epoch = len(train_gen), 
    validation_data = valid_dataset,
    validation_steps = len(valid_gen)
)

Epoch 1/300
 1/71 [..............................] - ETA: 0s - loss: 2.0654 - accuracy: 0.2969

# VGG
With this model we used the transfer learning approach to copy the bottom structure and the weights of the VGG16. We tried to append one and two Dense layers before finally adding the last Dense layer with softmax activation.
Using fine tuning, we tried different configuration of frozen layers, learning rates and activation functions. When it was clear that this was the approach that performed better, we added an extra regularization methods: we hand-wrote the model in python and downloaded and loaded the weight. Doing so allowed us to add, in each convolutional layer, a kernel and a bias regularizer. We thought that doing so will help to avoid overfitting, since the l2 function penalizes big weights. It was then necessary to research the optimal lambda parameter.
In this approach we used a script to split training and validation sets and put them in labeled folder, in order to use the flow_from_directory method of ImageDataGenerator. Since it was done in local, we will write here a version that uses flow_from_dataframe. However, the script is used is attached in the zip file.

The best result was obtained with:
- lambda_conv = 0.001
- freeze_until = 6
- learning_rate = 1e-5
- Dropout rate = 0.3
- Epoch = 23
- Two dense layer of 512 units, Dropout, Softmax

## Dataset

In [None]:
################ Settings ################
# Input image shape
img_height = 256 
img_width = 256
img_channels = 3

# Directories
dataset_dir = os.path.join(cwd, 'homework1-dataset')
training_dir = os.path.join(dataset_dir, 'training')
validation_dir = os.path.join(dataset_dir, 'training')
test_dir = os.path.join(dataset_dir, 'test')
##########################################

#### Dataframe

In [None]:
################ Settings ################
json_path = os.path.join(dataset_dir, 'train_gt.json')
##########################################

json = pd.read_json(lines=True, path_or_buf=json_path)

# Create Data Frame 
df = pd.DataFrame(json)
# First row is the label of the file
df = df.rename(index={0: "label"})
# Transpose the dataframe
df = df.T
# Convert elements to strings 
df['file'] = df.index.astype(str)
df['label'] = df['label'].astype(str)
# Shuffle the dataframe
df = df.sample(frac=1)
# Set number of classes as the number of different values in the label column
num_classes = df.groupby('label').count().T.columns.size

print(df.groupby('label').count())
print("\n\n")
print(df)

### Training Dataset

In [None]:
################ Settings ################
apply_data_augmentation = True
batch_size = 32
validation_split = 0.2
##########################################

# ImageDataGenerator
if apply_data_augmentation:
    train_data_gen = ImageDataGenerator(
        rotation_range=20,
        width_shift_range=25,
        height_shift_range=25,
        zoom_range=0.2,
        horizontal_flip=True,
        rescale=1. / 255,
        validation_split=validation_split
        ) 
else:
    train_data_gen = ImageDataGenerator(
        rescale=1./255,
        validation_split=validation_split)
    
train_gen=train_data_gen.flow_from_dataframe(
  dataframe = df,
  directory = training_dir,
  x_col = "file",
  y_col = "label",
  subset = "training",
  batch_size = batch_size,
  seed = SEED,
  shuffle = True,
  class_mode = "categorical",
  target_size = (img_height, img_width)
)

train_dataset = tf.data.Dataset.from_generator(
    lambda: train_gen,
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

train_dataset = train_dataset.repeat()

Found 4492 validated image filenames belonging to 3 classes.


### Validation Dataset

In [None]:
#valid_data_gen = ImageDataGenerator(rescale=1./255)

valid_gen = train_data_gen.flow_from_dataframe(
    dataframe=df,
    directory = validation_dir,
    x_col="file",
    y_col="label",
    target_size = (img_height, img_width),
    class_mode = "categorical",
    seed = SEED,
    subset = "validation",
    batch_size = batch_size
)

valid_dataset = tf.data.Dataset.from_generator(
    lambda: valid_gen, 
    output_types=(tf.float32, tf.float32),
    output_shapes=([None, img_height, img_width, img_channels], [None, num_classes]))

valid_dataset = valid_dataset.repeat()

Found 1122 validated image filenames belonging to 3 classes.


## Model

In [None]:
############ Structure and weight of VGG16 with regularization #################

lambda_conv = 0.001
imagenet_path = "vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5"


def VGG16Imagenet_Regularized():
    model = keras.Sequential()

    model.add(layers.Conv2D(64, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block1_conv1',
                            input_shape=(256, 256, 3)))
    model.add(layers.Conv2D(64, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block1_conv2'))
    model.add(layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool'))

    # Block 2
    model.add(layers.Conv2D(128, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block2_conv1'))
    model.add(layers.Conv2D(128, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block2_conv2'))
    model.add(layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool'))

    # Block 3
    model.add(layers.Conv2D(256, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block3_conv1'))
    model.add(layers.Conv2D(256, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block3_conv2'))
    model.add(layers.Conv2D(256, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block3_conv3'))
    model.add(layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool'))

    # Block 4
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block4_conv1'))
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block4_conv2'))
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block4_conv3'))
    model.add(layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool'))

    # Block 5
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block5_conv1'))
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block5_conv2'))
    model.add(layers.Conv2D(512, (3, 3),
                            activation='relu',
                            padding='same',
                            kernel_regularizer=l2(lambda_conv),
                            bias_regularizer=l2(lambda_conv),
                            name='block5_conv3'))
    model.add(layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool'))

    model.load_weights(imagenet_path)

    return model

In [None]:
model = VGG16Imagenet_Regularized()
model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(units=512,
                                activation="selu",
                                kernel_regularizer=l2(0.001),
                                bias_regularizer=l2(0.001)))
model.add(tf.keras.layers.Dense(units=512,
                                activation="selu",
                                kernel_regularizer=l2(0.001),
                                bias_regularizer=l2(0.001)))
model.add(tf.keras.layers.Dropout(rate=0.3))
model.add(tf.keras.layers.Dense(units=3,
                                activation="softmax"))

enable_finetuning = True

if enable_finetuning:
    freeze_until = 6  # Layer from which we want to fine-tune.
    for layer in model.layers[:freeze_until]:
        layer.trainable = False
else:
    model.trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=["accuracy"]
)

print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 256, 256, 64)      4864      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 64)      36928     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 128)     73856     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 128, 128, 128)     147584    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 64, 64, 256)       2

## Training

In [None]:
################ Settings ################
# Tensorboard
tensorBoard = False

# Early stopping
earlyStopping = True
patience = 20

# Model check
checkpoints_dir = os.path.join(cwd, 'checkpoints')
modelCheckpoint = False
##########################################

callback_list = []

if (earlyStopping):
  es = EarlyStopping(
      monitor = 'val_accuracy',
      min_delta = 0, 
      patience = patience,
      verbose = verbose,
      mode = 'auto',
      baseline = None,
      restore_best_weights = True)
  callback_list.append(es)

if (modelCheckpoint):
  cp = ModelCheckpoint(
      checkpoints_dir,
      monitor = 'val_accuracy',
      verbose = verbose,
      save_best_only = False,
      save_weights_only = False,
      mode = 'auto',
      save_freq = "epoch")
  callback_list.append(cp)

if (tensorBoard):
  tb = TensorBoard(
      log_dir='/content/tb_log',
      profile_batch=0,
      histogram_freq=1, # if 1 shows weights histograms
      ) 
  callback_list.append(tb)

In [None]:
################ Settings ################
epochs = 300
##########################################
 
model.fit(
    x = train_dataset,
    epochs = 300,
    callbacks = callback_list,
    steps_per_epoch = len(train_gen), 
    validation_data = valid_dataset,
    validation_steps = len(valid_gen)
)

Epoch 1/300
 1/71 [..............................] - ETA: 0s - loss: 2.0654 - accuracy: 0.2969

# Xception
Same procedure of the VGG16 but with different architeture. With this model we tried a few optimization steps, but after a while we saw that it did not surpass the performance of VGG16, so we decided to focus our effort on VGG16.

## Dataset

In [None]:
################ Settings ################
# Input image shape
img_height = 224 
img_width = 224
img_channels = 3

# Directories
dataset_dir = os.path.join(cwd, 'Faces')
training_dir = os.path.join(dataset_dir, 'Train')
validation_dir = os.path.join(dataset_dir, 'Validation')
test_dir = os.path.join(dataset_dir, 'Test')
##########################################

### Training Dataset

In [None]:
################ Settings ################
apply_data_augmentation = True
batch_size = 32
num_classes = 3
##########################################

# Create training ImageDataGenerator object
if apply_data_augmentation:
    train_data_gen = ImageDataGenerator(
        rotation_range=25,
        width_shift_range=20,
        height_shift_range=20,
        zoom_range=0.2,
        horizontal_flip=True,
        # vertical_flip=True,
        rescale=1. / 255)
else:
    train_data_gen = ImageDataGenerator(rescale=1./255)

train_gen = train_data_gen.flow_from_directory(
      training_dir,
      batch_size=batch_size,
      class_mode='categorical',
      shuffle=True,
      seed=SEED,
      target_size = (img_height, img_width)
)

train_dataset = tf.data.Dataset.from_generator(
      lambda: train_gen,
      output_types=(tf.float32, tf.float32),
      output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

train_dataset = train_dataset.repeat()

train_gen.class_indices

Found 14693 images belonging to 3 classes.


{'Background': 0, 'Mask': 1, 'NoMask': 2}

### Validation Dataset

In [None]:
valid_data_gen = ImageDataGenerator(rescale=1./255)

valid_gen = valid_data_gen.flow_from_directory(
      validation_dir,
      batch_size=batch_size, 
      class_mode='categorical',
      shuffle=False,
      seed=SEED,
      target_size = (img_height, img_width)
)

valid_dataset = tf.data.Dataset.from_generator(
      lambda: valid_gen, 
      output_types=(tf.float32, tf.float32),
      output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

valid_dataset = valid_dataset.repeat()

valid_gen.class_indices

Found 1960 images belonging to 3 classes.


{'Background': 0, 'Mask': 1, 'NoMask': 2}

## Model

In [None]:
xception = tf.keras.applications.Xception(
    input_shape=(224, 224, 3), 
    include_top=False, 
    weights='imagenet'
)

model = tf.keras.models.Sequential()
model.add(xception)
model.add(tf.keras.layers.Flatten())
# model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=512,
                                activation="relu",
                                kernel_regularizer=l2(lambda_conv),
                                bias_regularizer=l2(lambda_conv)))
model.add(tf.keras.layers.Dense(units=512,
                                activation="relu",
                                kernel_regularizer=l2(lambda_conv),
                                bias_regularizer=l2(lambda_conv)))
model.add(tf.keras.layers.Dropout(rate=0.3))
model.add(tf.keras.layers.Dense(units=3,
                                activation="softmax"))

enable_finetuning = True

if enable_finetuning:
    freeze_until = 40  # Layer from which we want to fine-tune.
    for layer in model.layers[:freeze_until]:
        layer.trainable = False
else:
    model.trainable = False

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=["accuracy"]
)

model.summary()

## Training

In [None]:
################ Settings ################
# Tensorboard
tensorBoard = True

# Early stopping
earlyStopping = True
patience = 20

# Model check
checkpoints_dir = os.path.join(cwd, 'checkpoints')
modelCheckpoint = False
##########################################

callback_list = []

if (earlyStopping):
  es = EarlyStopping(
      monitor = 'val_accuracy',
      min_delta = 0, 
      patience = patience,
      verbose = verbose,
      mode = 'auto',
      baseline = None,
      restore_best_weights = True)
  callback_list.append(es)

if (modelCheckpoint):
  cp = ModelCheckpoint(
      checkpoints_dir,
      monitor = 'val_accuracy',
      verbose = verbose,
      save_best_only = False,
      save_weights_only = False,
      mode = 'auto',
      save_freq = "epoch")
  callback_list.append(cp)

if (tensorBoard):
  tb = TensorBoard(
      log_dir='/content/tb_log',
      profile_batch=0,
      histogram_freq=1, # if 1 shows weights histograms
      ) 
  callback_list.append(tb)

In [None]:
################ Settings ################
epochs = 300
##########################################
 
model.fit(
    x = train_dataset,
    epochs = epochs,
    callbacks = callback_list,
    steps_per_epoch = len(train_gen), 
    validation_data = valid_dataset,
    validation_steps = len(valid_gen)
)

# Two-steps architecture
We thought that it should be much easier to detect masks given their bright colors. Thus we trained two networks: one that detects if there is at least a mask and one that classifies between SomeMask and AllMask. During prediction we used the first model and then, given its prediction, we used the second network to detect if there is at least one person without a mask.
The two models used for the training are the same we used for the VGG16 approach, regularized.

We attach only the script used to build the two dataframes, since the rest of the script is roughly the same as the VGG16 approach.

As written in the introduction reference the file TestModelTwoSteps.py to see how the prediction has been performed.

## Dataframes

In [None]:
dfMaskvsNomask = pd.DataFrame(json)
dfMaskvsNomask = dfMaskvsNomask.rename(index={0: "label"})
dfMaskvsNomask = dfMaskvsNomask.T
dfMaskvsNomask['file'] = dfMaskvsNomask.index.astype(str)
dfMaskvsNomask['label'] = dfMaskvsNomask['label'].astype(str)
dfMaskvsNomask['label'].replace({"2": "1"}, inplace=True)
dfMaskvsNomask['label'] = dfMaskvsNomask['label'].astype(str)
dfMaskvsNomask = dfMaskvsNomask.sample(frac=1)

dfAllmaskVSSomemask = pd.DataFrame(json)
dfAllmaskVSSomemask = dfAllmaskVSSomemask.rename(index={0: "label"})
dfAllmaskVSSomemask = dfAllmaskVSSomemask.T
dfAllmaskVSSomemask['file'] = dfAllmaskVSSomemask.index.astype(str)
dfAllmaskVSSomemask['label'] = dfAllmaskVSSomemask['label'].astype(str)
dfAllmaskVSSomemask['label'].replace({"2": "0"}, inplace=True)
dfAllmaskVSSomemask['label'] = dfAllmaskVSSomemask['label'].astype(str)
dfAllmaskVSSomemask = dfAllmaskVSSomemask.sample(frac=1)


# Face detection MTCNN
The main idea is to use a face detector to locate faces, crop them, count how many faces have the mask and use this information to predict the class. In order to train the network that detects masks we cropped the faces of the orginal dataset using the face detector, saved these images, divided them in training and validation, cleaned up the images creating a new Background class to detect false positives of the detector. The mask detection network performed really well, but the face detector does not detect all the faces (probably due to the mask) so the overall result isn't good enough.

## Create Face Dataset

In [None]:
# Load the dataframe
json = pd.read_json(lines=True, path_or_buf='/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/homework1-dataset/train_gt.json')

df = pd.DataFrame(json)
df = df.rename(index={0: "label"})
df = df.T
df['file'] = df.index.astype(str)
df['label'] = df['label'].astype(str)

In [None]:
offset = 10

# Returns the array of cropped faces from the input image
def crop_faces(filename, result_list):
	print("cropping " + str(len(result_list)))
	original_img = Image.open(filename)
	imgs = []
	for i in range(len(result_list)):
		x1, y1, width, height = result_list[i]['box']
		x2, y2 = x1 + width, y1 + height
		img = original_img.crop((x1-offset, y1-offset, x2+offset,  y2+offset))
		img = img.resize((224,224), Image.ANTIALIAS)
		imgs.append(img)
	return imgs

# Given the input image finds the location of the faces and returns the 
#  corrisponding array of cropped face images
def predict_faces(filename):
	pixels = plt.imread(filename)
	detector = MTCNN()
	faces = detector.detect_faces(pixels)
	return crop_faces(filename, faces)

# Given an array of face images returns the number of faces with a mask and without a mask
def num_of_faces(faces):
	mask = 0
	no_mask = 0
	for i in range(len(faces)):
		img_array = np.asarray(faces[i])
		img_array = tf.expand_dims(img_array, 0)#create a batch from this one image
		predictions = model.predict(img_array/255)
		predicted_class_int = np.argmax(predictions[0])
		print(predicted_class_int)
		if(predicted_class_int == 1):
			mask = mask + 1
		if(predicted_class_int == 2):
			no_mask = no_mask + 1
	return (mask, no_mask)

In [None]:
# Code that generates the new dataset
save_dir='/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/Faces/Train/NoMask'
test_dir='/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/homework1-dataset/training'
for i in os.listdir(test_dir):
  if(int(df['label'][i]) == 0):
    j = 0
    filename = os.path.join(test_dir, i)
    faces = predict_faces(filename)          
    for img in faces:
      filename = i.split('.')[0] + "_" + str(j) + ".jpg"
      j+=1
      save_path = os.path.join(save_dir, filename)
      print(save_path)
      img.save(save_path)

## Mask Dataset

In [None]:
################ Settings ################
# Input image shape
img_height = 224 
img_width = 224
img_channels = 3

# Directories
dataset_dir = os.path.join(cwd, 'Faces')
training_dir = os.path.join(dataset_dir, 'Train')
validation_dir = os.path.join(dataset_dir, 'Validation')
test_dir = os.path.join(dataset_dir, 'Test')
##########################################

### Training Dataset

In [None]:
################ Settings ################
apply_data_augmentation = True
batch_size = 64
num_classes = 3
##########################################

# Create training ImageDataGenerator object
if apply_data_augmentation:
    train_data_gen = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=5,
        height_shift_range=5,
        zoom_range=0.1,
        horizontal_flip=True,
        # vertical_flip=False,
        fill_mode='nearest',
        cval=0,
        rescale=1./255)
else:
    train_data_gen = ImageDataGenerator(rescale=1./255)

train_gen = train_data_gen.flow_from_directory(
      training_dir,
      batch_size=batch_size,
      class_mode='categorical',
      shuffle=True,
      seed=SEED,
      target_size = (img_height, img_width)
)

train_dataset = tf.data.Dataset.from_generator(
      lambda: train_gen,
      output_types=(tf.float32, tf.float32),
      output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

train_dataset = train_dataset.repeat()

train_gen.class_indices

Found 14693 images belonging to 3 classes.


{'Background': 0, 'Mask': 1, 'NoMask': 2}

### Validation Dataset

In [None]:
valid_data_gen = ImageDataGenerator(rescale=1./255)

valid_gen = valid_data_gen.flow_from_directory(
      validation_dir,
      batch_size=batch_size, 
      class_mode='categorical',
      shuffle=False,
      seed=SEED,
      target_size = (img_height, img_width)
)

valid_dataset = tf.data.Dataset.from_generator(
      lambda: valid_gen, 
      output_types=(tf.float32, tf.float32),
      output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

valid_dataset = valid_dataset.repeat()

valid_gen.class_indices

Found 1960 images belonging to 3 classes.


{'Background': 0, 'Mask': 1, 'NoMask': 2}

### Test Dataset

In [None]:
test_data_gen = ImageDataGenerator(rescale=1./255)

test_gen = test_data_gen.flow_from_directory(
      '/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/homework1-dataset/train',
      batch_size=batch_size, 
      class_mode='categorical',
      shuffle=False,
      seed=SEED,
      target_size = (img_height, img_width)
)

test_dataset = tf.data.Dataset.from_generator(
      lambda: test_gen,
      output_types=(tf.float32, tf.float32),
      output_shapes=([None, img_height, img_width, img_channels], [None, num_classes])
)

test_dataset = test_dataset.repeat()

test_gen.class_indices

## Model

In [None]:
NASNet = tf.keras.applications.NASNetMobile(
    input_shape=(224, 224, 3), 
    include_top=False, 
    weights='imagenet'
)

enable_finetuning = True

if enable_finetuning:
    freeze_until = 15 # Layer from which we want to fine-tune.
    for layer in NASNet.layers[:freeze_until]:
        layer.trainable = False
else:
    NASNet.trainable = False

model = tf.keras.Sequential()
model.add(NASNet)
model.add(tf.keras.layers.Flatten())
model.add(Dropout(0.3))
model.add(tf.keras.layers.Dense(units=256, activation='selu'))
model.add(Dropout(0.3))
model.add(tf.keras.layers.Dense(units=num_classes, activation='softmax'))
model.summary()

## Training

In [None]:
################ Optimization params ################ 
loss = tf.keras.losses.CategoricalCrossentropy()
learning_rate =1e-5
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
metrics = ['accuracy']
#####################################################

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

In [None]:
################ Settings ################
# Tensorboard
tensorBoard = True

# Early stopping
earlyStopping = True
patience = 10

# Model check
checkpoints_dir = os.path.join(cwd, 'checkpoints')
modelCheckpoint = False
##########################################

callback_list = []

if (earlyStopping):
  es = EarlyStopping(
      monitor = 'val_loss',
      min_delta = 0, 
      patience = patience,
      verbose = verbose,
      mode = 'auto',
      baseline = None,
      restore_best_weights = True)
  callback_list.append(es)

if (modelCheckpoint):
  cp = ModelCheckpoint(
      checkpoints_dir,
      monitor = 'val_loss',
      verbose = verbose,
      save_best_only = False,
      save_weights_only = False,
      mode = 'auto',
      save_freq = 1)
  callback_list.append(cp)

if (tensorBoard):
  tb = TensorBoard(
      log_dir='/content/tb_log',
      profile_batch=0,
      histogram_freq=1, # if 1 shows weights histograms
      ) 
  callback_list.append(tb)

In [None]:
################ Settings ################
epochs = 300
##########################################
 
model.fit(
    x = train_dataset,
    epochs = epochs,
    callbacks = callback_list,
    steps_per_epoch = len(train_gen), 
    validation_data = valid_dataset,
    validation_steps = len(valid_gen)
)

## Create the prediction csv file

In [None]:
results = {}
correct = 0
incorrect = 0
test_dir='/content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/Faces/Validation/Background'
for i in os.listdir(test_dir):
  filename = os.path.join(test_dir, i)
  if(int(df['label'][i]) != 0):
    faces = predict_faces(filename)
    (mask, no_mask) = num_of_faces(faces)
    if(mask != 0 and no_mask != 0):
      predicted_class_int = 2
    else:
      if(mask == 0 and no_mask == 0):
        predicted_class_int = 0
      else:
        if(no_mask == 0):
          predicted_class_int = 1
        else:
          predicted_class_int = 0
          
    results[i] = predicted_class_int

In [None]:
def create_csv(results, results_dir='./'):

    csvfname = 'results'
    csvfname += datetime.now().strftime('%b%d%H-%M-%S') + '.csv'

    with open(os.path.join(results_dir, csvfname), 'w') as f:

        f.write('Id,Category\n')

        for key, value in results.items():
            f.write(key + ',' + str(value) + '\n')

create_csv(results)
from google.colab import files

# Utils

## Tensorboard

In [None]:
%load_ext tensorboard
%tensorboard --logdir /content/tb_log

## Load model

In [None]:
################ LOAD MODEL ################ 
load_dir = os.path.join(cwd, 'model_VeryGood')
############################################

model = tf.keras.models.load_model(load_dir, compile = True)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
NASNet (Functional)          (None, 7, 7, 1056)        4269716   
_________________________________________________________________
flatten (Flatten)            (None, 51744)             0         
_________________________________________________________________
dropout (Dropout)            (None, 51744)             0         
_________________________________________________________________
dense (Dense)                (None, 256)               13246720  
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 3)                 771       
Total params: 17,517,207
Trainable params: 17,480,469
Non-trainable params: 36,738
_______________________________________

## Save Model

In [None]:
################ SAVE MODEL ################ 
save_dir = os.path.join(cwd, 'model_ThereIsMask_2')
############################################

model.save(save_dir)

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /content/drive/MyDrive/POLI/Artificial Neural Networks and Deep Learning/Homeworks/Homework1/model_ThereIsMask_2/assets
