# Description of the procedure that we have followed

We have adopted a progressive approach, starting from simple models (with few parameters) to more complex models; at each step we've tuned the model in order to evaluate its performances to the best.<br>
Therefore at the beginning we have tried models similar to the one seen at the laboratory sessions; for these structures, our guidelines have been the one aforementioned (from simple to complex), adding parameters if the previous model was not enough powerful to handle the classification well enough and adding dropout layers from regularization purposes. Moreover we tried to add regularization terms to the dense layers after the convolutional part, but all the networks we tried didn't perform better using this, so we dropped it.

The custom model we built was composed by some convolutional blocks (filters, relu, max-pooling), followed by a flatten layer and a fully connected layer. The convolutional block with which we reached the best performance had a depth of 5, a starting number of filter of 4 (doubling at each step) and a batch size of 32. Since we couldn't do better than 0.78, we switched to transfer learning.<br>
At the beginning we started from the laboratory session's structure, thus using vgg16.<br>
The problem with vgg was that we were not able to perform as we wanted; so we changed the feature extractor, selecting MobileNet, because of its high performances despite its simplicity (few parameters) that allowed us to push our training also to most of the MobileNet layers.

To tune the models, the approach adopted has been the classic trial&error, guided by our perception of the model overfit/underfit properties given by the losses and the accuracies (train and validation ones), doing hyperparameter tuning.<br>
To deal with overfitting we used two of the techniques seen during the lectures, i.e. Early Stopping (monitored on validation accuracy) and dropout layers.<br>
After we reached our peak performance, we tried to do better using bagging with 5 models, of similar structure of our best one, but we didn't manage to increase our final accuracy.
The following notebook is the one related to our best result in the competition, and all the hyperparameters' values can be found in the related section.



# Imports

In [None]:
from IPython.core.interactiveshell import InteractiveShell

import os

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import regularizers
from keras.applications.mobilenet import preprocess_input


import numpy as np

import pandas as pd

import json

from datetime import datetime

import ntpath

# Environment setup

In [None]:
InteractiveShell.ast_node_interactivity = "all" 

SEED = 1234
tf.random.set_seed(SEED)  


cwd = '/kaggle/input'

# the following is needed for running on colab

#cwd = os.getcwd()

#from google.colab import drive
#drive.mount('/content/drive')

#!unzip '/content/drive/My Drive/first_challenge_nn.zip' 

**Getting information about the GPUs**

In [None]:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    print(e)

    

# Hyperparameters definition

In [None]:
num_classes = 3
classes = [None]
img_h = 256
img_w = 256

apply_data_augmentation = True

FREEZE_UNTIL_TL = 3

bs = 5
lr = 2e-4
dropout_rate = 0.3
epochs_num = 100

loss = tf.keras.losses.CategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)

metrics = ['accuracy']

early_stop = True

patience_steps = 10

# Data preprocessing

**Setup for the data augmentation**

In [None]:
# Create training ImageDataGenerator object
if apply_data_augmentation:
    train_data_gen = ImageDataGenerator(rotation_range=10,
                                        width_shift_range=10,
                                        height_shift_range=10,
                                        zoom_range=0.3,
                                        horizontal_flip=True,
                                        vertical_flip=True,
                                        fill_mode='constant', #adding some pixels
                                        cval=0, #costant value of those pixels
                                        preprocessing_function=preprocess_input #preprocessing the data as the tl part prefers
                                        )
else:
    train_data_gen = ImageDataGenerator(rescale=1./255, 
                                       preprocessing_function=preprocess_input #preprocessing the data as the tl part prefers
                                       )

**Data retrieval as dataframe**

In [None]:
dataset_dir = os.path.join(cwd, 'artificial-neural-networks-and-deep-learning-2020/MaskDataset')

training_dir = os.path.join(dataset_dir, 'training')

with open(os.path.join(dataset_dir,"train_gt.json")) as f:
  dic = json.load(f)
  
dataframe = pd.DataFrame(dic.items())
dataframe.rename(columns = {0:'filename', 1:'class'}, inplace = True)
dataframe["class"] = dataframe["class"].astype(str)


**Shuffle of the data and split it into training and validation set;
the shuffle is needed in order to be sure to have in all of the set randomly selected samples from every class. We decided not to test in local, since the test competition we thought was enough and having the notebook on kaggle makes it easy to submit the .csv file.**

In [None]:
def create_validation(dataframe, SEED):  
  df_len = len(dataframe)

  dataframe = dataframe.sample(frac=1, random_state=SEED).reset_index(drop=True)

  train_end = int(df_len*0.9)
  valid_start = train_end

  train_df = dataframe[ : train_end]
  valid_df = dataframe[valid_start :]
  return [train_df, valid_df]


**Creating the samples for each set**

In [None]:
def create_flow(data_gen, dataframe, directory, bs, img_h, img_w, num_classes, SEED=None):

  gen = data_gen.flow_from_dataframe(dataframe,
                                                directory,
                                                batch_size=bs,
                                                class_mode='categorical',
                                                image_size=(img_h, img_w),
                                                target_size=(img_h, img_w),
                                                shuffle=True,
                                                seed=SEED)
  dataset = tf.data.Dataset.from_generator(lambda: gen,
                                                output_types=(tf.float32, tf.float32),
                                                output_shapes=([None, img_h, img_w, 3], [None, num_classes]))
  dataset = dataset.repeat()
  return dataset, len(gen)


# Model definition

**Tranfer Learning model**

In [None]:
tl = tf.keras.applications.MobileNet(include_top=False, input_shape=[img_w, img_h, 3])
freeze_until = FREEZE_UNTIL_TL # layer from which we want to fine-tune
    
for layer in tl.layers[:freeze_until]:
    layer.trainable = False

**Custom model**

In [None]:
model = tf.keras.Sequential()
    
model.add(tl)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dropout(rate=dropout_rate))
model.add(tf.keras.layers.Dense(units=num_classes, activation='softmax'))

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

In [None]:
model.summary()

# Model training

**Definition of the callbacks**

In [None]:
callbacks = []

# Early Stopping
# --------------

if early_stop:
    es_callback = tf.keras.callbacks.EarlyStopping(patience=patience_steps, 
                                                   monitor="val_accuracy",
                                                  restore_best_weights = True)
    callbacks.append(es_callback)

    

**Model fit**

In [None]:

train_df, valid_df = create_validation(dataframe, SEED)
train_dataset, len_train= create_flow(train_data_gen, train_df, training_dir, bs, img_h, img_w, num_classes, SEED)
valid_dataset, len_valid = create_flow(train_data_gen, valid_df, training_dir, bs, img_h, img_w, num_classes, SEED)


In [None]:
model.fit(x=train_dataset,
        epochs=epochs_num,
        steps_per_epoch=len_train,
        validation_data=valid_dataset,
        validation_steps=len_valid,
        callbacks=callbacks
        )

# Prediction on new data

**Data preparation**

In [None]:
test_dir = os.path.join(dataset_dir, 'test')
test_data_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

img_filenames = next(os.walk(test_dir))[2]
test_df = pd.DataFrame(img_filenames)
test_df['class'] = '0'
test_df.rename(columns={0:"filename"},
               inplace=True)
test_gen = test_data_gen.flow_from_dataframe(test_df, 
                                             test_dir,
                                             target_size=(img_h, img_w), 
                                                 color_mode='rgb',
                                                 class_mode='categorical',
                                                 classes = None,
                                                 batch_size=1,
                                                 shuffle=False)
test_gen.reset()



**Predictions**

In [None]:
predictions = model.predict_generator(test_gen, len(test_gen), verbose=1)
results = {}

**Collecting the predictions as python dictionary**

In [None]:

images = test_gen.filenames
i = 0

for p in predictions:
  prediction = np.argmax(p)
  image_name = ntpath.basename(images[i])
  results[image_name] = str(prediction)
  i = i + 1

**Collecting the prediction as .csv**

In [None]:

def create_csv(results, results_dir='./'):

    csv_fname = 'results_'
    csv_fname += datetime.now().strftime('%b%d_%H-%M-%S') + '.csv'

    with open(os.path.join(results_dir, csv_fname), 'w') as f:

        f.write('Id,Category\n')

        for key, value in results.items():
            f.write(key + ',' + str(value) + '\n')


            
create_csv(results, '/kaggle/working/')