# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
# set logging level
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import tensorflow as tf
from tensorflow import keras

# Params
SEED = 123
IMG_SIZE = (224, 224)


## Model Selection

I pondered the use of different established models for Image Classification, because I was shure that I could not create a better Neural Net from scratch. I read about different NNs and landed first on ResNet50 because of an article about monochromatic image classification (I can no longer find it, and did not save it ). I changed it to the ResNet 152V2 because of the better performance of the second version of the ResNets and the 152 version because I thought I could try to unfreeze more of the top layers if necessary. 


## Feature Engineering

Because I had the dataset I used, I was not able to do a lot of feature engineering, that was not already incorporated in the dataset.I just needed to scale the size and pixel values to appropriate values for the ResNet NN.


In [None]:
#import data 

train_data = keras.preprocessing.image_dataset_from_directory(
    '../1_DatasetCharacteristics/train/',
    validation_split=0.2,
    subset='training',
    seed=SEED,
    image_size=IMG_SIZE,
    batch_size=BATCH_SIZE
)

test_data = keras.preprocessing.image_dataset_from_directory(
    '../1_DatasetCharacteristics/train/',
    validation_split=0.2,
    subset='validation',
    seed=SEED,
    image_size=IMG_SIZE,
)

# normalization layer and scale for ResNet152V2
norm_layer = keras.layers.Rescaling(1/127.5, offset=-1)


## Hyperparameter Tuning

The most Hyperparameter Tuning I did, was on the batch size, epochs, learning rate, retrain layers. I also played with the Dropout layer rate.

In [None]:
BATCH_SIZE = 32 # I tried a lot of batch sizes, but always came back to 32, because of no significant advantage
EPOCHS = 10 # more epochs definitely were just overfitting
LEARNING_RATE = 0.01 # two stage training, for the basemodel I reduced by one magnitude
RETRAIN_LAYER = 35 # less yield worse results, and more I felt like it was diminishing returns


## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [None]:
# normalization layer and scale for ResNet152V2
norm_layer = keras.layers.Rescaling(1/127.5, offset=-1)

# define and use basemodel ResNet 152V2
base_model = keras.applications.ResNet152V2(
    weights='imagenet',
    include_top=False,
    input_shape=(IMG_SIZE[0], IMG_SIZE[1], 3)
)

# freeze the basemodel for it to run in inference mode
base_model.trainable = False

input = keras.Input(shape=(IMG_SIZE[0], IMG_SIZE[1], 3))
x = input
x = keras.applications.resnet_v2.preprocess_input(x)
x = base_model(x, training=False)

# add global average pooling layer
x = keras.layers.GlobalAveragePooling2D()(x)

# add dropout layer
x = keras.layers.Dropout(0.2)(x)

# define output layer
outputs = keras.layers.Dense(4, activation='softmax')(x)

# define model input and output
model = keras.Model(input, outputs)
model.summary()

# compile model
opt = keras.optimizers.Adam(learning_rate=LEARNING_RATE)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train model
history = model.fit(train_data, epochs=EPOCHS, validation_data=test_data, verbose=1)

#unfreeze parts of the basemodel
for layer in base_model.layers[-RETRAIN_LAYER:]:
    if not isinstance(layer, keras.layers.BatchNormalization):
        layer.trainable = True
        
model.summary()

# compile model
opt = keras.optimizers.Adam(learning_rate=LEARNING_RATE/10)
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train model
history = model.fit(train_data, epochs=EPOCHS, validation_data=test_data, verbose=1)


## Evaluation Metrics

The only really relevant metric is accuracy, so it is the only one to track.

Loss is important to check for over fitting on the validation side.


In [None]:
#save history
np.save('history.npy', history.history)
print('History saved')
# plot training and validation accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Train', 'Validation'], loc='upper left')

# save plot
plt.savefig('accuracy.png')
# compare predictions with actual labels
predictions = model.predict(test_data)
predictions = np.argmax(predictions, axis=1)
actual = np.concatenate([y for x, y in test_data], axis=0)

# calculate accuracy
accuracy = np.mean(predictions == actual)
print(f'Accuracy: {accuracy}')

## Comparative Analysis

Comparing just the accuracy of the baseline and the model.
