**NOTE FOR REBECCA:**

This notebook just represents the modelling section. This should be a part of the final notebook and doesn't run on its own. Have not included any library imports, preprocessing steps, code output etc. This is just to indicate the report text and where in the notebook it goes.


## Modelling 

Given that this is an image classification task, the natural direction to take is some kind of CNN which is the standard model for computer vision tasks.

### CNN trained from scratch

We use a simple CNN trained from scratch as a baseline in order to determine how well we can do using a basic technique and then try methods to improve upon this baseline performance.

We use a model with 4 convolutional layers interspersed with maxpooling layers. We have added a dropout layer in order to curb overfitting.

**Build and compile our simple CNN model:**

In [None]:
model = Sequential(name = 'Simple_CNN')
model.add(Conv2D(64, (3, 3), activation='relu', padding = 'same', input_shape=(300, 400, 1)))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(64, (3, 3), activation='relu', padding = 'same'))
model.add(MaxPooling2D((2, 2), padding = 'same'))
model.add(Conv2D(128, (3, 3), activation='relu', padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding = 'same'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dropout(0.05))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.summary()

model.compile(loss=tf.keras.losses.binary_crossentropy, metrics=['accuracy'])

**Fit the model:**

In [None]:
history = model.fit(
        data_train_augmented,
        y_train_augmented,
        epochs=6,
        batch_size = 16, 
        validation_data=(data_val, y_val))

We can see that the model reaches ~80% validation accuracy and takes <5 minutes to train on a GPU.

**Visualizing training history:**

In [None]:
plot_training_history(history)

After epoch 5, the model starts to overfit, as can be seen by the training history plot.

**Plot training and test ROCs:**

In [None]:
#Metrics for Testing Data
probs_test = model.predict(data_val)
preds_test = probs_test.reshape(-1)
fpr_test, tpr_test, threshold_test = metrics.roc_curve(y_val, preds_test)
roc_auc_test = metrics.auc(fpr_test, tpr_test)

#Metrics for Training Data
probs_train = model.predict(data_train)
preds_train = probs_train.reshape(-1)
fpr_train, tpr_train, threshold_train = metrics.roc_curve(y_train, preds_train)
roc_auc_train = metrics.auc(fpr_train, tpr_train)

#Plot
plt.title('ROC Curve')
plt.plot(fpr_test, tpr_test, 'b', label = 'Test AUC = %0.2f' % roc_auc_test)
plt.plot(fpr_train, tpr_train, 'g', label = 'Train AUC = %0.2f' % roc_auc_train)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show();

#Training and Test AUCs
print('Test AUC = %0.4f' % roc_auc_test)
print('Train AUC = %0.4f' % roc_auc_train)

From the AUC we can see that we have pretty good performance. 

Naturally, we want to further investigate this to visualise what features our model is picking up on. 

**Visualizations:**

In [None]:
# insert visualization code and analysis

### Transfer learning using a pretrained model

Even after augmentation, our dataset is relatively small in size. This, along with our limits on compute and time indicate that we should use a pretrained model to build on top of. One of the major advantages of doing this is that we let the pretrained model use all the low level features it has already learnt and build and train on top of this so these features can be adapted and used for our task. The learning of features such as basic lines and shapes is common to both tasks even though the datasets are not quite similar.

[VGG](https://www.robots.ox.ac.uk/~vgg/research/very_deep/) is a standard pretrained model trained on [ImageNet](http://www.image-net.org/) data. It has been built into Keras and can be loaded and used easily. 

**Load VGG and concatenate input images into 3 channels:**

In [None]:
img_input = Input(shape=(300, 400, 1))
model = VGG16(weights="imagenet", include_top=False, input_tensor=Concatenate()([img_input, img_input, img_input]))

VGG takes RGB images (3 channels) and our images are greyscale (1 channel). Hence, we replicate our channel thrice to create an image of the correct shape.
Additionally, we use `include_top=False` to only include the main convolutional layers of the model and not the input layers and final classification layers. This is required so we can modify the model according to our task.

**Add required final layers and freeze convolutional layers:**

We create a pooling layer and a final classification layer to classify into two classes. This along with the VGG convolutional blocks are combined into a new model. We freeze all the convolutional blocks but the last. This is done so that the higher level features can adapt to our new dataset without retraining the low level features.

In [None]:
# add a global spatial average pooling layer and a dense layer to classify 2 classes
x = model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)

# new model to train
new_model = Model(inputs=model.input, outputs=predictions, name = "VGG_Pretrain")

# freeze many convolutional VGG layers: only train the last block
for layer in model.layers[:16]:
    layer.trainable = False
for layer in model.layers[17:]:
    layer.trainable = True

new_model.summary()

**Compile and train the model:**

We use a small learning rate so that the model does not unlearn the features it learnt from the previous frozen layers.

In [None]:
sgd = SGD(lr=5e-4)
new_model.compile(optimizer=sgd, loss=tf.keras.losses.binary_crossentropy, metrics=['accuracy'])

history = new_model.fit(
        data_train_augmented,
        tf.keras.utils.to_categorical(y_train_augmented),
        epochs=5,
        batch_size = 16, 
        validation_data=(data_val, tf.keras.utils.to_categorical(y_val)))