# Simple Binary Classifier using Keras

## SIIM Github Repo
https://github.com/ImagingInformatics/machine-learning/blob/master/Education/KerasBinaryClassifier/SIIM_Keras_Binary_Classifier.ipynb

### Task 

Train a deep learning model to classify sagittal T1 MRI sequences into pre- or post-contrast.

### Requirements

1. Basic understanding of machine learning and deep learning
2. Programming in Python

### Learning objectives

At the end of this activity, you will be able to:

1. Understand how to organize your data to use it to train a deep learning model
2. Use the standard data handler from Keras (DataGenerator) to access your dataset to train a model
3. Create a custom convolutional neural network
4. Train a model
5. Calculate metrics in the Validation and Test sets


### Acknowledgements

This Jupyter Notebook was based on code by Paulo Eduardo de Aguiar Kuriki (paulokuriki@gmail.com), modified by Felipe Kitamura (kitamura.felipe@gmail.com).

---
# TODO: install gdown and restart runtime

Run the cell below. Then depending on whether you are running this in a jupyter notebook or on Google Colab follow these instructions:

**Jupyter** - click on the 'Kernel' tab at the top of the notebook and choose 'Restart & Clear Output'. Rerun all the cells from the start of the notebook to this point before moving on.

**Colab** - click on the 'Runtime' tab at the top of the notebook and choose 'Restart Runtime'. Rerun all the cells from the start of the notebook to this point before moving on.

These steps ensure that the installation takes effect. We will use gdown to download the data set shortly.

In [None]:
!pip install -U --no-cache-dir gdown --pre

# TODO: Setting your Team Name

In [None]:
# Enter your Team Name below
team = # TODO: choose a team name

In [None]:
print("Your Team Name is:", team)

## Dataset Format

First of all, we need to split out images into training, validation and test sets

For this task, we have a separate test folder and a training folder. The training folder will be split into training and validation by our code.

The files are organized in the following folder structure:

#### Train/with_gad/ - contains the files of sequences with contrast

#### Train/no_gad/ - contains the files of sequences without contrast

#### Test/with_gad/ - contains the files of sequences with contrast

#### Test/no_gad/ - contains the files of sequences without contrast



First thing we need to do is to download the dataset. **This will take a few minutes. Read ahead while you wait.**  

Then we unzip our dataset.

In [None]:
import os
import gdown

if not os.path.exists('Train.zip'): # check if already downloaded
    gdown.download('https://drive.google.com/uc?id=1rffWXRBaePSo7JMwJm1ygDdpgs7FZuTi', 'Train.zip', quiet=False)
    !unzip Train.zip
if not os.path.exists('Test.zip'):
    gdown.download('https://drive.google.com/uc?id=1x4LTeyPgLNndsP8w0rtn-pCV0LzOVc1H', 'Test.zip', quiet=False)
    !unzip Test.zip

## If the automatic download fails. You can manually grab the data sets from these links:

https://drive.google.com/file/d/1rffWXRBaePSo7JMwJm1ygDdpgs7FZuTi/view?usp=sharing

https://drive.google.com/file/d/1x4LTeyPgLNndsP8w0rtn-pCV0LzOVc1H/view?usp=sharing

If you manually downloaded the data sets uncomment the two lines below to unzip them

In [None]:
# !unzip Train.zip
# !unzip Test.zip

## Importing the libraries we will need

In [None]:
import sys
import requests
import itertools

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Input
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.activations import sigmoid
from sklearn.metrics import confusion_matrix
from sklearn.metrics import auc
from sklearn.metrics import roc_curve
from sklearn.metrics import accuracy_score

def sub_model(team, hyperparam):
  url='https://aihc5010.pythonanywhere.com/submit-model/'
  hyperparam['team']=team
  hyperparam['ModelKey']='lab2'
  x=requests.post(url,data=hyperparam)
  if x.status_code==200:
      print(f"Model Submitted Successfully for team {team}")
  else:
      print(x.status_code)
      print(x.text)
      print("Failed to Submit")

In [None]:
# Here we define the folders where our dataset is saved.
TRAIN_DATA_DIR = './Train/'
TEST_DATA_DIR = './Test/'

## Visualise some training set examples

### with contrast

In [None]:
fig = plt.figure(figsize=(20,20))
for i,f in enumerate(os.listdir(os.path.join(TRAIN_DATA_DIR, 'with_gad'))[:9]):
    # make sure the file is a .png image
    if not f.endswith('.png'):
        continue
    img = mpimg.imread(os.path.join(TRAIN_DATA_DIR, 'with_gad', f))
    ax = fig.add_subplot(3,3,i+1)
    ax.imshow(img, cmap='gray')
plt.show()

### without contrast

In [None]:
fig = plt.figure(figsize=(20,20))
for i,f in enumerate(os.listdir(os.path.join(TRAIN_DATA_DIR, 'no_gad'))[:9]):
    # make sure the file is a .png image
    if not f.endswith('.png'):
        continue
    img = mpimg.imread(os.path.join(TRAIN_DATA_DIR, 'no_gad', f))
    ax = fig.add_subplot(3,3,i+1)
    ax.imshow(img, cmap='gray')
plt.show()

## TODO: Number of training and test set examples of each class

#### Verify the number of examples available for the training and test sets. Take as many lines of code as you need. Hint: check the visualisation cells above for an example of iterating over all files in a directory that end with a .png extension.

In [None]:
n_training_with_gad =  # TODO: calculate the number of training examples with contrast
print('Number of training examples with contrast: {}'.format(n_training_with_gad))
n_training_no_gad =  # TODO: calculate the number of training examples without contrast
print('Number of training examples without contrast: {}'.format(n_training_no_gad))
n_training = n_training_with_gad + n_training_no_gad
print('Number of training examples : {}\n'.format(n_training))

n_test_with_gad =  # TODO: calculate the number of test examples with contrast
print('Number of test examples with contrast: {}'.format(n_test_with_gad))
n_test_no_gad =  # TODO: calculate the number of test examples without contrast
print('Number of test examples without contrast: {}'.format(n_test_no_gad))
n_test = n_test_with_gad + n_test_no_gad
print('Number of test examples : {}'.format(n_test))

## Making some choices

From the visualisations above we can see that the images are different sizes. For the most part either 300-by-300 or 375-by-375 pixels.

We will resize the images to 128-by-128. We will see how the images are resized later.

In [None]:
# Here we define the input size of our neural network.
# Images will be resized automatically.
IMG_HEIGHT = 128
IMG_WIDTH = 128

Let's also choose a batch size for training.

In [None]:
# Here we define the size of the batch to leverage
# the capacity GPUs have to parallelize 
BATCH_SIZE = 8

# TODO: Training steps per epoch excercise

#### Calculate how many steps per epoch there will be during training. Use your chosen batch size and assume that *80% of the training set* will be used for training and 20% for validation. We will verify this later.

In [None]:
steps_per_epoch =  # TODO: given your choice of batch size calculate the number of training steps per epcoh to expect
print('Steps per epoch : {}'.format(steps_per_epoch))

## Defining a function to plot the learning curves

In [None]:
# This function plots the learning curves, which includes
# the loss curve and the accuracy curve over the epochs.
def plot_learning_curves(history):
    # plot loss
    plt.figure(figsize=(10, 5))
    plt.title('Binary Cross Entropy Loss')
    plt.plot(history.history['loss'], color='blue', label='Train')
    plt.plot(history.history['val_loss'], color='orange', label='Validation')
    plt.legend(loc='upper right', shadow=True, fontsize='x-large')
    plt.show()
    # plot accuracy
    plt.figure(figsize=(10, 5))
    plt.title('Binary Classification Accuracy')
    plt.plot(history.history['accuracy'], color='blue', label='Train')
    plt.plot(history.history['val_accuracy'], color='orange', label='Validation')
    plt.legend(loc='lower right', shadow=True, fontsize='x-large')
    plt.show()

## Creating a custom convolutional neral network

In the cell below we have provided an example CNN.

In [None]:
# Every model needs to have an input
x1 = Input(shape=(IMG_HEIGHT,IMG_WIDTH,3))

# Below, we add paired convolutional and maxpooling layers
x = Conv2D(16, (3,3), activation='relu')(x1)
x = MaxPooling2D()(x)
x = Conv2D(16, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)

# Then we flatten the last vector
flat1 = Flatten()(x)
# Insert a dropout layer with 20% probability
flat2 = Dropout(0.2)(flat1)
# Then a dense layer
class1 = Dense(32, activation='relu', kernel_initializer='he_uniform')(flat2)
# And the output layer
class1b = Dense(1, activation='linear')(class1)
# The output needs to be binary, so we apply a sigmoid function
output = sigmoid(class1b)

# Here is where the model is created based on the input and output define above
model = Model(inputs=x1, outputs=output)

# We choose an optimizer
opt = Adam(lr=5e-6)

# The last step is to compile the model
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])

# We can see the structure and number of parameter of our network
# by calling .summary()
model.summary()

## Now we create a Data Generator, which is a function to read the images from the folders and use them to train and validate

In [None]:
train_datagen = ImageDataGenerator(rescale=1. / 255,
                                   validation_split=0.2)  # set validation split

train_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH), # Here is where we use the image dimensions you chose above
    batch_size=BATCH_SIZE,  # Here is where we use the batch size you chose above
    class_mode='binary',
    color_mode='rgb',
    subset='training')  # set as training data

val_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,  # same directory as training data
    target_size=(IMG_HEIGHT, IMG_WIDTH),  # Here is where we use the image dimensions you chose above
    batch_size=BATCH_SIZE,  # Here is where we use the batch size you chose above
    class_mode='binary',
    color_mode='rgb',
    subset='validation')  # set as validation data

## Show time. This is the moment we train our network

As the model trains verify that the number of training steps per epoch matches your calculation above.

In [None]:
# The .fit() method is used to train our network
# You can specify here the number os epochs
history = model.fit(train_it, steps_per_epoch=len(train_it), 
                              validation_data=val_it, validation_steps=len(val_it), 
                              epochs=15, verbose=1)

plot_learning_curves(history)

## TODO: Interpreting the learning curves.

#### What problems can we identify with this model? What changes could you make to remedy these? Hints: Are we in the high-bias or high-variance regime? What about the rate of convergence?

In [None]:
# TODO: convert this cell to markdown and replace this comment with your answer.

## TODO: Interpreting the differences between the loss and accuracy curves.

#### What difference do you notice between the loss and accuracy curves? How can you explain any differences? Hint: What is the difference in what each is measuring?

In [None]:
# TODO: convert this cell to markdown and replace this comment with your answer.

## We can save our trained model in a file so we can restore it to be used later

In [None]:
# To save your model, uncomment the following line and run this cell.

model.save('SimpleGadClass.h5')

## The following line allows us to read the model we trained

In [None]:
# Make sure the file name you try to read from is the same you saved

model = load_model('SimpleGadClass.h5')

## Here we predict the validation set so we can use both predictions and ground truth to calculate the performance metrics

In [None]:
def get_labels_and_preditions(data_iterator, model):
    i=0
    y_true = []
    y_pred = []
    x_ = []
    
    for x, y in data_iterator:
        y_true.extend(y)
        y_pred.extend(model.predict(x))
        x_.extend(x)
        i+=1
        if i==len(data_iterator):
            break
    
    y_pred = np.asarray(y_pred)
    x_ = np.asarray(x_)
    return y_true, y_pred, x_

In [None]:
val_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,  # same directory as training data
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb',
    subset='validation')  # set as validation data

In [None]:
y_true, y_pred, _ = get_labels_and_preditions(val_it, model)

## Now we plot the ROC curve for the validation set

In [None]:
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_true, y_pred)

auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(10,7))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='AUC (area = {:.5f})'.format(auc_keras), color='orange')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

## TODO: Why might AUC be better than accuracy as a metric for this data set?

In [None]:
# TODO: convert this cell to markdown and replace this comment with your answer.

## Now we plot the confusion matrix for the validation set with a decision threshold of 0.5

In [None]:
thresh = 0.5
cm = confusion_matrix(y_true, y_pred > thresh)

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix ')
plt.colorbar()
plt.show()

## Now we find the threshold with the best  validation set accuracy

In [None]:
thr_list = []
acc_list = []
for _th in range(100):
    _th = _th / 100.
    thr_list.append(_th)
    acc_list.append(accuracy_score(y_true, y_pred > _th))

plt.figure()
plt.plot(thr_list,acc_list)
plt.plot(thr_list[acc_list.index(max(acc_list))], max(acc_list), 'r+')

plt.show()

thresh = thr_list[acc_list.index(max(acc_list))]

## TODO: Why do we select the threshold on the validation set and not the test set?

In [None]:
# TODO: convert this cell to markdown and replace this comment with your answer.

## Now we plot the confusion matrix for the validation set with the decision threshold we just calculated

In [None]:
cm = confusion_matrix(y_true, y_pred > thresh)

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix ')
plt.colorbar()
plt.show()

## Here we predict the test set so we can use both predictions and ground truth to calculate the performance metrics

In [None]:
test_datagen = ImageDataGenerator(rescale=1. / 255)

test_it = test_datagen.flow_from_directory(
    TEST_DATA_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb')  # set as training data

In [None]:
y_true, y_pred, x_test = get_labels_and_preditions(test_it, model)

## Now we plot the ROC curve for the test set

In [None]:
fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_true, y_pred)

auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(10,7))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='AUC (area = {:.5f})'.format(auc_keras), color='orange')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

## Now we plot the confusion matrix for the test set

In [None]:
cm = confusion_matrix(y_true, y_pred > thresh) # apply the threshold we selected using the validation set

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix ')
plt.colorbar()
plt.show()

## Finally, we plot a sample of the cases that our model predicted incorrectly so we can understand the errors and try to come up with solutions

In [None]:
limit=30
counter = 0
for i in range(len(y_true)):
    if counter >= limit:
        break
    if y_true[i] != 1. * (y_pred[i, 0] > thresh):
        print('Truth:' + str(y_true[i]))
        print('Pred:' + str(y_pred[i, 0]))
        print(i)
        plt.figure(figsize=(7,7))
        plt.imshow(x_test[i])
        plt.show()
        counter += 1

## TODO: Are there any reasons you can see that might explain why some images might have failed?  As a reminder a true label of 0 is no contrast a true label of 1 is with contrast.

In [None]:
# TODO: convert this cell to markdown and replace this comment with your answer.

# Now it's your turn!

## TODO: Experiment with the model architecture and training strategy to optimize performance.

* You can read the documentation for the layerswe use here https://tensorflow.org/api_docs/python/tf/keras/layers. You couls also think about incorperating batch-norm layers for example.
* For the ImageDataGenerator we have added some examples of additional arguments (currently commented out) that can be used for data augmentation. You can read more here https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator
* Options for alternate optimizers are here https://www.tensorflow.org/api_docs/python/tf/keras/optimizers

## See how well your modifications compare on the leaderboard here:

### https://AIHC5010.pythonanywhere.com/leaderboard-classification

In [None]:
##########################
### BUILDING THE MODEL ###
##########################

# TODO: Experiment with the the hyperparameters

hyperparam = {
    'LearningRate': 5e-4,
    'BatchSize': 32,
    'Epochs': 15,
    'ImageSize': 331,
    'Dropout': 0.2
}

# Every model needs to have an input
IMG_HEIGHT = hyperparam['ImageSize']
IMG_WIDTH = hyperparam['ImageSize']
BATCH_SIZE = hyperparam['BatchSize']

x1 = Input(shape=(IMG_HEIGHT,IMG_WIDTH,3))

# TODO: Experiment with the model architecture. Add or remove layers.

# Below, we add paired convolutional and maxpooling layers
x = Conv2D(16, (3,3), activation='relu')(x1)
x = MaxPooling2D()(x)
x = Conv2D(16, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)
x = Conv2D(32, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)
x = Conv2D(32, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)
x = Conv2D(64, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)
x = Conv2D(64, (3,3), activation='relu')(x)
x = MaxPooling2D()(x)

# Then we flatten the last vector
flat1 = Flatten()(x)
# Insert a dropout layer with 20% probability
flat2 = Dropout(hyperparam['Dropout'])(flat1)
# Then a dense layer
class1 = Dense(64, activation='relu', kernel_initializer='he_uniform')(flat2)
# And the output layer
class1b = Dense(1, activation='linear')(class1)
# The output needs to be binary, so we apply a sigmoid function
output = sigmoid(class1b)

# Here is where the model is created based on the input and output define above
model = Model(inputs=x1, outputs=output)
# We can see the structure and number of parameter of our network
# by calling .summary()
model.summary()


######################################
### DEFINING THE TRAINING STRATEGY ###
######################################

# TODO: Experiment with the training strategy. Choose data augmentation options or alter the optimizer.

# We choose an optimizer
opt = Adam(lr=hyperparam['LearningRate'])

# The last step is to compile the model
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator(rescale=1. / 255,
                                   #shear_range=0.2,
                                   #zoom_range=0.2,
                                   #horizontal_flip=False,
                                   #vertical_flip=True,
                                   #rotation_range=0,
                                   #fill_mode='constant',
                                   #cval=0,
                                   #preprocessing_function=preprocess_input,
                                   validation_split=0.2)  # set validation split

train_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb',
    subset='training')  # set as training data

val_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,  # same directory as training data
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb',
    subset='validation')  # set as validation data

history = model.fit(train_it, steps_per_epoch=len(train_it), 
                              validation_data=val_it, validation_steps=len(val_it), 
                              epochs=hyperparam['Epochs'], verbose=1)

plot_learning_curves(history)


######################
### SAVE THE MODEL ###
######################

model.save('SimpleGadClass_vesion2.h5')

## Evaluate the model and submit to the leaderboard.

In [None]:
##############################
### LOAD THE TRAINED MODEL ###
##############################

model = load_model('SimpleGadClass_vesion2.h5')


############################################
### CALCULATE VALIDATION SET PERFORMANCE ###
############################################

print('############################################')
print('### CALCULATE VALIDATION SET PERFORMANCE ###')
print('############################################')

val_it = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,  # same directory as training data
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb',
    subset='validation')  # set as validation data

y_true, y_pred, _ = get_labels_and_preditions(val_it, model)

fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_true, y_pred)

auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(10,7))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='AUC (area = {:.5f})'.format(auc_keras), color='orange')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

thresh = 0.5
cm = confusion_matrix(y_true, y_pred > thresh)

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix threshold={:.3f}'.format(thresh))
plt.colorbar()
plt.show()

thr_list = []
acc_list = []
for _th in range(100):
    _th = _th / 100.
    thr_list.append(_th)
    acc_list.append(accuracy_score(y_true, y_pred > _th))

plt.figure()
plt.plot(thr_list,acc_list)
plt.plot(thr_list[acc_list.index(max(acc_list))], max(acc_list), 'r+')
plt.show()

thresh = thr_list[acc_list.index(max(acc_list))]

cm = confusion_matrix(y_true, y_pred > thresh)

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix threshold={:.3f}'.format(thresh))
plt.colorbar()
plt.show()

######################################
### CALCULATE TEST SET PERFORMANCE ###
######################################

print('######################################')
print('### CALCULATE TEST SET PERFORMANCE ###')
print('######################################')

test_datagen = ImageDataGenerator(rescale=1. / 255)

test_it = test_datagen.flow_from_directory(
    TEST_DATA_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    color_mode='rgb')  # set as training data

y_true, y_pred, x_test = get_labels_and_preditions(test_it, model)

fpr_keras, tpr_keras, thresholds_keras = roc_curve(y_true, y_pred)

auc_keras = auc(fpr_keras, tpr_keras)

plt.figure(figsize=(10,7))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_keras, tpr_keras, label='AUC (area = {:.5f})'.format(auc_keras), color='orange')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

cm = confusion_matrix(y_true, y_pred > thresh)

plt.figure(figsize=(7,7))
plt.imshow(cm, cmap=plt.cm.Blues)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
    plt.text(j, i, cm[i, j],
             horizontalalignment="center",
             color="white" if cm[i, j] > 120 else "black", size='x-large')
plt.xticks([], [])
plt.yticks([], [])
plt.title('Confusion matrix threshold={:.3f}'.format(thresh))
plt.colorbar()
plt.show()

for i in range(len(y_true)):
    if y_true[i] != 1. * (y_pred[i, 0] > thresh):
        print('Truth:' + str(y_true[i]))
        print('Pred:' + str(y_pred[i, 0]))
        print(i)
        plt.figure(figsize=(7,7))
        plt.imshow(x_test[i])
        plt.show()

########################
### SUBMIT THE MODEL ###
########################

hyperparam['metric'] = auc_keras
sub_model(team, hyperparam)