# Lab Six -  Convolutional Network Architectures
Amory Weinzierl, Fidelia Nawar, and Hayden Center

In this lab, you will select a prediction task to perform on your dataset, evaluate a deep learning architecture and tune hyper-parameters. If any part of the assignment is not clear, ask the instructor to clarify. 

This report is worth 10% of the final grade. Please upload a report (<b>one per team</b>) with all code used, visualizations, and text in a rendered Jupyter notebook. Any visualizations that cannot be embedded in the notebook, please provide screenshots of the output. The results should be reproducible using your report. Please carefully describe every assumption and every step in your report.

<b>Dataset Selection</b>

Select a dataset identically to lab two (images). That is, the dataset must be image data. In terms of generalization performance, it is helpful to have a large dataset of identically sized images. It is fine to perform binary classification or multi-class classification.

## Preparation (3 pts)

- [<b>1.5 points</b>] Choose and explain what metric(s) you will use to evaluate your algorithm’s performance. You should give a <b>detailed argument for why this (these) metric(s) are appropriate on your data. That is, why is the metric appropriate</b> for the task (e.g., in terms of the business case for the task). Please note: rarely is accuracy the best evaluation metric to use. Think deeply about an appropriate measure of performance.
- [<b>1.5 points</b>] Choose the method you will use for dividing your data into training and testing (i.e., are you using Stratified 10-fold cross validation? Shuffle splits? Why?). <b>Explain why your chosen method is appropriate or use more than one method as appropriate</b>. Convince me that your cross validation method is a realistic mirroring of how an algorithm would be used in practice. 

In [None]:
# Importing packages and reading in dataset
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras as keras

print('Pandas:', pd.__version__)
print('Numpy:',  np.__version__)
print('Tensorflow:', tf.__version__)
print('Keras:',  keras.__version__)

In [None]:
%%time

#source: https://www.geeksforgeeks.org/how-to-convert-images-to-numpy-array/
from PIL import Image

#source: https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory
from pathlib import Path

#directory name
paths = {
    "TRAIN": './Coronahack-Chest-XRay-Dataset/train/',
    "TEST":  './Coronahack-Chest-XRay-Dataset/test/'    
}
metadata = pd.read_csv('Chest_xray_Corona_Metadata.csv')

h, w = 64, 64

tf.random.set_seed(2)
np.random.seed(0) # using this to help make results reproducible

#shuffle data
# data = data.sample(frac=1)

# Define features and target
images = metadata[["X_ray_image_name", "Dataset_type"]]
X_data = []
y_data = metadata["Label"]
for idx, img in images.iterrows():
    name = img["X_ray_image_name"]
    path = img["Dataset_type"]
    img_arr = np.asarray(Image.open(paths[path] + name).convert('L').resize((h,w)))
    X_data.append(img_arr)

In [None]:
from sklearn import preprocessing

le = preprocessing.LabelEncoder()
X = np.expand_dims(np.array(X_data), axis=-1)/255 - 0.5
y = le.fit_transform(np.array(y_data))

# X = X/255 - 0.5

print(X.shape, y.shape)

In [None]:
import matplotlib.pyplot as plt

display_imgs = np.concatenate((X[0:9], X[-9:]))
labels = np.concatenate((y_data[0:9], y_data[-9:]))
def plot_gallery(images, titles, h, w, n_row=3, n_col=3):
    plt.figure(figsize=(n_col * n_col, 6 * n_row))
    plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    #normal scans tended towards front
    for i in range(n_row * n_col):
        plt.subplot(n_row * 2, n_col, i + 1)
        plt.imshow(images[i], cmap=plt.cm.gray)
        plt.title(titles[i], size=12)
        plt.xticks(())
        plt.yticks(())
    #pnemonia scans toward back so we pulled some from the back 
    #for demonstration purposes
    for j in range(n_row * n_col):
        plt.subplot(n_row * 2, n_col, n_row * n_col + j + 1)
        plt.imshow(images[-1*j], cmap=plt.cm.gray)
        plt.title(titles[-1*j], size=12)
        plt.xticks(())
        plt.yticks(())
        
plot_gallery(display_imgs, labels, 100, 100)

#### Evaluation Metric

The primary evaluation metrics we are using for our model are recall and precision. Recall measures the percentage of positive cases that were identified correctly, and precision measures the percentage of positive predictions that were correct.

These metrics emphasizes correct positive identifications, which is applicable to evaluate our solution because we want to minimize the amount of undetected pneumonia lungs, though recall is the more important metric, as it can be used to minimize the false negative rate. Having a low false negative rate is important in this situation because a diagnosis of a "Normal" lung condition when it is in fact penumonia is detrimental and possibly fatal to the patient. On the same token, it's necessary that healthy lungs are not misclassified as pneumonia because that would create unnecessary issues for a healthy patient. Because of this, we chose to use recall and precision, specifically the native Keras implementation of both, to evaluate our CNN solution.

#### Dividing Data

We are using stratified 10-fold cross validation in order to split up the data into training and test sets. We chose to use this method because almost 3/4 of our the lungs in our dataset are labeled as having pneumonia, whereas only 1/4 is labeled as healthy. Thus, if we did a random split/shuffle, there may be disproportionate amounts of pneumonia classification in the training variables, which would make the classification for the testing data less accurate. With 
stratified 10-fold cross validation, we can make a more effective model and also help with generalizing. It allows us to select training and testing sets while also decreasing overall variance because of the 10 folds, which will fit each CNN on each fold. This would be a realistic measuring of a real-world application of the algorithm because with smaller test sets, there is higher variance. Stratified cross validation reduces this variance by averaging over k different partitions, so the performance estimate is less sensitive to the partitioning of the data. We also chose 10 folds because this value has been shown empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance. 

## Modeling (6 pts)

- [<b>1.5 points</b>]  Setup the training to use data expansion in Keras. Explain why the chosen data expansion techniques are appropriate for your dataset. 
- [<b>2 points</b>] Create a convolutional neural network to use on your data using Keras. Investigate at least two different convolutional network architectures (and investigate changing some parameters of each architecture--at minimum have two variations of each network for a total of four models trained). Use the method of train/test splitting and evaluation metric that you argued for at the beginning of the lab. Visualize the performance of the training and validation sets per iteration (use the "history" parameter of Keras).
- [<b>1.5 points</b>] Visualize the final results of the CNNs and interpret the performance. Use proper statistics as appropriate, especially for comparing models. 
- [<b>1 points</b>] Compare the performance of your convolutional network to a standard multi-layer perceptron (MLP) using the receiver operating characteristic and area under the curve. Use proper statistical comparison techniques.  

We are using Keras's built in ImageDataGenerator for our data expansion. In reshaping all of our images to 128x128, many of the images were already stretched and squashed in different directions, and so expanding our dataset to stretch and squash them more randomly will hopefully remove any hidden biases that the different image sizes may have created. Additionally, since all of the xrays are more or less similarly oriented, we can add a slight rotational adjustment. However, since the images should all be uniquely oriented horizontally (because the heart is always located to one side of the body) and vertically (all of the images have the patients neck and shoulders on the top side of the image), it would not be useful to flip the images.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=5,
    width_shift_range=0.1,
    height_shift_range=0.1)

datagen.fit(X)

In [None]:
from tensorflow.keras.layers       import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers       import Dense, Dropout, Flatten, Activation, Average, BatchNormalization
from tensorflow.keras.models       import Model, Sequential
from tensorflow.keras.callbacks    import EarlyStopping
from tensorflow.keras.utils        import plot_model
from tensorflow.keras.regularizers import l2
from sklearn.model_selection import StratifiedKFold

loss = 'binary_crossentropy'
optimizer = 'rmsprop'
metrics = [keras.metrics.Precision(), keras.metrics.Recall()]
batch_size = 128
epochs = 3
verbose = 1
n_splits = 1
kf = StratifiedKFold(n_splits=2, shuffle=True, random_state=1234)

In [None]:
def plot_histories(histories):
    plt.figure(figsize=(15,8))
    for fold_no, history in enumerate(histories):
        keys = list(history.history.keys())
        
        plt.subplot(n_splits,3,3*fold_no+1)
        plt.plot(history.history[keys[0]])
        plt.title('Binary Crossentropy')
        plt.ylim(0.25, 1.25)
        plt.ylabel('Fold #'+str(fold_no))

        plt.subplot(n_splits,3,3*fold_no+2)
        plt.plot(history.history[keys[1]])
        plt.title('Precision')
        plt.ylim(0.7, 1)

        plt.subplot(n_splits,3,3*fold_no+3)
        plt.plot(history.history[keys[2]])
        plt.title('Recall')
        plt.ylim(0.7, 1)

### Model 1 - Basic Architecture

In [None]:
def basic_model(l2_lambda):
    reg = l2(l2_lambda)
    print("Basic Architecture")
    print("L2 Lambda:", l2_lambda,'\n')

    fold_no = 0
    histories = []
    eval_scores = []
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        cnn = Sequential()

        cnn.add(Conv2D(filters=32,
                    kernel_size=(3,3),
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=32,
                    kernel_size=(3,3),
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(MaxPooling2D(pool_size=(2, 2)))

        cnn.add(Conv2D(filters=64,
                    kernel_size=(3,3),
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=64,
                    kernel_size=(3,3),
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(MaxPooling2D(pool_size=(2, 2)))

        cnn.add(Dropout(0.25))
        cnn.add(Flatten())
        cnn.add(Dense(128, activation='relu',
                    kernel_regularizer=reg))
        cnn.add(Dropout(0.5))
        cnn.add(Dense(1, activation='sigmoid',
                    kernel_regularizer=reg))

        cnn.compile(loss=loss,
                    optimizer=optimizer,
                    metrics=metrics)

        print('Fold',fold_no)
        print('')  
        history = cnn.fit(datagen.flow(X_train, y_train, batch_size=batch_size), 
                    steps_per_epoch=int(len(X_train)/batch_size),
                    epochs=epochs, verbose=verbose)

        print('')
        scores = cnn.evaluate(X_test, y_test, verbose=verbose)
        print('-' * 110)

        histories.append(history)
        eval_scores.append(scores)

        fold_no += 1

    eval_scores = np.array(eval_scores)
    print("Average Performance")
    print(f"Precision:  {round(np.mean(eval_scores[:,1]), 5)}")
    print(f"Recall:     {round(np.mean(eval_scores[:,2]), 5)}")
    
    return histories

#### Variation 1

In [None]:
%%time

histories = basic_model(0.0001)

In [None]:
plot_histories(histories)

#### Variation 2

In [None]:
%%time

histories = basic_model(0.00001)

In [None]:
plot_histories(histories)

#### Variation 3

In [None]:
%%time

histories = basic_model(0.000001)

In [None]:
plot_histories(histories)

### Model 2 - Network in Network Architecture

In [None]:
# Architecture based on https://www.kaggle.com/bingdiaoxiaomao/network-in-network-nin-with-keras

def nin_model(l2_lambda):
    reg = l2(l2_lambda)
    print("NiN Architecture")
    print("L2 Lambda:", l2_lambda,'\n')

    fold_no = 0
    histories = []
    eval_scores = []
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        cnn = Sequential()

        cnn.add(Conv2D(filters=192,
                    kernel_size=5,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=160,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=96,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same'))
        
        cnn.add(Dropout(0.2))

        cnn.add(Conv2D(filters=192,
                    kernel_size=5,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=192,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=192,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(MaxPooling2D(pool_size=(3,3),strides=(2,2),padding='same'))
        
        cnn.add(Dropout(0.2))

        cnn.add(Conv2D(filters=192,
                    kernel_size=3,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=192,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        cnn.add(Conv2D(filters=2,
                    kernel_size=1,
                    kernel_regularizer=reg,
                    padding='same',
                    activation='relu'))
        
        cnn.add(GlobalAveragePooling2D())
        cnn.add(Activation('softmax'))   

        cnn.compile(loss=loss,
                    optimizer=optimizer,
                    metrics=metrics)

        print('Fold',fold_no)
        print('')  
        # There seems to be an issue with this architecture for a 1D output for binary classification, so it is modified
        # output a 2D output as if it were a multiclass classifier
        history = cnn.fit(datagen.flow(X, np.array(pd.get_dummies(y)), batch_size=batch_size), 
                    steps_per_epoch=int(len(X)/batch_size),
                    epochs=epochs, verbose=verbose)

        print('')
        scores = cnn.evaluate(X_test, y_test, verbose=verbose)
        print('-' * 110)
        
        histories.append(history)
        eval_scores.append(scores)

        fold_no += 1

    eval_scores = np.array(eval_scores)
    print("Average Performance")
    print(f"Precision:  {round(np.mean(eval_scores[:,1]), 5)}")
    print(f"Recall:     {round(np.mean(eval_scores[:,2]), 5)}")
    
    return histories

In [None]:
%%time

histories = nin_model(0.0001)

In [None]:
plot_histories(histories)

In [None]:
%%time

histories = nin_model(0.00001)

In [None]:
plot_histories(histories)

In [None]:
%%time

histories = nin_model(0.000001)

In [None]:
plot_histories(histories)

## Exceptional Work (1 pt)

- You have free reign to provide additional analyses. 
- One idea (<b>required for 7000 level students</b>): Use transfer learning to pre-train the weights of your initial layers of your CNN. Compare the performance when using transfer learning to training without transfer learning (i.e., compare to your best model from above) in terms of classification performance. 