# Matthew's CNN Notebook

## Overview
In this notebook I'll be creating a baseline CNN, and iterating off of that model.

## Preparing the Data


In [24]:
# Import statements
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
import seaborn as sns
from tensorflow.keras.optimizers import SGD
# Instantiating a generator object and normalizing the RGB values
traingen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
testgen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
valgen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)

# Creating the generator for the training data
train_data = traingen.flow_from_directory(
    # Specifying location of training data
    directory='../input/chest-xray-pneumonia/chest_xray/train',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42
)
# Creating the generator for the testing data
test_data = testgen.flow_from_directory(
    # Specifying location of testing data
    directory='../input/chest-xray-pneumonia/chest_xray/test',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42,
    shuffle=False
)

# Setting aside a validation set
val_data = valgen.flow_from_directory(
    # Specifying location of testing data
    directory='../input/chest-xray-pneumonia/chest_xray/val',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42
)

## Baseline CNN

In [None]:
# Create model
base_cnn = keras.Sequential()

# Add single Conv2D and MaxPool layer
base_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
base_cnn.add(keras.layers.MaxPool2D(2, 2))

base_cnn.add(keras.layers.Flatten())
base_cnn.add(keras.layers.Dense(1, 'sigmoid'))


#Compile model
base_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    metrics=['acc']
    
)

# Fit Model to Training
base_cnn_results = base_cnn.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data
)

### Conclusion
This is a good result for a first baseline model, but some obvious issues just from looking at these results:

- The model is overfitting
- Validation accuracy is bouncing all over the place, instead of consistently improving.

There are several things that could be done from here, so let's move on to something a little more robust.

## Deeper CNN

To start, I'm just going to add more layers to the network.

In [None]:
# Create model
deep_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
deep_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
deep_cnn.add(keras.layers.Conv2D(64, (2, 2), activation='relu'))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))

# Third layer with 96 filters
deep_cnn.add(keras.layers.Conv2D(96, (2, 2), activation='relu'))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))
# Flatten layers, and add Densley connected layers for prediction
deep_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes
deep_cnn.add(keras.layers.Dense(32, activation='relu'))

# Dense layer with 64 nodes
deep_cnn.add(keras.layers.Dense(64, activation='relu'))

# Dense layer with 96 nodes
deep_cnn.add(keras.layers.Dense(96, activation='relu'))

# Sigmoid output layer
deep_cnn.add(keras.layers.Dense(1, 'sigmoid'))


#Compile model
deep_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
deep_cnn_results = deep_cnn.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

### Conclusion
I added additional metrics on this model for more insights into the results of the training proccess. As far as performance goes it's definetly an improvement from the last model in terms of validation accuracy.

Some other notes about the model:
- The model is still overfitting
- The validation accuracy is not conistently improving
- Validation recall is very high, ~97% of true positives were identified correctly. This is good, since we decided that, in context of our buisness problem, false negatives are more costly then false positives.

Lets do some tuning to address the overfitting issues.

### Deeper CNN with Dropout Layers
I'm going to add dropout layers to the model in order to combat the rampant overfitting in my data.

In [None]:
# Create model
r_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
r_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
r_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
r_cnn.add(keras.layers.Conv2D(64, (2, 2), activation='relu'))
r_cnn.add(keras.layers.MaxPool2D(2, 2))

# Third layer with 96 filters
r_cnn.add(keras.layers.Conv2D(96, (2, 2), activation='relu'))
r_cnn.add(keras.layers.MaxPool2D(2, 2))
# Flatten layers, and add Densley connected layers for prediction
r_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
r_cnn.add(keras.layers.Dense(32, activation='relu'))
r_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
r_cnn.add(keras.layers.Dense(64, activation='relu'))
r_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
r_cnn.add(keras.layers.Dense(96, activation='relu'))
r_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
r_cnn.add(keras.layers.Dense(1, 'sigmoid'))


#Compile model
r_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
r_cnn_results = r_cnn.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

## Conclusion

In [None]:
# Visualizing results

# Creating figure with 2 subplots
fig, (ax1, ax2) = plt.subplots(1,2,figsize=(16, 8))

# Geting training history from results
history = r_cnn_results.history

# Ploting on first subplot
ax1.plot(history['loss'])
ax1.plot(history['val_loss'])

# Labeling 
ax1.xaxis.set_label('Epochs')
ax1.yaxis.set_label('Loss')
ax1.legend(['loss', 'val_loss'])

# Ploting on second subplot
ax2.plot(history['acc'])
ax2.plot(history['val_acc'])

# Labeling 
ax1.xaxis.set_label('Epochs')
ax1.yaxis.set_label('Accuracy')
ax2.legend(['Accuracy', 'Val_acc'])

fig.suptitle('Loss and Accuracy of Model');

The dropout layer definetly reduced the overfitting that was occuring. Though still some other issues with this model, specifically, **the validation loss sharply rises in the last few epochs**. I'm going to add early stopping and checkpoints to the model to help with this issue.

## CNN with Early Stopping and More Training
Since I'm implementing early stopping, I'll also add some epochs / steps to the training process

In [None]:
# Create early stopping object
early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=3),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]

# Create model
es_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
es_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
es_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
es_cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
es_cnn.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
es_cnn.add(keras.layers.Conv2D(96, (5, 5), activation='relu'))
es_cnn.add(keras.layers.MaxPool2D(5, 5))
# Flatten layers, and add Densley connected layers for prediction
es_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
es_cnn.add(keras.layers.Dense(32, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
es_cnn.add(keras.layers.Dense(64, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
es_cnn.add(keras.layers.Dense(96, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
es_cnn.add(keras.layers.Dense(1, 'sigmoid'))


#Compile model
es_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
es_cnn_results = es_cnn.fit_generator(train_data,
                              steps_per_epoch=150,
                              epochs=25,
                              validation_data=test_data,
                              callbacks=early_stopping)

## Conclusion

Early stopping is working as intended, however, I've noticed that the first few epochs always have the same validation accuracy: 0.6250.

The model may be finding the local minimim instead of the global in these epochs. I'll try to tune the learning rate of my optimizer, and seeeing if that changes anything. I will also try to introduce class weights to the model, as that may help with the problem as well.

# Optimizer and Misc Tuning

In [4]:
#### # Create model
op_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
op_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
op_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
op_cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
op_cnn.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
op_cnn.add(keras.layers.Conv2D(96, (5, 5), activation='relu'))
op_cnn.add(keras.layers.MaxPool2D(5, 5))
# Flatten layers, and add Densley connected layers for prediction
op_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
op_cnn.add(keras.layers.Dense(32, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
op_cnn.add(keras.layers.Dense(64, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
op_cnn.add(keras.layers.Dense(96, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
op_cnn.add(keras.layers.Dense(1, 'sigmoid'))

# Create early stopping object
op_early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]

# Create optimizer
optim = SGD(learning_rate=0.001, momentum=0.9, nesterov=True)

# Creating class weights
weights = {
    0: 2.88, # NORMAL
    1: 1.    # PNEM
}
#Compile model
op_cnn.compile(
    loss='binary_crossentropy',
    optimizer=optim,
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
op_cnn_results = op_cnn.fit_generator(train_data,
                              class_weight=weights,
                              steps_per_epoch=50,
                              epochs=100,
                              validation_data=test_data,
                              callbacks=op_early_stopping,)

In [5]:
op_cnn.evaluate(test_data)

## Conclusion

In [2]:
# Visualizing results

# Creating figure with 2 subplots
fig, (ax1, ax2) = plt.subplots(1,2,figsize=(16, 8))

# Geting training history from results
history = op_cnn_results.history

# Ploting on first subplot
ax1.plot(history['loss'])
ax1.plot(history['val_loss'])

# Labeling 
ax1.xaxis.set_label('Epochs')
ax1.yaxis.set_label('Loss')
ax1.legend(['loss', 'val_loss'])

# Ploting on second subplot
ax2.plot(history['acc'])
ax2.plot(history['val_acc'])

# Labeling 
ax1.xaxis.set_label('Epochs')
ax1.yaxis.set_label('Accuracy')
ax2.legend(['Accuracy', 'Val_acc'])

fig.suptitle('Loss and Accuracy of Model');

The early stopping worked great this time, and the changes to the optimizer, as well as adding class weights,has had a positive impact on the model. Validation accuracy is now sitting around ~87%. I'm going to add more layers and padding to the conv layers next, and see if that helps.

# ROBUST CNN

In [None]:
# Create model
rob_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
rob_cnn.add(keras.layers.Conv2D(32, (2, 2), padding='same', activation='relu', input_shape=(150, 150, 3)))
rob_cnn.add(keras.layers.MaxPool2D(2, 2))

rob_cnn.add(keras.layers.Conv2D(32, (2, 2), padding='same', activation='relu'))
rob_cnn.add(keras.layers.MaxPool2D(2, 2))
# Second layer with 64 filters
rob_cnn.add(keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'))
rob_cnn.add(keras.layers.MaxPool2D(3, 3))

# Second layer with 64 filters
rob_cnn.add(keras.layers.Conv2D(64, (3, 3), padding='same', activation='relu'))
rob_cnn.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
rob_cnn.add(keras.layers.Conv2D(96, (5, 5), padding='same', activation='relu'))
rob_cnn.add(keras.layers.MaxPool2D(3, 3))


# Flatten layers, and add Densley connected layers for prediction
rob_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer

rob_cnn.add(keras.layers.Dense(32, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))

rob_cnn.add(keras.layers.Dense(32, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))
# Dense layer with 64 nodes with dropout layer
rob_cnn.add(keras.layers.Dense(64, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))

rob_cnn.add(keras.layers.Dense(64, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
rob_cnn.add(keras.layers.Dense(96, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))

rob_cnn.add(keras.layers.Dense(96, activation='relu'))
rob_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
rob_cnn.add(keras.layers.Dense(1, 'sigmoid'))

# Create early stopping object
rob_early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=6, restore_best_weights=True),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]

# Create optimizer
optim = SGD(learning_rate=0.001, momentum=0.9, nesterov=True)

# Creating class weights
weights = {
    0: 2.88, # NORMAL
    1: 1.    # PNEM
}
#Compile model
rob_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
rob_cnn_results = rob_cnn.fit_generator(train_data,
                              class_weight=weights,
                              steps_per_epoch=75,
                              epochs=100,
                              validation_data=test_data,
                              callbacks=rob_early_stopping,)

## Conclusion
In the end, it does not look like these things have improved the model, so the previous model will be our final model for now.