In [93]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

#import numpy as np # linear algebra
#import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

#import os
#for dirname, _, filenames in os.walk('/kaggle/input'):
    #for filename in filenames:
        #print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## **Technical Summary**


Our modelling process is composed entirely by a series of neural networks. As with most models, there are a whole host of possible hyperparameters to tune and variety of networks to build. We started by building a simple multilayer perceptron (MLP) model with one hidden layer, in order to obtain a baseline model. We decided to use reLu as the activation function for all layers but the output layer, due to its tendency to prevent activation of all the neurons in a layer at a time, which often yields better results. We used the sigmid function as the activator function in the output layer, since this is a binary classification problem. This first simple MLP model had a big overfitting problem and high loss, so in the next couple of model iterations, we decided to add another hidden layer, and then some dropout layers, with the hope that the second layer would help the network pick up on more patterns and reduce overfitting. There was still a significant overfitting and loss problem, so we decided to move on to a new type of neural network: Convolutional Neural Networks (CNNs). CNNs introduce a type of filtering to images, which helps the network to pick up on patterns, such as edges differences, which might be useful in distinguishing between the different classes of images. We tuned various CNN models by using different optimizers (Adam and Stochastic Gradient Descent), trying different numbers of convolution and dense layers, adding dropout layers, implementing early stopping, testing out different learning rates and values for momentum, and adding class weights to account for the class imbalance (there were approximately 2.88 times as many pneumonia x-rays as there were normal x-rays). In the end, a CNN model with three convolution layers, three dense hidden layers, dropout layers, a Stochastic Gradient Descent optimizer with a learning rate of 0.001 and momentum of 0.9, early stopping, and class weights resulted in the best model, with a training and testing accuracy of around 88%.

## **Business Understanding**
Cyclops Hospital Network (CHN) owns 4 inpatient hospitals and 27 urgent care centers. The 4 inpatient hospitals each have a pediatric emergency room. The 27 urgent care centers are equipped to perform X-ray and CT imaging and diagnosis/treat peditric patients as well. Overall, CHN thus has 31 locations where pediatric patients who potentially have viral or bacterial pneumonia may seek diagnosis and care. Given certain symptoms and the severity of those symptoms, many of those patients will undergo X-ray imaging of the thorax. Given that a radiologist has a maximum of 12 hours to review these images and the initial assessment of the imaging is done by either an emergency room physician or an urgent care practitioner, who may be less accurate in diagnosing pneumonia via imaging, CHN wishes to create a decision support tool (DST) using a neural network in order to check the assessment of the emergency room and urgent care physicians. 

This DST will help to prevent doctors from missing important diagnoses and sending patients home with lack of care. Given the wide range of timescales during which pneumonia can develop, the similarly wide range of severity of symptoms and the specific dillema many pediatric patients experience in verbalizing their symptoms/health status, this DST will protect at-risk patients from being sent home without care to potentially worsen before a radiologist reviews her or his imaging. Additionally, this DST will protect CHN from malpractice suits that could result from this lack of diagnosis and subsequent lack of care.

## Directory Setup
The purpose of this section will demostrate and justify our data preperations steps used on our data. In our preperation we set up generators for our data, and in the generators we also re-scale and re-size the images before they are fed into our model.

Due to the large size of our dataset, we chose to use Kaggle as a place to store our data on the cloud. Our directory setup was structured in the following format: 
```
├── chest-xray                    <- Top level directory
│   ├── test                      <- Test set images
│   │   ├── Normal                <- Normal lung photos      
│   │   │   └── ...
│   │   └──  Pneumonia            <- Pneumonia lung photos
│   │   │   └── ...
│   ├── train                     <- Training set images
│   │   ├── Normal                <- Normal lung photos 
│   │   │   └── ...
│   │   └──  Pneumonia            <- Pneumonia lung photos
│   │   │   └── ...
│   ├── val                       <- Training set images
│   │   ├── Normal                <- Normal lung photos 
│   │   │   └── ...
│   │   └──  Pneumonia            <- Pneumonia lung photos
│   │   │   └── ...             
```
This setup allows us to use Kera's Image Data Generator to load our data. We chose to use a generator for 3 reasons:
- Saving memory and disk space by not downloading the dataset
- Integrating the preproccesing into our modeling process
- Easy re-sizing and rescaling of images
Using generators also allows more easily reproducable results. Since images fed into our model this way do not have to be preproccessed beforehand.

In [94]:
# Import statements
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import keras
import seaborn as sns
from tensorflow.keras.optimizers import SGD

In [95]:
# Instantiating a generator object and normalizing the RGB values
traingen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
testgen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)
valgen = keras.preprocessing.image.ImageDataGenerator(rescale=1/255)

# Creating the generator for the training data
train_data = traingen.flow_from_directory(
    # Specifying location of training data
    directory='../input/chest-xray-pneumonia/chest_xray/train',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42
)
# Creating the generator for the testing data
test_data = testgen.flow_from_directory(
    # Specifying location of testing data
    directory='../input/chest-xray-pneumonia/chest_xray/test',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42
)

# Setting aside a validation set
val_data = valgen.flow_from_directory(
    # Specifying location of testing data
    directory='../input/chest-xray-pneumonia/chest_xray/val',
    # Re-sizing images to 150x150
    target_size=(150, 150),
    # Class mode to binary to recoginize the two directories "NORMAL" and "PNEUMONIA" as the labels
    class_mode='binary',
    batch_size=20,
    seed=42
)

## **Data Understanding**
- Because we want to train a neural network to help identify whether or not a subject has pneumonia or not based on a chest X-ray, this dataset of 5,232 chest X-rays from children will help us train the network and so that it can be of use to doctors. There are 3,883 pneumonia x-rays and 1,349 normal ones, so there is a class imbalance issue. Additionally, each image is a different size, so it is necessary to standardize the images before modelling. 
- In the context of this data, a false positive would mean that the neural network identifies an x-ray as showing evidence of pneumonia, when it is really a normal x-ray. A false negative would mean that the neural network identifies a pneumonia image as being normal.

In [96]:
# Putting class information into a dataframe for easy visualizing
classes = pd.DataFrame(train_data.classes)
values = classes.value_counts()


In [97]:
# Giving the data better labels for visualization
class_dict = {0:'Normal', 1:'Pneumonia'}
classes[0] = classes[0].map(class_dict)


In [98]:
# Looking at the distribution of values between x-rays that show pneumonia and those that don't
diag = classes[0].value_counts()
diag

In [99]:
# Creating a bargraph to visualize the class imbalance
plt.figure(figsize=(12,8))
sns.set(font_scale=1.4)
sns.barplot(diag.index, diag.values)
plt.ylabel("Number of Images")
plt.title('Distribution of Chest X-ray Images');

**Visualize Transformation**

We will visualize the first 10 items in the training data set to check that all transformations to the images were done correctly.

In [100]:
# Visualize
train_batch = train_data.next()
fig, axes = plt.subplots(2, 5, figsize=(16, 8))
    
for i in range(10):
    # Load image into numpy array and re-scale
    img = np.array(train_batch[0][i] * 255, dtype='uint8')
    ax = axes[i // 5, i % 5]
    ax.imshow(img)
fig.suptitle('Training Images')
plt.tight_layout()
plt.show()

## **Multilayer Perceptron Models**

We will start by setting up a baseline multi-layer perceptron model, because this is the most simple kind of Neural Network. We will iterate off of the results, building adjusted models with the goal of obtaining an improved model each time.

### **Baseline Multilayer Perceptron Model**

In [101]:
# Setting up a baseline MLP
baseline = keras.Sequential(
    [
        keras.Input(shape=(150,150,3)), 
        keras.layers.Flatten(), 
        keras.layers.Dense(100, activation="relu"),
        keras.layers.Dense(1, activation="sigmoid"),
    ])

baseline.summary()

In [102]:
# Compiling the baseline MLP
baseline.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'])

In [103]:
baseline_results = baseline.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data,)


In [104]:
# Visualizing results
# Creating figure with 2 subplots
def visualize_results(results):
    '''Note regarding this function: the term used to refer to the metrics for testing data is "val", because this is
    the term used by the neural network models to refer to the testing data metrics (this can be seen by looking at 
    the data that is printed out after each epoch. The testing data, not the validation data, is what is used after each model
    iteration to look at extent of overfitting. Unless otherwise specified, when the graphs that result from this 
    function say anything about "val" or "validation," it is referring to testing data." ) '''
    
    fig, (ax1, ax2) = plt.subplots(1,2,figsize=(16, 8))
    # Geting training history from results
    history = results.history
    # Ploting on first subplot
    ax1.plot(history['loss'])
    ax1.plot(history['val_loss'])
    # Labeling 
    ax1.xaxis.set_label('Epochs')
    ax1.yaxis.set_label('Loss')
    ax1.legend(['loss', 'val_loss'])
    # Ploting on second subplot
    ax2.plot(history['acc'])
    ax2.plot(history['val_acc'])
    # Labeling 
    ax1.xaxis.set_label('Epochs')
    ax1.yaxis.set_label('Accuracy')
    ax2.legend(['Accuracy', 'Val_acc'])
    fig.suptitle('Loss and Accuracy of Model')

**Important Note about above function**

The term used to refer to the metrics for testing data is "val", because this is the term used by the neural network models to refer to the testing data metrics (this can be seen by looking at the data that is printed out after each epoch). The testing data, not the validation data, is what is used after each model iteration to look at the extent of overfitting. Unless otherwise specified, when the graphs that result from this function say anything about "val" or "validation," it is referring to testing data.

In [105]:
def oes_matrix(results):
    """
    Plots a confusion matrix using the results atrribute of a Keras history object
  
    Parameters:
    results (keras.callbacks.History): 
  
    Returns:
    None
    """
    
    conf = np.array([[results.history['true_positives'][-1], results.history['false_negatives'][-1]], [results.history['false_positives'][-1], results.history['true_negatives'][-1]]])
    fig, ax = plt.subplots(figsize=(10, 8))
    heat = sns.heatmap(conf.astype('int'), annot=True, fmt='g', ax=ax )
    heat.set_xticklabels(['Pneumonia', 'Normal'], fontsize=15)
    heat.set_yticklabels(['Pneumonia', 'Normal'], fontsize=15)
    plt.ylabel('True Label',fontsize=18)
    plt.xlabel('Predicted Label', fontsize=18)

In [106]:
visualize_results(baseline_results)

**Conclusion**

This model is overfitting; training accuracy is around 95%, while testing data is at around 80%. Additionally, loss for testing data is oscillating too much. Adding another dense layer might help the model pick up on important patterns in the images.

### **Adding another layer to the baseline MLP**

In [107]:
# Setting up a two layer MLP
two_hidden = keras.Sequential(
    [
        keras.Input(shape=(150,150,3)), 
        keras.layers.Flatten(), 
        keras.layers.Dense(100, activation="relu"),
        keras.layers.Dense(50, activation="relu"),
        keras.layers.Dense(1, activation="sigmoid"),
    ])

two_hidden.summary()

In [108]:
# Compiling the two hidden layer MLP
two_hidden.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'])

In [109]:
# Fitting the two hidden layer MLP
two_hidden_results = two_hidden.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

In [110]:
visualize_results(two_hidden_results)

**Conclusion**

Once again, the model is overfit. Training data accuracy is around thirty points higher than testing accuracy. No type of regularization has yet been added, so adding some dropout layers may be improve accuracy and reduce aoverfitting, as this works as a type of regulariation. 

### **Adding Dropout layers to the two layer MLP**

In [112]:
two_hidden_dropout = keras.Sequential(
    [
        keras.Input(shape=(150,150,3)), 
        keras.layers.Flatten(), 
        keras.layers.Dense(100, activation="relu"),
        keras.layers.Dropout(0.25),
        keras.layers.Dense(50, activation="relu"),
        keras.layers.Dropout(0.25),
        keras.layers.Dense(1, activation="sigmoid"),
    ])

two_hidden_dropout.summary()

In [113]:
# Compiling the two hidden layer dropout MLP
two_hidden_dropout.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'])

In [114]:
# Fitting the two hidden layer dropout MLP
two_hidden_dropout_results = two_hidden_dropout.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

In [115]:
visualize_results(two_hidden_dropout_results)

**Conclusion**

While this model is less overfit than the previous iteration (training data accuracy is about ten percentage points higher than testing accuracy), it is much less accurate than the previous model. Additionally, test accuracy does not improve at all over the epochs. Testing loss is oscillating too frequently and with too great a magnitude. Adding some convolution layers might help to filter the images so as to help the model focus on the most important patterns.

## **Convolutional Neural Network (CNN) Model Iterations**

The MLP models we have run so far are a good start, but in order to get a network which really capture all of the detail of the images and can pick up on patterns, we need to run some CNNs.

In [None]:
# Create model
deep_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
deep_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
deep_cnn.add(keras.layers.Conv2D(64, (2, 2), activation='relu'))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))

# Third layer with 96 filters
deep_cnn.add(keras.layers.Conv2D(96, (2, 2), activation='relu'))
deep_cnn.add(keras.layers.MaxPool2D(2, 2))
# Flatten layers, and add Densley connected layers for prediction
deep_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes
deep_cnn.add(keras.layers.Dense(32, activation='relu'))

# Dense layer with 64 nodes
deep_cnn.add(keras.layers.Dense(64, activation='relu'))

# Dense layer with 96 nodes
deep_cnn.add(keras.layers.Dense(96, activation='relu'))

# Sigmoid output layer
deep_cnn.add(keras.layers.Dense(1, 'sigmoid'))


#Compile model
deep_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
deep_cnn_results = deep_cnn.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

In [None]:
visualize_results(deep_cnn)

In [None]:
oes_matrix(deep_cnn)

**Conclusion**

I added additional metrics on this model for more insights into the results of the training proccess. As far as performance goes it's definetly an improvement from the last model in terms of testing accuracy.

Some other notes about the model:

- The model is still overfitting
- The testing accuracy is not conistently improving
- Testing recall (val_recall in the epochs) is very high, ~97% of true positives were identified correctly. This is good, since we decided that, in context of our buisness problem, false negatives are more costly then false positives.
- Lets change the optimizer to see if this will improve the overfitting issues.

### **CNN model with added layers and Adam optimizer**

In [None]:
# Set up for this CNN model is from this blog:  https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-cifar-10-photo-classification/
cnn_adam = keras.Sequential()
# We defined a variable input_shape earlier, can use that here
cnn_adam.add(keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(150,150,3)))
cnn_adam.add(keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
cnn_adam.add(keras.layers.MaxPooling2D((2, 2)))
cnn_adam.add(keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
cnn_adam.add(keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
cnn_adam.add(keras.layers.MaxPooling2D((2, 2)))
cnn_adam.add(keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
cnn_adam.add(keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
cnn_adam.add(keras.layers.MaxPooling2D((2, 2)))

# now, to get the proper output
cnn_adam.add(keras.layers.Flatten())
cnn_adam.add(keras.layers.Dense(128, activation='relu'))
cnn_adam.add(keras.layers.Dense(1, activation='sigmoid'))

cnn_adam.compile(loss='binary_crossentropy',
            optimizer="adam",
            metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives'])

In [None]:
cnn_adam_results = cnn_adam.fit_generator(train_data,
                              steps_per_epoch=100,
                              epochs=10,
                              validation_data=test_data)

In [None]:
visualize_results(cnn_adam_results)

**Conclusions**

This cnn model requires much improvement. It is overfitting to a larger degree than the previous model (the difference between training and testing data accuracy is around 11 percentage points higher in this model compared to the last).The double convolution layers before pooling may not be beneficial to the model, so the convolution strategy will look more like the previous model in the next iteration.

Additionally, the keras documentation for adam optimizers has a note discussing that for some types of CNN models, the default value for the hyperparameter epsilon in adam (1e-7) may not be the best; they suggest trying bigger values such as 0.1 or 1. This was attempted in a notebook called "Brooke Image Classification," but it was not beneficial to the ultimate progression of models, so it is not included here.

## **CNN with Dropout Layers, Early Stopping, and More Training**

In [None]:
# Create early stopping object
early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=3),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]

# Create model
es_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
es_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
es_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
es_cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
es_cnn.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
es_cnn.add(keras.layers.Conv2D(96, (5, 5), activation='relu'))
es_cnn.add(keras.layers.MaxPool2D(5, 5))
# Flatten layers, and add Densley connected layers for prediction
es_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
es_cnn.add(keras.layers.Dense(32, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
es_cnn.add(keras.layers.Dense(64, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
es_cnn.add(keras.layers.Dense(96, activation='relu'))
es_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
es_cnn.add(keras.layers.Dense(1, 'sigmoid'))

#Compile model
es_cnn.compile(
    loss='binary_crossentropy',
    optimizer='sgd',
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision']
    
)

# Fit Model to Training
es_cnn_results = es_cnn.fit_generator(train_data,
                              steps_per_epoch=150,
                              epochs=25,
                              validation_data=test_data,
                              callbacks=early_stopping)



In [None]:
visualize_results(es_cnn_results)

In [None]:
oes_matrix(visualize_results)

**Conclusion**

Early stopping is working as intended, however, I've noticed that the first few epochs always have the same testing accuracy: 0.6250.

The model may be finding the local minimim instead of the global in these epochs. I'll try to tune the learning rate of my optimizer, and seeeing if that changes anything. I will also try to introduce class weights to the model, as that may help with the problem as well.

## **Final Model**

In [None]:
#### # Create model
op_cnn = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
op_cnn.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
op_cnn.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
op_cnn.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
op_cnn.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
op_cnn.add(keras.layers.Conv2D(96, (5, 5), activation='relu'))
op_cnn.add(keras.layers.MaxPool2D(5, 5))
# Flatten layers, and add Densley connected layers for prediction
op_cnn.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
op_cnn.add(keras.layers.Dense(32, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
op_cnn.add(keras.layers.Dense(64, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
op_cnn.add(keras.layers.Dense(96, activation='relu'))
op_cnn.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
op_cnn.add(keras.layers.Dense(1, 'sigmoid'))

# Create early stopping object
op_early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]
# Create optimizer
optim = SGD(learning_rate=0.001, momentum=0.9, nesterov=True)

# Creating class weights
weights = {
    0: 2.88, # NORMAL
    1: 1.    # PNEM
}
#Compile model
op_cnn.compile(
    loss='binary_crossentropy',
    optimizer=optim,
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives']
    
)

# Fit Model to Training
op_cnn_results = op_cnn.fit_generator(train_data,
                              class_weight=weights,
                              steps_per_epoch=50,
                              epochs=100,
                              validation_data=test_data,
                              callbacks=op_early_stopping)

In [None]:
visualize_results(op_cnn_results)

In [None]:
oes_matrix(op_cnn_results)

**Conclusion**

The early stopping worked great this time, and the changes to the optimizer, as well as adding class weights,has had a positive impact on the model. Both training and testing accuracy is now sitting around ~88%, so the model is not overfit. Additionally, the testing loss (32%) is lower than the training loss (48%).

### **Testing the Final Model with the validation data**

In [None]:
#### # Create model
op_cnn_val = keras.Sequential()

# Adding first Conv2D and MaxPool layer, starting small and then growing larger.
op_cnn_val.add(keras.layers.Conv2D(32, (2, 2), activation='relu', input_shape=(150, 150, 3)))
op_cnn_val.add(keras.layers.MaxPool2D(2, 2))

# Second layer with 64 filters
op_cnn_val.add(keras.layers.Conv2D(64, (3, 3), activation='relu'))
op_cnn_val.add(keras.layers.MaxPool2D(3, 3))

# Third layer with 96 filters
op_cnn_val.add(keras.layers.Conv2D(96, (5, 5), activation='relu'))
op_cnn_val.add(keras.layers.MaxPool2D(5, 5))
# Flatten layers, and add Densley connected layers for prediction
op_cnn_val.add(keras.layers.Flatten())

# Dense layer with 32 nodes with dropout layer
op_cnn_val.add(keras.layers.Dense(32, activation='relu'))
op_cnn_val.add(keras.layers.Dropout(0.3))

# Dense layer with 64 nodes with dropout layer
op_cnn_val.add(keras.layers.Dense(64, activation='relu'))
op_cnn_val.add(keras.layers.Dropout(0.3))

# Dense layer with 96 nodes with dropout layer
op_cnn_val.add(keras.layers.Dense(96, activation='relu'))
op_cnn_val.add(keras.layers.Dropout(0.3))
# Sigmoid output layer
op_cnn_val.add(keras.layers.Dense(1, 'sigmoid'))

# Create early stopping object
op_early_stopping = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True),
    keras.callbacks.ModelCheckpoint(filepath='model.{epoch:02d}-{val_loss:.2f}.h5')
]
# Create optimizer
optim = SGD(learning_rate=0.001, momentum=0.9, nesterov=True)

# Creating class weights
weights = {
    0: 2.88, # NORMAL
    1: 1.    # PNEM
}


# Compile the model
op_cnn_val.compile(
    loss='binary_crossentropy',
    optimizer=optim,
    # Adding additonal metrics for better monitoring of training.
    metrics=['acc', 'Recall', 'Precision', 'TruePositives', 'TrueNegatives', 'FalsePositives', 'FalseNegatives']
    
)

# Fit Model to Training
op_cnn_val_results = op_cnn_val.fit_generator(train_data,
                              class_weight=weights,
                              steps_per_epoch=50,
                              epochs=100,
                              validation_data=val_data,
                              callbacks=op_early_stopping)

In [None]:
visualize_results(op_cnn_val_results)

**Conclusions**

Testing the final model out on the validation data (data the model has not yet seen) resulted in overfitting and more loss than when we used it on testing data. This may be because the validation set is so small, composed only of 16 images.

## **Overall Conclusions**

Throughout this process of neural network iterations, a series of different neural network models were created, and many different hyperparameters were tuned. In the end, a CNN model with three convolution layers, three dense hidden layers, dropout layers, a Stochastic Gradient Descent optimizer with a learning rate of 0.001 and momentum of 0.9, early stopping, and class weights resulted in the best model, with a training and testing accuracy at around 88%.  Additionally, the model has a recall of around 92%,and a precision of 89%, meaning that it does a good job at minimizing false negatives and false positives. What this translates to is that the model will make correct diagnoses around 90% of the time. This CNN model will do a good job with assisting ER physicians as a decision support tool with diagnosing pneumonia. There are also some further steps we would like to take in order to evaluate the efficacy of the CNN decision support tool, such as calculating the "case save rate" (number of cases wherein the ordering physician would have interpreted the xray incorrectly, released the patient and delayed care, BUT didn’t because the CNN decision support tool informed the physician that s/he may have been incorrect, resulting in immediate care) of the model. We would also like to estimate the monetary savings due to decrease in care delay and lawsuits. 