<h1> Chest X-Ray Image Classification of Pneumonia Diagnoses using Deep Learning Modeling CNNs </h1>

Authors: Ilan Haskel, Justin James, Roshni Janakiraman, Leif Schultz, and Brandon Sienkiewicz

---

Project Overview

In [1]:
from numpy.random import seed
seed(72)
import tensorflow as tf
tf.random.set_seed(72)

# Business Understanding

**Stakeholder:** Children's Hospital - Board of Directors

## Business Case 

Pneumonia is the primary cause of childhood hospitalization [[1]](https://pubmed.ncbi.nlm.nih.gov/25695124/). Pediatric pneumonia is lethal without proper treatment, accounting for 15% of all childhood deaths [[2]](https://www.who.int/en/news-room/fact-sheets/detail/pneumonia).  As the disease progresses, pediatric pneumonia requires a significant amount of hospital resources for treatment and poses a high cost of care for patients [[3]](https://www.sciencedirect.com/science/article/pii/S2352646719300274). Therefore, timely and accurate diagnosis of pneumonia is critical for successful treatment.

There are many complications to quickly and accurately diagnosing pneumonia. To determine the right treatment protocol, doctors need to determine whether pneumonia is *bacterial* or *viral* [[4]](https://www.nejm.org/doi/full/10.1056/NEJMoa1405870). Clinical features alone are not sufficient to accurately diagnose pneumonia [[5]](https://pneumonia.biomedcentral.com/articles/10.15172/pneu.2014.5/464#Sec4).

Chest X-Ray evaluation is the current gold standard for diagnosing pneumonia [[6]](https://academic.oup.com/cid/article/31/2/347/293404). However, this method has key limitations. Chest X-ray interpretation is labor intensive and prone to human error; there is a shortage of radiologists with sufficient training to read chest X-rays[[7]](https://www.thelancet.com/journals/landig/article/PIIS2589-7500(21)00106-0/fulltext). Even for experienced radiologists, reliability and accuracy scores range from 38-76% [[8]](https://www.ajronline.org/doi/10.2214/AJR.19.21521).

Artificial Intelligence can improve the process of diagnosing pneumonia accurately and efficiently. Computer aided diagnostic systems have shown reasonable accuracy in detecting infections from X-rays. When aided by AI, radiologists are significantly more accurate at diagnosing pneumonia compared to unassisted[[9]](https://www.thelancet.com/journals/landig/article/PIIS2589-7500(21)00106-0/fulltext). Computerized models can increase the overall efficiency of the diagnostic process by reducing work burden on radiologists. Quicker detection allows doctors to start treatment protocol sooner, which can reduce severity and duration of illness.  

## Goals of Current Project

We sought to create and optimize image classification models that could diagnose pneumonia from chest X-ray images. Our specific goals for these models were to:

1. Accurately distinguish pneumonia *positive* cases from *negative cases*
2. Given a new chest X-Ray image, accurately classify case as *bacterial pneumonia,* *viral pneumonia* or *non-pneumonia*
3. Minimize *false negative* diagnoses, given lethality of pneumonia without proper treatment
4. Increase the efficiency of the chest X-ray diagnostic process by deploying a quick, simple-to-run testing system.

---

## Data Understanding

### Data Description

### Data Exploration


Before reading in the data, we import all relevant packages. Since our data consists of images, we utilize the `image_dataset_from_directory` from Tensorflow. In order to use this data for analysis, the data was instantiated as images and labels and then converted to a Numpy array as it is an easier data type to work with.

In [2]:
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
from tensorflow.keras import datasets, layers, models, regularizers
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import seaborn as sns
from tensorflow.keras.callbacks import EarlyStopping

In [3]:
data = tf.keras.preprocessing.image_dataset_from_directory(
    directory='D:/Flatiron/X-Ray_pneumonia__phase_4/data',
    batch_size=10000,
    seed=72    
)

Found 5856 files belonging to 3 classes.


In [4]:
images, labels = next(iter(data))

In [5]:
images, labels = np.array(images), np.array(labels)

### Data Quality

---

## Data Preparation

The data as given already contained a train, test split; however, the validation set did not appear to have any viral pneumonia photos in the validation set. The validation set was also very small, containing only sixteen total photos. In order to accurately test on unseen data, we combined the training, testing, and validation datasets and then conducted a train, test split to recreate each data set. We chose to do a split of 80% train, 10% test, and 10% validation as it appears to be a generally acceptable proportion for a train test split. This new split allows for more meaningful evaluation when trying to accurately predict bacterial pneumonia, viral pneumonia, and non-pneumonia.

In [6]:
train_images, test_images, train_labels, test_labels = train_test_split(
    images,
    labels,
    random_state=42,
    test_size=585
)

In [7]:
train_images, val_images, train_labels, val_labels = train_test_split(
    train_images,
    train_labels,
    random_state=42,
    test_size=585
)

In [8]:
train_images.shape

(4686, 256, 256, 3)

In [9]:
test_images.shape

(585, 256, 256, 3)

In [10]:
val_images.shape

(585, 256, 256, 3)

With our new train,test split, it appears that bacterial pneumonia is the dominate class, at roughly 47.5%, followed by the non-pneumonia class at roughly 26.5%, and finally the viral pneumonia class at roughly 26%. This data does not appear to be unbalanced and, therefore, should not need any upsampling. Since no upsampling is needed, the next step is to normalize the image matrices. Since our images are 256x256 pixels, we divide the entries by 255 to normalize. This process makes each entry's value, or pixel, between 0 and 1. 

In [11]:
pd.DataFrame(train_labels).value_counts(normalize=True)

1    0.470124
0    0.269953
2    0.259923
dtype: float64

In [12]:
train_images, test_images, val_images = train_images/255, test_images/255, val_images/255

Finally, since we are evaluating a multiclass classification problem, we need to transform our labels with `OneHotEncoder`. This process splits the labels binary columns that state which class a given image is. Following this process, the data is ready for modelling.

In [13]:
from sklearn.preprocessing import OneHotEncoder

ohe = OneHotEncoder()

train_labels_encoded = ohe.fit_transform(train_labels.reshape(-1, 1)).toarray()

test_labels_encoded = ohe.fit_transform(test_labels.reshape(-1, 1)).toarray()

val_labels_encoded = ohe.fit_transform(val_labels.reshape(-1, 1)).toarray()

---

## Data Modeling

Before starting the modelling, a function titled `evaluate` was created to visualize the results of each model. This function takes the model and the results and prints out the following visualizations: loss for the training and validation set, accuracy for the training and validation set, a confusion matrix for the multiclass problem, the accuracy for each class in the multiclass, a confusion matrix for the binary problem, and the accuracy for each class in the binary. The function is detailed below.

In [14]:
def evaluate(model, results, final=False):
    
    #Create a function that provides useful vis for model
    #performance. This is especially useful as we are most
    #concerned with the number of false negatives
    
    if final:
        val_label="test"
    else:
        val_label="validation"
        

    #Extracts metrics from the results of the model (model fitting)
    
    train_loss = results.history['loss']
    val_loss = results.history['val_loss']
    train_accuracy = results.history['accuracy']
    val_accuracy = results.history['val_accuracy']

    #Setting up the plots
    
    fig, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(3, 2, figsize=(20, 10))

    # Plotting loss
    ax1.set_title("Loss")
    sns.lineplot(x=results.epoch, y=train_loss, ax=ax1, label="train")
    sns.lineplot(x=results.epoch, y=val_loss, ax=ax1, label=val_label)
    ax1.legend()

    # Plotting accuracy
    
    ax2.set_title("Accuracy")
    sns.lineplot(x=results.epoch, y=train_accuracy, ax=ax2, label="train")
    sns.lineplot(x=results.epoch, y=val_accuracy, ax=ax2, label=val_label)
    ax2.legend()
    
    #Uses the model to make predictions and creates a confusion
    #matrix for the multiclass
    
    y_pred = model.predict(test_images)
    cm = confusion_matrix(test_labels, np.argmax(y_pred, axis=1))
    cm_df = pd.DataFrame(cm)
    
    #Plotting the multiclass confusion matrix

    sns.heatmap(cm, ax=ax3, annot=True, cmap='Blues', fmt='0.5g')
    
    #Setting up the barplot showing the accuracy of each class
    #This involves creating labels and heights for the plot
    #The heights are determined from the values in the confusion
    #matrix

    label = ['Healthy Accuracy',
             'Bacterial Accuracy',
             'Viral Accuracy']

    height = [(cm_df[0][0]/sum(cm_df[0]))*100,
              (cm_df[1][1]/sum(cm_df[1]))*100,
              (cm_df[2][2]/sum(cm_df[2]))*100]
    
    #Plotting the class accuracy

    ax4.bar(x=label, height=height)
    plt.sca(ax4)
    xlocs, xlabs = plt.xticks()
    plt.ylim(top=100)
    plt.ylabel('Accuracy Percentage')
    plt.title('Model Accuracy')
    for i, j in enumerate(height):
        ax4.text(xlocs[i],
                 j-30,
                 ((str(round(j,1)))+'%'),
                 ha ='center',
                 bbox = dict(facecolor = 'white', alpha = .5))
        
    #Using the previous confusion matrix to create a binary
    #Confusion matrix
        
    cm_simple = [[cm_df[0][0], cm_df[1][0]+cm_df[2][0]],
                 [cm_df[0][1]+cm_df[0][2], cm_df[1][1]+cm_df[1][2]+cm_df[2][1]+cm_df[2][2]]]
    cm_simple_df = pd.DataFrame(cm_simple)
    
    #Plotting the binary confusion matrix
    
    sns.heatmap(cm_simple, ax=ax5, annot=True, cmap='Blues', fmt='0.5g')
    
    #Setting up the barplot showing the accuracy of each class
    #This involves creating labels and heights for the plot
    #The heights are determined from the values in the confusion
    #matrix
    
    simple_label = ['Healthy\n Accuracy',
                    'Pneumonia\n Accuracy']
    
    simple_height = [(cm_simple_df[0][0]/sum(cm_simple_df[0]))*100,
                     (cm_simple_df[1][1]/sum(cm_simple_df[1]))*100]
    
    #Plotting the class accuracy
    
    ax6.bar(x=simple_label, height=simple_height)
    plt.sca(ax6)
    xlocs, xlabs = plt.xticks()
    plt.ylim(top=100)
    plt.ylabel('Accuracy Percentage')
    plt.title('Model Accuracy')
    for k, l in enumerate(simple_height):
        ax6.text(xlocs[k],
                 l-30,
                 ((str(round(l,1)))+'%'),
                 ha ='center',
                 bbox = dict(facecolor = 'white', alpha = .5))

#### Baseline Model (Iteration One)

For the baseline model, a sequential model was chosen. The parameters chosen were chosen in order to prevent the model from having too many nodes. The input layer is a 2D convolutional layer with a 2x2 convolutional window and a dimensionality of 64. The activation used is relu. Padding was added in order to accomodate more layers in the future. The first hidden layer is a 2D max pooling layer which downsamples the input. The next hidden layers are a 2D convolutional layer with a dimensionality of 32 followed by a 2D max pooling layer. These layers are repeated once more except the following 2D convolutional layer has a dimensionality of 16. A flattening layer is then added, followed by a dense layer with dimensionality of 500. This will likely cause overfitting; however, since this is a baseline model, we can tune this in future iterations. The output layer is a dense layer with dimensionality 3. The filter was set to 3 as there are 3 category outputs for the model.

In [15]:
baseline = models.Sequential()

baseline.add(layers.Conv2D(filters=64,
                           kernel_size=2,
                           padding="same",
                           activation="relu",
                           input_shape=(256,256,3)))

baseline.add(layers.MaxPooling2D(pool_size=2))

baseline.add(layers.Conv2D(filters=32,
                           kernel_size=2,
                           padding="same",
                           activation ="relu"))
baseline.add(layers.MaxPooling2D(pool_size=2))
baseline.add(layers.Conv2D(filters=16,
                           kernel_size=2,
                           padding="same",
                           activation="relu"))
baseline.add(layers.MaxPooling2D(pool_size=2))

baseline.add(layers.Flatten())

baseline.add(layers.Dense(500,activation="relu"))

baseline.add(layers.Dense(3,activation="softmax"))

Following model creating, the model was compiled using `adam` as the optimizer and `categorical_crossentropy` as the loss metric. `categorical_crossentropy` was chosen as `train_labels` is not a sparse matrix, but rather, a Numpy array. `accuracy` was chosen for the metric as it most closely fits our evaluation metric. The confusion matrices will be the best indicator of model performance as we are looking to minimize the false negatives. 

In [16]:
baseline.compile(optimizer='adam',
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])

results = baseline.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=10,
    batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10

KeyboardInterrupt: 

The baseline model completed with an accuracy of 88.3% on the training data and 81.2% on the validation data. This model is obviously overfit which is clearly shown in the top two visualizations below. The model also does a decent job of predicting each type of pneumonia, 76.2% accuracy when predicting bacterial pneumonia and 63.2% predicting viral pneumonia. It appears that there are only 18 false negatives in this model, meaning the false negative rate is roughly 10.6%. This is a good start for the baseline model but could be reduced.

In [None]:
evaluate(baseline, results)

#### Second Model Iteration

For the second model iteration, the layers before the flattening layer were not changed. Instead of one dense layer following flattening, two dense layers with dimensionality 128 and 64 replace the layer. This should help to reduce overfitting. The model was compiled and fit in the same way as the `baseline`.

In [None]:
model2 = models.Sequential()

model2.add(layers.Conv2D(filters=64,
                         kernel_size=2,
                         padding="same",
                         activation="relu",
                         input_shape=(256,256,3)))

model2.add(layers.MaxPooling2D(pool_size=2))

model2.add(layers.Conv2D(32,
                         3,
                         padding="same",
                         activation ="relu"))
model2.add(layers.MaxPooling2D(pool_size=2))
model2.add(layers.Conv2D(16,
                         3,
                         padding="same",
                         activation="relu"))
model2.add(layers.MaxPooling2D(pool_size=2))

model2.add(layers.Flatten())

model2.add(layers.Dense(128,activation="relu"))
model2.add(layers.Dense(64,activation="relu"))
model2.add(layers.Dense(3,activation="softmax"))

In [None]:
model2.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = model2.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=10,
    batch_size=128)

While the second model does appear to reduce overfitting and more accurately predict bacterial and viral pneumonia, it has a higher false negative rate than our baseline model, 11.9% versus 10.6%. Since we are trying to minimize the false negatives, this model overall appears to perform worse than our baseline. However, the changes to the dense layers appears to have reduced overfitting. Hence, these layers will be kept going forward.

In [None]:
evaluate(model2, results)

#### Third Model Iteration

The third model adds a few layers and parameters. First, L2 regularization is added to the initial hidden 2D convolutional layers with a regularization factor of 0.05. Then 2D convolutional layers are added between the previous 2D convolutional layers and the max pooling layers. These have the same set-up as their previous layers with dimensionality 16 and 8 respectively. Also, in order to further reduce overfitting, the two previous dense layers after flattening were replaced with three dense layers of dimensionality 64, 32, and 16. The model was compiled and fit in the same way as the `baseline`.

In [None]:
model3 = models.Sequential()

model3.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

model3.add(layers.MaxPooling2D(pool_size=2))

model3.add(layers.Conv2D(filters=32,
                        kernel_size=2,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu"))
model3.add(layers.MaxPooling2D(pool_size=2))
model3.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu"))
model3.add(layers.MaxPooling2D(pool_size=2))

model3.add(layers.Flatten())

model3.add(layers.Dense(64,activation="relu"))
model3.add(layers.Dense(32,activation="relu"))
model3.add(layers.Dense(16,activation="relu"))
model3.add(layers.Dense(3,activation="softmax"))

In [None]:
model3.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = model3.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=10,
    batch_size=128)

As seen below, this model greatly reduced overfitting and had a false negative rate of 7.5% which is a vast improvement from the previous models. This model does not appear to be overfit and also has a decent accuracy at predicting baterial and viral pneumonia, 75.7% and 62.8% respectively.

In [None]:
evaluate(model3, results)

#### Third Model (Second Iteration)

In order to try to improve this model, regularization was added to the two remaining hidden 2D convolutional layers and two 25% dropout layers were added. The model was compiled and fit in the same way as the `baseline`.

In [None]:
model3b = models.Sequential()

model3b.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

model3b.add(layers.MaxPooling2D(pool_size=2))

model3b.add(layers.Conv2D(filters=32,
                        kernel_size=2,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3b.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3b.add(layers.MaxPooling2D(pool_size=2))

model3b.add(layers.Dropout(0.25))

model3b.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3b.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model3b.add(layers.MaxPooling2D(pool_size=2))

model3b.add(layers.Dropout(0.25))

model3b.add(layers.Flatten())

model3b.add(layers.Dense(64,activation="relu"))
model3b.add(layers.Dense(32,activation="relu"))
model3b.add(layers.Dense(16,activation="relu"))
model3b.add(layers.Dense(3,activation="softmax"))

In [None]:
model3b.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = model3b.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=10,
    batch_size=128)

Unfortunately, this model did not perform as well. The model appears to be somewhat underfit and does not reduce the false negative rate, 10.6% versus 7.5% for the previous model.

In [None]:
evaluate(model3b, results)

#### Fourth Model

This model adds a 25% dropout layer and a dense layer with dimensionality of 128 to the first iteration of the third model. The model was compiled and fit in the same way as the `baseline`.

In [None]:
model4 = models.Sequential()

model4.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

model4.add(layers.MaxPooling2D(pool_size=2))

model4.add(layers.Conv2D(filters=32,
                        kernel_size=3,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu"))
model4.add(layers.MaxPooling2D(pool_size=2))

model4.add(layers.Dropout(0.25))

model4.add(layers.Conv2D(filters=16,
                        kernel_size=3,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu"))
model4.add(layers.MaxPooling2D(pool_size=2))

model4.add(layers.Flatten())

model4.add(layers.Dense(128,activation="relu"))
model4.add(layers.Dense(64,activation="relu"))
model4.add(layers.Dense(32,activation="relu"))
model4.add(layers.Dense(16,activation="relu"))
model4.add(layers.Dense(3,activation="softmax"))

In [None]:
model4.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = model4.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=10,
    batch_size=128)

This model improved the accuracy for predicting pneumonia at the cost of a higher false negative rate. Overall, with false negative as our desired metric, this is our worst model (18.1% false negative rate). This model appears to be slightly underfit.

In [None]:
evaluate(model4, results)

#### Fourth Model (Second Iteration)

In order to try to capitalize on the increased accuracy in bacterial and viral pneumonia predictions, we will try to iterate on this model to reduce the false negative rate. Since the previous model appears to be slightly underfit, a early stop was defined in order to reduce the loss and get a better fit. This model was fit with 50 epochs and will early stop when noticable improvements are not made for 5 iterations.

In [None]:
model4b = models.Sequential()

model4b.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

model4b.add(layers.MaxPooling2D(pool_size=2))

model4b.add(layers.Conv2D(filters=32,
                        kernel_size=3,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4b.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu"))
model4b.add(layers.MaxPooling2D(pool_size=2))

model4b.add(layers.Dropout(0.25))

model4b.add(layers.Conv2D(filters=16,
                        kernel_size=3,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4b.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu"))
model4b.add(layers.MaxPooling2D(pool_size=2))

model4b.add(layers.Flatten())

model4b.add(layers.Dense(128,activation="relu"))
model4b.add(layers.Dense(64,activation="relu"))
model4b.add(layers.Dense(32,activation="relu"))
model4b.add(layers.Dense(16,activation="relu"))
model4b.add(layers.Dense(3,activation="softmax"))

In [None]:
model4b.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
early_stop = EarlyStopping(monitor='val_loss',
                           min_delta=1e-8,
                           verbose=1,
                           mode='min',
                           patience=5)

In [None]:
results = model4b.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=50,
    batch_size=128,
    callbacks=[early_stop]
)

This model performed better than the first iteration of model 4; however, the false negative rate is still not ideal at 12.5%.

In [None]:
evaluate(model4b, results)

#### Fourth Model (Third Iteration)

This iteration of the model removes the dropout layer. This will be the final iteration of the fourth model in order to optimize. This model utilizes the early stop from the second iteration and the same fitting as `model4b`.

In [None]:
model4c = models.Sequential()

model4c.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

model4c.add(layers.MaxPooling2D(pool_size=2))

model4c.add(layers.Conv2D(filters=32,
                        kernel_size=3,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4c.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu"))
model4c.add(layers.MaxPooling2D(pool_size=2))

model4c.add(layers.Conv2D(filters=16,
                        kernel_size=3,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
model4c.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu"))
model4c.add(layers.MaxPooling2D(pool_size=2))

model4c.add(layers.Flatten())

model4c.add(layers.Dense(128,activation="relu"))
model4c.add(layers.Dense(64,activation="relu"))
model4c.add(layers.Dense(32,activation="relu"))
model4c.add(layers.Dense(16,activation="relu"))
model4c.add(layers.Dense(3,activation="softmax"))

In [None]:
model4c.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = model4c.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=50,
    batch_size=128,
    callbacks=[early_stop]
)

While this model does improve from the previous iterations, it appears to be slightly overfit and has a false negative rate of 11.9% despite having an accuracy of 79.8% when predicting bacterial pneumonia.

In [None]:
evaluate(model4c, results)

#### Final Model

The base third model performed the best and; therefor, was chosen for the final model. Initially, the model was run with early stopping; however, after evaluating those results, it appeared to be optimal with epochs set as 15. This is value that the final model was fit with.

In [None]:
final_model = models.Sequential()

final_model.add(layers.Conv2D(filters=64,
                        kernel_size=3,
                        activation="relu",
                        input_shape=(256,256,3)))

final_model.add(layers.MaxPooling2D(pool_size=2))

final_model.add(layers.Conv2D(filters=32,
                        kernel_size=2,
                        padding="same",
                        activation ="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
final_model.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation ="relu"))
final_model.add(layers.MaxPooling2D(pool_size=2))
final_model.add(layers.Conv2D(filters=16,
                        kernel_size=2,
                        padding="same",
                        activation="relu",
                        kernel_regularizer=regularizers.L2(l=0.05)))
final_model.add(layers.Conv2D(filters=8,
                        kernel_size=2,
                        padding="same",
                        activation="relu"))
final_model.add(layers.MaxPooling2D(pool_size=2))

final_model.add(layers.Flatten())

final_model.add(layers.Dense(64,activation="relu"))
final_model.add(layers.Dense(32,activation="relu"))
final_model.add(layers.Dense(16,activation="relu"))
final_model.add(layers.Dense(3,activation="softmax"))

In [None]:
final_model.compile(optimizer='adam',
              loss='categorical_crossentropy',
             metrics=['accuracy'])

In [None]:
results = final_model.fit(
    train_images, 
    train_labels_encoded,
    validation_data=(val_images, val_labels_encoded),
    epochs=15,
    batch_size=128
)

The final iteration of the model performed very well when predicting bacterial pneumonia at an accuracy of 79.9%. It also has the lowest rate for false negatives at just 7.4%. This model does not appear to be overfit and also appears to do a good job of minimizing the loss as seen below.

In [None]:
evaluate(final_model, results, final=True)