# 1. Business Understanding

“Misdiagnosis of pneumonia will delay treatment and can result in long-term disability or death... pneumonia is misdiagnosed at an alarming rate, even among patients who are readmitted to the hospital after suffering from pneumonia in the recent past...One of the most effective ways to diagnose pneumonia is by chest x-ray. However, a chest x-ray in a person with pneumonia does not always have the characteristic “infiltrate” if it is early in the course of the illness. Sometimes, the infiltrate may be in a portion of the lung that is not easily seen by standard x-ray, and other patients may have congestive heart failure or scarring in their lungs, which can mimic pneumonia.” **This analysis aims to prove that if X-ray images are colorized, the rate of misdiagnosis (and complications associated) will decrease.**

Source: https://thistlelaw.com/do-you-have-a-case-for-the-misdiagnosis-of-pneumonia/



# 2. Data Understanding

# 3. Data Preparation

upload dataset, add descriptors, and caption, upload libraries

## Address Class Imbalance

In [1]:
import warnings
warnings.filterwarnings('ignore')
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import numpy as np
import os
import matplotlib.pyplot as plt
%matplotlib inline
import keras
from keras import layers
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_predict
from keras import regularizers
from keras.wrappers.scikit_learn import KerasRegressor
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import regularizers
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import itertools
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import VGG16
from keras.optimizers import Adam, Adadelta, Adagrad, SGD, RMSprop

Using TensorFlow backend.

Bad key "text.kerning_factor" on line 4 in
/Applications/anaconda3/envs/learn-env/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
https://github.com/matplotlib/matplotlib/blob/v3.1.3/matplotlibrc.template
or from the matplotlib source distribution


In [40]:
# Get the Directory Path of the Train, Validation, and Test Images
train_dir = 'chest_xray/chest_xray/train/'
val_dir = 'chest_xray/chest_xray/val/'
test_dir = 'chest_xray/chest_xray/test/'

In [3]:
# Establish the criteria for data augementation
datagen = ImageDataGenerator(
            rotation_range = 0,
            width_shift_range = 0,
            height_shift_range = 0,
            rescale = None,
            shear_range = 0,
            zoom_range = 0,
            horizontal_flip = False,
            fill_mode = 'nearest')

### Increase Normal Images to 5600

In [4]:
# Select a random normal image to duplicate. Put the image path into load_img
norm_img = load_img('chest_xray/chest_xray/train/NORMAL/IM-0140-0001.jpeg')

In [5]:
# Get the size of the image
norm_img.getpixel

<bound method Image.getpixel of <PIL.Image.Image image mode=RGB size=1156x1237 at 0x7F92A7C70A58>>

In [6]:
# Convert the image to an array
norm_img_array = img_to_array(norm_img)

In [7]:
norm_img_array.shape

(1237, 1156, 3)

In [9]:
norm_img_array = norm_img_array.reshape((1,) + norm_img_array.shape)
norm_img_array.shape

(1, 1237, 1156, 3)

In [10]:
# Run a Test to Gauge Photo Quality
testing = 'test_dir/'

In [11]:
# Add 10 'test' images to test_dir to verify image quality
count = 0
for batch in datagen.flow(norm_img_array, batch_size=1, save_to_dir=testing, save_prefix='IM', save_format='jpeg'):
    count +=1
    if count == 10:
        break
        
print('10 images have been genrated at', testing)
    

10 images have been genrated at test_dir/


In [29]:
# List directory where images will be stored
norm_dir = 'chest_xray/chest_xray/train/NORMAL/'

In [30]:
# Add 3660 Images to Normal
count = 0
for batch in datagen.flow(norm_img_array, batch_size=60, save_to_dir=norm_dir, save_prefix='IM', save_format='jpeg'):
    count +=1
    if count == 3660:
        break
print('3660 images have been generated at', norm_dir)
# Only 4407 Total

3660 images have been generated at chest_xray/chest_xray/train/NORMAL/


In [37]:
count = 0
for batch in datagen.flow(norm_img_array, batch_size=20, save_to_dir=norm_dir, save_prefix='IM', save_format='jpeg'):
    count +=1
    if count == 120:
        break
print('120 images have been generated at', norm_dir)

120 images have been generated at chest_xray/chest_xray/train/NORMAL/


### Pneumonia Images to 5600

In [16]:
# Select a random pneumonia image to duplicate
pneum_img = load_img('chest_xray/chest_xray/train/PNEUMONIA/person23_bacteria_92.jpeg')

In [17]:
pneum_img.getpixel

<bound method Image.getpixel of <PIL.Image.Image image mode=RGB size=1080x712 at 0x7F92A029D6A0>>

In [24]:
pneum_img_array = img_to_array(pneum_img)

In [25]:
pneum_img_array.shape

(712, 1080, 3)

In [26]:
# Reshape the array to a (1 x n) array
pneum_img_array = pneum_img_array.reshape((1,) + pneum_img_array.shape)
pneum_img_array.shape

(1, 712, 1080, 3)

In [31]:
# List directory where images will be stored
pneum_dir = 'chest_xray/chest_xray/train/PNEUMONIA'

In [32]:
count = 0
for batch in datagen.flow(pneum_img_array, batch_size=2, save_to_dir=pneum_dir, save_prefix='IM', save_format='jpeg'):
    count +=1
    if count == 1126:
        break
print('1126 additional images have been generated at', pneum_dir)
    

1126 additional images have been generated at chest_xray/chest_xray/train/PNEUMONIA


In [35]:
count = 0
for batch in datagen.flow(pneum_img_array, batch_size=100, save_to_dir=pneum_dir, save_prefix='IM', save_format='jpeg'):
    count +=1
    if count == 800:
        break
print('800 additional images have been generated at', pneum_dir)
    

800 additional images have been generated at chest_xray/chest_xray/train/PNEUMONIA


## Preprocessing

### Read in & Normalize Images

In [42]:
# Resize all 10,200 images in the train directory
train_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(train_dir,
                                                     target_size=(96,96), batch_size=11200, color_mode='grayscale')

Found 11198 images belonging to 2 classes.


In [43]:
val_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(val_dir,
                                                   target_size=(96,96), batch_size=16, color_mode='grayscale')


Found 16 images belonging to 2 classes.


In [44]:
test_generator = ImageDataGenerator(rescale=1./255).flow_from_directory(test_dir,
                                                   target_size=(96,96), batch_size=624, color_mode='grayscale')



Found 624 images belonging to 2 classes.


# 4. Modeling
Start with Greyscale, goal accuracy 80%