# Pneumonia Classifier
This notebook we will use CNN to train a classifier that will predict Pneumonia using x-ray's as input.
Use the database from: https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia  

Load the data:

In [1]:
from PIL import Image
import numpy as np
import operator
import os

max_size = 127

def loadData(path,label):
    count = 0
    Xresult,Yresult,shapes_result,debug = [],[],[],[]
    for filename in os.listdir(path):
        if filename.endswith(".jpeg"):   
            read_img = np.array(Image.open(path+"/"+filename).convert("L").resize((max_size,max_size),Image.ANTIALIAS))/255.
            Xresult.append(read_img)
            Yresult.append(label)
            shapes_result.append(read_img.shape)
            debug.append(filename)
            count += 1
    return np.array(Xresult),Yresult,shapes_result,debug

shapes,debug = [],[]
NORMAL = 0
PNEUMONIA = 1

Xnormal,Ynormal,shapes_normal,debug_normal = loadData('chest_xray/train/NORMAL',NORMAL)
Xpneumonia,Ypneumonia,shapes_pneumonia,debug_pneumonia = loadData('chest_xray/train/PNEUMONIA',PNEUMONIA)
Xtrain = np.vstack((Xnormal,Xpneumonia))
Ytrain = Ynormal + Ypneumonia
shapes = np.vstack((shapes_normal,shapes_pneumonia))
debug.extend(debug_normal)
debug.extend(debug_pneumonia)

Xnormal,Ynormal,shapes_normal,debug_normal = loadData('chest_xray/test/NORMAL',NORMAL)
Xpneumonia,Ypneumonia,shapes_pneumonia,debug_pneumonia = loadData('chest_xray/test/PNEUMONIA',PNEUMONIA)
Xtest = np.vstack((Xnormal,Xpneumonia))
Ytest = Ynormal + Ypneumonia
shapes = np.vstack((shapes,shapes_normal,shapes_pneumonia))
debug.extend(debug_normal)
debug.extend(debug_pneumonia)

Xnormal,Ynormal,shapes_normal,debug_normal = loadData('chest_xray/val/NORMAL',NORMAL)
Xpneumonia,Ypneumonia,shapes_pneumonia,debug_pneumonia = loadData('chest_xray/val/PNEUMONIA',PNEUMONIA)
Xval = np.vstack((Xnormal,Xpneumonia))
Yval = Ynormal + Ypneumonia
shapes = np.vstack((shapes,shapes_normal,shapes_pneumonia))
debug.extend(debug_normal)
debug.extend(debug_pneumonia)

print("Xtrain shape: ",Xtrain.shape)
print("Ytrain size: ",len(Ytrain))
print("Xtest shape: ",Xtest.shape)
print("Ytest size: ",len(Ytest))
print("Xval shape: ",Xval.shape)
print("Yval size: ",len(Yval))
print("input size: ",len(shapes))
print("another input size:(should be same) ",len(debug))

Xtrain shape:  (5216, 127, 127)
Ytrain size:  5216
Xtest shape:  (624, 127, 127)
Ytest size:  624
Xval shape:  (16, 127, 127)
Yval size:  16
input size:  5856
another input size:(should be same)  5856


Make new evaluation dataset. (Equal representation from train,test,val and equal normal vs pneumonia representation)
Suffle training and testing dataset. 
Make new testing dataset composed of train and test folder's. (Equal normal vs pneumonia representation)  
Resize input data to be compatable for conv2D input.

In [2]:
from sklearn.model_selection import train_test_split

def splitEqual(data,labels,size):
    Xresult = [np.zeros((len(data[0]),len(data[0])))]
    Yresult = []
    count_normal = 0
    count_pneumonia = 0
    i = 0
    while count_normal < size or count_pneumonia < size:
        if (labels[i] == NORMAL and count_normal < size) or (labels[i] == PNEUMONIA and count_pneumonia < size):
            if labels[i] == NORMAL:
                count_normal += 1
            else:
                count_pneumonia += 1
            Xresult = np.append(Xresult,[data[i]],axis = 0)
            Yresult = Yresult+[labels[i]]
            data = np.delete(data,i,axis=0)
            del labels[i]
        else:
            i += 1
    Xresult = np.delete(Xresult,0,axis=0)
    return (data,labels,Xresult,Yresult)

def countLabels(labels):
    count_normal = 0
    count_pne = 0
    for y in labels:
        if y == NORMAL:
            count_normal +=1
        else:
            count_pne += 1
    return (count_normal,count_pne)

Xtrain,Ytrain,Xval1,Yval1 = splitEqual(Xtrain,Ytrain,8)
Xtest,Ytest,Xval2,Yval2 = splitEqual(Xtest,Ytest,8)
Xval = np.append(Xval,Xval1,axis = 0)
Xval = np.append(Xval,Xval2,axis = 0)
Yval = Yval + Yval1 + Yval2

Xtrain1, Xtrain2, Ytrain1, Ytrain2 = train_test_split(Xtrain,Ytrain, test_size=0.5, shuffle=True, random_state=42)
Xtrain = np.vstack((Xtrain1,Xtrain2))
Ytrain = Ytrain1+Ytrain2

count_normal,count_pne = countLabels(Ytrain)
print("train set - normal: ",count_normal," ,pneumonia: ",count_pne)

count_normal,count_pne = countLabels(Ytest)
print("test set - normal: ",count_normal," ,pneumonia: ",count_pne)

count_normal,count_pne = countLabels(Yval)
print("val set - normal: ",count_normal," ,pneumonia: ",count_pne)

Xtrain = Xtrain.reshape(len(Xtrain),max_size,max_size,1)
Xtest = Xtest.reshape(len(Xtest),max_size,max_size,1)
Xval = Xval.reshape(len(Xval),max_size,max_size,1)
print("Xtrain shape: ",Xtrain.shape)
print("Ytrain shape: ",len(Ytrain))
print("Xtest shape: ",Xtest.shape)
print("Ytest shape: ",len(Ytest))
print("Xval shape: ",Xval.shape)
print("Yval shape: ",len(Yval))

train set - normal:  1333  ,pneumonia:  3867
train set - normal:  226  ,pneumonia:  382
train set - normal:  24  ,pneumonia:  24
Xtrain shape:  (5200, 127, 127, 1)
Ytrain shape:  5200
Xtest shape:  (608, 127, 127, 1)
Ytest shape:  608
Xval shape:  (48, 127, 127, 1)
Yval shape:  48


Build CNN model.

In [10]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.metrics import Precision,Recall
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten, InputLayer

model = Sequential([
    InputLayer((max_size, max_size,1)),
    Conv2D(filters=32, kernel_size=5,strides=2, activation='relu'),
    Dropout(0.2),
    Conv2D(filters=8,kernel_size=3,strides=1,padding="same", activation='relu'),
    Dropout(0.2),
    MaxPooling2D(pool_size=2,padding="same"),
    Conv2D(filters=16,kernel_size=3,strides=1,padding="same", activation='relu'),
    Dropout(0.2),
    MaxPooling2D(pool_size=2,padding="same"),
    Conv2D(filters=32,kernel_size=3,strides=1,padding="same", activation='relu'),
    MaxPooling2D(pool_size=2,padding="same"),
    Flatten(),
    Dense(128,activation='relu'),
    Dropout(0.5),
    Dense(32,activation='relu'),
    Dense(1, activation='sigmoid')
])
model.compile(loss="binary_crossentropy",optimizer='adam',metrics=["accuracy",Precision(name='precision'),Recall(name='recall')])
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_12 (Conv2D)           (None, 62, 62, 32)        832       
_________________________________________________________________
dropout_12 (Dropout)         (None, 62, 62, 32)        0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 62, 62, 8)         2312      
_________________________________________________________________
dropout_13 (Dropout)         (None, 62, 62, 8)         0         
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 31, 31, 8)         0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 31, 31, 16)        1168      
_________________________________________________________________
dropout_14 (Dropout)         (None, 31, 31, 16)       

Train the model.
Use evaluation dataset for validation_data.
Use class_weight to balance training dataset.

In [11]:
count_normal,count_pne = countLabels(Ytrain)
weight_for_0 = 2
weight_for_1 = (count_normal/count_pne)*2.0 
class_weight = {0: weight_for_0, 1: weight_for_1}

model.fit(Xtrain, np.array(Ytrain), epochs=25, validation_data=(Xval, np.array(Yval)),class_weight=class_weight)

{0: 2, 1: 0.6894233255753814}
Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x2be3acdddc0>

Evaluate the model with the testing dataset.

In [23]:
from sklearn.metrics import confusion_matrix

labels = [0,1]
model.evaluate(Xtest, np.array(Ytest))
confusion_matrix(Ytest,model.predict_classes(Xtest),labels=labels)



array([[155,  71],
       [ 13, 369]], dtype=int64)