# Introduction

Currently, one of the most common (and accurate) methods for conducting a Blood
Smear is manually. The goal of this project is to develop a neural network that can classify WBCs from images as part of an eventual effort to automate the procedure without a significant loss in accuracy. By automating this process, we can not only speed it up, but we also reduce the amount of human labor required to conduct a test, thus lowering the overall cost.

The dataset for this project is a collection of ~12,500 images that are 240 x 320. The images contain several RBCs and a single, highlighted WBC. Each WBC falls into one of four categories: Eosinophil, Lymphocyte, Monocyte, or Neutrophil. The dataset can be found on Kaggle [here](https://www.kaggle.com/paultimothymooney/blood-cells). Through accurate classification, accurate proportions of each WBC type could be calculated and checked for normalcy. Additionally, cell images could be further inspected for abnormalities.

# Module Imports

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
from PIL import Image
import os
import matplotlib.pyplot as plt
%matplotlib inline

import warnings  
with warnings.catch_warnings():  
    warnings.filterwarnings("ignore",category=FutureWarning)
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Conv2D, MaxPooling2D
    from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
    from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

# Data Exploration

Let's begin by taking a look at an example of each of the four types of WBC we'll be attempting to classify.

In [None]:
wbc_types = ['EOSINOPHIL', 'NEUTROPHIL', 'LYMPHOCYTE', 'MONOCYTE']
wbc_df = pd.DataFrame(columns=['file_name', 'type'])

plt.figure(figsize=(10,8))

for i, wbc_type in enumerate(wbc_types):
    path = 'dataset2-master/images/TRAIN/' + wbc_type + '/'
    files = os.listdir(path)
    
    df = pd.DataFrame(columns=['file_name', 'type'])
    df['file_name'] = files
    df.fillna(value=wbc_type, inplace=True)
    wbc_df = wbc_df.append(df)
    
    image = load_img(path + files[0])
    plt.subplot(2,2,i+1)
    plt.title(wbc_type)
    plt.axis('off')
    plt.imshow(image)
    
plt.tight_layout()
plt.show()

In [None]:
sns.countplot(x='type', data=wbc_df)
plt.show()

The images show that each of the four cell types are quite easily visually differentiable. Additionally, we can see from the countplot that our training data is very well balanced with ~2,500 images of each WBC cell type.

# Model Construction

Since the input to be classified are image files, we will be using a Convolutional Neural Network for these purposes. 

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(60,80,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(4))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

In [None]:
batch_size = 128

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        'dataset2-master/images/TRAIN',  # this is the target directory
        target_size=(60, 80),  # all images will be resized to 150x150
        batch_size=batch_size,
        class_mode='categorical')  # since we use binary_crossentropy loss, we need binary labels

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        'dataset2-master/images/TEST',
        target_size=(60, 80),
        batch_size=batch_size,
        class_mode='categorical')

In [None]:
model.fit_generator(
        train_generator,
        steps_per_epoch=9957 // 32,
        epochs=30,
        validation_data=validation_generator,
        validation_steps=2487 // 32)
model.save_weights('first_try.h5')  # always save your weights after training or during training