# Description

>    Introduction: The main objective of this notebook is to classify the White Blood Cell (WBC) components.There are 5 main components. They are: Eoisinophil, Lymphocyte, Monocyte, Neutrophil and Basophil. Here we have classified only among the 4 classes except Basophil due to very small number of it's data. Here we have used "rmsprop" optimizer and the output layer is of 4 nodes as it is to classify into 4 classes and the "softmax" activation function is used.


> Import Libraries:  The required library functions has been imported. We are using keras model with Tensorflow backend. The sklearn library is imported for generating the confusion matrix.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Dropout, Flatten, BatchNormalization, Conv2D, MaxPooling2D, Input
from tensorflow.keras.optimizers import Adam
from keras.models import load_model
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
import itertools
import os
import shutil
from keras import backend as K
import random
import glob
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
%matplotlib inline

> Declaration: At first the image size is declared in which we will train our model and then the path to our dataset is declared. In each of the train, validation and test folders there are 4 sub-folders with the names of the classes and in those sub-folders the data resides. Total number of training sample is 9957, validation sample is 1887 and test sample is 600. 

In [None]:
img_width, img_height = 120, 160

train_data_dir = '../input/main-dataset/main_dataset/train'
validation_data_dir = '../input/main-dataset/main_dataset/validation'
test_data_dir = '../input/main-dataset/main_dataset/test'
nb_train_samples = 9957
nb_validation_samples = 1887
epochs = 30
batch_size = 32
#regularizer = tf.keras.regularizers.l2(0.01,)

if K.image_data_format() == 'channels_first':
  input_shape = (3, img_width, img_height)
else:
  input_shape = (img_width, img_height, 3)

> Data Generator: We have used the ImageDataGenerator function to augment our dataset. This is done so that at every epoch the model faces a different version of the same data. It really increases the probability of learning features accurately. You csn check more about ImageDataGenerator here [ImageDataGenerator](https://keras.io/api/preprocessing/image/). We have only rescaled the test generator because we don't need to augment our validation(data set unseen to the model which is used to validate the model or to check how well it will perform on real world data.) or test set.

In [None]:
train_datagen = ImageDataGenerator(
    rescale = 1./255,
    shear_range = 0.3,
    zoom_range = 0.2,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip = True)

test_datagen = ImageDataGenerator(rescale = 1./255)

> Batches of data: Using the data generator now the train, validation and test data batches is created on which we will train our model, validate our model and find out the test accuracy respectively. Class mode is used "categorical" because this is a categorical classification. Here the flow from directory method is used which will extract the data from the directory we defined in our Declarion section. You can learn more about it from [flow_from_directory](https://keras.io/api/preprocessing/image/)

In [None]:
train_batches = train_datagen.flow_from_directory(
    train_data_dir,
    target_size = (img_width, img_height),
    batch_size = batch_size,
    #classes=['EOISINOPHIL', 'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL'],
    class_mode = 'categorical')

In [None]:
valid_batches = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size = (img_width, img_height),
    batch_size = batch_size,
    #color_mode = 'grayscale',
    class_mode = 'categorical')

In [None]:
test_batches = test_datagen.flow_from_directory(
    test_data_dir,
    target_size = (img_width, img_height),
    batch_size = 600,
    #color_mode = 'grayscale',
    class_mode = 'categorical')

> Plotting Image and Labels: This is just a function used to plot the images along with there labels just to demonstrate hoe it looks. Nothing to modify here. You can directly copy and paste it

In [None]:
def plots(ims, figsize=(12,6), rows=None, interp=False, titles=None):
    if type(ims[0]) is np.ndarray:
        ims = np.array(ims).astype(np.uint8)
        if (ims.shape[-1] != 3):
            ims = ims.transpose((0,2,3,1))
    f = plt.figure(figsize=figsize)
    cols = len(ims)//rows if len(ims) % 2 == 0 else len(ims)//rows +1
    for i in range(len(ims)):
        sp = f.add_subplot(rows, cols, i+1)
        sp.axis('Off')
        if titles is not None:
            sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i], interpolation=None if interp else 'none')

> Extracting labels: In this section the images and their labels are extracted. The next function takes the train_batches as input and stores the image and labels into "imgs" and "labels" variable respectively. It takes a number of samples equal to the batch_size declared in the train_batches. "*This is just to demostrate,nothing to do with the training*"

In [None]:
imgs, labels = next(train_batches)

> Don't be frustrated if the image and label output is black or overlapped. It is because the batch_size is huge and here all those images can't be shown in a organized way.

In [None]:
#Eoisinophil=8[1.0.0.0],Lymphocyte=4[0.1.0.0],monocyte=2[0.0.1.0],neutrophil=1[0.0.0.1]-->Labels
plots(imgs, rows=4, titles=labels)

# Model

> Here our model is created. We have used Sequential model. We have used some Convolutinal layer followed by normalization maxpooling and dropout. Initially the input shape is neede to be declared explicitly but then the layers take input of size whatever it's previous layer's output size is.

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape, activation='relu', padding='same'))
#model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))

In [None]:
#model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
#model.add(MaxPooling2D(pool_size=(2, 2)))

In [None]:
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

In [None]:
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

In [None]:
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
#model.add(BatchNormalization())
#model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

In [None]:
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.4))
model.add(Dense(4))
model.add(Activation('softmax'))

> Compilation: We compiled the model using rmsprop optimizer. The metrics is something on which the perfomance is measured and in this case it is the accuracy

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Training:

> The fit generator function is used here on the model. First the name is specified where the training datset is, then validation_data holds the valid_batches on which model validation is performed. Then a callback function which mainly monitors the training and stops the training when the monitoring variable doesn't do well and also has a patience value which means how long it will wait until stops the training. Then the model is saved in the specified directory.

In [None]:
%%time
h = model.fit_generator(
    train_batches,
    steps_per_epoch = nb_train_samples // batch_size,
    epochs = epochs,
    validation_data = valid_batches,
    validation_steps = nb_validation_samples // batch_size,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=5),
        tf.keras.callbacks.ModelCheckpoint(filepath = '/kaggle/working/model_{val_accuracy:.3f}.h5', save_best_only=True,
                                          save_weights_only=False, monitor='val_accuracy')
    ])

> Don't be frustrated if the image or labels output is black or overlapped. It is because the batch_size is huge and here all those images can't be shown in a organized way.

In [None]:
test_imgs, test_labels = next(test_batches)
plots(test_imgs, rows=10, titles=test_labels)

> This is done because in the upcoming section it will be neede because the confusion matrix takes input as single value(like: 1 or 2 ). 

In [None]:
rounded_labels = np.argmax(test_labels, axis=-1)

> Load Model: Here the best performing model is loaded.

In [None]:
test_model = load_model('./model_0.887.h5')

# Prediction:

> Prediction is done using the best performing model on the test set

In [None]:
predictions = test_model.predict_generator(test_batches, steps=1, verbose=0)

In [None]:
predictions

> This is done because in the upcoming section it will be neede because the confusion matrix takes input as single value(like: 1 or 2 ). 

In [None]:
rounded_prediction = np.argmax(predictions, axis=-1)

In [None]:
for i in rounded_prediction:
    print(i)

# Confusion matrix:

>In the confusin_matrix function there are two parameters. One is the true labels that has been loaded previously and the predicted labels.

In [None]:
cm = confusion_matrix(y_true=rounded_labels, y_pred=rounded_prediction)

> This function is just copied and you can use it directly without any modification

In [None]:
def plot_confusion_matrix(cm, classes,
                        normalize=False,
                        title='Confusion matrix',
                        cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
            horizontalalignment="center",
            color="black" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

> It's the confusion matrix which illustrates how well your model performs on test data.

In [None]:
cm_plot_labels = ['EOISINOPHIL', 'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL']
plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title='confusion_matrix')

# Accuracy Curve

In [None]:
accs = h.history['accuracy']
val_accs = h.history['val_accuracy']

plt.plot(range(len(accs)),accs, label = 'Training_accuracy')
plt.plot(range(len(accs)),val_accs, label = 'Validation_accuracy')
plt.legend()
plt.show()

# Loss Curve

In [None]:
accs = h.history['loss']
val_accs = h.history['val_loss']

plt.plot(range(len(accs)),accs, label = 'Training_loss')
plt.plot(range(len(accs)),val_accs, label = 'Validation_loss')
plt.legend()
plt.show()