# Red blood cell classifier : practical application of a convNet

With this example, you will build your own convNet in order to perform a specific task : classifying images of red blood cells (RBC) of either good or bad quality.

Data were kindly given by Viviana Claveria & Manouk Abkarian.

## I - Data importation

The data are RGB images saved in the png format on a google drive. All images are already sorted according to two classes :
- good RBC (1)
- bad RBC (0)


The first step is to load all the python packages we will use in the notebook:

In [None]:
import os
from glob import glob
import random
import sys
import warnings
import numpy as np
from tqdm import tqdm

from skimage.io import imread, imshow, imread_collection, concatenate_images
from skimage.transform import resize
from skimage.morphology import label

from tensorflow import keras
from keras.preprocessing.image import ImageDataGenerator
from keras import layers, models, optimizers

import matplotlib.pyplot as plt

Connect google drive to google collab and move to the folder containing the dataset

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

base_dir = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/'
os.chdir(base_dir)
%ls

Since all the images are different and do not have the same X,Y dimensions we will define a set of parameters to homogeneize the training/testing sets.

In [None]:
# Set the image size
# -----------------
IMG_WIDTH = 70
IMG_HEIGHT = 70
IMG_CHANNEL = 3

# Define the path where the data are saved
# ----------------------------------------

goodRBC_train_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/train/goodRBC/'
badRBC_train_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/train/badRBC/'

goodRBC_val_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/validation/goodRBC/'
badRBC_val_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/validation/badRBC/'

goodRBC_test_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/test/goodRBC/'
badRBC_test_PATH = '/content/gdrive/My Drive/Deep_learning_formation_MRI/Doc_JB_2021/Divers_part2/RBC_database/test/badRBC/'

The following method **get_data** is used to download the image and convert it to the right format.

In [None]:
def get_data(path):

  # get the total number of samples
  # -------------------------------

  ids = next(os.walk(path))[2]
  X = np.zeros((len(ids), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNEL), dtype=np.uint8)

  sys.stdout.flush()
  
  # select only the first n_im images
  # ---------------------------------

  for n, id_ in tqdm(enumerate(ids), total=len(ids)):
    path_new = path + id_

    # we'll be using skimage library for reading file and make sure all the images
    # have the same dimensions
    # -------------------------

    img = imread(path_new)
    img = resize(img, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)

    if len(img.shape) == 3:
      X[n] = img
    else:
      img = np.stack((img,)*3, axis=-1)
      X[n] = img

  return X

The training and testing set are defined below. Note that we are also building the ground truth accordingly.
**Note that it takes ~20-30min to load the data.**

In [None]:
Good_RBC = get_data(goodRBC_train_PATH)
y_good = np.ones((Good_RBC.shape[0],))
Bad_RBC = get_data(badRBC_train_PATH)
y_bad = np.zeros((Bad_RBC.shape[0],))

X_train = np.concatenate((Good_RBC,Bad_RBC), axis=0)
Y_train = np.concatenate((y_good,y_bad), axis=0)


Good_RBC_test = get_data(goodRBC_test_PATH)
y_good = np.ones((Good_RBC_test.shape[0],))
Bad_RBC_test = get_data(badRBC_test_PATH)
y_bad = np.zeros((Bad_RBC_test.shape[0],))

X_test = np.concatenate((Good_RBC_test,Bad_RBC_test), axis=0)
Y_test = np.concatenate((y_good,y_bad), axis=0)


Return the composition of the dataset

In [None]:
print('The train dataset is composed of {} images belonging to the "goodRBC" class and {} images belonging to the "badRBC" class'.format(Good_RBC.shape[0],Bad_RBC.shape[0]))
print('The test dataset is composed of {} images belonging to the "goodRBC" class and {} images belonging to the "badRBC" class'.format(Good_RBC_test.shape[0],Bad_RBC_test.shape[0]))

And finally make sure the data are normalized : 

In [None]:
X_train = X_train/255
X_test = X_test/255

## II- Data vizualization :

Display a few examples of images belonging to the "GoodRBC" class

In [None]:
plt.rcParams['figure.figsize'] = (5,5) # Make the figures a bit bigger

for i in range(9):
    plt.subplot(3,3,i+1)
    num = random.randint(0, len(Good_RBC))
    im = Good_RBC[num]
    plt.imshow(im)
    
plt.tight_layout()

And the same for the "BadRBC" class

In [None]:
plt.rcParams['figure.figsize'] = (5,5) # Make the figures a bit bigger

for i in range(9):
    plt.subplot(3,3,i+1)
    num = random.randint(0, len(Bad_RBC))
    im = Bad_RBC[num]
    plt.imshow(im)
    
plt.tight_layout()

## III- Definition of the model and training

Define the model and the compilation options. 


In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(70,70,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer = 'adam', 
            loss='binary_crossentropy',
            metrics=['accuracy'])

model.summary()

Define the model and the compilation options. 

In [None]:
history = model.fit(X_train, Y_train,
                    batch_size = 32,
                    epochs = 25,
                    validation_data=(X_test, Y_test),
                    shuffle = True)

Display the loss function during the training


In [None]:
history_dict = history.history

loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']

n = len(loss_values)
epochs = range(1, n+1)

plt.plot(epochs, loss_values, 'bo', label='Training loss')
plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

The accuracy of the model is tested using the testing set of data.

In [None]:
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']

n = len(acc_values)
epochs = range(1, n+1)

plt.plot(epochs, acc_values, 'bo', label='Training acccuracy')
plt.plot(epochs, val_acc_values, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('accuracy')
plt.legend()

plt.show()

## IV- Transfer learning

In the following section, we will see how transfer learning can be used to improve the performances of a classifier. The idea is to  used a previously trained network (such as VGG16) that was already trained on thousands of images and able to recognize hundred of thousands of different features on images.

By adding new layers at the end of the pre-trained network, we can use the features recognition property of this network and applied it to a completely new problem. 

The first step is to load the pre-trained VGG16 network. There are many different available model in keras (https://keras.io/api/applications/).

In [None]:
from keras.applications import vgg16
conv_base = vgg16.VGG16(weights='imagenet',include_top = False,input_shape=(50,50,3))

And then we will build our model around the VGG16 :

In [None]:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

conv_base.trainable = False
model.summary()

We need to define which part of the network will be trained. In our case, only the last convolution block of the VGG and the densely connected part will be trained :

In [None]:
conv_base.trainable = False

for layer in conv_base.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if layer.name == 'block5_conv2':
        set_trainable = True
    if layer.name == 'block5_conv3':
        set_trainable = True 
            
model.compile(optimizer = 'adam', 
            loss='binary_crossentropy',
            metrics=['accuracy'])

model.summary()

And finally train the new model and save it.

In [None]:
history = model.fit(X_train, Y_train,
                    batch_size = 64,
                    epochs = 25,
                    validation_data=(X_test, Y_test),
                    shuffle = True)


# Save the model
# --------------

model.save('RBC_classification_VGG16_1.h5')