<h4>Image classification with Convolutional Neural Networks</h4>

**Download the blood cell dataset**
Go to https://www.kaggle.com/datasets/paultimothymooney/blood-cells and download the dataset. The link includes a description explaining the context and content of the dataset.

Inside the downloaded folder you have two directories: dataset-master and dataset2-master. Inside dataset2-master you have another folder called dataset2-master which contains the images and labels inside. We will ignore the labels file for now.

To load a file that is located in a different folder we need to give the path to that file. From the root of the dataset the relative path is 'dataset2-master/dataset2-master/labels.csv'. If we are one folder above then the relative path is 'blood-cell-dataset/dataset2-master/dataset2-master/labels.csv'. Giving the relative path in string format might be problematic, since the directory separator in windows is different than in mac or linux (\ vs /). To avoid problems we will use the python built-in library `os`, which stands for operating system. We will use two different `os` methods:

In [1]:
import os

path_to_train = os.path.join('blood-cell-dataset','dataset2-master', 'dataset2-master', 'images', 'TRAIN')
print(path_to_train)

blood-cell-dataset/dataset2-master/dataset2-master/images/TRAIN


We can use `os.listdir` to list all the files inside a folder

In [2]:
ls

00_introduction_to_numpy.ipynb          04_convolutional-neural-networks.ipynb
01_ensemble_methods.ipynb               PCA.png
02_NN-from-scratch.ipynb                [34mdata[m[m/
03_MNIST-classification.ipynb


In [3]:
cell_type = os.path.join(path_to_train, 'EOSINOPHIL')
for image_path in os.listdir(cell_type):
    print(image_path)
    break # we break because we would get too many images

FileNotFoundError: [Errno 2] No such file or directory: 'blood-cell-dataset/dataset2-master/dataset2-master/images/TRAIN/EOSINOPHIL'

Inside the images directory you have 3 directories: TEST, TEST_SIMPLE and TRAIN. We will ignore the TEST_SIMPLE for now and use the images in the TEST and TRAIN directories. Inside each of the directories you have new directories with the cell types: EOSINOPHIL, LYMPHOCYTE, MONOCYTE, NEUTROPHIL. We will use python image library (PIL) to load the images. To load an image we use im = Image.open(image_path), which we can then concert to a numpy array with np.array(im).

Using the lists `splits` and `cell_types`, create four lists: One with the images of the training set (X_train), in which each element will be a (HxWx3) array, one with the labels of the training set (y_train) with the cell type names, and the same for X_test and y_test

In [4]:
import numpy as np
from PIL import Image

splits = ['TRAIN', 'TEST']
cell_types = ['EOSINOPHIL', 'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL']

path_to_images = os.path.join('blood-cell-dataset','dataset2-master', 'dataset2-master', 'images') # write this path

X_train, X_test, y_train, y_test = [], [], [], []

for split in splits:
    split_path = os.path.join(path_to_images, split)
    for cell_type in cell_types:
        cell_type_path = os.path.join(split_path, cell_type)
        for image in os.listdir(cell_type_path):
            if image.endswith('.jpeg'):
                image_path = os.path.join(cell_type_path, image)
                im = np.array(Image.open(image_path))
                if split == 'TRAIN':
                    X_train.append(im)
                    y_train.append(cell_type)
                if split == 'TEST':
                    X_test.append(im)
                    y_test.append(cell_type)

FileNotFoundError: [Errno 2] No such file or directory: 'blood-cell-dataset/dataset2-master/dataset2-master/images/TRAIN/EOSINOPHIL'

Stack all the images in a single numpy array. The shape of the array must be: (n_observations, height, width, n_channels). Use np.stack() to stack the images along the first dimension.

In [5]:
im.shape

(240, 320, 3)

In [6]:
X_train = np.stack(X_train).astype('float32')

In [7]:
# X_test = np.stack(X_test).astype('float32')

In [8]:
# your code goes here

Normalize the images such that their maximum value is 1. Visualize the values of your images (max, min values) if you have any doubts about how to normalize them

In [9]:
# your code goes here

After this step, your X_train and X_test dimensions should be: (n_observations, height, width, n_channels). The y_train and y_test should be a vector with (n_observations,) shape. Check the shape of your Xs and ys to make sure that you loaded the data correctly

Use matplotlib's `.imshow` method to visualize the first image in the training set.

In [10]:
import matplotlib.pyplot as plt

# your code goes here

Apply a sepia filter to the first image of the training set and visualize it. Remember to convert it to np.unit8() before plotting it. Use np.dot to multiply the image with the transpose of the sepia filter.

In [11]:
sepia_filter = np.array([[.393, .769, .189],
                         [.349, .686, .168],
                         [.272, .534, .131]])

# your code goes here

Select one of the channels (R, G, B) and apply an edge detection filter. Using scipy's `convolve2d` function. Read the documentation https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html if you have doubts about how to use it.

In [12]:
from scipy.signal import convolve2d

# your code goes here

<h4> Neural Network </h4>

Build a convolutional neural network in keras for image classification. Your task is to classify the red blood cell images in different cell types. Use the X_train, y_train, X_test and y_test that you loaded before. There is flexibility in the structure, loss function and hyperparameters of your network. Make sure to monitor the accuracy as you train so use metrics=['accuracy'] when you compile your model.

In [13]:
from keras.models import Sequential
import keras.layers as layers
from keras.layers import Dense, Conv2D

# your code goes here
model = Sequential()

model.add(Conv2D(kernel_size=(5,5),
                ))




Visualize the structure of your model with `model.summary()`:

In [14]:
# your code goes here

Train the network and monitor the training process. With the history that you stored, use the following function to plot the training curve:

In [15]:
def plot_loss_curves(history):
    loss = history.history['loss']
    val_loss = history.history['val_loss']

    accuracy = history.history['accuracy']
    val_accuracy = history.history['val_accuracy']

    epochs = range(len(history.history['loss']))

    plt.plot(epochs, loss, label='training_loss')
    plt.plot(epochs, val_loss, label='val_loss')
    plt.title('loss')
    plt.xlabel('Epochs')
    plt.legend()

    plt.figure()
    plt.plot(epochs, accuracy, label='training_accuracy')
    plt.plot(epochs, val_accuracy, label='val_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend()
    
# your code goes here

Evaluate your network. Compute the accuracy and plot some of your predictions and results. Use `print` and matplotlib's `show` function

In [16]:
# your code goes here

Choose different parameters and hyperparameters and retrain your network. Monitor the results and see if you can improve your predictions by tuning the parameters

In [17]:
# your code goes here