# Assignment: Amoeba Classification

This is the amoeba classification assignment. The students are encouraged to fill out the code block in "Build and train the model" and "Evaluate the model" parts by understanding the code in "Example: Clothes classification".

Here, we use the images that were collected in our research lab to train our own custom model and classify the images if they contain an amoeba or not. 



## Table of content

* Load images dataset
* Data preparation
* Build and train the model (blank in here)
* Evaluate the model (blank in here)
* Inference

# Load images dataset

The images dataset is loaded and we will use it to train our custom model. All of the images were collected in our research lab.  

In [None]:
# upload zip file of the dataset from local
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

!unzip dataset-amoebaClassification.zip

In [None]:
# download dataset from github

# %%shell
# git clone https://github.com/BaosenZ/amoeba-detection.git


# Data preparation

In this step, we will prepare the data, including spliting it into a training, validation and test datasets, and will also normalize the datasets. Here we provide one method to prepare the dataset. More ways can be found here: https://keras.io/examples/vision/image_classification_from_scratch/.

In [None]:
import os
import numpy as np
from tqdm import tqdm
from glob import glob
from PIL import Image
import tensorflow as tf
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras

# training data preparation

# define image size, it can be modified 
img_size = 299
# training images dataset path
train_path = 'dataset-amoebaClassification/train' 
nub_train = len(glob(train_path + '/*/*.jpg'))
# Create empty array, fill out the image array to newly-created array. 
X_train = np.zeros((nub_train,img_size,img_size,3),dtype=np.uint8) 
y_train = np.zeros((nub_train,),dtype=np.uint8)

i = 0
for img_path in tqdm(glob(train_path + '/*/*.jpg')):
    # print(img_path)

    img = Image.open(img_path)
    # image resize
    img = img.resize((img_size,img_size)) 
    # images are converted to array
    arr = np.asarray(img)
    # assign array
    X_train[i, :, :, :] = arr
    
    if img_path.split('/')[-2] == 'amoeba':
        # Set amoeba class as 0
        y_train[i] = 0
    else:
        # Set no amoeba class as 1
        y_train[i] = 1
        
    i += 1

In [None]:
# validation data preparation

# define image size, it can be modified 
img_size = 299
# validation images dataset path
validation_path = 'dataset-amoebaClassification/validation' 
nub_validation = len(glob(validation_path + '/*/*.jpg'))
# Creat empty array, fill out the image array to newly-created array. 
X_validation = np.zeros((nub_validation,img_size,img_size,3),dtype=np.uint8) 
y_validation = np.zeros((nub_validation,),dtype=np.uint8)

i = 0
for img_path in tqdm(glob(validation_path + '/*/*.jpg')):
    # print(img_path)

    img = Image.open(img_path)
    # image resize
    img = img.resize((img_size,img_size)) 
    # images are converted to array
    arr = np.asarray(img)
    # assign array
    X_validation[i, :, :, :] = arr
    
    if img_path.split('/')[-2] == 'amoeba':
        # Set cat class as 0
        y_validation[i] = 0
    else:
        # Set dog class as 1
        y_validation[i] = 1
        
    i += 1

In [None]:
# test data preparation

img_size = 299
test_path = 'dataset-amoebaClassification/test'
nub_test = len(glob(test_path + '/*/*.jpg'))

X_test = np.zeros((nub_test,img_size,img_size,3),dtype=np.uint8) 
y_test = np.zeros((nub_test,),dtype=np.uint8)

i = 0
for img_path in tqdm(glob(test_path + '/*/*.jpg')):
    # print(img_path)

    img = Image.open(img_path)
    img = img.resize((img_size,img_size))
    arr = np.asarray(img)
    X_test[i, :, :, :] = arr
          
    if img_path.split('/')[-2] == 'amoeba':
        # Set cat class as 0
        y_test[i] = 0
    else:
        # Set dog class as 1
        y_test[i] = 1
        
    i += 1

In [None]:
# Visualize the training dataset
fig,axes = plt.subplots(3,4,figsize=(20, 20))

j = 0
for i,img in enumerate(X_train[:12]):
    axes[i//4,j%4].imshow(img)
    j+=1

In [None]:
# normalize the dataset
X_mean = X_train.mean(axis=0, keepdims=True)
X_std = X_train.std(axis=0, keepdims=True) + 1e-7
X_train_norm = (X_train - X_mean) / X_std
X_validation_norm = (X_validation - X_mean) / X_std
X_test_norm = (X_test - X_mean) / X_std

X_train_norm = X_train_norm[..., np.newaxis]
X_validation_norm = X_validation_norm[..., np.newaxis]
X_test_norm = X_test_norm[..., np.newaxis]

# Build and train the model (blank in here)

Simple convolutional neural network (CNN) is used to train the model. 

In [None]:
# Build the model (sequential CNN model)
from functools import partial

DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, activation='relu', padding="SAME")

model = keras.models.Sequential([                           
    DefaultConv2D(filters=64, kernel_size=7, input_shape=[299, 299, 3]),
    keras.layers.MaxPooling2D(pool_size=2),
    DefaultConv2D(filters=128),
    DefaultConv2D(filters=128),
    keras.layers.MaxPooling2D(pool_size=2),
    DefaultConv2D(filters=256),
    DefaultConv2D(filters=256),
    keras.layers.MaxPooling2D(pool_size=2),
    keras.layers.Flatten(),
    keras.layers.Dense(units=128, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=64, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=32, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=16, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(units=2, activation='softmax'),
])

In [None]:
# Compile the model
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"])

Fill out the blank here to start training.

In [None]:
# Train the model
history = model. ( , , epochs=10, validation_data=(X_validation_norm, y_validation))

# Evaluate the model (blank in here)

In [None]:
# visualize the model structure with model.summary(). Feel free to comment out the code below to visualize the model structure

# model.summary()

The test dataset is not used for training and validation, which means the images are new to the trained model. We will use this dataset to evaluate the performance of the model. The performance is acceptable because the accuracy for the test dataset is nearly the same as the accuracy for the train and validation datasets.

fill out the blank here to finish performance evaluation

In [None]:
# Using test dataset to evaluate loss and accuracy for trained model
results = model.evaluate( , , batch_size=128)

At the end of epochs, the accuary for the training and validation datasets should be close in value. This is an easy way to determine if there is overfitting or not. 

In [None]:
# plot accuracy vs epoch
plt.plot(history.history['accuracy'],'r')
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left') 
plt.ylim([0, 1.1])
plt.show()

# Inference

We will visualize the image and use our own judgement to see if there is an amoeba in the image or not, and then compare the outcome to the model's prediction. 

In [None]:
from keras.applications.imagenet_utils import decode_predictions
import matplotlib.pyplot as plt
from keras.preprocessing import image
import numpy as np

# Visualize one image, X_test[x]. Here we choose X_test[1]. You can choose any of the images among all test dataset
img1 = X_test[1]
plt.imshow(img1)


We will predict the image above with model.predict() function to see if it matches our judgement. 

In [None]:
# class label
class_label = ['amoeba exist', 'no amoeba exist']

# image process
x = np.squeeze(X_test_norm[1])
x = image.img_to_array(x)
x = np.expand_dims(x, axis=0)

# predict the image with model.predict()
y_prob = model.predict(x)
print("probality for each of the catogaries: ", y_prob)
y_class = y_prob.argmax(axis=-1)
print("model predict: ", class_label[y_class[0]])