# Assignment: Amoeba Detection

This is the assignment called Amoeba Detection. The students are encouraged to fill out the code block in **"Train custom amoeba model"** and **"Evaluate amoeba model"** parts by understanding the code in "Example: kangaroo detection".

Here, we use the images that are collected from our research lab to train our own custom model to detect the position and number of amoeba in the image.

## Table of content


* Set up enviroment
* Load images dataset
* Data preparation and configurations
* Train custom amoeba model (blank in here)
* Evaluate amoeba model (blank in here)
* Inference

# Set up environment

We will install Mask R-CNN repo from Github (https://github.com/matterport/Mask_RCNN) and necessary packages. We need to make sure the packages we are using is the correct version.

In [None]:
!pip install tensorflow==1.15.0
!pip install keras==2.1.6
!pip install h5py==2.10.0
!pip install scikit-image==0.16.2

In [None]:
%%shell
# clone Mask_RCNN repo and install packages
git clone https://github.com/matterport/Mask_RCNN
cd Mask_RCNN
python setup.py install

In [None]:
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt

# Root directory of the project
ROOT_DIR = os.path.abspath("./Mask_RCNN/")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize

# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # find local version
import coco

%matplotlib inline 

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

# Load images dataset

We will clone the project from our github (https://github.com/BaosenZ/amoeba-detection.git). The dataset is included inside the github project.

In [None]:
# download dataset from github

%%shell
git clone https://github.com/BaosenZ/amoeba-detection.git


In [None]:
# copy the dataset from github folder to 'content' 
!cp -r '/content/amoeba-detection/dataset-section2/amoebaDataset/trainingDataset' '/content'
!cp -r '/content/amoeba-detection/dataset-section2/amoebaDataset/testDataset' '/content'

In [None]:
# upload zip of the dataset from local

# from google.colab import files
# uploaded = files.upload()
# for fn in uploaded.keys():
#     print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

# !unzip trainingDataset.zip
# !unzip testDataset.zip

# Data preparation and configurations

In [None]:
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.config import Config
from mrcnn.model import MaskRCNN

class AmoebaDataset(Dataset):
	def load_dataset(self, dataset_dir, is_train=True):
		# Add classes
		self.add_class("dataset", 1, "amoeba")
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		for filename in listdir(images_dir):
			image_id = filename[:-4]
			# based on images number, split the training and validation data
			if is_train and int(image_id) >= 170:
				continue
			if not is_train and int(image_id) < 170:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		tree = ElementTree.parse(filename)
		root = tree.getroot()
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		info = self.image_info[image_id]
		path = info['annotation']
		boxes, w, h = self.extract_boxes(path)
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('amoeba'))
		return masks, asarray(class_ids, dtype='int32')

	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']


# prepare train set
train_set = AmoebaDataset()
train_set.load_dataset('trainingDataset', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# prepare val set
val_set = AmoebaDataset()
val_set.load_dataset('trainingDataset', is_train=False)
val_set.prepare()
print('Val: %d' % len(val_set.image_ids))



Config files allow you to separate the code from the parameters of the machine learning pipeline to help produce repeatable outcomes.

In [None]:
# prepare config
class AmoebaConfig(Config):
	# Give the configuration a recognizable name
	NAME = "amoeba_cfg"
	# Number of classes (including background)
	NUM_CLASSES = 1 + 1 # background + 1 amoeba
	# Use a small epoch since the data is simple
	STEPS_PER_EPOCH = 131
	DETECTION_NMS_THRESHOLD = 0.5

config = AmoebaConfig()
config.display()

# Train custom amoeba model (blank in here)

In machine learning, to improve something you often need to be able to measure it. TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. More information about Tensorboard is available here (https://www.tensorflow.org/tensorboard/get_started).

In [None]:
# run tensorboard to visualize training
import keras
import os
root_logdir = os.path.join(os.curdir, "my_logs")
def get_run_logdir():
    import time
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
print(run_logdir)
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

The Mask R-CNN structure is described in the paper(He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).) and also in my ppt.

Train in two stages:
1. Only the heads. Here we're freezing all the backbone layers and training only the randomly initialized layers (i.e. the ones that we didn't use pre-trained weights from MS COCO). To train only the head layers, pass `layers='heads'` to the `train()` function.

2. Fine-tune all layers. For this simple example it's not necessary, but we're including it to show the process. Simply pass `layers="all` to train all layers.

### Fill out the blank in 'train the model' block

In [None]:
# train the model
model = MaskRCNN(mode='training', model_dir='./', config=config)
model.load_weights('Mask_RCNN/mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])
model.train(, , learning_rate=0.002, epochs=10, layers='', custom_callbacks=[tensorboard_cb]) # blank in here

# Evaluate amoeba model (blank in here)
 


Evaluate the mask rcnn model on the training, validation and test amoeba dataset

In [None]:
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from numpy import mean
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.utils import Dataset
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

class PredictionConfig(Config):
	NAME = "amoeba_cfg"
	NUM_CLASSES = 1 + 1
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		scaled_image = mold_image(image, cfg)
		sample = expand_dims(scaled_image, 0)
		yhat = model.detect(sample, verbose=0)
		r = yhat[0]
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		APs.append(AP)
	mAP = mean(APs)
	return mAP


In [None]:
# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

# !!!load model weights
# Get path to saved weights
# Either set a specific path or find last trained weights
# model_path = os.path.join(ROOT_DIR, ".h5 file name here")
# find last trained weights: 
model_path = model.find_last()
model.load_weights(model_path, by_name=True)

# set a specific path
#model.load_weights('amoeba_cfg20210206T1746/mask_rcnn_amoeba_cfg_0007.h5', by_name=True)  # change the weights path to run the code

### Fill out the blank in 'eval the model' block

In [None]:
# load the train dataset
train_set = AmoebaDataset()
train_set.load_dataset('trainingDataset', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# load the val dataset
val_set = AmoebaDataset()
val_set.load_dataset('trainingDataset', is_train=False)
val_set.prepare()
print('Val: %d' % len(val_set.image_ids))
# load the test dataset
test_set = AmoebaDataset()
test_set.load_dataset('testDataset')
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))


# evaluate model on training dataset
train_mAP = evaluate_model( , model, cfg) # blank here
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
val_mAP = evaluate_model( , model, cfg) # blank here
print("Val mAP: %.3f" % val_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model( , model, cfg) # blank here
print("Test mAP: %.3f" % test_mAP)

In [None]:
# visualize training in Tensorboard

# The tensorboard file name can be found in 'my_logs'. Then change the tensorboard file name.

# %load_ext tensorboard
# %tensorboard --logdir ./my_logs/run_2022_06_20-22_07_58

# Inference


## Drop inference

The concentration of amoebae in a sample can be determined by imaging an entire drop of a particular volume of the sample and counting the number of amoebae present there. The models described above were used to analyze the images of the entire drop and count the number of bounding boxes within all images. 

There are 6 drops examples. Students can change the file location to do inference on each of the drop. 

We will detect the images in testDataset folder by calling model.detect().

In [None]:
from matplotlib import pyplot
from matplotlib.patches import Rectangle

  
def count1_amoeba(image, model, cfg):
  image = np.asanyarray(image)
  scaled_image = mold_image(image, cfg)
  sample = expand_dims(image, 0)
  yhat = model.detect(sample, verbose=0)[0]
  
  count=0
  for box,confidence in zip(yhat['rois'],yhat['scores']):
    if confidence >=0.96:
      count = count +1
  print("The number of amoeba is ", count)
  return count

def save_predicted(image, model, cfg, filename):
  image = np.asanyarray(image)
  scaled_image = mold_image(image, cfg)
  sample = expand_dims(image, 0)
  yhat = model.detect(sample, verbose=0)[0]
  pyplot.imshow(image)
  ax = pyplot.gca()
  pyplot.axis("off")
  for box,confidence in zip(yhat['rois'],yhat['scores']):
    if confidence >= 0.96:
      y1, x1, y2, x2 = box
      width, height = x2 - x1, y2 - y1
      rect = Rectangle((x1, y1), width, height, fill=False, color='red')
      pyplot.text(x1,y1,confidence)
      ax.add_patch(rect)

  pyplot.savefig(filename,bbox_inches='tight',pad_inches=0.0)
  pyplot.show()


In [None]:
# create folder to save inference images
from PIL import Image
if not os.path.exists("drop1-inference"):
  os.mkdir("drop1-inference")

# find the path of original inference images
images_dir = "amoeba-detection/dataset-level2/dropInference/drop1/drop1"
save_root = "drop1-inference/"
# run the inference and count the amoeba
t=0
for img in listdir(images_dir):
  img_path = images_dir + "/" + img
  image = Image.open(img_path)
  image = np.asanyarray(image)
  save_file = save_root + img
  save_predicted(image, model, cfg, save_file)
  t=t + count1_amoeba(image,model,cfg)
  print(t)

In [None]:
# save the inference folder to local

# !zip -r drop1-inference.zip drop1-inference
# from google.colab import files
# files.download("drop1-inference.zip")

## Test dataset inference

In [None]:

# create folder to save inference images
from PIL import Image
if not os.path.exists("testDataset-pred"):
  os.mkdir("testDataset-pred")

# find the path of original inference images
images_dir = "amoeba-detection/dataset-level2/amoebaDataset/testDataset/images"
save_root = "testDataset-pred/"
# run the inference and count the amoeba
t=0
for img in listdir(images_dir):
  img_path = images_dir + "/" + img
  image = Image.open(img_path)
  image = np.asanyarray(image)
  save_file = save_root + img
  save_predicted(image, model, cfg, save_file)
  t=t + count1_amoeba(image,model,cfg)
  print(t)

In [None]:
# save the inference folder to local

# !zip -r testDataset-pred.zip testDataset-pred
# from google.colab import files
# files.download("testDataset-pred.zip")