<a href="https://colab.research.google.com/github/Martin09/DeepSEM/blob/master/segmentation-NMs/3_nm_seg_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3 - Model loading and NM size/yield analysis
In this notebook we will:
1. Load an image for analysis.
2. Load our previously-trained model.
3. Use model to label the SEM image.
4. Perform post-processing on model output to learn about our NM characteristics


Note: A GPU instance is not necessary for this notebook as we will only be performing inference which is not as computationally-expensive as training.

## Install detectron2
Again, we will be using Facebook's [detectron2](https://github.com/facebookresearch/detectron2) library to run the interence on our images to let's install it.

In [None]:
# install dependencies: (use cu101 because colab has CUDA 10.1)
!pip install -U torch==1.5 torchvision==0.6 -f https://download.pytorch.org/whl/cu101/torch_stable.html 
!pip install cython pyyaml==5.1
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
import torch, torchvision
print(torch.__version__, torch.cuda.is_available())
!gcc --version

In [None]:
# install detectron2:
!pip install detectron2==0.1.2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/index.html

In [None]:
# Setup detectron2 logger
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger('logs')

# import some common libraries
import numpy as np
import os, cv2, random, tifffile, json, datetime, time, urllib
from glob import glob
from google.colab.patches import cv2_imshow
from PIL import Image
from pathlib import Path

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import DatasetCatalog, MetadataCatalog

In [None]:
# Define the classes in our dataset
class_dict = {'slit': '1',
              'nanomembrane': '2',
              'parasitic': '3',
              'bottom_nucleus': '4',
              'side_nucleus': '5',
              'nanowire': '6',
              'overgrowth': '7'}

# Define some paths/constants that will be useful later
desired_mag = 50000  # Used to filter the TIFF input files

root = Path('./DeepSEM/segmentation-NMs/')
dataset_dir = root.joinpath('datasets')
output_dir = root.joinpath('output')
models_dir = root.joinpath('trained_models')

imgs_zip = dataset_dir.joinpath('Nick_NMs_allrawimgs.zip')
imgs_dir = dataset_dir.joinpath(imgs_zip.stem)
imgs_google_drive_id = '1M2_0GLScsNY53w8hU2xJdXtisESfkOqI'

test_dir = imgs_dir.joinpath('test')
train_dir = imgs_dir.joinpath('train')

dataset_root_name = 'nm_masks'
train_name = dataset_root_name + '_train'
test_name = dataset_root_name + '_test'

# model_path = models_dir.joinpath('nm_seg_it20k_loss0.028.yaml')
# weights_path = models_dir.joinpath('nm_seg_it20k_loss0.028.pth')

model_path = models_dir.joinpath('nm_seg_it19999_lossX.XXX.yaml')
weights_path = models_dir.joinpath('nm_seg_it19999_lossX.XXX.pth')

weights_google_drive_id = '1btMy-EyU2sTSSPQk8kf663DYgO-sD3kR'
# weights_google_drive_id = '1QDyirJCJlZwvuIKGfKRH0nCU1Dw_64G5'

## 3.1 - Unpack and load our images

In [None]:
# # Optional: Save everything to your own GoogleDrive
# from google.colab import drive
# drive.mount('/content/gdrive/')
# %cd "/content/gdrive/My Drive/path/to/save/location"

# Clone just the relevant folder from the DeepSEM repo
!rm -rf $root
!apt install subversion
!svn checkout $github_url $root

# # Alternative: Clone whole DeepSEM repository
# !rm -rf DeepSEM  # Remove folder if it already exists
# !git clone https://github.com/Martin09/DeepSEM

For simplicity, I will use our previous training images for inference. However these could be replaced with any similar un-labelled SEM images.

In [None]:
# Check if .zip file exists, if not, download it from Google Drive
if imgs_zip.exists():
  print('Dataset already exists. Skipping download!')
else:
  print('Dataset does not exist... Downloading!')
  !gdown --id $imgs_google_drive_id -O $imgs_zip

# Unzip raw dataset
!rm -rf $imgs_dir
!unzip -o $imgs_zip -d $imgs_dir

Now we will sort the input files which have many different magnifications into images that only have the desired magnification (50k in this case).

In [None]:
in_files = list(imgs_dir.rglob('*.tif'))

images = []
# Start to loop over all TIFF files
for file in in_files:
    # Open each file using the TiffFile library
    with tifffile.TiffFile(file) as tif:
        
        # Extract magnification data
        mag = tif.sem_metadata['ap_mag'][1] 
        if type(mag) is str:  # Apply correction for "k" ex: mag = "50 k"
            mag = float(mag.split(' ')[0]) * 1000
        else:
            mag = float(mag)

        # Only filter the images that have the magnification that we are interested in
        if not mag == desired_mag:
          continue

    images.append(file)

Load a random image and show it.

In [None]:
im_path = random.sample(images,1)[0]
im = cv2.imread(str(im_path), cv2.IMREAD_GRAYSCALE)
print(im.shape)
cv2_imshow(im)

Do some pre-processing to get it ready to feed into our model.

In [None]:
# Model expects an RGB image, so copy the greyscale data into other 2 channels
im_RGB = np.repeat(im[:, :, np.newaxis], 3, axis=2)
print(im_RGB.shape)
cv2_imshow(im_RGB)

## 3.2 - Load our model

Now we will load a trained model and use it to label the above image. First we load a default config with `get_cfg()` and we then overwrite some of its parameters with our saved YAML configuration file. 

One important point is that we need to have `cfg.MODEL.WEIGHTS` set to point to the weights file. As this file can be quite big (>300MB) and since Github isn't designed to host big binary files, I have saved the weights for this model on my Google Drive instead. However, if you have your weights saved locally (ex: on your Google Drive), you can skip this download.

In [None]:
# Check if .zip file exists, if not, download it from Google Drive
if raw_zip.exists():
  print('Dataset already exists. Skipping download!')
else:
  print('Dataset does not exist... Downloading!')
  !gdown --id $file_id -O $raw_zip

# Unzip raw dataset
!rm -rf $raw_dir
!unzip -o $raw_zip -d $raw_dir

Now we can go ahead with the rest of the configuration of the model.

In [None]:
cfg = get_cfg()
cfg.merge_from_file(model_path)
cfg.MODEL.WEIGHTS = str(weights_path)
cfg.MODEL.DEVICE = 'cpu'  # CPU is enough for inference, no need for GPU

# If we have a lot of objects to detect, need to set higher # of proposals here:
cfg.MODEL.RPN.POST_NMS_TOPK_TEST = 1000
cfg.MODEL.RPN.PRE_NMS_TOPK_TEST = 1000
cfg.TEST.DETECTIONS_PER_IMAGE = 200

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5   # Set the testing threshold for this model
cfg.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.2     # Non-max supression threshold
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(class_dict) # We have three classification classes 

# Setting allowed input sizes (avoid scaling)
cfg.INPUT.MIN_SIZE_TEST = 0
cfg.INPUT.MAX_SIZE_TEST = 99999


# A bit of a hacky way to be able to use the DefaultPredictor:
# Register a "fake" dataset to then set the 'thing_classes' metadata
# (there is probably a better way to do this...)
cfg.DATASETS.TEST = ('placeholder')
DatasetCatalog.clear()
DatasetCatalog.register("placeholder", lambda _: None)
MetadataCatalog.get("placeholder").set(thing_classes=list(class_dict))

In [None]:
predictor = DefaultPredictor(cfg)
outputs = predictor(im_RGB)
print('Number of detected objects = {}'.format(len(outputs["instances"])))

In [None]:
# Verify outputs manually
# outputs["instances"].pred_classes
# outputs["instances"].pred_boxes
# outputs["instances"].scores

In [None]:
# We can use Visualizer to draw the predictions on the image.
v = Visualizer(im_RGB[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TEST[0]), scale=1.5)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])

## 3.4 - Post-processing model output

However, just getting the output from the model isn't enough. Now we have to do bit more work to post-process the output and extract things like nanomembrane yield, sizes and other interesting data!

First lets divide up the output of the neural net for further processing:

In [None]:
cl = np.array(outputs["instances"].pred_classes.cpu())  # Classes
s = np.array(outputs["instances"].scores.cpu()) # Prediction scores
b =  np.array([x.numpy() for x in outputs["instances"].pred_boxes])  # Bounding boxes
c = np.array(outputs["instances"].pred_boxes.get_centers())  # Bounding box centres
m =  np.array([x.numpy() for x in outputs["instances"].pred_masks])  # Segmentation masks

Now we can loop over all the possible classes and display images with segmentation masks of each class individually.

In [None]:
for c in range(len(class_dict)):
  i_filt = list(np.argwhere(cl==c).flatten()) # Choose only the indixes with specific class

  print(f"{inv_class_dict[str(c+1)]}:")

  # We can use Visualizer to draw the predictions on the image.
  v = Visualizer(im_RGB[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TEST[0]), scale=1.0)
  v = v.draw_instance_predictions(outputs["instances"][[i_filt]].to("cpu"))
  cv2_imshow(v.get_image()[:, :, ::-1])


Now we can start to mess around dimensional analysis. But first let's extract the pixel size from the raw TIF image:

In [None]:
with tifffile.TiffFile(im_path) as tif:
    
    # Extract magnification data
    mag = tif.sem_metadata['ap_mag'][1] 
    if type(mag) is str:  # Apply correction for "k" ex: mag = "50 k"
        mag = float(mag.split(' ')[0]) * 1000
    else:
        mag = float(mag)

    # Extract pixel size data
    pixel_size = float(tif.sem_metadata['ap_pixel_size'][1])  # nm
    if 'µm' in tif.sem_metadata['ap_pixel_size'][2]: # Correction for um
        pixel_size *= 1000

    # Extract tilt data
    tilt = tif.sem_metadata['ap_tilt_angle'][1] # degrees
    # tilt = tif.sem_metadata['ap_stage_at_t'][1]  # might be equivalent, not sure

pixel_size_x = pixel_size  # nmd
pixel_size_y = pixel_size / np.cos(np.deg2rad(tilt))  # nm

Let's start with slit length/width.

In [None]:
i_filt = list(np.argwhere(cl+1==int(class_dict['slit'])).flatten()) # Choose only the indixes with specific class
b_slits = b[i_filt]

slit_widths = (b_slits[:,3] - b_slits[:,1]) * pixel_size_y
slit_lengths = (b_slits[:,2] - b_slits[:,0]) * pixel_size_x

print(f"Mean slit width: {slit_widths.mean():.0f} +/- {slit_widths.std():.0f} nm")
print(f"Mean slit length: {slit_lengths.mean():.0f} +/- {slit_lengths.std():.0f} nm")