<a href="https://colab.research.google.com/github/Martin09/DeepSEM/blob/master/segmentation-NMs/1_nm_seg_image_prep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NM Segmentation 1: Image Preparation
In this notebook we will:
1. Import our raw SEM images.
2. Filter these based on magnification.
3. Export desired images as PNGs.
3. Upload images for labelling to a new [Labelbox](https://labelbox.com/) project.

In [0]:
# A few useful imports
from matplotlib import pyplot as plt
import shutil, os, random, cv2, tifffile, collections
from pathlib import Path
from google.colab.patches import cv2_imshow

In [0]:
# Define some paths that will be useful later
root = Path('./DeepSEM/segmentation-NMs/')
dataset_dir = root.joinpath('datasets')
github_url = 'https://github.com/Martin09/' + str(root).replace('/','/trunk/')

raw_zip = dataset_dir.joinpath('Nick_NMs_allrawimgs.zip')
raw_dir = dataset_dir.joinpath(raw_zip.stem)
out_dir = dataset_dir.joinpath('Nick_NMs_50kmag_png')
upload_dir = out_dir  # We will upload the output imgs to LabelBox at the end

## 1.1 - Download the dataset

In [0]:
# # Optional: Save everything to your own GoogleDrive
# from google.colab import drive
# drive.mount('/content/gdrive/')
# %cd "/content/gdrive/My Drive/path/to/save/location"

# Clone just the relevant folder from the DeepSEM repo
!rm -rf $root
!apt install subversion
!svn checkout $github_url $root

# # Alternative: Clone whole DeepSEM repository
# !rm -rf DeepSEM  # Remove folder if it already exists
# !git clone https://github.com/Martin09/DeepSEM

Now we will download ALL the SEM images for the analysis. Since this is a big .zip file, I have to host it on my Google Drive (Github won't let me). Here we will download and unzip the raw SEM images.

In [0]:
# Check if .zip file exists, if not, download it from Google Drive
file_id = '1M2_0GLScsNY53w8hU2xJdXtisESfkOqI'
if raw_zip.exists():
  print('Dataset already exists. Skipping download!')
else:
  print('Dataset does not exist... Downloading!')
  !gdown --id $file_id -O $raw_zip

# Unzip raw dataset
!rm -rf $raw_dir
!unzip -o $raw_zip -d $raw_dir

In [0]:
# Create a list of all TIFF files in the SEM image dataset
in_files = list(raw_dir.rglob('*.tif'))
print("Loaded {:.0f} SEM images.".format(len(in_files)))

In [0]:
# Sample and display a few of them to see what they look like
for img_file in random.choices(in_files, k=3):
    print(str(img_file) + ":")
    im = cv2.imread(str(img_file), cv2.IMREAD_GRAYSCALE)
    cv2_imshow(im)

Notice the SEM images are of many different magnifications. Should filter these down to a single magnification before building our training set.

## 1.2 - Filtering images by magnification

First, let's just extract the magnification, pixel size and tilt of each image using a handy TIFF library to extract the metadata.

In [0]:
mags = []  # Empty list to save magnifications

# Start to loop over all TIFF files
for file in in_files:
    # Open each file using the TiffFile library
    with tifffile.TiffFile(file) as tif:
        
        # Extract magnification data
        mag = tif.sem_metadata['ap_mag'][1] 
        if type(mag) is str:  # Apply correction for "k" ex: mag = "50 k"
            mag = float(mag.split(' ')[0]) * 1000
        else:
            mag = float(mag)

        # Extract pixel size data
        pixel_size = float(tif.sem_metadata['ap_pixel_size'][1])  # nm
        if 'µm' in tif.sem_metadata['ap_pixel_size'][2]: # Correction for um
            pixel_size *= 1000

        # Extract tilt data
        tilt = tif.sem_metadata['ap_tilt_angle'][1] # degrees
#         tilt = tif.sem_metadata['ap_stage_at_t'][1]  # might be equivalent, not sure

    print(f'mag = {mag:6.0f}x, \tpixel_size = {pixel_size:4.0f} nm, \ttilt = {tilt:2.0f}°')
    mags.append(mag)

Above, we have printed the raw aquisition information of each image. In this format it is a bit difficult to visualize so let's put this into a histogram.

In [0]:
_ = plt.hist(mags,bins=20)

More explicitly, we can also count the number of images for each magnification.

In [0]:
dict(collections.Counter(mags))

It looks like 50k magnification would be the best for accurate segmentation as its the highest resolution for which we have a decent number of images. We have 20 images at this magnification so let's filter these out!

In [0]:
desired_mag = 50000  # Magnification to filter the images by

filt_imgs = []  # Declare empty list to collect the filtered 50k mag images
for (img_file, img_mag) in zip(in_files, mags):
  if int(img_mag) == desired_mag:
    filt_imgs.append(img_file)

print(f'Found {len(filt_imgs):.0f} images with mag of {desired_mag}!') 

Let's look at a few of the filtered images to make sure our script worked as expected.

In [0]:
# Sample and display a few of them to see what they look like
for img_file in random.choices(filt_imgs, k=3):
    print(str(img_file) + ":")
    im = cv2.imread(str(img_file), cv2.IMREAD_GRAYSCALE)
    cv2_imshow(im)

Seems reasonable!

## 1.3 - Export as PNGs
Next we will import the filtered TIFF images and export them as PNG files.

In [0]:
# Prepare the output directory
!rm -rf $out_dir
!mkdir $out_dir

In [0]:
# Loop over the filtered TIFF files
for img_file in filt_imgs:
    img = cv2.imread(str(img_file), cv2.IMREAD_GRAYSCALE) # Import the image

    # Save as PNG file (replace spaces with underscores)
    filename = out_dir.joinpath(str(img_file.stem).replace(" ","_") + '.png')
    print(filename)

    success = cv2.imwrite(str(filename), (img).astype('uint8')) # Save divided image as PNG

    if not success:
        print(f"Error, couldn't write image '{filename}'")

Optionally, we can now download the dataset for safe keeping.

In [0]:
# # To download the dataset, if you want
# from google.colab import files
# %cd $out_dir
# !rm -f ../dataset.zip
# !zip -r ../dataset.zip *
# files.download("../dataset.zip")
# %cd /content

## 1.4 - Upload images for labeling
Now we are ready to label these images for training the neural network. There are many tools available for creating labelled datasets. In this tutorial I will be using [Labelbox](https://labelbox.com/) for this purpose. 

Note: for those that don't care about labelling their own dataset, you can skip ahead to [Notebook 2](https://colab.research.google.com/github/Martin09/DeepSEM/blob/master/nanowire_yield/2_nw_yield_training.ipynb) where I will provide the pre-labelled data for the next steps.

Moving on, let's install the labelbox API first:

In [0]:
!pip install labelbox

If you haven't already done so, go ahead and make a free Labelbox account. You can either upload your images to be labelled manually, or you can upload them directly using this script below. 

***If you want to upload the images from this script, you need to create an API key [here](https://app.labelbox.com/account/api-keys) and paste it below:***

In [0]:
API_KEY = '[INSERT LABELBOX API KEY HERE]'

Now we can make a new Labelbox project and a new dataset before uploading the sub-divided images.

In [0]:
# Change these names if you wish
project_name = 'NanomembraneSegmentation'
dataset_name = 'Nick_NMs_50kmag'

# Create a new project and dataset in Labelbox
from labelbox import Client
client = Client(API_KEY)
project = client.create_project(name=project_name)
dataset = client.create_dataset(name=dataset_name, projects=project)

In [0]:
# Perform a bulk upload of the PNG files
dataset_files = list(upload_dir.glob('*.png'))  # Get a list of the files to upload
dataset_files = [str(file) for fil in dataset_files]  # Convert to list of strings (not Paths)
dataset.create_data_rows(dataset_files) # Upload the files
print("Done!")

After a few minutes, you should see the new project and images appear in your
Labelbox account, [here](https://app.labelbox.com/projects). You can now finish setting up your Labelbox project on the website, including setting your object classes.

For this tutorial, we will be doing segmentation. Therefore, be sure to ***only*** define segmentation objects (not bounding box or polygon objects, for example).

[Notebook 2](https://colab.research.google.com/github/Martin09/DeepSEM/blob/master/segmentation-NMs/2_nm_seg_training.ipynb) assumes you have finished your labelling and have exported a labelbox .JSON file with all of your segmentation mask labels. If you don't have your own labelled dataset don't worry, I will provide that for you. See you there!