# ALC Automatic and Semi-Automatic Zircon Measurement Notebook
The code in this Colab Notebook will automatically detect, segment, and measure zircon crystals in reflected light alignment mosaics from the [Arizona LaserChron center](https://sites.google.com/laserchron.org/arizonalaserchroncenter/home) saved during LA-ICP-MS dating. Zircon segmentation is accomplished via trained Mask RCNN model, using the Detectron2 deep learning library.

This Notebook is ready-to-run and implements code available in the [colab-zirc-dims GitHub repository](https://github.com/MCSitar/colab_zirc_dims). Before using this Notebook, it should be copied into your own Google Drive (location w/in is irrelevant) or opened in 'playground mode'. Make sure that you are connected to a GPU runtime when running this Notebook.

## How to run this Notebook (for new Google Colab users):

Google Colab notebooks are Jupyter notebooks that execute in cloud-hosted Python 3 environments on virtual machines equiped with high-end CPUs and GPUs. Users are thus able to run compute-intensive Python code, view outputs, etc. in a browser window from any local computer regardless of their hardware and without any setup or installation.



#### Checking GPU runtime:

Code in this Notebook uses RCNN models to segment zircon crystals from images, so virtual runtimes executing it require access to a GPU. This should be enabled by default, but if users want to verify that their virutal machine has a GPU:

1.   Navigate to 'Runtime' --> 'Change runtime type' in the toolbar at the top of the screen.
2.   In the 'Notebook Settings' window that pops up, check that 'GPU' is selected in the 'Hardware accelerator' dropdown menu. If not, select it.
3.   Click save, then run the Notebook.





#### Running cells:

Notebooks are made up of cells containing either text or code. Cells with code in this Notebook should be run in top-bottom order unless otherwise specified. To run cells:

1.   Hover the mouse over the cell to be run, then click the button with a 'play' symbol on it. See below for an example:

In [None]:
#Try running this example cell
print('Cell run!')

#### Clearing outputs:

To make this Notebook look neater after running it and/or to cut down on file size before saving (e.g., if there are many inspection images open), users can clear all cell outputs. To do this:

1.   Navigate to 'Edit' --> 'Clear all outputs' in the toolbar at the top of the screen, then click.

#### Factory reset runtime:

Some earlier versions of this project had memory leak issues that caused RAM crashes during fully-automated grain processing. Said issues seem to be fixed, but if they reappear:

1.   Clear the current virtual machine runtime and connect to a new one (with fresh RAM) by navigating to 'Runtime' --> 'Factory reset runtime' and clicking.
2.   Re-run all neccesary cells in Notebook.
3.   (Please) open an issue on the project Github page or use the contact details found there to report the bug directly to the project manager.


## Possible Workflows:


#### Only using automated processing (not recommended)
1. Run cells in 'Processing Option A' after running all required cells above. This is the only step.

#### Sample-by-sample semi-automated processing 
1.   Run the GUI in 'Processing Option B' after running all required cells above.
2.   Produce and correct segmentations and measurements on a sample-by-sample basis

#### Automated editing followed by manual segmentation checking/editing
1. Run cells in 'Processing Option A' with polygon saving enabled after running all required cells above.
2.   Load polygons and use the GUI ('Processing Option B') to check and/or edit auto-generated polygons. Can be done in a single sesssion or iteratively in multiple sessions; load polygons from your last semi-automated processing run to pick up where you left off.


## Data Formatting and Organization:

To use this Notebook, a formatted project folder containing your data files and a [mosaic info .csv file](https://colab.research.google.com/drive/1aPMjSF2uGOP4Xy2dssjhk--TFQ7_hWuu?usp=sharing) must be uploaded to your Google Drive.

Your project folder must be organized with the structure shown below. A template project folder that can be downloaded, edited, and re-uploaded is available [here](https://drive.google.com/drive/folders/1cFOoxp2ELt_W6bqY24EMpxQFmI00baDl?usp=sharing). Note that previous versions of this folder included trained model weights - these are now automatically downloaded while running this notebook and so are not a necessary component of the project folder.

```
root directory
|   mosaic_info.csv
|
└───mosaics*
│   │   mosaic_XXX.bmp
│   │   mosaic_XXX.Align
│   │   mosaic_YYY.bmp
│   │   mosaic_YYY.Align
|   |   ...
│   
└───scanlists
|   │   scanlist_XXX.scancsv
|   │   scanlist_YYY.scancsv
|   |   ...
|
└───outputs**
    |   ...

*.Align files must have the same filenames (minus file extensions) as
their respective .bmp mosaic files.
Low-contrast mosaic images will automatically have their contrast 
increased via (Scikit Image) histogram normalization during processing.

**Optional; if this folder does not exist it will be automatically created during processing
```
For basic use of this notebook, only the mosaic_info.csv file needs to be modified. A mosaic_info.csv file can be generated from your data using [this notebook](https://colab.research.google.com/drive/1aPMjSF2uGOP4Xy2dssjhk--TFQ7_hWuu?usp=sharing).


### mosaic_info.csv Formatting

Your mosaic_info csv file must have headers (capitalization must match):

| **Sample** | **Scanlist** | **Mosaic** | **Max_zircon_size** | **X_offset** | **Y_offset** |

Data that should be entered under each of the headers will be as follows:


*   **Sample**: Name of each sample (e.g., 'V26').
*   **Scanlist**: Full filename of the scanlist corresponding to each sample (e.g., 'V26 complete.scancsv').
*   **Mosaic**: Full filename of the mosaic .bmp image file corresponding to each sample (e.g., 'Mosaic160210 1844-32-916.bmp').
*   **Max_zircon_size**: Maximum expected zircon size (in µm) in each sample (e.g., '500'). During processing, subimages are clipped from larger mosaic images to cut down on processing time. This will be the size of the clipped subimages that are processed by the script.
*   **X_offset**: X correction (in µm) for any misalignment of each mosaic image relative to recorded ablation points (e.g., '-125' will shift ablation points 125 µm to the left). Set to 0 to keep recorded points as-is.
*   **Y_offset** Y correction (in µm) for any misalignment of each mosaic image relative to recorded ablation points (e.g., '-125' will shift ablation points 125 µm upwards). Set to 0 to keep recorded points as-is.

---

## Setup and Installation:


The 2 cells below (modified from the [Detectron2 Colab tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5)) will install Detectron2 and import various packages neccesary for further processing. The runtime will automatically restart upon Detectron2 installation in the first cell (may bring up crash warning).

In [None]:
!pip install pyyaml==5.1

import torch
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
# Install detectron2 that matches the above pytorch version
# See https://detectron2.readthedocs.io/tutorials/install.html for instructions
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html
exit(0)  # After installation, you need to "restart runtime" in Colab. This line can also restart runtime

In [None]:
# import some neccesary libraries
import os, cv2, sys
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
import ipywidgets as widgets
%matplotlib auto

# import some common detectron2 utilities
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import catalog

#install colab_zirc_dims
!pip install colab_zirc_dims==1.0.7

#imports cloned colab_zirc_dims modules
from colab_zirc_dims import czd_utils
from colab_zirc_dims import save_load
from colab_zirc_dims import alc_notebook_fxns
from colab_zirc_dims import zirc_dims_GUI


To mount your Google Drive, simply run the cell below and input your authentication code as requested.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Modify the form field and run the cell below to set your project directory (organized as above) as the root directory.

In [None]:
#@title Input full path to project folder here, then run this cell
ROOT_DIR = "/content/drive/My Drive/YOUR PROJECT DIRECTORY HERE" #@param {type:"string"}


## Data Loading:

Run the cell below to load datasets from your project directory and select samples for processing using dynamically-created checkboxes. You should re-run this cell if a) the dataset did not load correctly or b) you have made changes (e.g., adjusting alignment in the mosaic_info.csv file) to your project directory since first running it in this session.


In [None]:
#loads data mosaic_info.csv, various align files; collects and verifies filenames. \
# You may need to modify files (e.g., mosaic_info.csv), re-upload them to Google Drive, \
# and re-run this cell if output is not satisfactory.
mos_data_dict = czd_utils.load_data_dict(ROOT_DIR)
print('Samples successfully loaded:', list(mos_data_dict.keys()))

#Selection of samples from dataset for processing
selected_samples = []
if len(list(mos_data_dict.keys())):
  alc_notebook_fxns.select_samples_fxn(mos_data_dict, selected_samples)

## Data Inspection (optional):
Run the cell below to inspect (display random samples of scans from each mosaic) selected data from each dataset in your project directory. Scales for
images are in microns.

In [None]:
%matplotlib inline
alc_notebook_fxns.inspect_data(mos_data_dict,
                               selected_samples,
                               n_scans_sample = 3) #modify integer here to change
                                                   # number of scans sampled from
                                                   # each sample.
%matplotlib auto

## Model Download and Initialization:

The cells below will download (1st cell) and then initialize and configure (2nd cell) a trained zircon-segmentation RCNN model. This model will be used to segment zircon crystals from the mosaic images that you previously loaded.

Model training parameters and comparisons between model results and segmentations by humans will (hopefully) be made available in a future publication. It is recommended that users pick one of the first two models, and that they avoid the last 3 models (these do not perform well and are only included for manuscript data replication purposes). 

In [None]:
# Modify path/url below if you want to use a custom (with downloadable models)
# model library with this Notebook. Models are currently hosted for download on AWS;
# see czd_model_library.json for direct download links and model lib formatting.

model_lib_loc = 'default' #this will get the current version of the czd model lib from Github

current_model_dict = {}
alc_notebook_fxns.select_download_model_interface(current_model_dict, model_lib_loc)

In [None]:
# Run this cell after selecting a model in the cell above, and again if model is changed at any point.
# Can be modified to load any Detectron 2 instance segmentation model trained to \
# detect zircon grains.
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(current_model_dict['config_file']))#base model configuration
cfg.MODEL.RESNETS.DEPTH = current_model_dict['resnet_depth'] #set resnet depth from model library
cfg.MODEL.WEIGHTS = os.path.join(os.getcwd(), current_model_dict['name']) #loads model weights (trained to detect zircons)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # Only class here is zircon
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set a custom threshold
cfg.MODEL.RPN.NMS_THRESH = 0.06 #set a custom (lower) NMS threshold; ideally minimizes overlapping zircon masks in results

zircon_metadata = catalog.Metadata(name='zircon_meta', thing_classes=['zircon'])

predictor = DefaultPredictor(cfg)

## Test Evaluation (Optional):

Run the cell below to visualize model prediction results on a random sample of scans from each of your mosaic images. Axes for scanned zircon images will be in microns.

In [None]:
%matplotlib inline
alc_notebook_fxns.test_eval(selected_samples, mos_data_dict, predictor, 
                            zircon_metadata,
                            n_scans_sample =3) #modify integer to change num. scans
                                               # randomly selected, evaluated per sample.
%matplotlib auto

## Processing Option A: Fully Automated Processing:

Run the cell below to automatically process all selected samples.

In the upper part of the cell, you will select alternate image pre-processing/segmentation methods to automatically try in case an initial attempt at segmentation fails. You do not have to select all or any of these alternate methods. Otsu thresholding segmentation (non-RCNN) is particularly sensitive to image artefacts and can produce wildly incorrect results.

In the lower part of the cell, you will choose whether to save segmentation polygons (approximating autogenerated masks to within one micron) into .json files. These polygons can be loaded into GUI (see 'Option B' below) for evaluation and/or editing in this or any future Colab session. Saving polygons is recommended for verification purposes and enabled by default, but will slow down automated processing somewhat.

After selecting alternate methods and choosing whether to save autogenrated polygons, run the cell to automatically process all selected samples. Zircon dimensions will be saved to .csv files corresponding to each sample, and mask images will be saved as .png files for verification. If polygon saving is enabled, polygons will be saved into .json files for future editing or further verification.

In [None]:
#@title Select processing options, then run this cell to start fully automated proccessing
#@markdown #####Alternate segmentation methods:
Try_zoomed_out_subimage = True #@param {type:"boolean"}
Try_zoomed_in_subimage = True #@param {type:"boolean"}
Try_contrast_enhanced_subimage = True #@param {type:"boolean"}
Try_Otsu_thresholding  = False #@param {type:"boolean"}

#@markdown #####Save polygons for GUI viewing/editing?
Save_polygons = True #@param {type:"boolean"}


save_polys_bool = Save_polygons
alt_methods = [Try_zoomed_out_subimage, Try_zoomed_in_subimage,
               Try_contrast_enhanced_subimage,
               Try_Otsu_thresholding]

#@markdown ###### Identifying string for new output directory name (can be blank):
full_auto_str = '' #@param {type:"string"}


#Below: actually run fully-automated segmentation for selected samples. \
#If somehow not set in cells above, mpl auto needed here to prevent RAM crash
%matplotlib auto
run_dir = alc_notebook_fxns.full_auto_proc(ROOT_DIR, selected_samples, mos_data_dict,
                                           predictor, save_polys_bool, alt_methods,
                                           full_auto_str)

## Processing Option B: GUI (Semi-Automated) Processing:

The cells below set up and open a simple GUI that allows sample-by-sample creation, inspection, replacement, and export of automatic zircon segmentations. This may be most useful for mosaic images that the automatic segmentation model struggles with (e.g., with poor image quality/misalignment or partially exposed zircons).
_______________________________________________________________________

### Load saved polygons (optional):

This step is optional; the GUI can be run without any loadable polygons and will in this case simply produce automated segmentations for each sample as the user navigates through their dataset.

(Optionally) running one of the two cells below allows loading polygons and limited metadata from a run directory created during a previous automated or semi-automated processing run. Polygons from the selected run directory will be copied to a current run directory, so segmentations can be edited iteratively in different sessions (see 'Workflows' above). Zircon shapes will not be analyzed and new polygons will not be saved until the 'Analyze...' button is clicked in the GUI.

*   Run the first cell below if you have just (within this notebook session) completed an automated processing run for your sample with polygon saving enabled and want to check/edit saved zircon segmentations.
*   OR
*   Type in the folder name of a semi-automated or saving-enabled automated processing run subdirectory (e.g., *../YOUR_PROJECT FOLDER/outputs/NAME_OF_RUN_DIRECTORY*) from this or a previous session to load polygons from it.

In [None]:
#Run this cell to attempt loading from a just-run automated processing run.
try:
  loading_dir = save_load.check_loadable(run_dir, verbose=True)
except NameError:
  print('No run_dir variable found for current session; try finding and manually',
        'adding a loadable run directory in the cell below or proceed without loading.')
  loading_dir = None

OR

In [None]:
#Add the name of a loadable run directory from this or a previous session, \
# then run this cell to attempt loading.
input_loadable_run_dir = "OUTPUT RUN FOR LOADING HERE" #@param {type:"string"}
loading_dir = save_load.check_loadable(os.path.join(ROOT_DIR, 'outputs',
                                                    input_loadable_run_dir),
                                       verbose=True)

### Open and Run GUI


Running the cell below will actually open the GUI. If a valid previous run was selected, polygons will be loaded from that run. Otherwise, new automated segmentations will be produced on a sample-by-sample basis. Upon invoking the 'Analyze...' functions, zircon dimensions will be saved to .csv files corresponding to each sample, mask images will be saved as .png files for verification, and any changes to polygons will be saved to .json files corresponding to each sample.

Because Google Colab will automatically close inactive runtimes, it is recommended that users not leave GUI running but inactive without first saving changes.


>**Instructions and tips:**
>
> Click on the canvas to start a new zircon polygon, and double click to finish the polygon.
>
>The 'restore orig. polygon' button will revert to the last saved automatically-generated (or, in the case of loaded polygons, possibly human-generated) zircon segmentation if one exists.
>
>The 'tag scan' button adds a tag to the scan that will persist into the exported image filename and zircon dimensions .csv file. What this tag means is up to the user (e.g., well-exposed zircons could be tagged).
>
>The 'Save changes to sample polygons' button will send all current polygons in the currently-open sample for .json saving. This will also be done automatically upon switching between samples and sending polygons for grain dimension analysis.
>
>The 'Analyze sample dimensions and export to Drive' button will send all current polygons in the currently-open sample for .json saving and dimensional analysis and save the results to your Google Drive project folder.
>
>The 'Analyze... all selected samples' button will measure dimensions of zircon crystals for all samples in the dataset where saved polygons (either generated in current session or loaded from a previous one) are available. For as-of-yet unclear reasons this is a slow process, so it may be best to check/edit all polygons in a dataset before starting it.

In [None]:
#run this cell to open GUI
%matplotlib auto
try:
  loading_dir = loading_dir
except NameError:
  loading_dir = None
#@markdown ###### Identifying string for new output directory name (can be blank):
semiauto_str = '' #@param {type:"string"}
zirc_dims_GUI.run_alc_GUI(mos_data_dict, selected_samples, ROOT_DIR, predictor,
                          loading_dir, semiauto_str)