# Colab Zirc Dims: Data Matching and Preparation

This Colab Notebook is ready-to-run and allows users to match mosaic images of zircon grains from LA-ICP-MS analyses to corresponding .scancsv scanlist files using code from the [colab-zirc-dims GitHub repository](https://github.com/MCSitar/colab_zirc_dims). It is intended to complement [this notebook](https://colab.research.google.com/drive/1ujPeumUGHi5_ZtwliOVA2W2gjPljW0XV?usp=sharing), which automates zircon grain size measurment using deep learning. This notebook should be opened in playground mode or copied into users' Google Drives before running.

## How to run this Notebook (for new Google Colab users):

Google Colab notebooks are Jupyter notebooks that execute in cloud-hosted Python 3 environments on virtual machines equiped with high-end CPUs and GPUs. Users are thus able to run compute-intensive Python code, view outputs, etc. in a browser window from any local computer regardless of their hardware and without any setup or installation.


#### Running cells:

Notebooks are made up of cells containing either text or code. Cells with code in this Notebook should be run in top-bottom order unless otherwise specified. To run cells:

1.   Hover the mouse over the cell to be run, then click the button with a 'play' symbol on it. See below for an example:

In [None]:
#Try running this example cell
print('Cell run!')

#### Clearing outputs:

To make this Notebook look neater after running it and/or to cut down on file size before saving (e.g., if there are many inspection images open), users can clear all cell outputs. To do this:

1.   Navigate to 'Edit' --> 'Clear all outputs' in the toolbar at the top of the screen, then click.

## Project Folder Organization:


Before running this notebook, users must have a project folder organized with the structure below.  A template project folder that can be downloaded, edited, and re-uploaded is available [here](https://drive.google.com/drive/folders/1cFOoxp2ELt_W6bqY24EMpxQFmI00baDl?usp=sharing). Note that the template folder previously included a trained model, but this is no longer neccesary (as of 03/21/2022) because models are now automatically downloaded before automated analysis. This notebook will procedurally generate a mosaic_info.csv file (neccessary for automated analysis) and add it to the folder.

```
root directory
|
└───mosaics**
│   │   mosaic_XXX.bmp
│   │   mosaic_XXX.Align
│   │   mosaic_YYY.bmp
│   │   mosaic_YYY.Align
|   |   ...
│   
└───scanlists
    │   scanlist_XXX.scancsv
    │   scanlist_YYY.scancsv
    |   ...


**This directory must contain both .bmp mosaic images and .Align files for 
each zircon sample to be analyzed. .Align files must have the same 
filenames (minus file extensions) as their respective .bmp mosaic files.
All .bmp/.Align files do not have to correspond to a scanlist (and vice-versa);
.bmp/.Align files that do not correspond to any .scancsv file will be
automatically ignored when running this notebook.
Low-contrast mosaic images will automatically have their contrast 
increased via (Scikit Image) histogram normalization during processing.


```


### mosaic_info.csv Formatting/Explanation:


Your mosaic_info .csv file (i.e., the one created by this Notebook) will have headers (capitalization must match):

| **Sample** | **Scanlist** | **Mosaic** | **Max_zircon_size** | **X_offset** | **Y_offset** |

Data under each of the headers will be as follows:


*   **Sample**: Name of each sample (e.g., 'V26'). Sample names must be unique!
*   **Scanlist**: Full filename of the scanlist corresponding to each sample (e.g., 'V26 complete.scancsv').
*   **Mosaic**: Full filename of the mosaic .bmp image file corresponding to each sample (e.g., 'Mosaic160210 1844-32-916.bmp').
*   **Max_zircon_size**: Maximum expected zircon size (in µm) in each sample (e.g., '500'). During processing, subimages are clipped from larger mosaic images to cut down on processing time. This will be the size of the clipped subimages that are processed by the script. Models will often fail to distinguish grains when they cannot see all grain boundaries (e.g., if the sub-image is smaller than the grain), so this parameter should be adjusted such that non-standard grains are fully visible within example subimages.
*   **X_offset**: X correction (in µm) for any misalignment of each mosaic image relative to recorded ablation points (e.g., '-125' will shift ablation points 125 µm to the left). Keep at 0 to keep recorded points as-is.
*   **Y_offset** Y correction (in µm) for any misalignment of each mosaic image relative to recorded ablation points (e.g., '-125' will shift ablation points 125 µm upwards). Keep at 0 to keep recorded points as-is.

---

## Imports and Google Drive mounting:

Run the cell below to import neccessary packages and mount your Google Drive to this Notebook. Mounting your Google Drive will require following the given instructions (click link, sign in, copy code, paste code into box, press enter).

In [None]:
#imports neccessary packages
import sys
import os
import matplotlib.pyplot as plt
import pandas as pd

#install colab_zirc_dims, import modules
!pip install colab_zirc_dims==1.0.8

from colab_zirc_dims import czd_utils, mos_proc, mos_match

# mounts user Google Drive
from google.colab import drive
drive.mount('/content/drive')

## Link project folder:

Add the path to your project folder to the cell below and then run the cell to link the Notebook to it.

In [None]:
#@title Input full path to project folder here, then run this cell
ROOT_DIR = "/content/drive/My Drive/YOUR PROJECT DIRECTORY HERE" #@param {type:"string"}


## Match mosaic files to .scancsv files:

Run the cell below to find matches by comparing the bounds of .bmp images (given in their respective .Align files) with those of .scancsv files.

Re-run this cell if you have modified the project folder (e.g., added files) and want these changes to be reflected in your table. Note that this will clear any changes that you have made to the table within this Notebook - export and download your .csv to save changes.

If you know that there is significant mismatch (i.e., > 1 shots outside of mosaic areas) in your dataset, you can increase the integer in line 9 of the cell below to increase tolerance for out-of-bounds shots when matching mosaics to .scancsv scanlist files.

In [None]:
scanlist_dir = os.path.join(ROOT_DIR, 'scanlists')
mosaic_dir = os.path.join(ROOT_DIR, 'mosaics')

#Change the integer parameter in the line below to increase the number \
# of 'out-of-bounds' shots allowed before a mosaic is classified as \
# not being a match for a .scancsv file. Increase the parameter to an \
# arbitrarily large number (i.e., > number of shots in the .scancsv) \
# to view all mosaics in project folder as potential matches.
matches_dict = mos_match.check_scan_mos_matches(scanlist_dir, mosaic_dir, 1)

mutable_export_dict, act_matches_dict = mos_match.matches_to_mos_info(matches_dict)

## View matches and edit table:

Run the cells below to view and edit the data that will be exported in your mosaic_info.csv file.

#### 1. Run (hidden by default) cell below to define functions, variables neccesary for table viewing/editing:

In [None]:
from google.colab import widgets
import ipywidgets
from ipywidgets import Layout
from IPython.display import display, clear_output
def display_edit_table():
    headers = list(mutable_export_dict.keys())
    num_matches = len(mutable_export_dict['Mosaic'])

    #shows a random sample of subimages for a given row in mutable_export_dict
    # param "output_tab" is a list with [*tab_widget_name*, *index_of_tab_for_plotting*]
    def random_sample_row(sample_row, output_tab = None):
        sample_params = []
        for each_header, each_data in mutable_export_dict.items():
          if each_header == 'Mosaic':
            sample_params.append(os.path.join(mosaic_dir, each_data[sample_row]))
            temp_align_path = sample_params[-1].strip('bmp') + 'Align'
            sample_params.append(temp_align_path)
          elif each_header == 'Scanlist':
            sample_params.append(os.path.join(scanlist_dir, each_data[sample_row]))
          else:
            sample_params.append(each_data[sample_row])
        if output_tab:
          output_tab[0].clear_tab(output_tab[1])
          with output_tab[0].output_to(output_tab[1]):
              mos_proc.random_subimg_sample(9, *sample_params[:5], sample_params[5:])
        else:
            mos_proc.random_subimg_sample(9, *sample_params[:5], sample_params[5:])


    #upon change in input box/dropdown vals, changes vals in mutable_export_dict
    def change_watcher(change, val_key_idx):
        if change['name'] == 'value' and (change['new'] != change['old']):
            column_name = headers[val_key_idx[0]]
            mutable_export_dict[column_name][val_key_idx[1]] = change['new']

    #displays an input widget and sets up listener (linked to mutable_export_dict)
    def display_and_watch(ipy_widget, val_key_idx):
        each_widget = ipy_widget
        each_widget.observe(lambda change: change_watcher(change, val_key_idx))
        display(each_widget)

    #displays a button widget and links it to a function with specific param(s)
    def display_linked_button(button_widget, val_for_fxn):
        each_widget = button_widget
        each_widget.on_click(lambda i: random_sample_row(val_for_fxn, [t, 1]))
        display(each_widget)

    t = widgets.TabBar(['Table', 'Image View'])

    with t.output_to(0):

      grid = widgets.Grid(num_matches + 1, 7, header_row=True, header_column=True,
                          style='width: auto')

      starting_vals = list(mutable_export_dict.values())
      for (row, col) in grid:
        #row in displayed table does not match row in export dict, so needs adjustment
        adj_row = row - 1

        #adds/links data, headers from mutable_export_dict procedurally
        if col < 6:
          each_starting_val = starting_vals[col][adj_row]
          if row == 0:
            print(headers[col])
          elif col == 0:
              display_and_watch(ipywidgets.Text(value=each_starting_val,
                                                layout=Layout(width='100%')),
                                 [col, adj_row])
          elif col == 1:
            print(list(act_matches_dict.keys())[adj_row])
          elif col == 2:
            if len(list(act_matches_dict.values())[adj_row]) == 1:
              print(each_starting_val)
            else:
              options_list = list(act_matches_dict.values())[adj_row]
              display_and_watch(ipywidgets.Dropdown(value = each_starting_val,
                                                    options = options_list,
                                                    layout=Layout(width='100%')),
                                [col, adj_row]
                                )
          else:
            display_and_watch(ipywidgets.IntText(value = each_starting_val,
                                                 layout=Layout(width='100%')), [col, adj_row])

        #adds buttons allowing samples of data (with current inputs) to be displayed
        elif row == 0:
          print('Display/refresh\nsample images')
        else:
          display_linked_button(ipywidgets.Button(tooltip='Click to display/refresh a random sample of shot images', 
                                                  style= {'button_color': 'white',
                                                          'text_color': 'black'},
                                                  description = 'Display'), adj_row)

#### 2. Run cell below to actually view/edit table:
If multiple possible matching mosaic files are available for a given .scancsv file, you can browse through them and select a best match via dropdown menu. Click the button at right to view or reload a random sample of shot images with current (entered) parameters/mosaic file. **Images will appear in the 'Image View' tab (click on this) at the top of the table**. All cells should be filled out (with placeholder values if necessary) before export.

In [None]:
display_edit_table()

## Convert and view mosaic_info data table:

Run the cell below to convert your entered data to a pandas data table and run it. This should be re-run before export to record any changes to the table in the above cell.

In [None]:
from google.colab import data_table
data_table.enable_dataframe_formatter()

pd_export_dict = pd.DataFrame.from_dict(mutable_export_dict)

pd_export_dict

## Export mosaic_info.csv to project folder:

Run the cell below to export the data table that you have created to your project folder as 'mosaic_info.csv'. Note that this will overwrite any current 'mosaic_info.csv' file in the folder.

In [None]:
czd_utils.save_csv(os.path.join(ROOT_DIR, 'mosaic_info.csv'), pd_export_dict)

## Measure zircons with deep learning automation (optional):
If you want to continue to automated zircon size analysis, follow [this link](https://colab.research.google.com/drive/1ujPeumUGHi5_ZtwliOVA2W2gjPljW0XV?usp=sharing).