## Core Naming and Cropping

For TMAs, each core is extracted all at once. However, this makes it difficult to locate the exact positions of each core. Additionally, the default names assigned to each core aren't particularly useful because they don't contain any information about their position on the TMA.

This section will help you assign informative names to each core and afterwards, segment out the locations of specific cores to generate FOV-level statistics.

In [None]:
import os
import pathlib

import matplotlib.pyplot as plt
from pyimzml.ImzMLParser import ImzMLParser
from maldi_tools import extraction

Load in the imzml data associated with your run.

TODO: only the coordinates should be needed for this step, should be saved further upstream for efficient loading.

In [None]:
imzml_dir = base_dir / "imzml"
data_name = "panc2055_imzML"
data_file = pathlib.Path(data_name) / "panc2055.imzML"
data_path = imzml_dir / data_file

imz_data = ImzMLParser(data_path, include_spectra_metadata="full")

It is helpful first to create an all-encompassing mask that defines the locations of all the cores. This will make it clear where the TMA was scanned for the naming step. You will need to provide the path to one of your extracted glycan images first.

* `glycan_img_path`: path to one glycan image, needed to properly dimension the mask
* `glycan_mask_path`: where the mask will be saved

In [None]:
glycan_img_path = "path/to/glycan_img.tiff"
glycan_mask_path = "path/to/glycan_mask.png"

# generate and save the glycan mask
extraction.generate_glycan_mask(
    imz_data=imz_data,
    glycan_img_path=glycan_img_path,
    glycan_mask_path=glycan_mask_path
)

Each core on the TMA should be appropriately named by the <a href=https://tsai.stanford.edu/research/maldi_tma/>TSAI MALDI tiler</a>. You will need to provide the PNG saved at `glycan_mask_path` as input. **Ensure that this step is completed before running the following sections.**

The poslog files for your TMA run will contain each scanned coordinate in the exact order it was scanned. This, along with the tiler output, will be needed to map each coordinate to its respective core.

* `centroid_path`: TSAI MALDI tiler output, contains name of each core mapped to respective centroid
* `poslog_paths`: list of poslog files used for the TMA, contains all coordinates in order of acquisition. **Make sure this matches up with the order of acquisition for your run.**

In [None]:
centroid_path = "path/to/centroids.json"
poslog_paths = ["path/to/poslog1.txt", "path/to/poslog2.txt"]

# map coordinates to core names
region_core_info = extraction.map_coordinates_to_core_name(
    imz_data=imz_data,
    centroid_path=centroid_path,
    poslog_paths=poslog_paths
)

To generate FOV-level statistics, an individual mask for each core named by TSAI will be saved. They can then be loaded in as needed in the FOV-level-statistic-generating functions.

* `glycan_crop_save_dir`: the directory where these masks will be saved

In [None]:
glycan_crop_save_dir = "path/to/glycan/crops"
if not os.path.exists(glycan_crop_save_dir):
    os.makedirs(glycan_crop_save_dir)

extraction.generate_glycan_crop_masks(
    glycan_mask_path=glycan_mask_path,
    region_core_info=region_core_info,
    glycan_crop_save_dir=glycan_crop_save_dir
)

Run the following cell to visualize the masks for certain cores for testing.

* `cores_to_crop`: define all the cores you want to visualize their masks for. If multiple cores are specified, the individual masks are combined. Set to `None` to crop all cores out.

In [None]:
cores_to_crop = ["R1C1", "R1C2"]

# extract a binary mask with just the cores specified
core_cropping_mask = extraction.load_glycan_crop_masks(
    glycan_crop_save_dir=glycan_crop_save_dir,
    cores_to_crop=cores_to_crop
)

# visualize the mask
_ = plt.imshow(core_cropping_mask)