# Image Compensation
## This notebook is an example: create a copy before running it or you will get merge conflicts!

Rosetta is the normalization process for your images produced by the MIBI. By normalizing the images you can reduce forms contamination that may show up.

For example, we illustrate Pre and Post Rosetta processing on the CD11c channel.

<table><tr>
    <td> <img src="./img/CD11c_pre_rosetta_cropped.png" style="width:100%"/> </td>
    <td> <img src="./img/CD11c_post_rosetta_cropped.png" style="width:100%"/> </td>
</tr></table>


In [None]:
import sys
sys.path.append('../')

import os
import shutil

import skimage.io as io
from toffy import rosetta
from toffy.panel_utils import load_panel
from ark.utils.io_utils import list_folders, list_files

## 1. Setup

Below, you will set up the necessary structure for testing rosetta on all of your runs.
- `cohort_name` is a descriptive name for the folder that will store the rosetta testing files
- `run_names` is a list of all the runs you would like to retrieve FOV images from for testing
- `panel_path` should point to a panel csv specifying the targets on your panel (see [panel format](https://github.com/angelolab/toffy#panel-format) for more information)

A new directory based on the provided `cohort_name` above will be created within `C:\\Users\\Customer.ION\\Documents\\rosetta_testing`; this folder will contain all the files need for and produced in **Section 2** of the notebook.

In [None]:
# run specifications
cohort_name = '20220101_new_cohort'
run_names = ['20220101_TMA1', '20220102_TMA2']
panel_path = 'C:\\Users\\Customer.ION\\Documents\\panel_files\\my_cool_panel.csv'

By default, the `commercial_rosetta_matrix_v1.csv` from the `files` directory of toffy will be used for rosetta. If you would like to use a different matrix, specify the path below. 

In [None]:
# default rosetta matrix provided in toffy
default_matrix_path = os.path.join('..', 'files', 'commercial_rosetta_matrix_v1.csv')

rosetta_testing_dir = 'C:\\Users\\Customer.ION\\Documents\\rosetta_testing'
extracted_imgs_dir = 'D:\\Extracted_Images'

# Read in toffy panel file
panel = load_panel(panel_path)

With the provided run names, we will randomly choose 5 FOVs per run to normalize and then test rosetta on.

In [None]:
# copy random fovs from each run
rosetta.copy_image_files(cohort_name, run_names, rosetta_testing_dir, 
                         extracted_imgs_dir, fovs_per_run=5)

# copy rosetta matrix
shutil.copyfile(default_matrix_path, 
                os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix.csv'))

# normalize images to allow direct comparison with rosetta
img_out_dir = os.path.join(rosetta_testing_dir, cohort_name, 'extracted_images')
fovs = list_folders(img_out_dir)
for fov in fovs:
    fov_dir = os.path.join(img_out_dir, fov)
    sub_dir = os.path.join(fov_dir, 'normalized')
    os.makedirs(sub_dir)
    chans = list_files(fov_dir)
    for chan in chans:
        img = io.imread(os.path.join(fov_dir, chan))
        img = (img / 100).astype('float32')
        io.imsave(os.path.join(sub_dir, chan), img, check_contrast=False)

## 2. Rosetta - Remove Signal Contamination
We'll now process the images with rosetta to remove signal contamination. This will give us a new set of compensated images, with different values for the compensation matrix. We'll be testing out coefficients in proportion to their value in the default matrix. For example, specifying multipliers of 0.5, 1, and 2 would test coefficients that are half the size, the same size, and twice the size of the coefficients in the default matrix, respectively. 

In [4]:
# Pick the channel that you will be optimizing the coefficient for
current_channel_name = 'Noodle'

# UPDATE THE BELOW ARGS WHEN RE-RUNNING
# set multipliers
multipliers = [0.25, 1, 4]

# pick an informative name
folder_name = 'rosetta_test1'

# If you only want to look at the output for a subset of the channels once you've picked good coefficients for the rest, 
# update this variable for fast processing. Otherwise, all channels will be compensated and saved
output_channel_names = None # e.g. output_channel_names = ['Au', 'CD45', 'PanCK']

In [None]:
# everything from here and below will run automatically
current_channel_mass = rosetta.get_masses_from_channel_names([current_channel_name], panel)

if output_channel_names is not None:
    output_masses = rosetta.get_masses_from_channel_names(output_channel_names, panel)
else:
    output_masses = None

# create sub-folder to hold images and files from this set of multipliers
folder_path = os.path.join(rosetta_testing_dir, cohort_name, folder_name)
if os.path.exists(folder_path):
    raise ValueError('This folder {} already exists, please' 
                     'pick a new name for each set of parameters'.format(folder_name))
else:
    os.makedirs(folder_path)

rosetta_mat_path = os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix.csv')

# generate rosseta matrices for each multiplier
rosetta.create_rosetta_matrices(default_matrix=rosetta_mat_path,
                               multipliers=multipliers, masses=current_channel_mass,
                               save_dir=folder_path)

# loop over each multiplier and compensate the data
rosetta_dirs = [img_out_dir]
for multiplier in multipliers:
    rosetta_mat_path = os.path.join(folder_path, 'commercial_rosetta_matrix_mult_{}.csv'.format(multiplier))
    rosetta_out_dir = os.path.join(folder_path, 'compensated_data_{}'.format(multiplier))
    rosetta_dirs.append(rosetta_out_dir)
    os.makedirs(rosetta_out_dir)
    rosetta.compensate_image_data(raw_data_dir=img_out_dir, comp_data_dir=rosetta_out_dir,comp_mat_path=rosetta_mat_path, 
                                  raw_data_sub_folder='normalized', panel_info=panel, batch_size=1, norm_const=1, output_masses=output_masses)

In [None]:
# stitch images together to enable easy visualization of outputs
stitched_dir = os.path.join(folder_path, 'stitched_images')
os.makedirs(stitched_dir)

rosetta.create_tiled_comparison(input_dir_list=rosetta_dirs, output_dir=stitched_dir, channels=output_channel_names)

# add the source channel as first row to make evaluation easier
output_dir = os.path.join(rosetta_testing_dir, cohort_name, folder_name + '-stitched_with_' + current_channel_name)
os.makedirs(output_dir)
rosetta.add_source_channel_to_tiled_image(raw_img_dir=img_out_dir, tiled_img_dir=stitched_dir,
                                             output_dir=output_dir, source_channel=current_channel_name)

Now that we've generated the compensated data for the given multipliers, we'll generate stitched images to make comparing the different multipliers easier.

### Evaluating the Images

There will now be a folder named `folder_name-stitched_with_Noodle` (based on the folder name you provided for the rosetta testing). You can look through these stitched images to visualize what signal is being removed from the Noodle channel, which is the main source of noise in most images.
These files will contain 5 rows of images: 
- row 1: the Noodle channel
- row 2: the raw extracted image
- row 3-5: images after applying the rosetta matrix with coefficients adjusted for the multipliers (i.e. [0.5, 1, 2])

**If the images in the fourth row (rosetta using multiplier 1) are satisfactory, then there is no need to adjust the matrix. You can run the cell below and move on to Section 3 to process your entire cohort.** 

In [None]:
# use the default matrix for compensation
final_rosetta_path = rosetta_mat_path

### Optimize Your Compensation Matrix
However, if you would like to adjust the amount of noise being removed, you can re-run the code cells in this section to optimize the compensation matrix for your data; you will need to update the `multipliers` variable and a provide a new `folder_name`. The stitched images can help you determine whether the multiplier needs to be higher, lower, or the same.

**For each channel, pick the multiplier that worked the best. Then, open the commercial_rosetta_matrix.csv file in your cohort testing directory and update the corresponding coefficient in that cell to be the `previous_value * coefficient`.** If you're happy with the new coefficients, you can rename your modified matrix, run the cell below, and move on to the final section.

In [None]:
# rename your rosetta matrix and put the path to final file here
final_rosetta_path = os.path.join(rosetta_testing_dir, cohort_name, 
                                  'new_rosetta_matrix.csv')

**Once you've finalized your coefficients, please let us know [here](https://github.com/angelolab/toffy/issues/55).**

## 3. Rosetta - Compensate Your Runs

Once you're satisfied that the Rosetta is working appropriately, you can use it to process your runs. First select the runs you want to process, and define the relevant top-level folders. Everything necessary for and subsequently outputted from this section of the notebook is stored in the automatic directories established in `1_set_up_toffy.ipynb`. More information on the uses and locations of the directories in toffy can be found in the [README](https://github.com/angelolab/toffy#directory-structure).

In [None]:
# list of run names you would like to compensate images for, 
# by default uses the run list provided in Step 1 for testing
runs = run_names

extracted_imgs_dir = 'D:\\Extracted_Images'

# This folder will hold the post-rosetta images
rosetta_image_dir = 'D:\\Rosetta_Compensated_Images'

Then, you can compensate the data using rosetta.

In [None]:
# perform rosetta on the provided runs
for run in runs:
    print("processing run {}".format(run))
    run_extracted_dir = os.path.join(extracted_imgs_dir, run)
    run_rosetta_dir = os.path.join(rosetta_image_dir, run)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    rosetta.compensate_image_data(raw_data_dir=run_extracted_dir, comp_data_dir=run_rosetta_dir, 
                                 comp_mat_path=final_rosetta_path, panel_info=panel, batch_size=1)