# Image Compensation (Rosetta  Round 2)

After Rosetta Round 1, an unlikely scenario may occur that cause a channel(s) to still retain sources of background/noise as a result of residual contamination from another channel. This notebook allows you to apply an additional round of compensation by limiting the scope of Rosetta to problematic sets of channels. To learn more about the Rosetta algorithm, please refer to the [base Rosetta notebook](https://github.com/angelolab/toffy/blob/main/templates/4a_compensate_image_data.ipynb). <b>Note that you must have ran `4a_compensate_image_data.ipynb` before using this notebook</b>.

The Rosetta matrix contains rows for each of the sources of noise, and columns for each of the output channels. Each entry in the matrix represents the proportional contamination from a given noise channel to a given output channel. Unlike Rosetta Round 1, all default values in the compensation matrix of Rosetta Round 2 will be 0, since only specific user-defined channels will be compensated further. As in Round 1 with the Noodle coefficient, Round 2 allows the user to modify the coefficient for the problematic output channel.

In the following Rosetta matrix example:

<img src="./img/rosetta_matrix_entry.png"/>

we're applying a 0.25 multiplier to compensate output channel 144 from input channel 147.

## This notebook consists of 3 steps:

**1. Define directories, which includes the path to the test set you generated for Round 1 Rosetta.**

**2. Test Rosetta on this subset of FOVs to find good coefficients for the Round 2 compensation matrix.**

**3. Use the finalized Round 2 matrix to process all of the data.**

In [None]:
import sys
sys.path.append('../')

import os
import shutil

import skimage.io as io
from toffy import rosetta
from toffy.panel_utils import load_panel
from toffy.image_stitching import get_max_img_size
from alpineer.io_utils import list_folders, list_files

## 1. Setup

Below, you will set up the necessary structure for testing rosetta on all of your runs.
- `cohort_name` is the name of the cohort you used in `4a_compensate_image_data.ipynb`
- `panel_path` should point to a panel csv specifying the targets on your panel (see [panel format](https://github.com/angelolab/toffy#panel-format) for more information)

In [None]:
# run specifications
cohort_name = '20220101_new_cohort'
panel_path = 'C:\\Users\\Customer.ION\\Documents\\panel_files\\my_cool_panel.csv'

By default, the `commercial_rosetta_matrix_round2.csv` from the `files` directory of toffy will be used for rosetta.

* `default_matrix_path`: the default path points to a `.csv` file with all zeros. You will need to provide your own `.csv` which contains the values you need.

In [None]:
# default rosetta matrix provided in toffy
default_matrix_path = os.path.join('..', 'files', 'commercial_rosetta_matrix_round2.csv')

rosetta_testing_dir = 'C:\\Users\\Customer.ION\\Documents\\rosetta_testing'

# read in toffy panel file
panel = load_panel(panel_path)

The rosetta matrix provided by the path above will be copied into the cohort's testing directory, suffixed with `_round2` to prevent ambiguity. Additionally, the script verifies that test data has been generated for this cohort.

In [None]:
# verify that testing data has been generated for this cohort
if not os.path.exists(os.path.join(rosetta_testing_dir, cohort_name)):
    raise ValueError('Cohort %s does not have testing data in %s: please double check these variables' % (rosetta_testing_dir, cohort_name))

# copy rosetta matrix
shutil.copyfile(default_matrix_path, 
                os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix_round2.csv'))
                     
img_out_dir = os.path.join(rosetta_testing_dir, cohort_name, 'extracted_images')

## 2. Rosetta - Remove Signal Contamination

We'll now process the images with rosetta to remove signal contamination at varying levels. **We'll be testing out coefficient multipliers in proportion to their value in the default matrix for the specified `current_channel_name`, compensated against the channel(s) in `output_channel_names`.** For example, specifying multipliers of 0.5, 1, and 2 would test coefficients that are half the size, the same size, and twice the size of the `current_channel_name` coefficients in the default matrix, respectively. **This will give us a new set of compensated images, using different values in each compensation matrix.**

* `current_channel_name`: the channel that you will be optimizing the coefficient for.
* `multipliers`: the range of values to multiply the default matrix by to get new coefficients.
* `folder_name`: the name of the folder to store the Rosetta data. This will be placed in `rosetta_testing_dir/cohort_name`.
* `output_channel_names`: the channel(s) that you will be compensating for

In [None]:
# pick the channel that you will be optimizing the coefficient for
current_channel_name = 'Noodle'

# set multipliers
multipliers = [0.5, 1, 2]

# pick an informative name
folder_name = 'rosetta_test1'

# channel(s) you will be compensating for
output_channel_names = ['Ecadherin']

Run compensation on example images. This should be much faster than `4a_compensate_image_data.ipynb` since far fewer channels are being compensated against.

In [None]:
rosetta_mat_path = os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix_round2.csv')

# create sub-folder to hold images and files from this set of multipliers
folder_path = os.path.join(rosetta_testing_dir, cohort_name, folder_name)
if os.path.exists(folder_path):
    raise ValueError('This folder {} already exists, please' 
                     'pick a new name for each set of parameters'.format(folder_name))
else:
    os.makedirs(folder_path)

# compensate the example fov images
rosetta.generate_rosetta_test_imgs(rosetta_mat_path, img_out_dir, multipliers, folder_path, 
                                   panel, current_channel_name, output_channel_names=output_channel_names,
                                   gaus_rad=0, norm_const=1)

Now that we've generated the compensated data for the given multipliers, we'll generate stitched images to make comparing the different coefficients easier.

In [None]:
# stitch images together to enable easy visualization of outputs
stitched_dir = os.path.join(folder_path, 'stitched_images')
os.makedirs(stitched_dir)

rosetta_dirs=[img_out_dir]
for mult in multipliers:
    rosetta_dirs.append(os.path.join(folder_path, f'compensated_data_{mult}'))

img_size = get_max_img_size(img_out_dir)
rosetta.create_tiled_comparison(input_dir_list=rosetta_dirs, output_dir=stitched_dir, max_img_size=img_size, 
                                channels=output_channel_names)

# add the source channel as first row to make evaluation easier
output_dir = os.path.join(rosetta_testing_dir, cohort_name, folder_name + '-stitched_with_' + current_channel_name)
os.makedirs(output_dir)
rosetta.add_source_channel_to_tiled_image(raw_img_dir=img_out_dir, tiled_img_dir=stitched_dir,
                                          output_dir=output_dir, source_channel=current_channel_name,
                                          max_img_size=img_size, img_sub_folder='rescaled',
                                          percent_norm=None)

# remove the intermediate compensated_data_{mult} and stitched_image dirs
rosetta.clean_rosetta_test_dir(folder_path)

There will now exist a folder named `{folder_name}-stitched_with_{current_channel_name}` (based on the folder name you provided above for this test) in your cohort testing directory. You can look through these stitched images to visualize what signal is being removed from the Noodle channel.

The output format of the images will be similar to `4a_compensate_image_data.ipynb`. Please refer to [that notebook](https://github.com/angelolab/toffy/blob/main/templates/4a_compensate_image_data.ipynb) to see an example.

Within the `folder_name` directory, you will find matrices files updated with the provided multipliers. If you're happy with one multiplier (e.g. 0.5), **find the corresponding matrix** `commercial_rosetta_matrix_mult_0.5.csv` **and rename it**. You can then run the cell below with your updated `final_matrix_name` and move on to the final section. 
    
**A copy of your final rosetta matrix will be saved to `C:\\Users\\Customer.ION\\Documents\\rosetta_matrices`.**

In [None]:
# rename your rosetta matrix and put the path to final file here
final_matrix_name = 'cohort_name_rosetta_matrix_round2.csv'

rosetta_path = os.path.join(rosetta_testing_dir, cohort_name, folder_name, final_matrix_name)

# copy final rosetta matrix to matrix folder
rosetta_matrix_dir = 'C:\\Users\\Customer.ION\\Documents\\rosetta_matrices'
shutil.copyfile(rosetta_path, os.path.join(rosetta_matrix_dir, final_matrix_name))

### (Optional) Optimize Your Compensation Matrix
However, if you would like to further adjust the amount of noise being removed, you can **re-run the code cells in Section 2** to optimize the compensation matrix for your data; you can try new multiplier values until you find one that is able to give you your desired images. When re-running the code, you will need to update the `multipliers` variable and a provide a new `folder_name`. The previously generated stitched images can help you determine whether the new multipliers need to be higher or lower.

**After re-running rosetta, examine your new set of stitched images and check if you are happy with the images produced. Be sure to rename and save your chosen compensation matrix before proceeding to step 3.** 

## 3. Rosetta - Compensate Your Runs

**Once you're satisfied that the Rosetta is working appropriately, you can use it to process your runs.** First select the runs you want to process, and define the relevant top-level folders.
- `runs` is a list of all the runs you would like to process, this should be the same as the runs list you used in `4a_compensate_image_data.ipynb`
- `final_matrix_name` is the name of the matrix you wish to use for compensation, stored in **`C:\\Users\\Customer.ION\\Documents\\rosetta_matrices`**

In [None]:
# replace this with the run list used in 4a_compensate_image_data.ipynb
runs = ['example_run_1', 'example_run_2']

# provide the matrix file name
final_matrix_name = 'cohort_name_rosetta_matrix_round2.csv'

# rosetta compensated directory from previous round
extracted_imgs_dir = 'D:\\Rosetta_Compensated_Images'

# if you would like to process all of the run folders in the image dir instead of just the runs tested, you can use the below line
# runs = list_folders(extracted_imgs_dir)

Everything necessary for and subsequently outputted from this section of the notebook is stored in the automatic directories established in `1_set_up_toffy.ipynb`. More information on the uses and locations of the directories in toffy can be found in the [README](https://github.com/angelolab/toffy#directory-structure).

* `rosetta_image_dir`: rename to the directory you wish to write Rosetta Round 2 data to
* `extracted_imgs_dir`: the directory containing Rosetta compensated data generated from Round 1 (`4a_compensate_image_data.ipynb`)

In [None]:
# holds the post-rosetta round 2 images
rosetta_image_dir = 'D:\\Rosetta_Compensated_Images\\Round2'

rosetta_matrix_dir = 'C:\\Users\\Customer.ION\\Documents\\rosetta_matrices'
final_rosetta_path = os.path.join(rosetta_matrix_dir, final_matrix_name)

Now, you can compensate the data using rosetta. Depending on how many runs and FOVs you will be processing, this can take a while. Feel free to leave the notebook running overnight (do not to close jupyter lab or the terminal window), and also make sure you have enough storage space for the new images produced.

* `final_output_channel_names`: if you compensated multiple output channels during the previous step, make sure to specify all the ones used in this variable

In [None]:
# final channel(s) you will be compensating for
final_output_channel_names = ['Ecadherin']

# generate the output masses from the channels provided
output_masses = rosetta.get_masses_from_channel_names(final_output_channel_names, panel)

# generate the other target names, these will need to be copied from extracted_imgs_dir (V1 Rosetta)
non_output_targets = list(panel[~panel['Target'].isin(final_output_channel_names)]['Target'].values)

# perform rosetta on the provided runs, copy over necessary files from Rosetta V1
for run in runs:
    print("processing run {}".format(run))
    if not os.path.exists(os.path.join(rosetta_image_dir, run)):
        os.makedirs(os.path.join(rosetta_image_dir, run))
    rosetta.compensate_image_data(raw_data_dir=os.path.join(extracted_imgs_dir, run), 
                                  comp_data_dir=os.path.join(rosetta_image_dir, run), 
                                  comp_mat_path=final_rosetta_path, panel_info=panel,
                                  raw_data_sub_folder='rescaled', batch_size=1,
                                  gaus_rad=0, norm_const=1, output_masses=output_masses)

rosetta.copy_round_one_compensated_images(rosetta_image_dir, extracted_imgs_dir, non_output_targets)