# Image Compensation

This notebook will guide you through the Rosetta algorithm, which is used to remove background and noise from image data prior to analysis. It is based on the compensation matrix approach that has been used for correcting flow-cytometry data. The Rosetta matrix contains rows for each of the sources of noise, and columns for each of the output channels. Each entry in the matrix represents the proportional contamination from a given noise channel to a given output channel.

The majority of the entries in the matrix will never need to be modified. For example, the isotopic impurities present in the elements that are used for conjugation, the proportion of hydride and oxide contamination, and other intrinsic features of the instrument. However, some channels, in particular the "Noodle" channel, which we use to remove organic contamination, can be influenced by sample preparation and instrument configuration. Therefore, this notebook gives the user the opportunity to modify that coefficient.

For example, we illustrate Pre and Post Rosetta processing on the CD11c channel.

<table><tr>
    <td> <img src="./img/CD11c_pre_rosetta_cropped.png" style="width:100%"/> </td>
    <td> <img src="./img/CD11c_post_rosetta_cropped.png" style="width:100%"/> </td>
</tr></table>


## This notebook consists of 3 steps:
**1. Define directories and copy necessary files, which includes the random selection of FOVs from the provided run folders.**

**2. Test Rosetta on this subset of FOVs to find good coefficients for the compensation matrix.**

**3. Use the finalized matrix to process all of the data.**

In [1]:
import os
import shutil

from toffy import rosetta
from toffy.panel_utils import load_panel
from toffy.image_stitching import get_max_img_size
from alpineer.io_utils import list_folders

## 1. Setup

Below, you will set up the necessary structure for testing rosetta on all of your runs.
- `cohort_name` is a descriptive name for the folder that will store the rosetta testing files
- `run_names` is a list of all the runs you would like to retrieve FOV images from for testing
- `panel_path` should point to a panel csv specifying the targets on your panel (see [panel format](https://github.com/angelolab/toffy#panel-format) for more information)

A new directory based on the provided `cohort_name` above will be created within `C:\\Users\\Customer.ION\\Documents\\rosetta_testing`. This folder will contain all the files need for and produced in **Section 2** of the notebook.

In [None]:
# run specifications
cohort_name = '20220101_new_cohort'
run_names = ['20220101_TMA1', '20220102_TMA2']
panel_path = 'C:\\Users\\Customer.ION\\Documents\\panel_files\\my_cool_panel.csv'

# if you would like to process all of the run folders in the image dir instead of just the runs tested, you can use the below line
# run_names = list_folders(extracted_imgs_dir)

By default, the `commercial_rosetta_matrix_v1.csv` from the `files` directory of toffy will be used for rosetta. If you would like to use a different matrix, specify the path below. 

In [None]:
# default rosetta matrix provided in toffy
default_matrix_path = os.path.join('..', 'files', 'commercial_rosetta_matrix_v1.csv')

rosetta_testing_dir = 'D:\\Rosetta_processing\\rosetta_testing'
extracted_imgs_dir = 'D:\\Extracted_Images'

# read in toffy panel file
panel = load_panel(panel_path)

With the provided run names, we will randomly choose a few FOVs per run to rescale and then test rosetta on. The testing subset should be approximately **10-20 FOVs in total**; by default the number of FOVs per run is 5 (i.e. 2 runs with 5 FOVs each will produce a 10 FOV testing set). You can adjust the `fovs_per_run` variable below to create an appropriate testing subset. The rosetta matrix provided by the path above will also be copied into the new testing directory to use.

In [None]:
# copy random fovs from each run
rosetta.copy_image_files(cohort_name, run_names, rosetta_testing_dir, extracted_imgs_dir, fovs_per_run=5)

# copy rosetta matrix
shutil.copyfile(default_matrix_path, 
                os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix.csv'))

# rescale images to allow direct comparison with rosetta
img_out_dir = os.path.join(rosetta_testing_dir, cohort_name, 'extracted_images')
rosetta.rescale_raw_imgs(img_out_dir)

## 2. Rosetta - Remove Signal Contamination
We'll now process the images with rosetta to remove signal contamination at varying levels. **By default we'll be testing out coefficient multipliers in proportion to their value in the default matrix for the Noodle channel, since it is the main source of noise in most images.** For example, specifying multipliers of 0.5, 1, and 2 would test coefficients that are half the size, the same size, and twice the size of the Noodle coefficients in the default matrix, respectively. **This will give us a new set of compensated images, using different values in each compensation matrix.**

* `current_channel_name`: the channel that you will be optimizing the coefficient for.
* `multipliers`: the range of values to multiply the default matrix by to get new coefficients.
* `folder_name`: the name of the folder to store the Rosetta data. This will be placed in `rosetta_testing_dir/cohort_name`.

In [None]:
# pick the channel that you will be optimizing the coefficient for
current_channel_name = 'Noodle'

# set multipliers
multipliers = [0.5, 1, 2]

# pick an informative name
folder_name = 'rosetta_test1'

Compensating the example images for 3 multipliers can take upwards of about 30 minutes.

In [None]:
rosetta_mat_path = os.path.join(rosetta_testing_dir, cohort_name, 'commercial_rosetta_matrix.csv')

# create sub-folder to hold images and files from this set of multipliers
folder_path = os.path.join(rosetta_testing_dir, cohort_name, folder_name)
if os.path.exists(folder_path):
    raise ValueError('This folder {} already exists, please ' 
                     'pick a new name for each set of parameters'.format(folder_name))
else:
    os.makedirs(folder_path)

# compensate the example fov images
rosetta.generate_rosetta_test_imgs(rosetta_mat_path, img_out_dir, multipliers, folder_path, 
                                   panel, current_channel_name, output_channel_names=None)

Now that we've generated the compensated data for the given multipliers, we'll generate stitched images to make comparing the different coefficients easier.

In [None]:
# stitch images together to enable easy visualization of outputs
stitched_dir = os.path.join(folder_path, 'stitched_images')
os.makedirs(stitched_dir)

rosetta_dirs=[img_out_dir]
for mult in multipliers:
    rosetta_dirs.append(os.path.join(folder_path, f'compensated_data_{mult}'))

img_size = get_max_img_size(img_out_dir)
scale = 0.5
rosetta.create_tiled_comparison(input_dir_list=rosetta_dirs, output_dir=stitched_dir, max_img_size=img_size, 
                                channels=None, img_size_scale=scale)

# add the source channel as first row to make evaluation easier
output_dir = os.path.join(rosetta_testing_dir, cohort_name, folder_name + '-stitched_with_' + current_channel_name)
os.makedirs(output_dir)
rosetta.add_source_channel_to_tiled_image(raw_img_dir=img_out_dir, tiled_img_dir=stitched_dir,
                                          output_dir=output_dir, source_channel=current_channel_name,
                                          max_img_size=img_size, img_size_scale=scale, img_sub_folder="rescaled")

# remove the intermediate compensated_data_{mult} and stitched_image dirs
rosetta.clean_rosetta_test_dir(folder_path)

### Evaluating the Images

There will now exist a folder named `{folder_name}-stitched_with_Noodle` (based on the folder name you provided above for this test) in your cohort testing directory. You can look through these stitched images to visualize what signal is being removed from the Noodle channel.

These files will contain 5 rows of images: 
- row 1: the Noodle signal
- row 2: the raw extracted image
- row 3-5: images after applying the rosetta matrix with coefficients adjusted for the multipliers (i.e. [0.5, 1, 2])

<center>
    <img src="./img/CD4_stitched_with_Noodle.jpg" style="width:50%"> 
<center>

**If the images in either row 3, 4, or 5 are satisfactory, then you can save your compensation matrix to complete step 2.**

Within the `folder_name` directory, you will find matrices files updated with the provided multipliers. If you're happy with one multiplier (e.g. 0.5), **find the corresponding matrix** `commercial_rosetta_matrix_mult_0.5.csv` **and rename it**. You can then run the cell below with your updated `final_matrix_name` and move on to the final section. 
    
**A copy of your final rosetta matrix will be saved to `C:\\Users\\Customer.ION\\Documents\\rosetta_matrices`.**


In [None]:
# rename your rosetta matrix and put the path to final file here
final_matrix_name = 'cohort_name_rosetta_matrix.csv'

rosetta_path = os.path.join(rosetta_testing_dir, cohort_name, folder_name, final_matrix_name)

# copy final rosetta matrix to matrix folder
rosetta_matrix_dir = 'D:\\Rosetta_processing\\rosetta_matrices'
_ = shutil.copyfile(rosetta_path, os.path.join(rosetta_matrix_dir, final_matrix_name))

**Once you've finalized your matrix, please let us know [here](https://github.com/angelolab/toffy/issues/55).** This will help us better fine tune the matrix and improve rosetta for future users.

### (Optional) Optimize Your Compensation Matrix
However, if you would like to further adjust the amount of noise being removed, you can **re-run the code cells in Section 2** to optimize the compensation matrix for your data; you can try new multiplier values until you find one that is able to give you your desired images. When re-running the code, you will need to update the `multipliers` variable and a provide a new `folder_name`. The previously generated stitched images can help you determine whether the new multipliers need to be higher or lower.

**After re-running rosetta, examine your new set of stitched images and check if you are happy with the images produced. Be sure to rename and save your chosen compensation matrix before proceeding to step 3.** 

## 3. Rosetta - Compensate Your Runs

**Once you're satisfied that the Rosetta is working appropriately, you can use it to process your runs.** First select the runs you want to process, and define the relevant top-level folders.
- `runs` is a list of all the runs you would like to process, by default uses the run list provided in Step 1 for testing
- `final_matrix_name` is the name of the matrix you wish to use for compensation, stored in **`C:\\Users\\Customer.ION\\Documents\\rosetta_matrices`**

In [None]:
# by default uses the run list provided in Step 1 for testing, or provide your own list
runs = run_names    # runs = []

# provide the matrix file name
final_matrix_name = 'cohort_name_rosetta_matrix.csv'

Everything necessary for and subsequently outputted from this section of the notebook is stored in the automatic directories established in `1_set_up_toffy.ipynb`. More information on the uses and locations of the directories in toffy can be found in the [README](https://github.com/angelolab/toffy#directory-structure).

In [None]:
# this folder will hold the post-rosetta images
rosetta_image_dir = 'D:\\Rosetta_Compensated_Images'

rosetta_matrix_dir = 'D:\\Rosetta_processing\\rosetta_matrices'
final_rosetta_path = os.path.join(rosetta_matrix_dir, final_matrix_name)

extracted_imgs_dir = 'D:\\Extracted_Images'

# if you would like to process all of the run folders in the image dir instead of just the runs tested, you can use the below line
# runs = list_folders(extracted_imgs_dir)

Now, you can compensate the data using rosetta. Depending on how many runs and FOVs you will be processing, this can take a while. Feel free to leave the notebook running overnight (do not to close jupyter lab or the terminal window), and also make sure you have enough storage space for the new images produced.

In [None]:
# perform rosetta on the provided runs
for run in runs:
    print("processing run {}".format(run))
    if not os.path.exists(os.path.join(rosetta_image_dir, run)):
        os.makedirs(os.path.join(rosetta_image_dir, run))
    rosetta.compensate_image_data(raw_data_dir=os.path.join(extracted_imgs_dir, run), 
                                  comp_data_dir=os.path.join(rosetta_image_dir, run), 
                                  comp_mat_path=final_rosetta_path, panel_info=panel, batch_size=1)

<b>NOTE: If you wish to run a second round of Rosetta to further denoise specific channels, please head to the [Rosetta Round 2 notebook](https://github.com/angelolab/toffy/blob/main/templates/4a_compensate_image_data_v2.ipynb).</b>