### This notebook will guide you through the process of optimizing a compensation matrix for your data. Before starting, it is recommended that you pick ~10 representative FOVs from your cohort that demonstrate the full spectrum of cell types and marker expressions you expect to see

## This notebook is an example: create a copy before running it or you will get merge conflicts!

In [1]:
import sys
sys.path.append('../')

import os
import shutil

import skimage.io as io
import pandas as pd
from mibi_bin_tools import bin_files
from toffy import rosetta

from ark.utils.io_utils import list_folders, list_files

### First, make a folder to hold all of the iterations of parameter testing, then put the full path to that folder below

In [None]:
base_dir = 'path/to/base/dir'

### Next, copy over the .bin files for the ~10 FOVs will you use for testing. In addition to the .bin files, make sure to copy over the .JSON files with the same name into this folder. Place them in a folder named *example_bins*.

#### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc

In [None]:
# this folder should contain the bins and JSONs for the ~10 fovs
test_bin_dir = os.path.join(base_dir, 'bin_files')

### Next, copy the *commercial_rosetta_matrix.csv* and the *example_panel_file.csv* files from the *files* directory of toffy into *base_dir*. Make sure to update the Target column of *example_panel_file.csv* with the details of your panel. For targets you aren't using, just leave the rows as is, don't delete them. Once you've updated the panel file, put the new name below.

In [None]:
panel_file_name = 'example_panel_file.csv'

### We'll then use this panel file to extract the images from the bin files


In [None]:
# specify folder to hold extracted files
img_out_dir = os.path.join(base_dir, 'extracted_images')

# Read in updated panel file
panel = pd.read_csv(os.path.join(base_dir, panel_file_name))

# extract the bin files
bin_files.extract_bin_files(test_bin_dir, img_out_dir, panel=panel, intensities=['Au', 'chan_39'])

# replace count images with intensity images
rosetta.replace_with_intensity_image(run_dir=img_out_dir, channel='Au')
rosetta.replace_with_intensity_image(run_dir=img_out_dir, channel='chan_39')

# clean up dirs
rosetta.remove_sub_dirs(run_dir=img_out_dir, sub_dirs=['intensities', 'intensity_times_width'])

# normalize images to allow direct comparison with rosetta
fovs = list_folders(img_out_dir)
for fov in fovs:
    fov_dir = os.path.join(img_out_dir, fov)
    sub_dir = os.path.join(fov_dir, 'normalized')
    os.makedirs(sub_dir)
    chans = list_files(fov_dir)
    for chan in chans:
        img = io.imread(os.path.join(fov_dir, chan))
        img = img / 100
        io.imsave(os.path.join(sub_dir, chan), img, check_contrast=False)

### Now that we've generated the image data, we can test out different values for the compensation matrix. We'll be testing out coefficients in proportion to their value in the default matrix. For example, specifying multipliers of 0.5, 1, and 2 would test coefficients that are half the size, the same size, and twice the size of the coefficients in the default matrix, respectively. 

### The cell below can be run multiple times to hone in on the speficic coefficient that works the best. In general, it is best to optimize the value of one channel's coefficient at a time. The channels that often need to be optimized are Au and Noodle. However, you can optimize the coefficient for any channel that causes problems

In [None]:
# Pick the channel that you will be optimizing the coefficient for
current_channel_name = 'Au'
current_channel_mass = rosetta.get_masses_from_channel_names([current_channel_name], panel)

# set multipliers
multipliers = [0.25, 1, 4]

# If you only want to look at the output for a subset of the channels once you've picked good coefficients for the rest, update this variable for faste processing.
# Otherwise, all channels will be compensated and saved
output_channel_names = None # e.g. output_channel_names = ['Au', 'CD45', 'PanCK']

# pick an informative name
folder_name = 'give_a_name_for_this_folder'

# everything from here and below will run automatically
if output_channel_names is not None:
    output_masses = rosetta.get_masses_from_channel_names(output_channel_names, panel)
else:
    output_masses = None

# create sub-folder to hold images and files from this set of multipliers
folder_path = os.path.join(base_dir, folder_name)
if os.path.exists(folder_path):
    raise ValueError('This folder {} already exists, please' 
                     'pick a new name for each set of parameters'.format(folder_name))
else:
    os.makedirs(folder_path)

# generate rosseta matrices for each multiplier
rosetta.create_rosetta_matrices(default_matrix=os.path.join(base_dir, 'commercial_rosetta_matrix.csv'),
                               multipliers=multipliers, masses=current_channel_mass,
                               save_dir=folder_path)

# loop over each multiplier and compensate the data
rosetta_dirs = [img_out_dir]
for multiplier in multipliers:
    rosetta_mat_path = os.path.join(folder_path, 'commercial_rosetta_matrix_mult_{}.csv'.format(multiplier))
    rosetta_out_dir = os.path.join(folder_path, 'compensated_data_{}'.format(multiplier))
    rosetta_dirs.append(rosetta_out_dir)
    os.makedirs(rosetta_out_dir)
    rosetta.compensate_image_data(raw_data_dir=img_out_dir, comp_data_dir=rosetta_out_dir,comp_mat_path=rosetta_mat_path, 
                                  raw_data_sub_folder='normalized', panel_info=panel, batch_size=1, norm_const=1, output_masses=output_masses)

### Now that we've generated the compensated data for the given multipliers, we'll generate stitched images to make comparing the different multipliers easier

In [None]:
# stitch images together to enable easy visualization of outputs
stitched_dir = os.path.join(folder_path, 'stitched_images')
os.makedirs(stitched_dir)

rosetta.create_tiled_comparison(input_dir_list=rosetta_dirs, output_dir=stitched_dir, channels=output_channel_names)

# add the source channel as first row to make evaluation easier
output_dir = os.path.join(folder_path, 'stitched_with_' + current_channel_name)
os.makedirs(output_dir)
rosetta.add_source_channel_to_tiled_image(raw_img_dir=img_out_dir, tiled_img_dir=stitched_dir,
                                             output_dir=output_dir, source_channel=current_channel_name)

### There will now be a folder named *stitched_with_channel_name* present within the sub-folder you created. You can look through these stitched images to determine whether the multiplier needs to be higher, lower, or the same.

### For each channel, pick the multiplier that worked the best. Then, open the commercial_rosetta_matrix.csv file that you copied over and update the corresponding coefficient in that cell to be the `previous_value * coefficient`. If you're happy with the new coefficients, you can take your modified matrix and move on to the next step. If not, you can rerun the two cells above starting with the updated coefficients to further narrow in on the best value. Once you've finalized your coefficients, please let us know [here](https://github.com/angelolab/toffy/issues/55).

In [2]:
# rename your rosetta matrix and put the path to final file here
final_rosetta_path = 'I:\\20220518_TONIC_rosetta_matrix.csv'
panel = pd.read_csv('I:\\20220518_TONIC_panel_file.csv')

### Next, you'll need to extract all of your images

In [3]:
# specify the path to folder containing your runs, as well as the folder where the extracted images will get saved
bin_file_dir = 'I:\\run_files'
extracted_image_dir = 'I:\\extracted'

In [4]:
# If you only want to extract a subset of your runs, specify their names here; otherwise, leave as None
runs = None
if runs is None:
    runs = list_folders(bin_file_dir)

In [5]:
# specify path to save rosetta images
rosetta_image_dir = 'I:\\rosetta'

In [None]:
# Perform rosetta on extracted images
for run in runs:
    print("processing run {}".format(run))
    raw_img_dir = os.path.join(extracted_image_dir, run)
    out_dir = os.path.join(rosetta_image_dir, run)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    rosetta.compensate_image_data(raw_data_dir=raw_img_dir, comp_data_dir=out_dir, 
                                 comp_mat_path=final_rosetta_path, panel_info=panel, batch_size=1)