### This notebook will guide you through the process of optimizing a compensation matrix for your data. Before starting, it is recommended that you pick ~10 representative FOVs from your cohort that demonstrate the full spectrum of cell types and marker expressions you expect to see

In [None]:
import sys
sys.path.append('../')

import os

import pandas as pd
from mibi_bin_tools import bin_files
from toffy import rosetta

### First, make a folder to hold all of the iterations of parameter testing, then put the full path to that folder below

In [4]:
base_dir = 'path_to_folder`

### Next, copy over the .bin files for the ~10 FOVs will you use for testing. In addition to the .bin files, make sure to copy over the .JSON files with the same name into this folder. 

#### For example, fov-1-scan-1.bin, fov-1-scan-1.json, fov-23-scan-1.bin, fov-23-scan-1.json, etc

In [None]:
# this folder should contain the bins and JSONs for the ~10 fovs
bin_file_dir = os.path.join(base_dir, 'example_bins')
os.makedirs(bin_file_dir)

### Next, copy the *commercial_rosetta_matrix.csv* and the *example_panel_file.csv* files from the *files* directory of toffy into *base_dir*. Make sure to update the Target column of *example_panel_file.csv* with the details of your panel. For targets you aren't using, just leave the rows as is, don't delete them. Once you've updated the panel file, put the new name below.

In [None]:
panel_file_name = 'your_file_name_here.csv'

### We'll then use this panel file to extract the images from the bin files


In [None]:
# create a new folder to hold extracted files
img_out_dir = os.path.join(base_dir, 'example_images')
if not os.path.exists(img_out_dir):
    os.makedirs(img_out_dir)

# Column names should be uppercased: 'Mass', 'Target', 'Start', 'Stop'
panel = pd.read_csv(os.path.join(base_dir, panel_file_name))

# extract the bin files
bin_files.extract_bin_files(base_dir, img_out_dir, panel=panel, intensities=['Au'])

# replace gold count image with gold intensity image
rosetta.replace_with_intensity_image(base_dir=base_dir, channel='Au', folders=['example_images'])

### Now that we've generated the image data, we can test out different values for the compensation matrix. We'll be testing out coefficients in proportion to their value in the default matrix. For example, specifying multipliers of 0.5, 1, and 2 would test coefficients that are half the size, the same size, and twice the size of the coefficients in the default matrix, respectively. 

### The cell below can be run multiple times to hone in on the speficic coefficient that works the best.

In [None]:
# set multipliers
multipliers = [0.5, 1, 2]

# create sub-folder to hold images and files from this set of multipliers
folder_name = 'give_a_name_for_this_folder'
folder_path = os.path.join(base_dir, folder_name)
if os.path.exists(folder_path):
    raise ValueError('This folder already exists, please pick a new name for each set of parameters')

# generate rosseta matrices for each multiplier
rosetta.create_rosetta_matrices(default_matrix=os.path.join(base_dir, 'commercial_rosetta_matrix.csv'),
                               multipliers=multipliers, channels=['Au', 'Noodle'],
                               save_dir=folder_path)

# loop over each multiplier and compensate the data
rosetta_dirs = []
for multiplier in multipliers:
    rosetta_mat_path = os.path.join(folder_path, 'rosetta_{}.csv'.format(multiplier))
    rosetta_out_dir = os.path.join(folder_name, 'compensated_data_{}'.format(multiplier))
    rosetta_dirs.append(rosetta_out_dir)
    os.makedirs(rosetta_out_dir)
    rosetta.compensate_image_data(raw_data_dir=img_out_dir, comp_data_dir=rosetta_out_dir, 
                                 comp_mat_path=rosetta_mat_path, panel_info_path=panel)

### Now that we've generated the compensated data for the given multipliers, we'll generate stitched images to make comparing the different multipliers easier. In general, we find that modifications only need to be made to the Noodle channel and the gold (Au) channel.

In [None]:
# stitch images together to enable easy visualization of outputs
stitched_dir = os.path.join(folder_path, 'stitched_images')
os.makedirs(stitched_dir)
rosetta.create_tiled_comparison(input_dir_list=rosetta_dirs, stitched_dir)

# add the noodle channel and gold channel as first row to make evaluation easier
for channel in ['Au', 'Noodle']:
    output_dir = os.path.join(folder_path, 'stitched_with_' + channel)
    os.makedirs(output_dir)
    rosetta.add_source_channel_to_tiled_image(raw_img_dir=img_out_dir, tiled_img_dir=stitched_dir,
                                             output_dir=output_dir, source_channel=channel)

### There will now be a folder named *stitched_with_Au* and *stitched_with_Noodle* present within the sub-folder you created. You can look through these stitched images, one for each channel, to determine whether the multiplier needs to be higher, lower, or the same.

### For each channel, pick the value that worked the best. If you're happy with your multipliers, you can take that rosetta matrix and move on to the next step. If not, you can rerun the two cells above starting with the updated multipliers you selected to further narrow in on the best value

In [None]:
# TODO: validate that rosetta_matrix is all floats. Change rosetta function to take panel instead of path to panel