## This notebook is an example: create a copy before running it or you will get merge conflicts!

### The purpose of this notebook is to organize your image data following processing so that it is ready to be analyzed. This entails three steps:
### 1. Renaming each image folder to have the user-supplied name, rather than fov-x-scan-y
### 2. Combining directories together that represent the same sample/tma/run, that may have been created due to restarts or crashes
### 3. Creating a single cohort directory of images

In [7]:
# Imports
import sys
sys.path.append('../')

import os
import shutil

from ark.utils.io_utils import list_folders

from toffy import reorg

In [8]:
# Define base file paths
bin_file_dir = 'I:\\run_files'
processed_image_dir = 'I:\\normalized'
cohort_image_dir = 'I:\\'

## 1. Renaming each image folder to have user supplied names. 
### The first step is to create a name for your cohort. This folder will hold all of the formatted, ready to analyze tifs

In [9]:
cohort_name = 'TONIC_Cohort'
cohort_path = os.path.join(cohort_image_dir, cohort_name)
if not os.path.exists(cohort_path):
    os.makedirs(cohort_path)

### The next step is to identify all of the runs that belong to your cohort. If all of the runs in the **processed_image_dir** folder are part of your cohort, you can use the list_folders function below to list them all. Otherwise, you'll need to manually specify which runs are yours

In [12]:
# Either list the runs here that belong to your cohort
# run_names = ['20220101_TMA1', '20220102_TMA2']

# Or get all of the runs from the processed image folder
#run_names = list_folders(processed_image_dir)
run_names[-2:]

['2022-03-01_TONIC_TMA14_run1a', '2022-03-01_TONIC_TMA14_run1b']

### Now we'll rename all of the FOVs within each of your runs so that they have the original name you gave them on the MIBI. For example, fov-1-scan-1 might be renamed patient_1_region_1, etc. 

In [13]:
# Loop over all runs and rename
for run in run_names[-2:]:
    print("Renaming FOVs in {}".format(run))
    input_dir = os.path.join(processed_image_dir, run)
    output_dir = os.path.join(cohort_path, run)
    json_path = os.path.join(bin_file_dir, run, run + '.json')
    reorg.rename_fov_dirs(json_run_path=json_path, default_run_dir=input_dir, output_run_dir=output_dir)

Renaming FOVs in 2022-03-01_TONIC_TMA14_run1a


 Displaying 10 of 20 invalid value(s) for list fovs in run file
1            fov-16-scan-1
2            fov-17-scan-1
3            fov-18-scan-1
4            fov-19-scan-1
5            fov-20-scan-1
6            fov-21-scan-1
7            fov-22-scan-1
8            fov-23-scan-1
9            fov-24-scan-1
10           fov-25-scan-1



Renaming FOVs in 2022-03-01_TONIC_TMA14_run1b


 Displaying 10 of 15 invalid value(s) for list fovs in run file
1            fov-1-scan-1
2            fov-2-scan-1
3            fov-3-scan-1
4            fov-4-scan-1
5            fov-5-scan-1
6            fov-6-scan-1
7            fov-7-scan-1
8            fov-8-scan-1
9            fov-9-scan-1
10           fov-10-scan-1



## 2. Combining runs together
### If you have multiple runs that you would like combined together, such as 20220101_TMA1_part1 and 20220102_TMA1_part2, the cells below will automate that process. 

In [73]:
# First, pick a string that is present in all of the runs you want combined together. Check the output of this cell to make 
# sure you are only combining together the right folders
run_string = 'TONIC_TMA1'
folders = list_folders(cohort_path, run_string)
print("You selected the following subfolders: make sure all of these should be combined together {}".format(folders))

You selected the following subfolders: make sure all of these should be combined together []


In [71]:
# Once you've verified that the correct runs are being combined together, you can run this cell. 
reorg.merge_partial_runs(cohort_dir=cohort_path, run_string=run_string)

# These two cells be re-run multiple times to combine different runs together

## 3. Creating a single cohort directory
### Once all of the FOVs within each folder have been renamed and all of the partial runs have been combined together, you can now get rid of the run structure and create a single cohort directory of FOVS. The function below will combine all of the FOVs within each of your distinct runs into a single directory with the run name appended. For example, if you have a structure like this:

*  20220101_run_1
    *  tonsil_1
    *  tonsil_2
*  20220102_run_2
    *  lymph_1
    *  spleen_2

### It will get merged into something that looks like this:
* image_data
    *  20220101_run_1_tonsil_1
    *  20220101_run_1_tonsil_2
    *  20220102_run_2_lymph_1
    *  20220102_run_2_spleen_2

### This is not required; if you plan on processing each run separately, such as for tiled images, you can skip this step. However, if you will be doing all of your analysis at the individual FOV level, this will simplify the downstream steps.

In [74]:
reorg.combine_runs(cohort_dir=cohort_path)