# Reorganization Notebook

The purpose of this notebook is to organize your image data following processing so that it is ready to be analyzed. 

**This entails three steps:**

**1. Renaming each image folder to have the user-supplied name, rather than fov-x-scan-y.**

**2. Combining directories together that represent the same sample/tma/run, that may have been created due to restarts or crashes.**

**3. Creating a single cohort directory of images.**

In [None]:
import sys
sys.path.append('../')

import os
import shutil

from toffy import reorg
from tmi.io_utils import list_folders

Everything necessary for and subsequently outputted from this notebook is stored in the automatic directories established in `1_set_up_toffy.ipynb`. More information on the uses and locations of the directories in toffy can be found in the [README](https://github.com/angelolab/toffy#directory-structure).

In [None]:
# define base file paths
bin_base_dir = 'D:\\Data'
processed_base_dir = 'D:\\Normalized_Images'
cohort_image_dir = 'D:\\Cohorts'

## 1. Renaming FOVs 
First, you will need to specify the names of relevant folders.

- `cohort_name`: new name for the folder that will hold all of the formatted, ready to analyze tifs
- `run_names`: list of names of the runs in your cohort

In [None]:
cohort_name = '20220101_new_cohort'

# list the runs here that belong to your cohort
run_names = ['20220101_TMA1', '20220102_TMA2']

# or get all of the runs from the processed image folder
# run_names = list_folders(processed_image_dir)

In [None]:
cohort_path = os.path.join(cohort_image_dir, cohort_name)
if not os.path.exists(cohort_path):
    os.makedirs(cohort_path)

Now we'll rename all of the FOVs within each of your runs so that they have the original name you gave them on the MIBI. For example, fov-1-scan-1 might be renamed patient_1_region_1, etc. 

In [None]:
# rename FOVs in each of the runs in run_names
reorg.rename_fovs_in_cohort(run_names=run_names, processed_base_dir=processed_base_dir, cohort_path=cohort_path,
                            bin_base_dir=bin_base_dir)

## 2. Combining Partial Runs
If you have multiple runs that you would like combined together, such as 20220101_TMA1_part1 and 20220102_TMA1_part2, the cells below will automate that process. If you already have one run per experiment, this section can be skipped.
- `run_string`: a string that is present in all of the runs you want combined together

In [None]:
# check the output of this cell to make sure you are only combining together the right folders
run_string = 'TMA1'

folders = list_folders(cohort_path, run_string)
print("You selected the following subfolders: make sure all of these should be combined together {}".format(folders))

Once you've verified that the correct runs are being combined together, you can run the next cell. 

**Note:** The function below will raise a warning if there are FOVs listed in the JSON run file that are not present in the folder. There are valid reasons that this may have happened, and it won't impact downstream analyses. **However, this provides an opportunity to double check to make sure these ommitted FOVs were intentionally left out.**

In [None]:
reorg.merge_partial_runs(cohort_dir=cohort_path, run_string=run_string)

**The two two cells above can be re-run multiple times to combine different runs together.**

## 3. Creating a Single Cohort
Once all of the FOVs within each folder have been renamed and all of the partial runs have been combined together, you can now get rid of the run structure and create a single cohort directory of FOVS. The function below will combine all of the FOVs within each of your distinct runs into a single directory with the run name prepended. 

**For example, if you have a structure like this:**

*  20220101_run_1
    *  tonsil_1
    *  tonsil_2
*  20220102_run_2
    *  lymph_1
    *  spleen_2

**It will get merged into something that looks like this:**
* image_data
    *  20220101_run_1_tonsil_1
    *  20220101_run_1_tonsil_2
    *  20220102_run_2_lymph_1
    *  20220102_run_2_spleen_2

**This is not required; if you plan on processing each run separately, such as for tiled images, you can skip this step. However, if you will be doing all of your analysis at the individual FOV level, this will simplify the downstream steps.**

In [None]:
reorg.combine_runs(cohort_dir=cohort_path)