# Tissue Atlas Setup Walkthrough
This notebook walks a user through the process of setting up the DKFZ htc framework to work data from the DKFZ tissue atlas dataset. The Tissue Atlas dataset is organized differently from the data used by the original htc framework, so a slightly edited version of the htc framework must me used. This version retains the ability to use default datasets from the original htc framework (or aims to). but also allows users to access the tissue atlas.

This notebook will guide the user through the initial setup necessary to use the framework. After completing this notebook, a second Notebook, titled "TissueAtlasTraining" will take you through actually using the data. The notebook is designed for use by a medical researcher with little to no python experience, and it will walk you explicitly through all the steps you need to get started and run your own training and inference on a dataset of your choice.

Reminder: htc is only executable on Ubuntu. If you are running windows, I recommend using WSL to get set up with linux capabilities.


## PATH Environment Variables
In order to use the htc framework, you must first define PATH Environment variables in an appropriate file. Please consult the README in the htc repository for a more detailed treatment of these environment variables. Here, we will provide a basic overview of how they work and instructions on how to use them.

PATH environment variables are variables that the htc framework uses to locate the dataset(s) that you want it to use. First, you must create an environments file. If you have cloned the htc framework as suggested in the readme, simply navigate to the repositories root Directory (should be named htc). In this directory, create a new file named ".env"

For example, in a bash terminal, run:

```bash
cd ~/path/to/my/htc
nano .env
```

Replace "~/path/to/my/htc" with the actual path on your system to the cloned htc repository.

Now, you must define your PATH variables. the htc framework uses a specific naming convention, so please follow the steps carefully.

### PATH to dataset(s):
for our purposes, there are two types of PATH environment variables used by the framework. The first is the PATH_Tivita variable, which tells the framework where to look for the dataset that you wish to use. first, you must find the path on your system to the dataset that you want to use. In the current iteration of the framework, a 'dataset" has a specific meaning: a dataset is a directory in the larger Tissue Atlas folder structure, that itself contains a directory titled "data", which is where the imaging data itself is actually kept (This specific structure is a vestige of the original frameworks close coupling with the default dataset design, so while it may seem odd to the user it is motivated on the back end).
Importantly, the "dataset" path should not be to the "data" directory itself. In the current iteration, it must also not point to a parent or grand-parent directory. (The author aims to implement this soon. however, in the current structure, you should only want to point to one "dataset" because we have only binary annotations for each class). 

For example the following path is a valid path to a dataset:

```bash
 ~/TIVITA_Cat/Cat_Pig/Cat_atlas/Cat_0002_small_bowel
```
However, the next two are not valid "dataset" paths:

```bash
 ~/TIVITA_Cat/Cat_Pig/Cat_atlas/Cat_0002_small_bowel/data
 ~/TIVITA_Cat/Cat_Pig/Cat_atlas
```

Once you have found your dataset path, you can define a Path environment variable by copying and pasting and the following code into the .env file. (replace the paths with your own path)
```bash
 PATH_Tivita_Cat_0002_small_bowel="~/TIVITA_Cat/Cat_Pig/Cat_atlas/Cat_0002_small_bowel:shortcut=smallbowel"
```
The shortcut here provides an extra way to access your dataset further in the framework, name it how you like. However, the variable name is NOT trivial. it must follow the form:
```bash
 PATH_Tivita_<your_dataset_directory_name>
```
If the variable name does not match the path it is handed, the framework will not work.

If you have multiple datasets that you would like to use, simply add them in the same way to your .env folder. Finally, when you are done, in the root of your htc directory run:
```bash
source .env
```
You can also copy this source line to your .bashrc, so that every time you open a new terminal the .env file is automatically sourced

## PATH to Results

You also need to add a path to a "Results" folder. this is where the framework will send the output of training or inference tasks. the setup is pretty much the same as with the dataset, except simpler:
```bash
 PATH_HTC_RESULTS="~/path/to/results"
```
Do not specify a shortcut for the results path. You don't need one, and the framework will not recognize it. If you want to have multiple results folders, you can do that as well, for instructions please consult the README.md in the htc repository

## Specify Dataset Settings

The first step to load data is to create a dataset_settings.json file with some important metadata about your dataset. This is used by the framework to facilitate data loading, and to provide some important functionalities with the data.

In a directory of your choice on your system, create a .json file with a name of your choosing. The name should probably correspond to the dataset/project you are working on, but its up to you. e.g,:
```bash
cd ~path/to/dataset_settings/
nano myname.json
```

Copy and paste the following code block into the .json file, replacing "NAME_OF_YOUR_DATASET" with your dataset's name (e.g., "Cat_0002_small_bowel")
The following JSON is set up for binary classifications, between "class_1" (organ of interest) and "unlabeled" (background). You can also change the name of class_1 to match the organ you are working on.

You can add more information to the .json if you like, such as subject or annotator mapping, but this tutorial does not yet cover those options. 

In [1]:
{
    "dataset_name": "NAME_OF_YOUR_DATASET",
    "data_path_class": "htc.tivita.DataPathAtlas2>DataPathAtlas",
    "shape": [
        480,
        640,
        100
    ],
    "shape_names": [
        "height",
        "width",
        "channels"
    ],
    "label_mapping": {
        "class_1": 0,
        "unlabeled": 255
    },
    "last_valid_label_index": 0
    
}

{'dataset_name': 'NAME_OF_YOUR_DATASET',
 'data_path_class': 'htc.tivita.DataPathAtlas2>DataPathAtlas',
 'shape': [480, 640, 100],
 'shape_names': ['height', 'width', 'channels'],
 'label_mapping': {'class_1': 0, 'unlabeled': 255},
 'last_valid_label_index': 0}

## Loading Data

We will end with a brief demonstration of how the framework loads data. In the Training notebook, this process will be rolled into other class constructors, However, there are many functionalities for data analysis, such as retrieving and plotting median reflectance across the HSI spectrum for an organ (see original htc "General" tutorial for more info)

Start by importing necessary packages, and defining the Path object to your dataset_settings json.

In [2]:
%load_ext autoreload
%autoreload 2
from pathlib import Path

from htc import settings
from htc.tivita.DataPath import DataPath

There are then 3 ways to build datapaths to the images. one path should always always represents just one timestamp (image) directory

In [3]:
#1 via iteration: main tool to access images
#for "your_shortcut, use the shortcut you defined earlier in the PATH variable"

paths = list(DataPath.iterate(settings.data_dirs.test_dataset11june))
[p.timestamp for p in paths[:10]]

external directory found


['2021_04_15_09_22_02', '2021_04_28_08_49_12']

Now that are set up, you can go to the "TissueAtlasTraining.ipynb" notebook to begin training a model on your dataset