# Tissue Atlas Model Training Walkthrough
This notebook will walk a user through using the Atlas compatible htc for training their own segmentation model. If you have not yet, please read the Setup tutorial for important information.

Start with necessary inputs and define path to your dataset_settings .json. The tutorial is written with a very small dataset (2 pigs) called "HeiPorSpectral_mod". Replace relevant directory paths / names with the names to your own dataset and json file

In [16]:
%load_ext autoreload
%autoreload 2

from pathlib import Path

import pandas as pd
from IPython.display import JSON
from typing import TYPE_CHECKING, Any, Callable, Union, Self
from htc import (
    Config,
    DataPath,
    DataSpecification,
    MetricAggregation,
    SpecsGeneration,
    create_class_scores_figure,
    settings,
)
from htc.models.data.SpecsGenerationAtlas import SpecsGenerationAtlas

dataset_settings = Path("~/dkfz/htc/tests/test_4june/externals/testdataset_Settings.json")
data_dir = settings.data_dirs.test_dataset4june

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Then, we can specify important parameters for your training run, such as fold, train/test split, etc. replace the values in the following code block with the values of your choice

In [17]:

filters = [] #list of callable filter functions, can be a variety of things
annotation_name = None #name of annotators to be used
test_ratio = 0 #ratio of images to be saved as test, i.e, not ued in any training. should be float between 0.0 and 1.0
n_folds = 2 #number of folds to make in the training data. training data (not test data) will be randomly split into n_folds different groups
#for each "fold", the network will train a model with one of the groups as validation and all the other groups as training data.  
seed = None #optional parameter that interacts with the random grouping of the folding operation. For a different fold upon every function call, set = None.
# for a consistent fold, set seed to a number of your choice, e.g. seed = 42
name = "Atlas" #name of a json file created in the following code block, that gets stored in the parent directory of this notebook. name it something simple and descriptive


In [18]:
tutorial_dir = Path().absolute()

SpecsGenerationAtlas(data_dir,
                dataset_settings,
                filters = filters,
                annotation_name = annotation_name,
                test_ratio = test_ratio,
                n_folds = n_folds,
                seed = seed,
                name = name,
                ).generate_dataset(target_folder=tutorial_dir)
specs_path = tutorial_dir / name

## Lightning Class

Next step is to choose/build our lightning class. The Lightning class (as in Pytorch Lightning) performs many aspects of managing training, and can b customized by creating your own child class. most notably, the Lightning class allows you to specify your Loss function.

For this walkthrough, we will use the htc default "LightingImage" class, which is their default class for training on full images (as opposed to patches, pixels, or superpixels). This calculates loss as a weighted average of Dice loss and Cross-Entropy loss. See the htc "networkTraining" tutorial for more info on the lightning class.

## Config
The last step before training is to create our configuration file. This file is also a json that contains important metadata, and it is used by the training process itself to configure training hyperparameters, like batch size and transformations. We will use the htc's Config ***class*** to write the config ***json***

The following Code block will write the config json for you. By default, it will store the config.json file in the same directory as your dataset_settings json.

The following code block is still set up for tutorial use, not production use

In [19]:
config = Config.from_model_name("default", "image")
config["inherits"] = "models/image/configs/default"
config["input/data_spec"] = specs_path
config["input/annotation_name"] = ["polygon#annotator1", "polygon#annotator2", "polygon#annotator3"]
config["validation/checkpoint_metric_mode"] = "class_level"



# We want to merge the annotations from all annotators into one label mask
config["input/merge_annotations"] = "union"

# We have a two-class problem and we want to ignore all unlabeled pixels
# Everything which is >= settings.label_index_thresh will later be considered invalid
#config["label_mapping"] = {
#    "spleen": 0,
#    "gallbladder": 1,
#    "unlabeled": settings.label_index_thresh,
#}

# Reduce the training time
config["trainer_kwargs/max_epochs"] = 1

# Progress bars can cause problems in Jupyter notebooks so we disable them here (training does not take super long)
config["trainer_kwargs/enable_progress_bar"] = False

# Uncomment the following lines if you want to use one of the pretrained models as basis for our training
# config["model/pretrained_model"] = {
#     "model": "image",
#     "run_folder": "2022-02-03_22-58-44_generated_default_model_comparison",
#

config_path = dataset_settings.parent/ (name + "_config.json")
config.save_config(config_path)
JSON(config_path)

<IPython.core.display.JSON object>

## Start the Training
You are now ready to train your network. Simply run the `htc training` command and pass the model type (image model in our case) and path to the config as arguments.
> &#x26a0;&#xfe0f; Starting a training session in a Jupyter notebook is usually not a good idea. Instead, it is advisable to use a [`screen`](https://linuxize.com/post/how-to-use-linux-screen/) environment so that your training runs in the background and you can return later to check for the status.

> There is also a `--fold FOLD_NAME` switch if you only want to train only one fold. This is useful for debugging.

In [20]:
import torch
torch.cuda.empty_cache()
# Retrieve GPU memory information
allocated_memory, total_memory = torch.cuda.mem_get_info()
print(f"Allocated Memory: {allocated_memory / (1024 ** 2):.2f} MB")
print(f"Total Memory: {total_memory / (1024 ** 2):.2f} MB")

print(torch.cuda.memory_allocated())

!htc training --model image --config $config_path
print("After starting training:")
assert _exit_code == 0, "Training was not successful"  # noqa: F821

Allocated Memory: 5143.00 MB
Total Memory: 6143.69 MB
0
_build_cache was acessed
PATH_Tivita_Cat_atlas_kidney_nativ was set to                    [2m               [0m
[35m/home/lucas/dkfz/htc/tests/test_31may/[0m[95mCat_atlas_kidney_nativ[0m but [2m               [0m
the path does not exist                                          [2m               [0m
[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m Starting training of the fold fold_1 [1m[[0m[37m1[0m/[37m2[0m[1m][0m       [2mrun_training.py:301[0m
_build_cache was acessed
PATH_Tivita_Cat_atlas_kidney_nativ was set to                    [2m               [0m
[35m/home/lucas/dkfz/htc/tests/test_31may/[0m[95mCat_atlas_kidney_nativ[0m but [2m               [0m
the path does not exist                                          [2m               [0m
[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m The following config will be used for training:   [2mrun_training.py:81[0m
[1m[[0m[32m

In [21]:
!nvidia-smi

Mon Jun 10 15:21:49 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.104      Driver Version: 528.79       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P8     3W /  80W |     68MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [22]:
import torch

def print_gpu_info():
    if torch.cuda.is_available():
        print(f"CUDA Version: {torch.version.cuda}")
        print(f"PyTorch CUDA Support: {torch.cuda.is_available()}")
        print(f"Number of GPUs: {torch.cuda.device_count()}")

        for i in range(torch.cuda.device_count()):
            print(f"\nGPU {i}: {torch.cuda.get_device_name(i)}")
            print(f"  Total Memory: {torch.cuda.get_device_properties(i).total_memory / 1024 ** 3:.2f} GB")
            print(f"  Allocated Memory: {torch.cuda.memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Cached Memory: {torch.cuda.memory_reserved(i) / 1024 ** 2:.2f} MB")
            print(f"  Current Memory Usage: {torch.cuda.memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Peak Memory Usage: {torch.cuda.max_memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Free Memory: {(torch.cuda.get_device_properties(i).total_memory - torch.cuda.memory_allocated(i)) / 1024 ** 2:.2f} MB")

print_gpu_info()


CUDA Version: 12.1
PyTorch CUDA Support: True
Number of GPUs: 1

GPU 0: NVIDIA GeForce GTX 1660 Ti
  Total Memory: 6.00 GB
  Allocated Memory: 0.00 MB
  Cached Memory: 0.00 MB
  Current Memory Usage: 0.00 MB
  Peak Memory Usage: 0.00 MB
  Free Memory: 6143.69 MB
