# Segmentation Training Walkthrough
This notebook will walk a user through using the Atlas compatible htc for training their own segmentation model. There is another, similar notebook for training a classification model based on spectral analysis, titled "TissueAtlasClassificationTraining.py" If you have not yet, please read the Setup tutorial for important information.
Start with necessary inputs and define path to your dataset_settings .json. The tutorial is written with a very small dataset (2 pigs) called "HeiPorSpectral_mod". Replace relevant directory paths / names with the names to your own dataset and json file

In [1]:
%load_ext autoreload
%autoreload 2

from pathlib import Path

import pandas as pd
from IPython.display import JSON
from typing import TYPE_CHECKING, Any, Callable, Union, Self
from htc import (
    Config,
    DataPath,
    DataSpecification,
    MetricAggregation,
    SpecsGeneration,
    create_class_scores_figure,
    settings,
)
from htc.models.data.SpecsGenerationAtlas import SpecsGenerationAtlas

intermediates_dir = settings.intermediates_dirs.external
print(intermediates_dir)

/omics/groups/OE0645/internal/data/htcdata/medium_test/external/intermediates


Then, we can specify important parameters for your training run, such as fold, train/test split, etc. replace the values in the following code block with the values of your choice

In [2]:
#TO DO:
#filter by existence of .txt file next to hypergui within timestamp folder -- lets you know that its ok
#add batch size
#add epoch length
#add batch randomization conditions

filter_txt = lambda p: p.contains_txt()
filters = [filter_txt] #list of callable filter functions, can be a variety of things
annotation_name = 'annotator1' #name of annotators to be used
test_ratio = 0 #ratio of images to be saved as test, i.e, not ued in any training. should be float between 0.0 and 1.0
n_folds = 3 #number of folds to make in the training data. training data (not test data) will be randomly split into n_folds different groups
#for each "fold", the network will train a model with one of the groups as validation and all the other groups as training data.  
seed = None #optional parameter that interacts with the random grouping of the folding operation. For a different fold upon every function call, set = None.
# for a consistent fold, set seed to a number of your choice, e.g. seed = 42
name = "testSegment" #name of a json file created in the following code block, that gets stored in the parent directory of this notebook. name it something simple and descriptive


In [3]:
tutorial_dir = Path().absolute()
external = settings.external_dir.external['path_dataset']#need brackets to acess the path, because settings.external_dir.external is a dictionary cointaing info about the external_dir.
#settings.external_dir is an object containing all the different external directories: in our case, there should always just be one with shortcut "external"
print(external)
specs_path = external/'data' / name
SpecsGenerationAtlas(intermediates_dir,
                filters = filters,
                annotation_name = annotation_name,
                test_ratio = test_ratio,
                n_folds = n_folds,
                seed = seed,
                name = name,
                ).generate_dataset(external / 'data')

/omics/groups/OE0645/internal/data/htcdata/medium_test/external
['P160_OP124_2023_06_28_Experiment1', 'P162_OP126_2023_07_06_Experiment1', 'P163_OP127_2023_07_12_Experiment1']


## Lightning Class

Next step is to choose/build our lightning class. The Lightning class (as in Pytorch Lightning) performs many aspects of managing training, and can b customized by creating your own child class. most notably, the Lightning class allows you to specify your Loss function.

For this walkthrough, we will use the htc default "LightingImage" class, which is their default class for training on full images (as opposed to patches, pixels, or superpixels). This calculates loss as a weighted average of Dice loss and Cross-Entropy loss. See the htc "networkTraining" tutorial for more info on the lightning class.

## Config
The last step before training is to create our configuration file. This file is also a json that contains important metadata, and it is used by the training process itself to configure training hyperparameters, like batch size and transformations. We will use the htc's Config ***class*** to write the config ***json***

The following Code block will write the config json for you. By default, it will store the config.json file in the same directory as your dataset_settings json.

The following code block is still set up for tutorial use, not production use

In [4]:
#assign training hyperparameters
max_epochs = 2 #this can be whatever you want
batch_size = 2 #this is the number of SUBJECTS, rather than images, in each batch. The loader is designed to sort batches by subject
#default batch size is 3 subjects
shuffle = True #this tells the batch generator to retrieve random, different, batches on every epoch.
#True causes it to be random, False will leave same batches across epochs. 
num_workers = "auto" #how many dataloading "worker" subprocesses to start. The optimal amount for fast loading is highly dependant on your system
#you can experiment on low-epoch runs to see what num_workers maximizes your training speed. 
#left to implement: specialized sampling practices? such as guaranteeing even organ distribution across classes

In [5]:
config = Config.from_model_name("default", "image")
config["inherits"] = "models/image/configs/default"
config["input/data_spec"] = specs_path
config["input/annotation_name"] = ["polygon#annotator1"]
config["validation/checkpoint_metric_mode"] = "class_level"



# We want to merge the annotations from all annotators into one label mask
config["input/merge_annotations"] = "union"

# We have a two-class problem and we want to ignore all unlabeled pixels
# Everything which is >= settings.label_index_thresh will later be considered invalid
config["label_mapping"] = {
        "last_valid_label_index": 1,
        "mapping_index_name": {
            "0": "uro_conduit",
            "1": "background",
            "254": "overlap" },
        "mapping_name_index": {
            "overlap": 254,
            "unlabeled": 1,
            "uro_conduit": 0 },
        "unknown_invalid": False,
        "zero_is_invalid": False}
#leaving as none will use the label Id#s in the segmentation bloscs. if we want to remap the labels, we can specify here.
#could be useful for combining multiple labels into one label, without reloading the intermediates?
#some confusion on how background is handled/defined

#specify batch and sampler settings:
config['dataloader_kwargs/batch_size'] = batch_size
config['dataloader_kwargs/num_workers'] = num_workers

# Reduce the training time
config["trainer_kwargs/max_epochs"] = max_epochs

# Progress bars can cause problems in Jupyter notebooks so we disable them here (training does not take super long)
config["trainer_kwargs/enable_progress_bar"] = True

# Uncomment the following lines if you want to use one of the pretrained models as basis for our training
# config["model/pretrained_model"] = {
#     "model": "image",
#     "run_folder": "2022-02-03_22-58-44_generated_default_model_comparison",
#

config_path = external/'data'/ (name + "_config.json")
config.save_config(config_path)
JSON(config_path)

print(type(config_path))

<class 'pathlib.PosixPath'>


## Start the Training
You are now ready to train your network. Simply run the `htc training` command and pass the model type (image model in our case) and path to the config as arguments.
> &#x26a0;&#xfe0f; Starting a training session in a Jupyter notebook is usually not a good idea. Instead, it is advisable to use a [`screen`](https://linuxize.com/post/how-to-use-linux-screen/) environment so that your training runs in the background and you can return later to check for the status.

> There is also a `--fold FOLD_NAME` switch if you only want to train only one fold. This is useful for debugging.

In [6]:
import torch
torch.cuda.empty_cache()
# Retrieve GPU memory information
#allocated_memory, total_memory = torch.cuda.mem_get_info()
#print(f"Allocated Memory: {allocated_memory / (1024 ** 2):.2f} MB")
#print(f"Total Memory: {total_memory / (1024 ** 2):.2f} MB")

#print(torch.cuda.memory_allocated())

!htc training --model image --config $config_path
assert _exit_code == 0, "Training was not successful"  # noqa: F821

[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m Starting training of the fold fold_1 [1m[[0m[37m1[0m/[37m3[0m[1m][0m       [2mrun_training.py:301[0m
[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m The number of workers are set to [37m1[0m [1m([0m[37m2[0m physical cores    [2mutils.py:250[0m
are available in total[1m)[0m                                             [2m            [0m
[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m The following config will be used for training:   [2mrun_training.py:81[0m
[1m[[0m[32mINFO[0m[1m][0m[1m[[0m[3mhtc[0m[1m][0m [1m{[0m[90m'config_name'[0m: [90m'testSegment_config'[0m,             [2mrun_training.py:82[0m
 [90m'dataloader_kwargs'[0m: [1m{[0m[90m'batch_size'[0m: [37m2[0m, [90m'num_workers'[0m: [37m1[0m[1m}[0m,    [2m                  [0m
 [90m'input'[0m: [1m{[0m[90m'annotation_name'[0m: [1m[[0m[90m'[0m[36mpolygon#annotator1[0m[90m'[0m[1

AssertionError: Training was not successful

In [None]:
!nvidia-smi

/bin/bash: nvidia-smi: command not found


In [None]:
import torch

def print_gpu_info():
    if torch.cuda.is_available():
        print(f"CUDA Version: {torch.version.cuda}")
        print(f"PyTorch CUDA Support: {torch.cuda.is_available()}")
        print(f"Number of GPUs: {torch.cuda.device_count()}")

        for i in range(torch.cuda.device_count()):
            print(f"\nGPU {i}: {torch.cuda.get_device_name(i)}")
            print(f"  Total Memory: {torch.cuda.get_device_properties(i).total_memory / 1024 ** 3:.2f} GB")
            print(f"  Allocated Memory: {torch.cuda.memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Cached Memory: {torch.cuda.memory_reserved(i) / 1024 ** 2:.2f} MB")
            print(f"  Current Memory Usage: {torch.cuda.memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Peak Memory Usage: {torch.cuda.max_memory_allocated(i) / 1024 ** 2:.2f} MB")
            print(f"  Free Memory: {(torch.cuda.get_device_properties(i).total_memory - torch.cuda.memory_allocated(i)) / 1024 ** 2:.2f} MB")

print_gpu_info()
