# HN Segmentation with Transfer Learning

From journal article "Transfer learning for auto-segmentation of 17 organs-at-risk in the head and neck: bridging the gap between institutional and public datasets", published in ...

To complete this experiment, you must have access to a large institutional CT head and neck auto-segmentation dataset or similar.

## Installation

Note: These first few commands should be run in a terminal, not within the notebook
as we've yet to set up a suitable Jupyter notebook kernel.

In [1]:
# Install Python v3.10.4 or similar.

In [2]:
# Create and activate 'transfer-learning' virtual environment.
$ python -m venv ~/venvs/transfer-learning
$ source ~/venvs/transfer-learning/bin/activate

In [3]:
# Create Jupyter kernel pointing to virtual environment.
$ pip install ipykernel
$ python -m ipykernel install --user --name=transfer-learning

Note: Commands from here on are run in this Jupyter notebook. You should
restart the Jupyter notebook to ensure the kernel loads, and then select
the 'transfer-learning' kernel.

In [5]:
# Install required Python packages.
! pip install -r requirements.txt

[31mERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'[0m[31m
You should consider upgrading via the '/home/baclark/venvs/transfer-learning/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

In [6]:
# Create a data folder somewhere, and edit/export the following environment
# variables:
#   HNAS_CODE: The filepath pointing to "hn-segmentation-with-transfer-learning" code.
#   HNAS_DATA: The filepath to your new data folder.
! export HNAS_CODE="/absolute/path/to/hn-segmentation-with-transfer-learning"
! export HNAS_DATA="/absolute/path/to/data/folder"
! export HNAS_CODE="/data/projects/punim1413/hn-segmentation-with-transfer-learning/"
! export HNAS_DATA="/data/projects/punim1413/transfer-learning/"

## Preparing Public Data

If you would like to avoid training the public models yourself, you can find them on [Zenodo](/link/after/publication). You can then skip the "Preparing Public Data" and "Training Public Models" sections.

Download the public datasets from the Cancer Imaging Archive.
- HN1: Available from [Head-Neck-Radiomics-HN1](https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-Radiomics-HN1).
- HNPCT: Available from [Head-Neck-PET-CT](https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-PET-CT).
- HNSCC: Available from [HNSCC](https://wiki.cancerimagingarchive.net/display/Public/HNSCC).
- OPC: Available from [OPC-Radiomics](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=33948764).

For each of these datasets, drop the downloaded data into the corresponding "data" folder
created with the following structure. Folder structure within "data" doesn't matter as dataset
indexing will occur when you first query the `DICOMDataset`.

```
<HNAS_DATA>
    /datasets
        /dicom
            /HN1
                /data
            /HNPCT
                /data
            /HNSCC
                /data
            /OPC
                /data
```

In [8]:
# We will need to provide a mapping between the organ-at-risk names in the public datasets
# and the conventions that we will use in this package. This is carried out by creation a 
# 'region-map.csv' file in the root folder of the `DICOMDataset`. These files have already
# been created and just need to by symlinked to the correct locations.

! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/hn1-region-map.csv $HNAS_DATA/datasets/dicom/HN1/region-map.csv
! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/hnpct-region-map.csv $HNAS_DATA/datasets/dicom/HNPCT/region-map.csv
! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/hnscc-region-map.csv $HNAS_DATA/datasets/dicom/HNSCC/region-map.csv
! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/opc-region-map.csv $HNAS_DATA/datasets/dicom/OPC/region-map.csv

ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/HN1/region-map.csv': File exists
ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/HNPCT/region-map.csv': File exists
ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/HNSCC/region-map.csv': File exists
ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/OPC/region-map.csv': File exists


In [9]:

# This mapping process fails if there are multiple labels that would map to the same
# name. When this would occur, we need to register the duplicate labels.

! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/hnpct-region-dups.csv $HNAS_DATA/datasets/dicom/HNPCT/region-dups.csv
! ln -s $HNAS_CODE/hnas/dataset/dicom/files/region-maps/hnscc-region-dups.csv $HNAS_DATA/datasets/dicom/HNSCC/region-dups.csv

ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/HNPCT/region-dups.csv': File exists
ln: failed to create symbolic link '/data/projects/punim1413/transfer-learning/datasets/dicom/HNSCC/region-dups.csv': File exists


In [2]:
# Process DICOM data to NIFTI.
# NIFTI stores medical imaging data in a compact format and removes extraneous details 
# that are present in DICOM files.

# Option A (preferred).
# Creates jobs to process the data on a slurm cluster.
! python scripts/slurm/steps/4/create_jobs.py       # Creates 4 slurm jobs.

# Option B.
# Runs processing on local machine.
#! python scripts/python/steps/4.py       

sbatch --export=ALL,DATASET=HN1 scripts/slurm/steps/4/template.slurm
sbatch --export=ALL,DATASET=HNPCT scripts/slurm/steps/4/template.slurm
sbatch --export=ALL,DATASET=HNSCC scripts/slurm/steps/4/template.slurm
sbatch --export=ALL,DATASET=OPC scripts/slurm/steps/4/template.slurm


In [None]:
# Process NIFTI data to training data.

# Option A (preferred).
# Creates jobs to process the data on a slurm cluster.
! python scripts/slurm/steps/5/create_jobs.py       # Creates 8 slurm jobs.

# Option B.
# Runs processing on local machine.
#! python scripts/python/steps/5.py       

## Training Public Models

If you would like to avoid training the public models yourself, you can find them on [Zenodo](/link/after/publication).

In [None]:
# Train a public localiser and segmenter network per organ-at-risk.
# NOTE: Some editing of the slurm template files will be necessary to connect to your
# GPU partition. Training is configured to use [wandb](https://wandb.ai/) reporting by
# default, but this must be enabled by setting USE_LOGGER=True in the templates.

# Option A (preferred).
# Creates jobs to train the networks on a slurm cluster.
! python scripts/slurm/steps/6/create_jobs.py       # Creates 34 slurm jobs.

# Option B.
# Runs training on local machine.
#! python scripts/python/steps/6.py       

In [None]:
# Training can be resumed upon failure with the following scripts.

# Option A (preferred).
# Creates jobs to train the networks on a slurm cluster.
! python scripts/slurm/steps/6/create_resume_jobs.py       # Creates 34 slurm jobs.

# Option B.
# Runs training on local machine.
#! python scripts/python/steps/6_resume.py       

## Preparing Institutional Data

Create institutional dataset (e.g "INST") in a similar manner to the public datasets,
by dropping all data into the "data" folder.

```
<HNAS_DATA>
    /datasets
        /dicom
            /INST
                /data
```

In [None]:
# Process DICOM data to NIFTI.
# NIFTI stores medical imaging data in a compact format and removes extraneous details 
# that are present in DICOM files.

# Option A (preferred).
# Creates jobs to process the data on a slurm cluster.
! python scripts/slurm/steps/8/create_jobs.py       # Creates 1 slurm jobs.

# Option B.
# Runs processing on local machine.
#! python scripts/python/steps/8.py       

In [None]:
# Process NIFTI data to training data.

# Option A (preferred).
# Creates jobs to process the data on a slurm cluster.
! python scripts/slurm/steps/9/create_jobs.py       # Creates 1 slurm jobs.

# Option B.
# Runs processing on local machine.
#! python scripts/python/steps/9.py       

## Training Institutional Models

In [None]:
# WARNING: This script will create 595 (17 organs-at-risk x 7 dataset sizes x 5-fold cross-validation)
# slurm jobs!!! You will need to modify the "regions", "n_trains", and "test_folds" of the script to
# initiate the jobs in smaller batches to avoid queue limits.

# Train an institutional localiser and segmenter per organ-at-risk.
# NOTE: Some editing of the slurm template files will be necessary to connect to your
# GPU partition. Training is configured to use [wandb](https://wandb.ai/) reporting by
# default, but this must be enabled by setting USE_LOGGER=True in the templates.

# Option A (preferred).
# Creates jobs to train the networks on a slurm cluster.
! python scripts/slurm/steps/10/create_jobs.py       # Creates 595 slurm jobs.

# Option B.
# Runs training on local machine.
#! python scripts/python/steps/10.py       

#