# Deployment using GaNDLF
In this document, we will explore the use of GaNDLF to train and deploy a 'production' quality segmentation model using the data that we labeled in the previous step. GANDLF provides state-of-the-art segmentation models, multi-GPU training, and facilitates use of best practices for model training. Full usage instructions can be found here: https://mlcommons.github.io/GaNDLF/usage/

## Prepare data CSV for GaNDLF
First, we need to create CSV files that tells GaNDLF where our images and segmentation (label) data are stored. There are many ways to accomplish this, but a pythonic way is provided below. We will create a single CSV, which will be automatically split into training, validation, and testing sets by GaNDLF. More complete instructions for creating a data CSV can be found here: https://mlcommons.github.io/GaNDLF/usage/#constructing-the-data-csv

In [3]:
from glob import glob
import os
import csv

# get lists of image files and segmentations
image_dir = './nifti_resampled'
seg_dir = './nifti_resampled/labels/final'
images = sorted(glob(os.path.join(image_dir, '*.nii.gz')))
labels = sorted(glob(os.path.join(seg_dir, '*.nii.gz')))

# make sure we are only including data with a corresponding label file
label_ids = [os.path.basename(label).split('.')[0] for label in labels]
images = [image for image in images if os.path.basename(image).split('.')[0] in label_ids]

# prepare data for the CSV for GANDLF
current_abs_path = os.getcwd()
images_abs = [os.path.join(current_abs_path, rel_path) for rel_path in images]  # convert image and label paths to absolute paths
labels_abs = [os.path.join(current_abs_path, rel_path) for rel_path in labels]  # make the header line for the CSV
csv_header = ['SubjectID', 'Channel_0', 'Label']
data = list(zip(label_ids, images_abs, labels_abs)) # zip file path data together

# write data CSV file
csv_file = 'data.csv'
with open(csv_file, 'w+') as f:
    writer = csv.writer(f)
    writer.writerow(csv_header)
    writer.writerows(data)

## Identify GaNDLF config file and prepare output directories
GaNDLF uses a YML text file to keep track of all the necessary model/training parameters. We have included a sample config file `gandlf.yml` in this code repository in the `configs` directory. This config file can be fully customized, but we will just use it out-of-the-box for this demo. Note that this config file controls the automated train/validation/testing splits. Additional config file examples can be found here: https://github.com/mlcommons/GaNDLF/tree/master/samples

In [4]:
import os

# make output directories
gandalf_out_dir = 'gandlf_out'
if not os.path.isdir(gandalf_out_dir):
    os.mkdir(gandalf_out_dir)
gandalf_out_dir = 'mlcube_out'
if not os.path.isdir(gandalf_out_dir):
    os.mkdir(gandalf_out_dir)

## Train the model
Now we are ready to train the model in GaNDLF. 

<span style="color:yellow">WARNING: If you want to use GPU accelerated training (recommended) and you have a CUDA compatible device, then make sure you set the CUDA_VISIBLE_DEVICES environment variable. For example: `export CUDA_VISIBLE_DEVICES=0`</span>

Training can be initiated with a single command in the terminal:

```bash
python gandlf_run -c configs/gandlf.yml -i data.csv -m gandlf_out -t True -d cuda
```

<span style="color:red">WARNING: Model training may take hours to days to complete depending on your hardware and configuration.</span>

### Explanation of arguments
  * `-h` - Show help message and exit
  * `-v` - Show program's version number and exit.
  * `-c` - Model configuration - needs to be a valid YAML
  * `-i` - Data in CSV format 
  * `-m` - Model directory where the output of the training will be stored, created if not present
  * `-t` - True == train, False == inference
  * `-d` - Ensure CUDA_VISIBLE_DEVICES env variable is set for GPU device, use 'cpu' for CPU workloads
 
More detailed training instructions can be found here: https://mlcommons.github.io/GaNDLF/usage/#running-gandlf-traininginference

## Package model for deployment
Once training is done, we can package the model as an MLCube for straightforward sharing and deployment. This is done with a single command in the terminal:


<span style="color:yellow">WARNING: Deployment requires a functional Docker engine. Please refer to instructions in `dicom2deployment-initial-setup.ipynb`.</span>


```bash
python gandlf_deploy -c gandlf.yml -m gandlf_out --target docker --mlcube-root configs -o mlcub_out --mlcube-type model 
```

### Explanation of arguments
  * `-h` - Show help message and exit
  * `-c` - Model configuration file
  * `-m` - Model directory where the output of the training was saved
  * `--target` - The target platform (--help will show all available targets)
  * `--mlcube-root` - Directory containing mlcube.yaml (provided in the `configs` directory of this code repository)
  * `--mlcube-type` - MLCube type (should be `model`)
  * `-o` - Output directory where a new mlcube.yaml file to be distributed with your image will be created

More detailed deployment instructions can be found here: https://mlcommons.github.io/GaNDLF/usage/#deployment

## Finished!
Congratulations! You have completed the deployment section.