# Getting Started with Clara Train SDK
Clara Train SDK consists of different modules as depicted below 
<br>![side_bar](screenShots/TrainBlock.png)

By the end of this notebook you will:
1. Understand the components of [Medical Model ARchive (MMAR)](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/mmar.html)
2. Know how to configure train config json to train a CNN
3. Train a CNN with single and muiltple GPUs
4. Fine tune a model
5. Export a model 
6. Perform inference on test dataset 


## Prerequisites
- Nvidia GPU with 8GB of memory (Pascal or newer) 

### Resources
It maybe helpful to watch the free GTC Digital 2020 talk covering the Clara Train SDK 
- [S22563](https://developer.nvidia.com/gtc/2020/video/S22563)
Clara train Getting started: Core concepts, Bring Your Own Components (BYOC), AI assisted annotation (AIAA), AutoML 

## DataSet 
This notebook uses a sample dataset (ie. a single image of spleen dataset) provided in the package to train a network for a few epochs. 
This single file is duplicated 32 times for the training set and 9 times for validation in order to mimic the full spleen dataset. 

#### Disclaimer  
In this Notebook we run sample training jobs for one or two epochs just to highlight the core concepts. 
A relatively small neural network is also used to ensure it runs on most GPUs.    
For realistic training a user could increase the number of epochs, use larger neural networks and tune other parameters.  

# Lets get started
It is helpful to first check that we have an NVIDIA GPU available in the docker by running the cell below

In [1]:
# following command should show all gpus available 
!nvidia-smi

Mon Sep 21 13:32:21 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57       Driver Version: 450.57       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 208...  Off  | 00000000:B2:00.0 Off |                  N/A |
| 31%   35C    P0    64W / 250W |      0MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

The cell below defines a helper function that will be used throughout the notebook

In [1]:
MMAR_ROOT="/workspace/clara_seg_ct_brats"
print ("setting MMAR_ROOT=",MMAR_ROOT)
%ls $MMAR_ROOT

!chmod 777 $MMAR_ROOT/commands/*
def printFile(filePath,lnSt,lnOffset):
    print ("showing ",str(lnOffset)," lines from file ",filePath, "starting at line",str(lnSt))
    lnOffset=lnSt+lnOffset
    !< $filePath head -n "$lnOffset" | tail -n +"$lnSt"

setting MMAR_ROOT= /workspace/clara_seg_ct_brats
[0m[01;34mcommands[0m/  [01;34mconfig[0m/  [01;34mdocs[0m/  [01;34meval[0m/  [01;34mmodels[0m/  [01;34mresources[0m/


## Medical Model ARchive (MMAR)
Clara Train SDK uses the [Medical Model ARchive (MMAR)](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/mmar.html). 
The MMAR defines a standard structure for organizing all artifacts produced during the model development life cycle. 
The Clara Train SDK basic idea is to get started on training deep learning models using intuitive configuration files as shown below:
<br>![side_bar](screenShots/MMAR.png)


You can download sample models for different problems from [NGC](https://ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&pageNumber=0&query=clara&quickFilter=&filters=) <br> 
All MMAR follow the structure provided in this Notebook. if you navigate to the parent folder structure it should contain the following subdirectories
```
./GettingStarted 
├── commands
├── config
├── docs
├── eval
├── models
└── resources
```

* `commands` contains a number of ready-to-run scripts for:
    - training
    - training with multiple GPUS
    - validation
    - inference (testing)
    - exporting models in TensorRT Inference Server format
* `config` contains configuration files (in JSON format) for each training, 
validation, and deployment for [AI-assisted annotation](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/aiaa/index.html) 
(_Note:_ these configuration files are used in the scripts under the `commands` folder)
* `docs` contains local documentation for the model, but for a more complete view it is recommended that you visit the NGC model page
* `eval` is used as the output directory for model evaluation (by default)
* `models` is where the tensorflow checkpoint-formatted model is stored (`.index`, `.meta`, `.data-xxxxx-of-xxxxx`), and the corresponding graph definition files (`fzn.pb` for frozen models, and `trt.pb` for TRT models)
* `resources` currently contains the logger configuration in the `log.config` file

Some of the most important files you will need to understand to configure and use in Clara Train SDK are:

1. `environment.json` which has important common parameters: 
    * `DATA_ROOT` is the root folder where the data with which we would like to train, validate, or test resides in
    * `DATASET_JSON` expects the path to a JSON-formatted file 
    * `MMAR_CKPT_DIR` the path to the where the tensorflow checkpoint files reside
    * `MMAR_EVAL_OUTPUT_PATH` the path to output evaluation metrics for the neural network during training, validation, and inference
    * `PROCESSING_TASK` the type of processing task the neural net is intended to perform (currently limited to `annotation`, `segmentation`, `classification`)
    * `PRETRAIN_WEIGHTS_FILE` (_optional_) 	determines the location of the pre-trained weights file; if the file does not exist and is needed, 
    the training program will download it from a predefined URL


In [2]:
printFile(MMAR_ROOT+"/config/environment.json",0,30)

showing  30  lines from file  /workspace/clara_seg_ct_brats/config/environment.json starting at line 0
{
    "DATA_ROOT": "/workspace/data/",
    "DATASET_JSON": "/workspace/clara_seg_ct_brats/config/2018train_2019test.json",
    "MMAR_CKPT_DIR": "models",
    "MMAR_EVAL_OUTPUT_PATH": "eval",
    "PROCESSING_TASK": "segmentation"
}


2. `train.sh` and `train_finetune.sh` run the commands to train the neural network based on the `config_train.json` configuration; 
this shell script can be also used to override parameters in `config_train.json` using the `--set` argument (see `train_finetune.sh`)

_Note_: The main difference between the two is that `train_finetune.sh` specifies a `ckpt` file, 
while `train.sh` does not since it is training from scratch.

Let's take a look at `train.sh` by executing the following cell.

In [8]:
# printFile(MMAR_ROOT+"/commands/train_W_Config.sh",30,30)
printFile(MMAR_ROOT+"/commands/train_finetune.sh",0,30)


showing  30  lines from file  /workspace/clara_seg_ct_brats/commands/train_finetune.sh starting at line 0
#!/usr/bin/env bash

my_dir="$(dirname "$0")"
. $my_dir/set_env.sh

echo "MMAR_ROOT set to $MMAR_ROOT"
additional_options="$*"

# Data list containing all data
CONFIG_FILE=config/config_train.json
ENVIRONMENT_FILE=config/environment.json

python -u  -m nvmidl.apps.train \
    -m $MMAR_ROOT \
    -c $CONFIG_FILE \
    -e $ENVIRONMENT_FILE \
    --set \
    DATASET_JSON=$MMAR_ROOT/config/seg_brats18_datalist_0.json \
    epochs=1250 \
    MMAR_CKPT=$MMAR_ROOT/models/model.ckpt \
    ${additional_options}


## config.json Main Concepts 


`config_train.json` contains all the parameters necessary to define the neural network, 
how is it trained (training hyper-parameters, loss, etc.), 
pre- and post-transformation functions necessary to modify and/or augment the data before input to the neural net, etc. 
The complete documentation on the training configuration is laid out 
[here](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/appendix/configuration.html#training-configuration).
The configuration file defines all training related parameters. 
This is were a researcher would spend most of their time.

<br>![s](screenShots/MMARParts.png)<br> 

Lets take some time to examine each component of this configuration file.


1. Global configurations 

In [10]:
# confFile=MMAR_ROOT+"/config/trn_base.json"
# printFile(confFile,0,10)


2. Training config which includes:
    1. Loss functions:
    [Dice](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=dice#module-ai4med.components.losses.dice)
    , [CrossEntropy](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=crossentropy#ai4med.components.losses.cross_entropy.CrossEntropy)
    , [Focal](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=focal#module-ai4med.components.losses.focal)
    , [FocalDice](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=focaldice#ai4med.components.losses.focal_dice.FocalDice) 
    , [CrossEntropyDice](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=crossentropydice#ai4med.components.losses.cross_entropy_dice.CrossEntropyDice) 
    , [BinaryClassificationLoss](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=binaryclassificationloss#ai4med.components.losses.classification_loss.BinaryClassificationLoss)
    , [MulticlassClassificationLoss](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=multiclassclassificationloss#ai4med.components.losses.classification_loss.MulticlassClassificationLoss)
    , [WeightedMulticlassClassificationLoss](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.losses.html?highlight=weightedmulticlassclassificationloss#ai4med.components.losses.classification_loss.WeightedMulticlassClassificationLoss)
    2. Optimizer
    [Momentum]()
    , [Adam]()
    , [NovaGrad]()
    3. Network architecture
    [SegAhnet]()
    , [SegResnet]()
    , [Unet]()
    , [UnetParallel]()
    , [DenseNet121]()
    , [Alexnet]()
    4. Learing rate Policy 
    [ReducePoly]()
    , [DecayOnStep]()
    , [ReduceCosine]()
    , [ReduceOnPlateau]()
    5. Image pipeline
        1. Classification 
        , [ClassificationImagePipeline]()
        , [ClassificationImagePipelineWithCache]()
        , [ClassificationKerasImagePipeline]()
        , [ClassificationKerasImagePipelineWithCache]()
        2. Segmenatation 
        , [SegmentationImagePipeline]()
        , [SegmentationImagePipelineWithCache]()
        , [SegmentationKerasImagePipeline]()
        , [SegmentationKerasImagePipelineWithCache]()    
    4. Pretransforms
        1. Loading transformations:
            [LoadNifti](https://docs.nvidia.com/clara/tlt-m[i/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=loadnifti#ai4med.components.transforms.load_nifti.LoadNifti)
            , [LoadPng](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=loadpng#ai4med.components.transforms.load_png.LoadPng)
            , [ConvertToChannelsFirst](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=converttochannelsfirst#ai4med.components.transforms.convert_to_channels_first.ConvertToChannelsFirst)
            , [LoadImageMasksFromNumpy](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=loadimagemasksfromnumpy#ai4med.components.transforms.load_image_masks_from_numpy.LoadImageMasksFromNumpy)
            , [LoadJpg](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=loadjpg#ai4med.components.transforms.load_jpg.LoadJpg)
        2. Resample Transformation
            [RepeatChannel](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=repeatchannel#ai4med.components.transforms.repeat_channel.RepeatChannel)
            , [ScaleByFactor](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scalebyfactor#ai4med.components.transforms.scale_by_factor.ScaleByFactor)
            , [ScaleByResolution](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scalebyresolution#ai4med.components.transforms.scale_by_resolution.ScaleByResolution)
            , [ScaleBySpacing](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scalebyspacing#ai4med.components.transforms.scale_by_spacing.ScaleBySpacing)
            , [ScaleToShape](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scaletoshape#ai4med.components.transforms.scale_to_shape.ScaleToShape)
            , [RestoreOriginalShape](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=restoreoriginalshape#ai4med.components.transforms.restore_original_shape.RestoreOriginalShape)
            , [LoadImageMasksFromNumpy](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=loadimagemasksfromnumpy#ai4med.components.transforms.load_image_masks_from_numpy.LoadImageMasksFromNumpy)
        3. Cropping transformations
            [CropForegroundObject](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropforegroundobject#ai4med.components.transforms.crop_foreground_object.CropForegroundObject)
            , [FastPosNegRatioCropROI](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=fastposnegratiocroproi#ai4med.components.transforms.fast_pos_neg_ratio_crop_roi.FastPosNegRatioCropROI)
            , [CropByPosNegRatio](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropbyposnegratio#ai4med.components.transforms.crop_by_pos_neg_ratio.CropByPosNegRatio)
            , [SymmetricPadderDiv](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=symmetricpadderdiv#ai4med.components.transforms.symmetric_padder_div.SymmetricPadderDiv)
            , [FastCropByPosNegRatio](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=fastcropbyposnegratio#ai4med.components.transforms.fast_crop_by_pos_neg_ratio.FastCropByPosNegRatio)
            , [CropByPosNegRatioLabelOnly](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropbyposnegratiolabelonly#ai4med.components.transforms.crop_by_pos_neg_ratio_label_only.CropByPosNegRatioLabelOnly)
            , [CropForegroundObject](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropforegroundobject#ai4med.components.transforms.crop_foreground_object.CropForegroundObject)
            , [CropSubVolumeCenter](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropsubvolumecenter#ai4med.components.transforms.crop_sub_volume_center.CropSubVolumeCenter)
            , [CropRandomSizeWithDisplacement](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=croprandomsizewithdisplacement#ai4med.components.transforms.crop_random_size_w_displacement.CropRandomSizeWithDisplacement)
            , [CropFixedSizeRandomCenter](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=cropfixedsizerandomcenter#ai4med.components.transforms.crop_fixed_size_random_center.CropFixedSizeRandomCenter)
        4. Deformable transformations
            [FastPosNegRatioCropROI](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=fastposnegratiocroproi#ai4med.components.transforms.fast_pos_neg_ratio_crop_roi.FastPosNegRatioCropROI)
        5. Intensity Transforms
            [ScaleIntensityRange](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scaleintensityrange#ai4med.components.transforms.scale_intensity_range.ScaleIntensityRange)
            , [ScaleIntensityOscillation](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=scaleintensityoscillation#ai4med.components.transforms.scale_intensity_oscillation.ScaleIntensityOscillation)
            , [AddGaussianNoise](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=addgaussiannoise#ai4med.components.transforms.add_gaussian_noise.AddGaussianNoise)
            , [NormalizeNonzeroIntensities](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=normalizenonzerointensities#ai4med.components.transforms.normalize_nonzero_intensities.NormalizeNonzeroIntensities)
            , [CenterData](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=centerdata#ai4med.components.transforms.center_data.CenterData)
            , [AdjustContrast](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=adjustcontrast#ai4med.components.transforms.adjust_contrast.AdjustContrast)
            , [RandomGaussianSmooth]()
            , [RandomMRBiasField]()
        6. Augmentation Transforms
            [RandomZoom](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=randomzoom#ai4med.components.transforms.random_zoom.RandomZoom)
            , [RandomAxisFlip](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=randomaxisflip#ai4med.components.transforms.random_axis_flip.RandomAxisFlip)
            , [RandomSpatialFlip](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=randomspatialflip#ai4med.components.transforms.random_spatial_flip.RandomSpatialFlip)
            , [RandomRotate2D](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=randomrotate2d#ai4med.components.transforms.random_rotate_2d.RandomRotate2D)
            , [RandomRotate3D](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=randomrotate3d#ai4med.components.transforms.random_rotate_3d.RandomRotate3D)
        7. Special transforms 
            [AddExtremePointsChannel](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=addextremepointschannel#ai4med.components.transforms.add_extreme_points_channel.AddExtremePointsChannel)
            , [SplitAcrossChannels](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=splitacrosschannels#ai4med.components.transforms.split_across_channels.SplitAcrossChannels)
            , [SplitBasedOnLabel](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=splitbasedonlabel#ai4med.components.transforms.split_based_on_label.SplitBasedOnLabel)
            , [ThresholdValues](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=thresholdvalues#ai4med.components.transforms.apply_threshold.ThresholdValues)
            , [SplitBasedOnBratsClasses](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=splitbasedonbratsclasses#ai4med.components.transforms.split_based_on_brats_classes.SplitBasedOnBratsClasses)
            , [ConvertToMultiChannelBasedOnLabel](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=converttomultichannelbasedonlabel#ai4med.components.transforms.convert_to_multi_channel_based_on_label.ConvertToMultiChannelBasedOnLabel)
            , [KeepLargestCC](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=keeplargestcc#ai4med.components.transforms.keep_largest_connected_component.KeepLargestCC)
            , [CopyProperties](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=copyproperties#ai4med.components.transforms.copy_properties.CopyProperties)
            , [ConvertToMultiChannelBasedOnBratsClasses](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=converttomultichannelbasedonbratsclasses#ai4med.components.transforms.convert_to_multi_channel_based_on_brats_classes.ConvertToMultiChannelBasedOnBratsClasses)
            , [ArgmaxAcrossChannels](https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/apidocs/ai4med/ai4med.components.transforms.html?highlight=argmaxacrosschannels#ai4med.components.transforms.argmax_across_channels.ArgmaxAcrossChannels)       

        

In [11]:
confFile=MMAR_ROOT+"/config/trn_base.json"
printFile(confFile,9,8)
printFile(confFile,16,8)
printFile(confFile,25,20)
printFile(confFile,108,15)

showing  8  lines from file  /workspace/clara_seg_ct_brats/config/trn_base.json starting at line 9
/bin/dash: 1: cannot open /workspace/clara_seg_ct_brats/config/trn_base.json: No such file
showing  8  lines from file  /workspace/clara_seg_ct_brats/config/trn_base.json starting at line 16
/bin/dash: 1: cannot open /workspace/clara_seg_ct_brats/config/trn_base.json: No such file
showing  20  lines from file  /workspace/clara_seg_ct_brats/config/trn_base.json starting at line 25
/bin/dash: 1: cannot open /workspace/clara_seg_ct_brats/config/trn_base.json: No such file
showing  15  lines from file  /workspace/clara_seg_ct_brats/config/trn_base.json starting at line 108
/bin/dash: 1: cannot open /workspace/clara_seg_ct_brats/config/trn_base.json: No such file




3. Validation config which includes:
    1. Metric 
    2. pre-transforms. Since these transforms are usually a subset from the pre-transforms in the training section, 
    we can use the alias to point to these transforms by name as ` "ref": "LoadNifti"`. 
    In case we use 2 transforms with the same name as `ScaleByResolution` 
    we can give each an alias to refer to as `"name": "ScaleByResolution#ScaleImg"` 
    then refer to it in the validation section as `ScaleImg` 
    3. Image pipeline
    4. Inference

In [8]:
confFile=MMAR_ROOT+"/config/trn_base.json"
printFile(confFile,120,13)
printFile(confFile,135,16)
printFile(confFile,152,12)
printFile(confFile,164,10)

showing  13  lines from file  /claraDevDay/MMARs/GettingStarted//config/trn_base.json starting at line 120
      }
    }
  },
  "validate": {
    "metrics": [
      {
        "name": "ComputeAverageDice",
        "args": {
          "name": "mean_dice",
          "is_key_metric": true,
          "field": "model",
          "label_field": "label"
        }
      }
showing  16  lines from file  /claraDevDay/MMARs/GettingStarted//config/trn_base.json starting at line 135
    "pre_transforms": [
       {
         "ref": "LoadNifti"
       },
       {
         "ref": "ConvertToChannelsFirst"
       },
       {
         "ref": "ScaleImg"
       },
       {
         "ref": "ScaleLb"
       },
       {
         "ref": "ScaleIntensityRange"
       }
    ],
showing  12  lines from file  /claraDevDay/MMARs/GettingStarted//config/trn_base.json starting at line 152
    "image_pipeline": {
      "name": "SegmentationImagePipeline",
      "args": {
        "data_list_file_path": "{DATASET_JSON}",
   


## Start TensorBoard 
Before launching a training run or while the neural network is training, 
users can monitor the accuracy and other metrics using tensorboard in a side jupyter lab tab as shown below
 <br>![tb](screenShots/TensorBoard.png)<br> 


## Lets start training
Now that we have our training configuration, to start training simply run `train.sh` as below. 
Please keep in mind that we have setup a dummy dataset with one file to train a small neural network quickly (we only train for 2 epochs). 
Please see exercises on how to easily switch data and train a real segmentation network.

**_Note:_** We have renamed `train.sh` to `train_W_Config.sh` as we modified it to accept parameters with the configuration to use       

In [7]:
! $MMAR_ROOT/commands/train.sh

MMAR_ROOT set to /workspace/clara_seg_ct_brats/commands/..
2020-09-21 13:50:45.320073: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1


--------------------------------------------------------------------------
[[33331,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: b90125841dd2

Another transport will be used instead, although this may result in
lower performance.

btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Using TensorFlow backend.
2020-09-21 13:50:47,560 - TrainConfiger - INFO - DETERMINISM IS ON
2020-09-21 13:50:47,566 - nvmidl.utils.train_conf - INFO - Automatic Mixed Precision status: Disabled
Number of samples: 285
Data Property: {'task': 'segmentation', 'num_channels': 4, 'num_label_channels': 3, 'data_format': 'channels_first', 'lab

Now let us navigate the `models` directory, which would includes out models and the tensorboard files 

In [10]:
! ls -la $MMAR_ROOT/models

total 62568
drwxrwxr-x 3 1001 1001     4096 Sep 22 01:38 .
drwxrwxr-x 8 1001 1001     4096 Sep 21 12:56 ..
drwxr-xr-x 2 root root     4096 Sep 21 13:34 .ipynb_checkpoints
-rw-rw-r-- 1 1001 1001       49 Jun 30 15:24 _gitignore
-rw-r--r-- 1 root root       77 Sep 22 01:01 checkpoint
-rw-r--r-- 1 root root  5003165 Sep 22 01:36 events.out.tfevents.1600696255.b90125841dd2
-rw-r--r-- 1 root root 56426740 Sep 22 01:01 model.ckpt.data-00000-of-00001
-rw-r--r-- 1 root root    10048 Sep 22 01:01 model.ckpt.index
-rw-r--r-- 1 root root  2599518 Sep 22 01:01 model.ckpt.meta




## Export Model

To export the model we simply run `export.sh` which will: 
- Remove back propagation information from checkpoint files
- Generate two frozen graphs in the models folder
This optimized model will be used by Triton Inference server in the Clara Deploy SDK.


In [11]:
! $MMAR_ROOT/commands/export.sh

MMAR_ROOT set to /workspace/clara_seg_ct_brats/commands/..
2020-09-22 01:40:50.984453: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-09-22 01:40:52.999138: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1


Creating a regular frozen graph from Checkpoint at '/workspace/clara_seg_ct_brats/commands/../models' ...
Loaded meta graph file '/workspace/clara_seg_ct_brats/commands/../models/model.ckpt.meta
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2020-09-22 01:40:53.091228: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-09-22 01:40:53.733796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:b2:00.0
2020-09-22 



lets check out what was created in the folder. 
after running cell below you should see:
1. Frozen File Generated: /claraDevDay/MMARs/GettingStarted/commands/../models/trn_base/model.fzn.pb
2. TRT File Generated: /claraDevDay/MMARs/GettingStarted/commands/../models/trn_base/model.trt.pb


In [13]:
!ls -la $MMAR_ROOT/models/*.pb

-rw-r--r-- 1 root root 19622543 Sep 22 01:40 /workspace/clara_seg_ct_brats/models/model.fzn.pb
-rw-r--r-- 1 root root 19506006 Sep 22 01:40 /workspace/clara_seg_ct_brats/models/model.trt.pb



## Evaluate and Prediction 
Now that we have trained our model we would like to run evaluation to get some statistics and also do inference to see the resulting prediction.


### 1. Evaluate 
To run evaluation on your validation dataset you should run `validate.sh`. 
This will run evaluation on the validation dataset and place it in the `MMAR_EVAL_OUTPUT_PATH` as configured in the [environment.json](config/environment.json) 
file (default is eval folder). 
This evaluation would give min, max, mean of the metric as specified in the config_validation file


In [15]:
! $MMAR_ROOT/commands/validate.sh

MMAR_ROOT set to /workspace/clara_seg_ct_brats/commands/..
2020-09-22 01:42:13.365580: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1


--------------------------------------------------------------------------
[[35311,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: b90125841dd2

Another transport will be used instead, although this may result in
lower performance.

btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Using TensorFlow backend.
2020-09-22 01:42:15,446 - nvmidl.utils.train_conf - INFO - Automatic Mixed Precision status: Disabled
Previously evaluated: 0 ; To be evaluated: 50
2020-09-22 01:42:15.759927: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2020-09-22 01:42:15.766242: I tensorflow/compiler/

Now let us see the results in the folder by running cells below. 
You should see statistics and dice per file in the validation dataset

In [16]:
! ls -la $MMAR_ROOT/eval/

total 92
drwxrwxr-x 2 1001 1001  4096 Sep 22 01:47 .
drwxrwxr-x 8 1001 1001  4096 Sep 21 12:56 ..
-rw-rw-r-- 1 1001 1001     0 Jun 30 15:24 _gitignore
-rw-r--r-- 1 root root 21593 Sep 22 01:47 mean_dice_ET_raw_results.txt
-rw-r--r-- 1 root root   148 Sep 22 01:47 mean_dice_ET_summary_results.txt
-rw-r--r-- 1 root root 21597 Sep 22 01:47 mean_dice_TC_raw_results.txt
-rw-r--r-- 1 root root   148 Sep 22 01:47 mean_dice_TC_summary_results.txt
-rw-r--r-- 1 root root 21591 Sep 22 01:47 mean_dice_WT_raw_results.txt
-rw-r--r-- 1 root root   148 Sep 22 01:47 mean_dice_WT_summary_results.txt


In [23]:
# statistic summary
!cat $MMAR_ROOT/eval/mean_dice_ET_summary_results.txt
!cat $MMAR_ROOT/eval/mean_dice_WT_summary_results.txt
!cat $MMAR_ROOT/eval/mean_dice_TC_summary_results.txt

mean_dice_ET (statistics of 50 valid cases):
    mean  median     max     min   90percent   std
   0.874   0.891   0.960   0.520   0.811     0.075

mean_dice_WT (statistics of 50 valid cases):
    mean  median     max     min   90percent   std
   0.918   0.931   0.968   0.664   0.886     0.058

mean_dice_TC (statistics of 50 valid cases):
    mean  median     max     min   90percent   std
   0.906   0.944   0.974   0.494   0.827     0.102



In [19]:
!cat $MMAR_ROOT/eval/mean_dice_ET_raw_results.txt

/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_BGX_1/BraTS19_CBICA_BGX_1_t1ce.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_BGX_1/BraTS19_CBICA_BGX_1_t1.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_BGX_1/BraTS19_CBICA_BGX_1_t2.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_BGX_1/BraTS19_CBICA_BGX_1_flair.nii.gz	0.886624675805854
/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_AYG_1/BraTS19_CBICA_AYG_1_t1ce.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_AYG_1/BraTS19_CBICA_AYG_1_t1.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_AYG_1/BraTS19_CBICA_AYG_1_t2.nii.gz,/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_AYG_1/BraTS19_CBICA_AYG_1_flair.nii.gz	0.8989393296563428
/workspace/data/MICCAI_BraTS_2019_Data_Training/HGG/BraTS19_CBICA_BCF_1/BraTS19_CBICA_BCF_1_t1ce.nii.gz,/workspace/data/MICCAI_BraTS_2019

### 2. Predict

To run inference on validation dataset or test dataset you should run `infer.sh`. 
This will run prediction on the validation dataset and place it in the `MMAR_EVAL_OUTPUT_PATH` as configured in the 
[environment.json](config/environment.json) file (default is eval folder)


In [3]:
! $MMAR_ROOT/commands/infer.sh

MMAR_ROOT set to /workspace/clara_seg_ct_brats/commands/..
2020-09-30 16:45:37.737657: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1


--------------------------------------------------------------------------
[[29558,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 13dfd1518867

Another transport will be used instead, although this may result in
lower performance.

btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Using TensorFlow backend.
2020-09-30 16:45:39,719 - nvmidl.utils.train_conf - INFO - Automatic Mixed Precision status: Disabled
Previously evaluated: 0 ; To be evaluated: 50
2020-09-30 16:45:40.111880: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2020-09-30 16:45:40.121138: I tensorflow/compiler/

In [9]:
!$MMAR_ROOT/commands/infer_2020_validation.sh

MMAR_ROOT set to /workspace/clara_seg_ct_brats/commands/..
2020-09-30 17:10:03.528800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1


--------------------------------------------------------------------------
[[30286,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)
  Host: 13dfd1518867

Another transport will be used instead, although this may result in
lower performance.

btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
Using TensorFlow backend.
2020-09-30 17:10:05,374 - nvmidl.utils.train_conf - INFO - Automatic Mixed Precision status: Disabled
Previously evaluated: 0 ; To be evaluated: 125
2020-09-30 17:10:05.727741: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2020-09-30 17:10:05.735691: I tensorflow/compiler

Now lets see results in the folder

In [None]:
! ls -la $MMAR_ROOT/eval/

In [None]:
! ls -la $MMAR_ROOT/eval/spleen_8

## Multi-GPU Training
Clara train aims to simplify scaling and the utilization of all available gpu resources. 
Using the same config we already used for train we can simply invoke `train_2gpu.sh` to train on multiple gpus. 
We use MPI and Horovod to speed up training and passing weights between GPUs as shown below
<br>![tb](screenShots/MultiGPU.png)<br> 

Let us examine the `train_2gpu.sh` script by running cell below. 
You can see we are changing the learning rate as the batch size has doubled.

In [None]:
printFile(MMAR_ROOT+"/commands/train_2gpu.sh",0,50)

Lets give it a try and run cell below to train on 2 gpus

In [None]:
! $MMAR_ROOT/commands/train_2gpu.sh



# Exercise:
Now that you are familiar with the Clara Train SDK, you can try to: 
1. Train on a full spleen dataset; to do this you could:
    1. Download the spleen dataset using the [download](download) Notebook
    2. Switch the dataset file in the [environment.json](config/environment.json)
    3. rerun `train.sh`
2. Explore different model architectures, losses, transformations by modifying or creating a new config file and running training
3. Experiment with multi-GPU training by changing the number of gpus to train on from 2 to 3 or 4. 
You can edit [train_2gpu.sh](commands/train_2gpu.sh) then rerun the script.



# 合并