Copyright (c) MONAI Consortium  
Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  
&nbsp;&nbsp;&nbsp;&nbsp;http://www.apache.org/licenses/LICENSE-2.0  
Unless required by applicable law or agreed to in writing, software  
distributed under the License is distributed on an "AS IS" BASIS,  
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
See the License for the specific language governing permissions and  
limitations under the License.

# MONAI Auto3DSeg Reference Python APIs

In this notebook, we will break down the Auto3DSeg by the modules in the pipeline and introduce the API calls in Python and CLI commands. Particularly, if you have used the AutoRunner class, we will map the AutoRunner commands and configurations to each of the Auto3DSeg module APIs

![workflow](../figures/workflow.png)

## Setup environment

In [None]:
!python -c "import monai" || pip install -q "monai-weekly[nibabel, tqdm]"

## Setup imports

In [None]:
import os
import tempfile

from monai.apps import download_and_extract
from monai.apps.auto3dseg import (
    DataAnalyzer,
    BundleGen,
    AlgoEnsembleBestN,
    AlgoEnsembleBuilder,
    export_bundle_algo_history,
    import_bundle_algo_history,
)
from monai.auto3dseg import algo_to_pickle
from monai.bundle.config_parser import ConfigParser
from monai.config import print_config

print_config()

## Download dataset

We provide a toy datalist file that splits a subset of the downloaded datasets into five folds.

In [None]:
directory = os.environ.get("MONAI_DATA_DIRECTORY")
root_dir = tempfile.mkdtemp() if directory is None else directory
print(root_dir)

msd_task = "Task04_Hippocampus"
resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/" + msd_task + ".tar"

compressed_file = os.path.join(root_dir, msd_task + ".tar")
dataroot = os.path.join(root_dir, msd_task)
if not os.path.exists(dataroot):
    download_and_extract(resource, compressed_file, root_dir)

datalist_file = os.path.join("..", "tasks", "msd", msd_task, "msd_" + msd_task.lower() + "_folds.json")

## Prepare a input YAML configuration

In [None]:
input_cfg = {
    "name": msd_task,  # optional, it is only for your own record
    "task": "segmentation",  # optional, it is only for your own record
    "modality": "MRI",  # required
    "datalist": datalist_file,  # required
    "dataroot": dataroot,  # required
}
input = "./input.yaml"
ConfigParser.export_config_file(input_cfg, input)

## Breaking down the AutoRunner

Below is the typical usage of AutoRunner
```python
runner = AutoRunner(input=input)
runner.run() 
```

The two lines cover the typical settings in Auto3DSeg and now we are going through the internal APIs calls inside these two lines

## Data Analysis

When the `analyze` flag is set to `True`, `AutoRunner` will call `DataAnalyzer` to analyze the datasets and generate a statisical report in YAML. Below is the equivalent Python API calls of `DataAnalyzer`:


In [None]:
work_dir = "./ref_api_work_dir"

if not os.path.isdir(work_dir):
    os.makedirs(work_dir)
datastats_file = os.path.join(work_dir, "data_stats.yaml")
analyser = DataAnalyzer(datalist_file, dataroot, output_path=datastats_file)
datastat = analyser.get_all_case_stats()

print("datalist file: ", os.path.abspath(datalist_file))
print("dataroot path: ", os.path.abspath(dataroot))
print("datastat path: ", os.path.abspath(datastats_file))

Besides the Python API call, user can also use command line interface (CLI) provided by the Python Fire:

```bash
python -m monai.apps.auto3dseg DataAnalyzer get_all_case_stats \
    --datalist="<datalist file>" \
    --dataroot="<dataroot path>" \
    --output_path="<datastat path>"
```

## Algorithm Generation (algo_gen)

When the `algo_gen` flag is set to `True`, `AutoRunner` will use `BundleGen` to generate monai bundles from templated algorithms in the working directory. 

The templated algorithms are customized for the datasets when the `generate` method is called. In detail, the `generate` method will fill the templates using information from the data_stats report. Also, it will copy the necessary scripts (train.py/infer.py) to the algorithm folder. Finally, it will create an algo_object.pkl to save the `Algo` so that it can be instantiated in the local or remote machine. Cross validation is used by default, and `num_fold` can be set to 1 if the users do not want cross validation.

Below is the equivalent Python API calls of `BundleGen`:

In [None]:
bundle_generator = BundleGen(
    algo_path=work_dir,
    data_stats_filename=datastats_file,
    data_src_cfg_name=input,
)

bundle_generator.generate(work_dir, num_fold=5)

print("algo path: ", os.path.abspath(work_dir))
print("data_stats file: ", os.path.abspath(datastats_file))
print("task input file: ", os.path.abspath(input))

Besides the Python API call, user can also use command line interface (CLI) provided by the user's OS. One example is the following bash commands:

```bash
python -m monai.apps.auto3dseg BundleGen generate \
    --algo_path="<algo path>" \
    --data_stats_filename="<data_stats file>" \
    --data_src_cfg_name="<task input file>"
```

## Getting and saving the algorithm generation history to the local drive

If the users continue to train the algorithms on local system, The history of the algorithm generation can be fetched via `get_history` method of the `BundleGen` object. There also are scenarios that users need to stop the Python process after the `algo_gen`. For example, the users may need to transfer the files to a remote cluster to start the training. `Auto3DSeg` offers a utility function `export_bundle_algo_history` to dump the history to hard drive and recall it by `import_bundle_algo_history`. 

If the files are copied to a remote system, please ensure the algorithm templates are also copied there. Some functions require the path to instantiate the algorithm class properly.

In [None]:
history = bundle_generator.get_history()
export_bundle_algo_history(history)  # save the Algo objects

## Add training parameters to cut down the training time in this notebook (Optional)

This step is not required, but for demo purposes, we'll set a limit of the epochs to train the algorithms. 

Some algorithms in **Auto3DSeg** use `epoch` to mark the progress of training, while the others use `iteration` to iterate the loops. 
Below is the code block to convert `num_epoch` to iteration style and override all algorithms with the same training parameters for a 1-GPU/2-GPU machine. 

It is not required for the users to set the `train_param`. 
The users can use either `train()` or `train({})` if no changes are needed.
Then the algorithms will go for the full training and repeat 5 folds.

On the other hand, users can also use set `train_param` for each algorithm.


For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.
The setup works fine for a machine that has GPUs less than or equal to 8.
The datalist in this example is only using a subset of the original dataset.
Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.
For example, the following code block is not suitable for a 16-GPU system.
In such cases, please change the code block accordingly.

In [None]:
max_epochs = 2  # change epoch number to 2 to cut down the notebook running time

train_param = {
    "num_epochs_per_validation": 1,
    "num_images_per_batch": 2,
    "num_epochs": max_epochs,
    "num_warmup_epochs": 1,
}

print(train_param)

## Training the neural networks sequentially

The algo_gen history contains `Algo` object that has multiple methods such as `train` and `predict`. We can easily use such APIs to trigger neural network training. By default, `AutoRunnner` will start a training on a single node (single or multiple GPUs) in a seqential manner.

`algo_to_pickle` is optional and it will update the dumped Algo objects with the accuracies information.

In [None]:
history = import_bundle_algo_history(work_dir, only_trained=False)
for task in history:
    for _, algo in task.items():
        algo.train(train_param)  # can use default params by `algo.train()`
        acc = algo.get_score()
        algo_to_pickle(algo, template_path=algo.template_path, best_metrics=acc)

## Information about Hyper-parameter Optimization (HPO)

Another method to handle the neural network training is to perform HPO (e.g. training & searching). This is made possible by NNI or Optuna packages which are installed in the MONAI development environment. `AutoRunner` uses Microsoft `NNI` as backend via the `NNIGen`, but Optuna HPO can also be chosen via the `OptunaGen` method in the Auto3DSeg pipeline

For more information, please refer to the [HPO NNI notebook](hpo_nni.ipynb) and the [HPO Optuna notebook](hpo_optuna.ipynb)

## Ensemble

Finally, after the neural networks are trained, `AutoRunner` will apply the ensemble methods in Auto3DSeg to improve the overall performance. 

Here we used a utility function `import_bundle_algo_history` to load the `Algo` that is trained into the ensemble. With the history loaded, we build an ensemble method and use the method to perform the inference on all testing data.

> NOTE: Because we need to get the prediction in Python, there is no alternative CLI commands for this step.

In [None]:
history = import_bundle_algo_history(work_dir, only_trained=True)
builder = AlgoEnsembleBuilder(history, input)
builder.set_ensemble_method(AlgoEnsembleBestN(n_best=5))
ensembler = builder.get_ensemble()
preds = ensembler()