# MONAI Auto3DSeg AutoRunner

This notebook will introduce `AutoRunner`, the interface to run the Auto3Dseg pipeline with minimal user inputs.

Specifically, it will show the features below:
1. Use `AutoRunner` with an input config file `input.yaml` example
2. How to prepare the config file `input.yaml`
3. How to configure the paths for inputs, outputs, and intermediate results
4. How to set the internal parameters of **Auto3DSeg** components
5. How to use a 3rd party hyper parameter optimization(HPO) package with `AutoRunner`

## Setup environment

In [None]:
!python -c "import monai" || pip install -q "monai-weekly[nibabel, nni, tqdm, cucim, yaml, optuna]"

## Setup imports

In [None]:
import os
import tempfile
import torch

from monai.bundle.config_parser import ConfigParser
from monai.apps import download_and_extract

from monai.apps.auto3dseg import AutoRunner
from monai.auto3dseg import datafold_read

## Download dataset

In [None]:
directory = os.environ.get("MONAI_DATA_DIRECTORY")
root_dir = tempfile.mkdtemp() if directory is None else directory
print(root_dir)

msd_task = "Task04_Hippocampus"
resource = "https://msd-for-monai.s3-us-west-2.amazonaws.com/" + msd_task + ".tar"

compressed_file = os.path.join(root_dir, msd_task + ".tar")
dataroot = os.path.join(root_dir, msd_task)
if not os.path.exists(dataroot):
    download_and_extract(resource, compressed_file, root_dir)

datalist_file = os.path.join("..", "tasks", "msd", msd_task, "msd_" + msd_task.lower() + "_folds.json")

## Prepare a input YAML configuration

In [None]:
input_cfg = {
    "name": msd_task,  # optional, it is only for your own record
    "task": "segmentation",  # optional, it is only for your own record
    "modality": "MRI",  # required
    "datalist": datalist_file,  # required
    "dataroot": dataroot,  # required
}
input = './input.yaml'
ConfigParser.export_config_file(input_cfg, input)

## Run the Auto3DSeg pipeline in a few lines of code

Below is the typical usage of AutoRunner
```python
runner = AutoRunner(input=input)
runner.run()
```

The `run` command will take a long time since it will train algorithms over iterations.

If the user would like to perform a full training in the tutorial, it is recommended to uncomment the `runner.run()` appended at the end of each code block.

## Use the default setting with the input YAML file

In [None]:
runner = AutoRunner(input=input)
# runner.run()

## Use the default setting with the dictionary instead of the YAML file as the input

In [None]:
runner = AutoRunner(input=input_cfg)
# runner.run()

## Customize working directory
`AutoRunner` provides the user interfaces to save all the intermediate and final results in a user-specified location.
Here we use `./my_workspace` as an example

In [None]:
runner = AutoRunner(work_dir='./my_workspace', input=input)
# runner.run()

## Customize result caching

AutoRunner saves intermediate results by default to save computation time.
The user can choose whether it uses the cached results or restart from scratch.

If the users want to start from scratch, they can set `not_use_cache` to True

In [None]:
# This will restart from scratch and not use any cached results
runner = AutoRunner(input=input, not_use_cache=True)
# runner.run()

# Below will skip data analysis.
# Because data analysis was NOT completed and cache before, AutoRunner will throw an error

# runner = AutoRunner(input=input, analyze=False)  # This will throw error

## Customize the output folder to save ensemble result

AutoRunner will perform inference on the testing data specified by the `datalist` in the data source config input. The inference result will be written to the `ensemble_output` folder under the working directory in the form of `nii.gz`. The user can choose the format by adding keyword arguments to the AutoRunner. A list of argument can be found in [MONAI tranforms documentation](https://docs.monai.io/en/stable/transforms.html#saveimage).

In [None]:
runner = AutoRunner(input=input, output_dir='./output_dir')
# runner.run()

## Setting Auto3DSeg internal parameters
`Auto3DSeg` has four steps: data analysis, algorithm generation, training, and ensemble. Users can configure the internal parameters of the `AutoRunner` object to customize some steps in the pipeline.

Below, we begin the experiments with a smaller number of cross-validation folds. The default is 5 in the algorithm but we set it to 2 here:

In [None]:
runner = AutoRunner(input=input)
runner.set_num_fold(num_fold=2)
# runner.run()

## Customize training parameters by override the default values

`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. 

Note: **Auto3DSeg** uses bundle templates to perform training, validation, and inference. The number of epochs/iterations of training is specified by the config files in each template. While we can override them, it is also noted that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.

For demo purpose, below is code-block to convert num_epoch to iteration style and override all algorithms with the same training parameters for 1-GPU/2-GPU machine. 


In [None]:
max_epochs = 2

# safeguard to ensure max_epochs is greater or equal to 2
max_epochs = max(max_epochs, 2)

num_gpus = 1 if "multigpu" in input_cfg and not input_cfg["multigpu"] else torch.cuda.device_count()

num_epoch = max_epochs
num_images_per_batch = 2
files_train_fold0, _ = datafold_read(datalist_file, "", 0)
n_data = len(files_train_fold0)
n_iter = int(num_epoch * n_data / num_images_per_batch / num_gpus)
n_iter_val = int(n_iter / 2)

train_param = {
    "num_iterations": n_iter,
    "num_iterations_per_validation": n_iter_val,
    "num_images_per_batch": num_images_per_batch,
    "num_epochs": num_epoch,
    "num_warmup_iterations": n_iter_val,
}
runner = AutoRunner(input=input)
runner.set_training_params(params=train_param)
# runner.run()

## Customize the ensemble method

There are two supported methods: "AlgoEnsembleBestN" and "AlgoEnsembleBestByFold"

In [None]:
runner = AutoRunner(input=input)
runner.set_ensemble_method(ensemble_method_name="AlgoEnsembleBestByFold")
# runner.run()

## Customize the inference parameters by override the default values

In [None]:
# set model ensemble method
pred_params = {
    'files_slices': slice(0, 2),  # only infer the first two files in the testing data
    'mode': "vote",              # use majority vote instead of mean to ensemble the predictions
    'sigmoid': True,             # when to use sigmoid to binarize the prediction and output the label
}
runner = AutoRunner(input=input)
runner.set_prediction_params(params=pred_params)
# runner.run()

## Train model with HPO

**Auto3DSeg** supports hyper parameter optimization (HPO) via `NNI` and `Optuna` backends.
If you wound like to the use `Optuna`, please check the [notebook](hpo_optuna.ipynb) for detailed usage.

Here we demonstrate the HPO option with `NNI` by Microsoft.
Please install it via `pip install nni` if you hope to execute HPO with it in tutorial and haven't done so in the beginning of the notebook.
AutoRunner supports `NNI` backend with a grid search method via automatically generating a the `NNI` config and run `nnictl` commands in subprocess.

## Use `AutoRunner` with `NNI` backend to perform grid-search

After `runner.run()` is executed, `nni` will attempt to start a web service using port 8088 by default. If you are running the tutorial in a remote host, please make sure the port is available on the system.

In [None]:
runner = AutoRunner(input=input, hpo=True)
search_space = {"learning_rate": {"_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1]}}
runner.set_nni_search_space(search_space)
# runner.run()

## Override the templated values

The default `NNI` config that `AutoRunner` looks like below. User can override some of the parameters via the `set_hpo_params` interface:

```python
default_nni_config = {
    "trialCodeDirectory": ".",
    "trialGpuNumber": torch.cuda.device_count(),
    "trialConcurrency": 1,
    "maxTrialNumber": 10,
    "maxExperimentDuration": "1h",
    "tuner": {"name": "GridSearch"},
    "trainingService": {"platform": "local", "useActiveGpu": True},
}
```

In [None]:
runner = AutoRunner(input=input, hpo=True)
hpo_params = {"maxTrialNumber": 20}
search_space = {"learning_rate": {"_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1]}}
runner.set_hpo_params(params=hpo_params)
runner.set_nni_search_space(search_space)
# runner.run()

For more details about the usage of **Auto3DSeg** HPO features, please check the [Auto3DSeg NNI Notebok](./hpo_nni.ipynb) and [Auto3DSeg Optuna Notebook](./hpo_optuna.ipynb)

## Conclusion

Here we demonstrate how to use the AutoRunner APIs to customize your **Auto3DSeg** pipeline with mininal inputs. Don't forget you need to execute the `run` command to start the training and make everything take effect.

```python
runner.run()
```