# AutoGluon

AutoGluon aims to provide automatic machine learning (Auto ML) support for MXNet and Gluon. AutoGluon focuses on automatic deep learning (Auto DL). AutoGluon targets: 

* *Beginners *are* *70~80% of the customers who would be interested in AutoGluon. The basic Auto ML  scenario: customers have a traditional machine learning task by hand, provide own raw data, watch the search process, and finally obtain a good quality model. The beginners include but not limited to engineers and students, who are generally new to machine learning. 
* *Advanced users *aim to own full control and access to the Auto ML overall process as well as each important component, such as constructing own networks, metrics, losses, optimizers, searcher and trial scheduler. The advanced users could potentially have more specified constraints regarding to the automatic searching procedure. The advanced users include but not limited to experienced machine learning researchers and engineers.
* *Contributors: *Contributors are Advanced users who will create strategies that are useful for beginners either extending to new datasets, new domains, new algorithms or bringing state of art results to save time and effort.

The AutoGluon's design principles are:

* *Easy to use: *Deep learning framework users could use AutoGluon almost right away. The only usage difference between AutoGluon and Gluon is that: rather than providing a fixed value to different deep learning components, we enable a searchable range to let Auto ML decides which are the best, whereas all the major APIs' usage stays the same.
* *Easy to extend: *From user perspective, we organize the AutoGluon by tasks, users could easily use all the task specific components, such as data preprocessing, model zoo, metrics and losses, so that adding a new task could very straightforward. In this way, advanced ML tasks, such as GAN ,could be easily incorporated by providing a new task module. From system perspective, multiple back-ends could be used since the front-end are designed to be separate from the backends, this could be beneficial to extend to production-level Auto ML.

The AutoGluon's overall system design is as below:

<img src="./img/autogluon_overview.png" width="400" height="400" />

In the following*, we use Image Classification as a running example* to illustrate the usage of AutoGluon's main APIs.

As follow-ups, to understand more about the `fit` backend workflow, please find the basic and distributed fit backend tutorials in [`[1]`](https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb) and [`[2]`](https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb) respectively.


## Preparation

### Install AutoGluon

```bash
git clone ssh://git.amazon.com/pkg/AutoGluon
cd AutoGluon
python setup.py install
```

### Import Task

We are using image classification as an example in this notebook.

In [None]:
import warnings
warnings.filterwarnings("ignore")

from autogluon import image_classification as task

import logging
logging.basicConfig(level=logging.INFO)

## A Quick Image Classification Example

We first show the most basic usage by first creating a dataset and then fiting the dataset to generate the results with the image classification example.

### Create AutoGluon Dataset

We use CIFAR10 for image classfication for demo purpose.

In [None]:
dataset = task.Dataset(name='cifar10') # case insentive 

print(dataset) # show a quick summary of the dataset, e.g. #example for train, #classes

The constructed dataset contains the CIFAR10 training and validation datasets.

In [None]:
dataset.train[0] # access the first example
dataset.val[-2:] # access the last 2 validation examples

Then we will use the default configuration of the image classification to generate:
* Best result of the search in terms of accuracy
* According best configuration regarding to the best result
    
To acheive this, we are using `fit` function to generate the above results based on the datasets.

The default configruation is based on `max_trial_count=2` and `max_training_epochs=3`. 
If running on no GPU environment, please set `demo=True` in the `fit`. The process would approximately cost one and half minutes.
If want to watch the `fit`, we default provide Tensorboad to visualize the process.
Please type `tensorboard --logdir=./checkpoint/exp1/logs --host=127.0.0.1 --port=8888` in the command.

In [None]:
results = task.fit(dataset, demo=True)

The best accuracy is:

In [None]:
print('%.2f acc' % (results.val_accuracy * 100))

The associated best configuration is:

In [None]:
print(results.config)

Total time cost is:

In [None]:
print('%.2f s' % results.time)

## A Step-by-step Image Classification Example

We first introduce the basic configuration `autogluon.space`, which is used to represent the search space of each task components, we will then go throught each components, including 

* `autogluon.Dataset`
* `autogluon.Nets`
* `autogluon.Optimizers`
* `autogluon.Losses`
* `autogluon.Metrics`

and finally put all together to `fit` to generate best results.

### Import AutoGluon

In [None]:
import autogluon as ag

### Create AutoGluon Space


`autogluon.space` is a search space containing a set of configuration candidates.
We provide three basic space types.

* Categorical Space

In [None]:
list_space = ag.space.List('listspace', ['0', '1', '2'])
print(list_space)

* Linear Space

In [None]:
linear_space = ag.space.Linear('linspace', 0, 10)
print(linear_space)

* Log Space

In [None]:
log_space = ag.space.Log('logspace', 10**-10, 10**-1)
print(log_space)

* An Example of Random Sample from the Combined Space

In [None]:
print(ag.space.sample_configuration([list_space, linear_space, log_space]))

We then will use `autogluon.Nets` and `autogluon.Optimizers` as examples to show the usage of auto objects. The remainining auto objects are using default value.

### Create AutoGluon Nets

`autogluon.Nets` is a list of auto networks, and allows search for the best net

* from a list of provided (or default) networks
* by choosing the best architecture regarding to each auto net.

In [None]:
# type of net_list is ag.space.List

# method 1 (complex but flexiable): specify the net_list using get_model
# net_list = [task.model_zoo.get_model('cifar_resnet20_v1'), # TODO: pretrained and pretrained_dataset would be supported
#             task.model_zoo.get_model('cifar_resnet56_v1'),
#             task.model_zoo.get_model('cifar_resnet110_v1')]

# method 2 (easy and less flexiable): specify the net_list using model name
net_list = ['cifar_resnet20_v1',
            'cifar_resnet56_v1',
            'cifar_resnet110_v1']

# default net list for image classification would be overwritten 
# if net_list is provided
nets = ag.Nets(net_list)

print(nets)

### Create AutoGluon Optimizers

`autogluon.Optimizers` defines a list of optimization algorithms that allows search for the best optimization algorithm 

* from a list of provided (or default) optimizers
* by choosing the best hyper-parameters regarding to each auto optimizer

In [None]:
# method 1 (complex but flexiable): specify the optim_list using get_optim
# optimizers = ag.Optimizers([ag.optim.get_optim('sgd'),
#                             ag.optim.get_optim('adam')])

# method 2 (easy and less flexiable): specify the optim_list using get_model
optimizers = ag.Optimizers(['sgd', 'adam'])

print(optimizers)

### Create AutoGluon Fit - Put all together

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 1.0,
    'max_trial_count': 4
}

resources_per_trial = {
    'max_num_gpus': 0, # set this to more than 1 if you have GPU machine to run more efficiently.
    'max_num_cpus': 4,
    'max_training_epochs': 2
}

results = task.fit(dataset,
                   nets,
                   optimizers,
                   stop_criterion=stop_criterion,
                   resources_per_trial=resources_per_trial,
                   demo=True) # demo=True is recommened when running on no GPU machine

The best accuracy is:

In [None]:
print('%.2f acc' % (results.val_accuracy * 100))

The best associated configuration is:

In [None]:
print(results.config)

Total time cost is:

In [None]:
print('%.2f s' % results.time)

### Use Search Algorithm

`autogluon.searcher` will support both basic and SOTA searchers for both hyper-parameter optimization and architecture search. We now support random search. The default is using searcher is random searcher.

In [None]:
# cs is CS.ConfigurationSpace() where import ConfigSpace as CS, this is just example code;
# in practice, this is in fit function, and cs should not be None
cs = None
searcher = ag.searcher.RandomSampling(cs)

print(searcher)

Or simply use string name:

In [None]:
searcher = 'random'

print(searcher)

### Use Trial Scheduler

`ag.scheduler` supports scheduling trials in serial order and with early stopping.

We support basic FIFO scheduler.

In [None]:
# this is just example code; in practice, this is in fit function
savedir = 'checkpoint/demo.ag'

trial_scheduler = ag.scheduler.FIFO_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                checkpoint=savedir)

print(trial_scheduler)

We also support Hyperband which is an early stopping mechanism.

In [None]:
# this is just example code; in practice, this is in fit function

trial_scheduler = ag.scheduler.Hyperband_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                time_attr='epoch',
                reward_attr='accuracy',
                max_t=10,
                grace_period=1,
                checkpoint=savedir)

print(trial_scheduler)

### Resume Fit and Checkpointer

We use the resume and checkpoint dir in the scheduler.

In [None]:
savedir = 'checkpoint/demo.ag'
resume = False

In [None]:
trial_scheduler = ag.scheduler.Hyperband_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                checkpoint=savedir,
                resume=resume,
                time_attr='epoch',
                reward_attr='accuracy',
                max_t=10,
                grace_period=1)

print(trial_scheduler)

Or simply specify the trial scheduler with the string name:

In [None]:
trial_scheduler = 'hyperband'

### Visualize Using Tensor/MXBoard

We could visualize the traing curve using Tensorboad or MXboard.
To start the Tensorboard or MXboard, please use:

`tensorboard --logdir=./checkpoint/demo/logs --host=127.0.0.1 --port=8889`

An example is shown below.

<img src="./img/cifar_accuracy_curves_2.svg" width="400" height="400" />

### Create Stop Criterion

`autogluon` supports overall automatic constraints in `stop_criterion`.

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80, #if you know, otherwise use the default 1.0
    'max_trial_count': 2
}

### Create Resources Per Trial

`autogluon` supports constraints for each trial `in resource_per_trial`.

In [None]:
resources_per_trial = {
    'max_num_gpus': 0,
    'max_num_cpus': 4,
    'max_training_epochs': 1
}

### Create AutoGluon Fit with Full Capacity

In [None]:
results = task.fit(dataset,
                  nets,
                  optimizers,
                  searcher=searcher,
                  trial_scheduler='fifo',
                  resume=resume,
                  savedir=savedir,
                  stop_criterion=stop_criterion,
                  resources_per_trial=resources_per_trial,
                  demo=True) # only set demo=True when running on no GPU machine

The best accuracy is

In [None]:
print('%.2f acc' % (results.val_accuracy * 100))

The associated best configuration is:

In [None]:
print(results.config)

The total time cost is:

In [None]:
print('%.2f s' % results.time)

### Resume AutoGluon Fit

We could resume the previous training for more epochs to achieve better results. Similarly, we could also increase `max_trial_count` for better results.

Here we increase the `max_training_epochs` from 1 to 3, `max_trial_count` from 2 to 3, and set `resume = True` which will load the checking point in the savedir.

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80,
    'max_trial_count': 3
}

resources_per_trial = {
    'max_num_gpus': 0,
    'max_num_cpus': 4,
    'max_training_epochs': 3
}

resume = True

In [None]:
results = task.fit(dataset,
                  nets,
                  optimizers,
                  searcher=searcher,
                  trial_scheduler='fifo',
                  resume=resume,
                  savedir=savedir,
                  stop_criterion=stop_criterion,
                  resources_per_trial=resources_per_trial,
                  demo=True)

The best accuracy is

In [None]:
print('%.2f acc' % (results.val_accuracy * 100))

The associated best configuration is:

In [None]:
print(results.config)

The total time cost is:

In [None]:
print('%.2f s' % results.time)

## Refereces

code: https://code.amazon.com/packages/AutoGluon/trees/heads/mainline 

[1] fit backend tutorial: https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb 

[2] fit backend distributed tutorial: https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb