# AutoGluon

AutoGluon aims to provide automatic machine learning (Auto ML) support for MXNet and Gluon. AutoGluon focuses on automatic deep learning (Auto DL). AutoGluon targets: 

* *Beginners *are* *70~80% of the customers who would be interested in AutoGluon. The basic Auto ML  scenario: customers have a traditional machine learning task by hand, provide own raw data, watch the search process, and finally obtain a good quality model. The beginners include but not limited to engineers and students, who are generally new to machine learning. 
* *Advanced users *aim to own full control and access to the Auto ML overall process as well as each important component, such as constructing own networks, metrics, losses, optimizers, searcher and trial scheduler. The advanced users could potentially have more specified constraints regarding to the automatic searching procedure. The advanced users include but not limited to experienced machine learning researchers and engineers.
* *Contributors: *Contributors are Advanced users who will create strategies that are useful for beginners either extending to new datasets, new domains, new algorithms or bringing state of art results to save time and effort.

The AutoGluon's design principles are:

* *Easy to use: *Deep learning framework users could use AutoGluon almost right away. The only usage difference between AutoGluon and Gluon is that: rather than providing a fixed value to different deep learning components, we enable a searchable range to let Auto ML decides which are the best, whereas all the major APIs' usage stays the same.
* *Easy to extend: *From user perspective, we organize the AutoGluon by tasks, users could easily use all the task specific components, such as data preprocessing, model zoo, metrics and losses, so that adding a new task could very straightforward. In this way, advanced ML tasks, such as GAN ,could be easily incorporated by providing a new task module. From system perspective, multiple back-ends could be used since the front-end are designed to be separate from the backends, this could be beneficial to extend to production-level Auto ML.

The AutoGluon's overall system design is as below:

<img src="./img/autogluon_overview.png" width="400" height="400" />

In the following*, we use Image Classification as a running example* to illustrate the usage of AutoGluon's main APIs.

As follow-ups, to understand more about the `fit` backend workflow, please find the basic and distributed fit backend tutorials in [`[1]`](https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb) and [`[2]`](https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb) respectively.


## Preparation

### Install AutoGluon

```bash
git clone ssh://git.amazon.com/pkg/AutoGluon
cd AutoGluon
python setup.py install
```

### Import Task

We are using image classification as an example in this notebook.

In [1]:
import warnings
warnings.filterwarnings("ignore")

from autogluon import image_classification as task

import logging
logging.basicConfig(level=logging.INFO)

## A Quick Image Classification Example

We first show the most basic usage by first creating a dataset and then fiting the dataset to generate the results with the image classification example.

### Create AutoGluon Dataset

We use CIFAR10 for image classfication for demo purpose.

In [2]:
dataset = task.Dataset(name='cifar10') # case insentive 

print(dataset) # show a quick summary of the dataset, e.g. #example for train, #classes

AutoGluon Dataset: 
 name = cifar10
 Train data statistic 
 number of classes = 10
 number of samples = 50000
 Val data statistic 
 number of classes = 10
 number of samples = 10000


The constructed dataset contains the CIFAR10 training and validation datasets.

In [3]:
dataset.train[0] # access the first example
dataset.val[-2:] # access the last 2 validation examples

(
 [[[[ 25  40  12]
    [ 15  36   3]
    [ 23  41  18]
    ...
    [ 61  82  78]
    [ 92 113 112]
    [ 75  89  92]]
 
   [[ 12  25   6]
    [ 20  37   7]
    [ 24  36  15]
    ...
    [115 134 138]
    [149 168 177]
    [104 117 131]]
 
   [[ 12  25  11]
    [ 15  29   6]
    [ 34  40  24]
    ...
    [154 172 182]
    [157 175 192]
    [116 129 151]]
 
   ...
 
   [[100 129  81]
    [103 132  84]
    [104 134  86]
    ...
    [ 97 128  84]
    [ 98 126  84]
    [ 91 121  79]]
 
   [[103 132  83]
    [104 131  83]
    [107 135  87]
    ...
    [101 132  87]
    [ 99 127  84]
    [ 92 121  79]]
 
   [[ 95 126  78]
    [ 95 123  76]
    [101 128  81]
    ...
    [ 93 124  80]
    [ 95 123  81]
    [ 92 120  80]]]
 
 
  [[[ 73  78  75]
    [ 98 103 113]
    [ 99 106 114]
    ...
    [135 150 152]
    [135 149 154]
    [203 215 223]]
 
   [[ 69  73  70]
    [ 84  89  97]
    [ 68  75  81]
    ...
    [ 85  95  89]
    [ 71  82  80]
    [120 133 135]]
 
   [[ 69  73  70]
    [ 90  95 100

Then we will use the default configuration of the image classification to generate:
* Best result of the search in terms of accuracy
* According best configuration regarding to the best result
    
To acheive this, we are using `fit` function to generate the above results based on the datasets.

The default configruation is based on `max_trial_count=2` and `max_training_epochs=3`. 
If running on no GPU environment, please set `demo=True` in the `fit`. The process would approximately cost one and half minutes.
If want to watch the `fit`, we default provide Tensorboad to visualize the process.
Please type `tensorboard --logdir=./checkpoint/exp1/logs --host=127.0.0.1 --port=8888` in the command.

In [4]:
results = task.fit(dataset, demo=True)

INFO:autogluon.task.image_classification.core:Start fitting
INFO:autogluon.task.image_classification.core:Start constructing search space
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.task.image_classification.core:Start using default backend.
INFO:autogluon.scheduler.fifo:Starting Experiments
INFO:autogluon.scheduler.fifo:Num of Finished Tasks is 0
INFO:autogluon.scheduler.fifo:Num of Pending Tasks is 2
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet110_v1", "optimizer": "adam", "lr": 0.0001750926405119516, "momentum": 0.854224638460032, "pretrained": true, "wd": 7.829126792739102e-05} and reward: 0.33203125
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet56_v1", "optimizer": "sgd", "lr": 0.059722562143918044, "momentum": 0.8545740897127332, "pretrained": false, "wd": 8.437329566608076e-06} and reward: 0.09114583333333333
INFO:autogluo

The best accuracy is:

In [5]:
print('%.2f acc' % (results.val_accuracy * 100))

33.20 acc


The associated best configuration is:

In [6]:
print(results.config)

{'model': 'cifar_resnet110_v1', 'optimizer': 'adam', 'lr': 0.0001750926405119516, 'momentum': 0.854224638460032, 'pretrained': True, 'wd': 7.829126792739102e-05}


Total time cost is:

In [7]:
print('%.2f s' % results.time)

100.14 s


## A Step-by-step Image Classification Example

We first introduce the basic configuration `autogluon.space`, which is used to represent the search space of each task components, we will then go throught each components, including 

* `autogluon.Dataset`
* `autogluon.Nets`
* `autogluon.Optimizers`
* `autogluon.Losses`
* `autogluon.Metrics`

and finally put all together to `fit` to generate best results.

### Import AutoGluon

In [8]:
import autogluon as ag

### Create AutoGluon Space


`autogluon.space` is a search space containing a set of configuration candidates.
We provide three basic space types.

* Categorical Space

In [9]:
list_space = ag.space.List('listspace', ['0', '1', '2'])
print(list_space)

AutoGluon List Space listspace: ['0', '1', '2']


* Linear Space

In [10]:
linear_space = ag.space.Linear('linspace', 0, 10)
print(linear_space)

AutoGluon Linear Space linspace: lower 0, upper 10


* Log Space

In [11]:
log_space = ag.space.Log('logspace', 10**-10, 10**-1)
print(log_space)

AutoGluon Log Space logspace: lower 0.000000, upper 0.100000


* An Example of Random Sample from the Combined Space

In [12]:
print(ag.space.sample_configuration([list_space, linear_space, log_space]))

Configuration:
  linspace, Value: 4
  listspace, Value: '0'
  logspace, Value: 0.002624699612532773



We then will use `autogluon.Nets` and `autogluon.Optimizers` as examples to show the usage of auto objects. The remainining auto objects are using default value.

### Create AutoGluon Nets

`autogluon.Nets` is a list of auto networks, and allows search for the best net

* from a list of provided (or default) networks
* by choosing the best architecture regarding to each auto net.

In [13]:
# type of net_list is ag.space.List

# method 1 (complex but flexiable): specify the net_list using get_model
# net_list = [task.model_zoo.get_model('cifar_resnet20_v1'), # TODO: pretrained and pretrained_dataset would be supported
#             task.model_zoo.get_model('cifar_resnet56_v1'),
#             task.model_zoo.get_model('cifar_resnet110_v1')]

# method 2 (easy and less flexiable): specify the net_list using model name
net_list = ['cifar_resnet20_v1',
            'cifar_resnet56_v1',
            'cifar_resnet110_v1']

# default net list for image classification would be overwritten 
# if net_list is provided
nets = ag.Nets(net_list)

print(nets)

AutoGluon Nets ['cifar_resnet20_v1', 'cifar_resnet56_v1', 'cifar_resnet110_v1'] with Configuration space object:
  Hyperparameters:
    model, Type: Categorical, Choices: {cifar_resnet20_v1, cifar_resnet56_v1, cifar_resnet110_v1}, Default: cifar_resnet20_v1
    pretrained, Type: Categorical, Choices: {True, False}, Default: True
  Conditions:
    pretrained | model in {'cifar_resnet20_v1', 'cifar_resnet56_v1', 'cifar_resnet110_v1'}



### Create AutoGluon Optimizers

`autogluon.Optimizers` defines a list of optimization algorithms that allows search for the best optimization algorithm 

* from a list of provided (or default) optimizers
* by choosing the best hyper-parameters regarding to each auto optimizer

In [14]:
# method 1 (complex but flexiable): specify the optim_list using get_optim
# optimizers = ag.Optimizers([ag.optim.get_optim('sgd'),
#                             ag.optim.get_optim('adam')])

# method 2 (easy and less flexiable): specify the optim_list using get_model
optimizers = ag.Optimizers(['sgd', 'adam'])

print(optimizers)

AutoGluon Optimizers ['sgd', 'adam'] with Configuration space object:
  Hyperparameters:
    lr, Type: UniformFloat, Range: [0.0001, 0.1], Default: 0.0031622777, on log-scale
    momentum, Type: UniformFloat, Range: [0.85, 0.95], Default: 0.9
    optimizer, Type: Categorical, Choices: {sgd, adam}, Default: sgd
    wd, Type: UniformFloat, Range: [1e-06, 0.01], Default: 0.0001, on log-scale
  Conditions:
    lr | optimizer in {'sgd', 'adam'}
    momentum | optimizer in {'sgd', 'adam'}
    wd | optimizer in {'sgd', 'adam'}



### Create AutoGluon Fit - Put all together

In [15]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 1.0,
    'max_trial_count': 4
}

resources_per_trial = {
    'max_num_gpus': 0, # set this to more than 1 if you have GPU machine to run more efficiently.
    'max_num_cpus': 4,
    'max_training_epochs': 2
}

results = task.fit(dataset,
                   nets,
                   optimizers,
                   stop_criterion=stop_criterion,
                   resources_per_trial=resources_per_trial,
                   demo=True) # demo=True is recommened when running on no GPU machine

INFO:autogluon.task.image_classification.core:Start fitting
INFO:autogluon.task.image_classification.core:Start constructing search space
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.task.image_classification.core:Start using default backend.
INFO:autogluon.scheduler.fifo:Starting Experiments
INFO:autogluon.scheduler.fifo:Num of Finished Tasks is 0
INFO:autogluon.scheduler.fifo:Num of Pending Tasks is 4
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet110_v1", "optimizer": "adam", "lr": 0.000971822017027605, "momentum": 0.8955635388444919, "pretrained": false, "wd": 1.0310856645047407e-05} and reward: 0.130859375
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet110_v1", "optimizer": "adam", "lr": 0.0006384488810730198, "momentum": 0.8965231041889178, "pretrained": false, "wd": 0.00022854716918998107} and reward: 0.09765625
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "

The best accuracy is:

In [16]:
print('%.2f acc' % (results.val_accuracy * 100))

13.09 acc


The best associated configuration is:

In [17]:
print(results.config)

{'model': 'cifar_resnet110_v1', 'optimizer': 'adam', 'lr': 0.000971822017027605, 'momentum': 0.8955635388444919, 'pretrained': False, 'wd': 1.0310856645047407e-05}


Total time cost is:

In [18]:
print('%.2f s' % results.time)

154.39 s


### Use Search Algorithm

`autogluon.searcher` will support both basic and SOTA searchers for both hyper-parameter optimization and architecture search. We now support random search. The default is using searcher is random searcher.

In [19]:
# cs is CS.ConfigurationSpace() where import ConfigSpace as CS, this is just example code;
# in practice, this is in fit function, and cs should not be None
cs = None
searcher = ag.searcher.RandomSampling(cs)

print(searcher)

RandomSampling(ConfigSpace: NoneResults: OrderedDict())


Or simply use string name:

In [20]:
searcher = 'random'

print(searcher)

random


### Use Trial Scheduler

`ag.scheduler` supports scheduling trials in serial order and with early stopping.

We support basic FIFO scheduler.

In [21]:
# this is just example code; in practice, this is in fit function
savedir = 'checkpoint/demo.ag'

trial_scheduler = ag.scheduler.FIFO_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                checkpoint=savedir)

print(trial_scheduler)

<autogluon.scheduler.fifo.FIFO_Scheduler object at 0x1a20b13780>


We also support Hyperband which is an early stopping mechanism.

In [22]:
# this is just example code; in practice, this is in fit function

trial_scheduler = ag.scheduler.Hyperband_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                time_attr='epoch',
                reward_attr='accuracy',
                max_t=10,
                grace_period=1,
                checkpoint=savedir)

print(trial_scheduler)

<autogluon.scheduler.hyperband.Hyperband_Scheduler object at 0x1a20b13b38>


### Resume Fit and Checkpointer

We use the resume and checkpoint dir in the scheduler.

In [23]:
savedir = 'checkpoint/demo.ag'
resume = False

In [24]:
trial_scheduler = ag.scheduler.Hyperband_Scheduler(
                task.pipeline.train_image_classification,
                None,
                {
                    'num_cpus': 4,
                    'num_gpus': 0,
                },
                searcher,
                checkpoint=savedir,
                resume=resume,
                time_attr='epoch',
                reward_attr='accuracy',
                max_t=10,
                grace_period=1)

print(trial_scheduler)

<autogluon.scheduler.hyperband.Hyperband_Scheduler object at 0x1a20b13e80>


Or simply specify the trial scheduler with the string name:

In [25]:
trial_scheduler = 'hyperband'

### Visualize Using Tensor/MXBoard

We could visualize the traing curve using Tensorboad or MXboard.
To start the Tensorboard or MXboard, please use:

`tensorboard --logdir=./checkpoint/demo/logs --host=127.0.0.1 --port=8889`

An example is shown below.

<img src="./img/cifar_accuracy_curves_2.svg" width="400" height="400" />

### Create Stop Criterion

`autogluon` supports overall automatic constraints in `stop_criterion`.

In [26]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80, #if you know, otherwise use the default 1.0
    'max_trial_count': 2
}

### Create Resources Per Trial

`autogluon` supports constraints for each trial `in resource_per_trial`.

In [27]:
resources_per_trial = {
    'max_num_gpus': 0,
    'max_num_cpus': 4,
    'max_training_epochs': 1
}

### Create AutoGluon Fit with Full Capacity

In [28]:
results = task.fit(dataset,
                  nets,
                  optimizers,
                  searcher=searcher,
                  trial_scheduler='fifo',
                  resume=resume,
                  savedir=savedir,
                  stop_criterion=stop_criterion,
                  resources_per_trial=resources_per_trial,
                  demo=True) # only set demo=True when running on no GPU machine

INFO:autogluon.task.image_classification.core:Start fitting
INFO:autogluon.task.image_classification.core:Start constructing search space
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.task.image_classification.core:Start using default backend.
INFO:autogluon.scheduler.fifo:Starting Experiments
INFO:autogluon.scheduler.fifo:Num of Finished Tasks is 0
INFO:autogluon.scheduler.fifo:Num of Pending Tasks is 2
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet56_v1", "optimizer": "sgd", "lr": 0.00017169634114508177, "momentum": 0.9144799623364764, "pretrained": true, "wd": 0.00038767613621760414} and reward: 0.13671875
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet20_v1", "optimizer": "adam", "lr": 0.008119505267006553, "momentum": 0.9180838785194803, "pretrained": false, "wd": 7.453270287735516e-06} and reward: 0.1015625
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.task.i

The best accuracy is

In [29]:
print('%.2f acc' % (results.val_accuracy * 100))

13.67 acc


The associated best configuration is:

In [30]:
print(results.config)

{'model': 'cifar_resnet56_v1', 'optimizer': 'sgd', 'lr': 0.00017169634114508177, 'momentum': 0.9144799623364764, 'pretrained': True, 'wd': 0.00038767613621760414}


The total time cost is:

In [31]:
print('%.2f s' % results.time)

17.94 s


### Resume AutoGluon Fit

We could resume the previous training for more epochs to achieve better results. Similarly, we could also increase `max_trial_count` for better results.

Here we increase the `max_training_epochs` from 1 to 3, `max_trial_count` from 2 to 3, and set `resume = True` which will load the checking point in the savedir.

In [32]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80,
    'max_trial_count': 3
}

resources_per_trial = {
    'max_num_gpus': 0,
    'max_num_cpus': 4,
    'max_training_epochs': 3
}

resume = True

In [33]:
results = task.fit(dataset,
                  nets,
                  optimizers,
                  searcher=searcher,
                  trial_scheduler='fifo',
                  resume=resume,
                  savedir=savedir,
                  stop_criterion=stop_criterion,
                  resources_per_trial=resources_per_trial,
                  demo=True)

INFO:autogluon.task.image_classification.core:Start fitting
INFO:autogluon.task.image_classification.core:Start constructing search space
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.task.image_classification.core:Start using default backend.
INFO:autogluon.scheduler.scheduler:Seting TASK ID: 8
INFO:autogluon.scheduler.fifo:Starting Experiments
INFO:autogluon.scheduler.fifo:Num of Finished Tasks is 2
INFO:autogluon.scheduler.fifo:Num of Pending Tasks is 1
INFO:autogluon.task.image_classification.core:Finished.
INFO:autogluon.searcher.searcher:Finished Task with config: {"model": "cifar_resnet110_v1", "optimizer": "sgd", "lr": 0.0007197816540971983, "momentum": 0.9026482740642131, "pretrained": false, "wd": 1.2516407792664963e-06} and reward: 0.10677083333333333
INFO:autogluon.task.image_classification.core:Finished.


The best accuracy is

In [34]:
print('%.2f acc' % (results.val_accuracy * 100))

13.67 acc


The associated best configuration is:

In [35]:
print(results.config)

{'model': 'cifar_resnet56_v1', 'optimizer': 'sgd', 'lr': 0.00017169634114508177, 'momentum': 0.9144799623364764, 'pretrained': True, 'wd': 0.00038767613621760414}


The total time cost is:

In [36]:
print('%.2f s' % results.time)

64.45 s


## Refereces

code: https://code.amazon.com/packages/AutoGluon/trees/heads/mainline 

[1] fit backend tutorial: https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb 

[2] fit backend distributed tutorial: https://code.amazon.com/packages/AutoGluon/blobs/brazil/--/tutorials/autogluon_demo.ipynb