# AutoGluon

AutoGluon aims to provide automatic machine learning (Auto ML) support for MXNet and Gluon. AutoGluon focuses on automatic deep learning (Auto DL). AutoGluon targets: 

* *Beginners *are* *70~80% of the customers who would be interested in AutoGluon. The basic Auto ML  scenario: customers have a traditional machine learning task by hand, provide own raw data, watch the search process, and finally obtain a good quality model. The beginners include but not limited to engineers and students, who are generally new to machine learning. 
* *Advanced users *aim to own full control and access to the Auto ML overall process as well as each important component, such as constructing own networks, metrics, losses, optimizers, searcher and trial scheduler. The advanced users could potentially have more specified constraints regarding to the automatic searching procedure. The advanced users include but not limited to experienced machine learning researchers and engineers.
* *Contributors: *Contributors are Advanced users who will create strategies that are useful for beginners either extending to new datasets, new domains, new algorithms or bringing state of art results to save time and effort.

The AutoGluon's design principles are:

* *Easy to use: *Deep learning framework users could use AutoGluon almost right away. The only usage difference between AutoGluon and Gluon is that: rather than providing a fixed value to different deep learning components, we enable a searchable range to let Auto ML decides which are the best, whereas all the major APIs' usage stays the same.
* *Easy to extend: *From user perspective, we organize the AutoGluon by tasks, users could easily use all the task specific components, such as data preprocessing, model zoo, metrics and losses, so that adding a new task could very straightforward. In this way, advanced ML tasks, such as GAN ,could be easily incorporated by providing a new task module. From system perspective, multiple back-ends could be used since the front-end are designed to be separate from the backends, this could be beneficial to extend to production-level Auto ML.

In the following*, we use Image Classification as a running example* to illustrate the usage of AutoGluon's main APIs.


## Preparation


```bash
# TODO. only support linux, still buggy at macos.
git clone ssh://git.amazon.com/pkg/AutoGluon
cd AutoGluon
python setup.py develop
```

### Import Task

We are using image classification as an example in this notebook.

In [1]:
from autogluon import image_classification as task

ImportError: cannot import name 'Nets'

## A Quick Image Classification Example

We first show the most basic usage by first creating a dataset and then fiting the dataset to generate the results with the image classification example.

### Create AutoGluon Dataset

We use CIFAR10 for image classfication for demo purpose.

In [None]:
dataset = task.Dataset(name='cifar10') # case insentive 
# TODO. show a quick summary of the dataset, e.g. #example for train, #classes

The constructed dataset contains the `gluon.data.DataLoader` for the CIFAR10 training and validation datasets.

In [None]:
dataset.train[0] # access the first example
dataset.val[-10:] # access the last 10 validation examples

Then we will use the default configuration of the image classification to generate:
* Best result of the search in terms of accuracy
* According best configuration regarding to the best result
    
To acheive this, we are using `fit` function to generate the above results based on the datasets.

The default configruation is based on `max_trial_count=10` and `max_training_epochs=1`. This will take approximately 10min to finish.

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80,
    'max_trial_count': 5
}

resources_per_trial = {
    'max_num_gpus': 1,
    'max_num_cpus': 4,
    'max_training_epochs': 10
}


results = task.fit(dataset,
                   stop_criterion=stop_criterion,
                   resources_per_trial=resources_per_trial)

# TODO. only show INFO. save debug into a file
# TODO. show a valid figure

The best accuracy is:

In [None]:
results.accuracy

The associated best configuration is:

In [None]:
results.config

## A Step-by-step Image Classification Example

We first introduce the basic configuration `autogluon.space`, which is used to represent the search space of each task components, we will then go throught each components, including 

* `autogluon.Dataset`
* `autogluon.Nets`
* `autogluon.Optimizers`
* `autogluon.Losses`
* `autogluon.Metrics`

and finally put all together to `fit` to generate best results.

### Import AutoGluon

In [None]:
import warnings
warnings.filterwarnings("ignore")

import autogluon as ag

### Create AutoGluon Space


`autogluon.space` is a search space containing a set of configuration candidates.
We provide three basic space types.

* Categorical Space

In [None]:
list_space = ag.space.List('listspace', ['0', '1', '2'])
print(list_space)

* Linear Space

In [None]:
linear_space = ag.space.Linear('linspace', 0, 10)
print(linear_space)

* Log Space

In [None]:
log_space = ag.space.Log('logspace', 10**-10, 10**-1)
print(log_space)

* An Example of Random Sample from the Combined Space

In [None]:
print(ag.space.sample_configuration([list_space, linear_space, log_space]))

We then will use `autogluon.Nets` and `autogluon.Optimizers` as examples to show the usage of auto objects. The remainining auto objects are using default value.

### Create AutoGluon Nets

`autogluon.Nets` is a list of auto networks, and allows search for the best net

* from a list of provided (or default) networks
* by choosing the best architecture regarding to each auto net.

In [None]:
# # type of net_list is ag.space.List
# net_list = [task.model_zoo.get_model('resnet18_v1'),
#             task.model_zoo.get_model('resnet34_v1'),
#             task.model_zoo.get_model('resnet50_v1'),
#             task.model_zoo.get_model('resnet101_v1'),
#             task.model_zoo.get_model('resnet152_v1')]

# # default net list for image classification would be overwritten 
# # if net_list is provided
# nets = ag.Nets(net_list)

# type of net_list is ag.space.List
net_list = ['resnet18_v1',
            'resnet34_v1',
            'resnet50_v1',
            'resnet101_v1',
            'resnet152_v1']

# default net list for image classification would be overwritten 
# if net_list is provided
nets = ag.Nets(net_list)

print(nets)

### Create AutoGluon Optimizers

`autogluon.Optimizers` defines a list of optimization algorithms that allows search for the best optimization algorithm 

* from a list of provided (or default) optimizers
* by choosing the best hyper-parameters regarding to each auto optimizer

In [None]:
# optimizers = ag.Optimizers([ag.optim.get_optim('sgd'),
#                             ag.optim.get_optim('adam')])

# print(optimizers)

optimizers = ag.Optimizers(['sgd', 'adam'])

print(optimizers)

### Create AutoGluon Fit - Put all together

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80,
    'max_trial_count': 10
}

resources_per_trial = {
    'max_num_gpus': 1,
    'max_num_cpus': 4,
    'max_training_epochs': 20
}

results = task.fit(dataset,
                   stop_criterion=stop_criterion,
                   resources_per_trial=resources_per_trial)

The best accuracy is:

In [None]:
results.accuracy

The best associated configuration is:

In [None]:
results.config

<div style="width: 500px;">![traingcurve](demo.png)</div>

## Refereces

* code: https://code.amazon.com/packages/AutoGluon/trees/heads/mainline 
* API design: https://quip-amazon.com/aaGsAS9lY3WU/AutoGluon-API
* Implementation roadmap: https://quip-amazon.com/zlQUAjSWBc3c/AutoGluon-System-Implementation-Roadmap