# AutoGluon

## Preparation

### Install AutoGluon

```bash
git clone ssh://git.amazon.com/pkg/AutoGluon
cd AutoGluon
python setup.py develop
```

### Import Task

We are using text classification as an example in this notebook.

In [None]:
from autogluon import text_classification as task

import logging
logging.basicConfig(level=logging.INFO)

## A Quick Text Classification Example

We first show the most basic usage by first creating a dataset and then fiting the dataset to generate the results with the text classification example. We will use Stanford Sentiment Treebank Dataset for this tutorial.

### Create AutoGluon Dataset

We use Stanford Sentiment Treebank - 2 (SST2) dataset.

In [None]:
dataset = task.Dataset(name='sst_2') # case insentive 
# TODO. show a quick summary of the dataset, e.g. #example for train, #classes

The constructed dataset contains the `gluon.data.DataLoader` for training and validation datasets.

Then we will use the default configuration of the text classification to generate:
* Best result of the search in terms of validation accuracy
* Get the best configuration corresponding to the best result obtained.
    
To acheive this, we are using `fit` function to generate the above results based on the datasets.

The default configruation is based on `max_trial_count=5` and `max_training_epochs=5`.

In [None]:
results = task.fit(dataset)

The best accuracy is:

In [None]:
print(results.metric)

The associated best configuration is:

In [None]:
print(results.config)

Total time cost is:

In [None]:
print('%.2f s' % results.time)

## A Step-by-step Text Classification Example

We first introduce the basic configuration `autogluon.space`, which is used to represent the search space of each task components, we will then go throught each components, including 

* `autogluon.Dataset`
* `autogluon.Nets`
* `autogluon.Optimizers`
* `autogluon.Losses`
* `autogluon.Metrics`

and finally put all together to `fit` to generate best results.

### Import AutoGluon

In [None]:
import warnings
warnings.filterwarnings("ignore")

import autogluon as ag

### Create AutoGluon Space


`autogluon.space` is a search space containing a set of configuration candidates.
We provide three basic space types.

* Categorical Space

In [None]:
list_space = ag.space.List('listspace', ['0', '1', '2'])
print(list_space)

* Linear Space

In [None]:
linear_space = ag.space.Linear('linspace', 0, 10)
print(linear_space)

* Log Space

In [None]:
log_space = ag.space.Log('logspace', 10**-10, 10**-1)
print(log_space)

* An Example of Random Sample from the Combined Space

In [None]:
print(ag.space.sample_configuration([list_space, linear_space, log_space]))

We then will use `autogluon.Nets` and `autogluon.Optimizers` as examples to show the usage of auto objects. The remainining auto objects are using default value.

### Create AutoGluon Nets

`autogluon.Nets` is a list of auto networks, and allows search for the best net

* from a list of provided (or default) networks
* by choosing the best architecture regarding to each auto net.

In [None]:
# type of net_list is ag.space.List

# method 1 (complex but flexiable): specify the net_list using get_model
# net_list = [task.model_zoo.get_model('cifar_resnet20_v1'), # TODO: pretrained and pretrained_dataset would be supported
#             task.model_zoo.get_model('cifar_resnet56_v1'),
#             task.model_zoo.get_model('cifar_resnet110_v1')]

# method 2 (easy and less flexiable): specify the net_list using model name
net_list = ['standard_lstm_lm_200',
            'awd_lstm_lm_600',
            'awd_lstm_lm_1150']

# default net list for text classification would be overwritten 
# if net_list is provided
nets = ag.Nets(net_list)

print(nets)

### Create AutoGluon Optimizers

`autogluon.Optimizers` defines a list of optimization algorithms that allows search for the best optimization algorithm 

* from a list of provided (or default) optimizers
* by choosing the best hyper-parameters regarding to each auto optimizer

In [None]:
# method 1 (complex but flexiable): specify the optim_list using get_optim
# optimizers = ag.Optimizers([ag.optim.get_optim('sgd'),
#                             ag.optim.get_optim('adam')])

# method 2 (easy and less flexiable): specify the optim_list using get_model
optimizers = ag.Optimizers(['sgd', 'adam', 'ftml'])

print(optimizers)

### Create AutoGluon Fit - Put all together

In [None]:
stop_criterion = {
    'time_limits': 1*60*60,
    'max_metric': 0.80,
    'max_trial_count': 5
}

resources_per_trial = {
    'max_num_gpus': 4,
    'max_num_cpus': 4,
    'max_training_epochs': 5
}

results = task.fit(dataset,
                   nets,
                   optimizers,
                   stop_criterion=stop_criterion,
                   resources_per_trial=resources_per_trial)

The best accuracy is:

In [None]:
print(results.val_accuracy)

The best associated configuration is:

In [None]:
print(results.config)

Total time cost is:

In [None]:
print('%.2f s' % results.time)

## Refereces

* code: https://code.amazon.com/packages/AutoGluon/trees/heads/mainline 
* API design: https://quip-amazon.com/aaGsAS9lY3WU/AutoGluon-API
* Implementation roadmap: https://quip-amazon.com/zlQUAjSWBc3c/AutoGluon-System-Implementation-Roadmap