# AutoML and AutoDL :


The sensitivity of machine learning to hyper-parameters and model architecture has led to the advent of AutoML libraries. The two key components in AutoML: the search space and search algorithm.

- Network Architecture Search (NAS)
- HyperParameter Optimization (HPO)

More details are available at https://github.com/D-X-Y/Awesome-AutoDL

- Automated Machine Learning (AutoML) is a promising paradigm for tackling this difficulty. In
AutoML, selecting architectures and hyper-parameters is formulated as a search problem, where a
search space is defined to represent all possible choices and a search algorithm is used to find the best choices. 

- For hyper-parameter search, the search space would specify the range of values to try.
For architecture search, the search space would specify the architectural configurations to try. The
search space plays a critical role in the success of neural architecture search (NAS), and can be
significantly different from one application to another. In addition, there are also many different
search algorithms, such as random search, Bayesian optimization, RL-based methods, evolutionary methods, gradient-based methods and neural predictors. 

- This proliferation of search spaces and search algorithms in AutoML makes it difficult to program with
existing software libraries. In particular, a common problem of current libraries is that search spaces
and search algorithms are tightly coupled, making it hard to modify search space or search algorithm
alone. A practical scenario that arises is the need to upgrade a search algorithm while keeping the rest
of the infrastructure the same.

- Some formulate AutoML as a problem of jointly optimizing architectures and hyper-parameters. Others focus on providing interfaces for black-box optimization. In particular, Google’s Vizier library provides tools for optimizing a user-specified search space using black-box algorithms, but makes the end user responsible for translating a point in the search space into a user program. DeepArchitect proposes a language to create a search space as a program that connects user components. Keras-tuner employs a different way to annotate a model into a search space, though this annotation is limited to a list of supported components. Optuna 
embraces eager evaluation of tunable parameters, making it easy to declare a search space on the go. Meanwhile, efficient NAS algorithms brought new challenges to AutoML frameworks, which require coupling between the controller and child program. AutoGluon and NNI partially solve this problem by building predefined modules that work in both general search mode and weight-sharing mode, however, supporting different efficient NAS algorithms are still non-trivial. Among the existing AutoML systems we are aware of, complex search flows are less explored. Compared to existing systems, PyGlove employs a mutable programming model to solve these problems, making AutoML easily accessible to preexisting ML programs. It also accommodates the dynamic interactions among the child programs, search spaces, search algorithms, and search
flows to provide the flexibility needed for future AutoML research.


In [None]:
# You should have these package on your environment 

# !pip install jedi
# !pip install -U pip
# python.exe -m pip install -U pip
# conda install -c conda-forge pycocotools
# !pip install -U setuptools wheel
# conda install Cpython

## For the installation please refer to: 
### https://auto.gluon.ai/dev/install.html

## For the quick run, please refer to: 
### https://auto.gluon.ai/dev/tutorials/tabular_prediction/tabular-quickstart.html


In [None]:
!pip3 install torch==1.12+cpu torchvision==0.13.0+cpu torchtext==0.13.0 -f https://download.pytorch.org/whl/cpu/torch_stable.html

In [None]:
!pip3 install autogluon

In [None]:
 !python3 -m pip install autogluon.tabular[all]

In [None]:
from autogluon.tabular import TabularDataset, TabularPredictor

In [None]:
train_data = TabularDataset('./sample_data/california_housing_train.csv')
train_data.drop(columns=['longitude','latitude'],inplace=True)
train_data.info()
label = 'median_house_value'
print("Summary of class variable: \n", train_data[label].describe())

save_path = 'agModels-predictClass'  # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path,eval_metric='mean_squared_error',problem_type='regression').fit(train_data)


In [None]:
subsample_size = 500  # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
train_data.head()

In [None]:
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor

train = TabularDataset('../data/train.csv')
test = TabularDataset('../data/test.csv')

In [None]:
train.inf()

In [None]:
train.describe()

In [None]:
label = 'Survived'
print("Summary of class variable: \n", train[label].describe())

In [None]:
save_path = 'agModels-predictClass'  # specifies folder to store trained models
predictor = TabularPredictor(label=label, path=save_path).fit(train)
predictor.leaderboard(train, silent=True)

In [None]:
time_limit = 60
predictor = TabularPredictor(label=label).fit(train, time_limit=time_limit)

submission = pd.read_csv('../data/gender_submission.csv')
submission[label] = predictor.predict(test)
submission.to_csv('submission.csv', index=False)
submission.head()