# Auto ML model selection

AutoML allows us to try multiple algorithms and preprocessing transformations on our data. This, alongside the scalable cloud-based compute makes it possible for us to find best performing model of our data.

This feature can be run only in an Enterprise version.

AutoML can be used for Clasification, Regression or Time Series Forecasting.

By default, AutoML will randomly select from the full range of algos for a specific task. You can choose to block certain algos from being used if you know your data is not suited to that algo, or you have to comply with a policy that restricts the type of ML you can use in your organisation.

As well as using randomly selected algos on your data, it can also apply preprocessing transformation on your data. This will improve the performance of the model.

It will apply **scaling and normalisation** to numeric data automatically. There is also optional featurisation:
- missing value imputation to remove nulls
- categorical encoding convert categorical features to numeric indicators
- drop high cardinality features such as records
- feature engineering

Using the AzureML SDK provides with more configurations. You can set the automated experiment options by using the **AutoML** class.

```python
from azureml.train.automl import AutoMLConfig

automl_run_config = RunConfiguration(framework='python')
automl_config = AutoMlConfig(
    name='Automated ML experiment',
    task='classification',
    primary_metric='AUC_weighted',
    compute_target='aml-compute',
    training_data=train_dataset,
    validation_data=test_dataset,
    label_column_name='Label',
    featurization='auto',
    iterations=12,
    max_concurrent_iterations=4
)
```

When using the UI, you can select a ***AML Dataset*** to use. 
When using the SDK:
- specify a ***AML Dataset*** or dataframe of **training data**
- optional: specify a second validation data set or df that will be used to validate the model. If it is not provided, AML will apply cross-validation using the training data



**primary_metric** is one of the more important parameters you need to configure. This is the target performance metric which will determine the optimal model. AML supports a set of named metrics for each type of task. You can retreive them by running the below:

In [None]:
from azureml.train.automl.utilities import get_primary_metrics
get_primary_metrics('classification')

You can submit an AutoML experiment by:
```python
from azureml.core.experiment import Experiment 

automl_experiment = Experiment(ws, 'automl_experiment')
automl_run = automl_experiment.submit(automl_config)
```

You can monitor the above in the usual way: 
- in the portal
- RunDetails widget

You can easily retreive the best model to download or deploy it using:
```python
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)
```

AutoML uses scikit-learn pipelines to encapsulate preprocessing steps of the model. You can view the preprocessing steps for the fitted model of the best run using:
```python
for step_ in fitted_model.named_steps:
    print(step)
```