# Use Automated Machine Learning

There are many kinds of machine learning algorithm that you can use to train a model, and sometimes it's not easy to determine the most effective algorithm for your particular data and prediction requirements. Additionally, you can significantly affect the predictive performance of a model by preprocessing the training data, using techniques such as normalization, missing feature imputation, and others. In your quest to find the *best* model for your requirements, you may need to try many combinations of algorithms and preprocessing transformations; which takes a lot of time and compute resources.

Azure Machine Learning enables you to automate the comparison of models trained using different algorithms and preprocessing options. You can use the visual interface in [Azure Machine Learning studio](https://ml/azure.com) or the SDK to leverage this capability. The SDK gives you greater control over the settings for the automated machine learning experiment, but the visual interface is easier to use.

## Before you start

In addition to the latest version of the **azureml-sdk** and **azureml-widgets** packages, you'll need the **azureml-train-automl** package to run the code in this notebook. Run the cell below to verify that it is installed.

In [1]:
import azureml.train.automl.utilities as automl_utils

Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (azureml-dataprep-rslex 1.2.3 (c:\applications\anaconda\lib\site-packages), Requirement.parse('azureml-dataprep-rslex<1.19.0a,>=1.18.0dev0'), {'azureml-dataprep'}).


## Connect to your workspace

With the required SDK packages installed, now you're ready to connect to your workspace.

> **Note**: If you haven't already established an authenticated session with your Azure subscription, you'll be prompted to authenticate by clicking a link, entering an authentication code, and signing into Azure.

In [2]:
import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.33.0 to work with wsag


## Prepare data

You don't need to create a training script for automated machine learning, but you do need to load the training data. In this case, you'll use a dataset containing details of diabetes patients, and then split this into two datasets: one for training, and another for model validation.

In [3]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'diabetes dataset' not in ws.datasets:
    default_ds.upload_files(files=['data/diabetes.csv', 'data/diabetes2.csv'], # Upload the diabetes csv files in /data
                        target_path='diabetes-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

    #Create a tabular dataset from the path on the datastore (this may take a short while)
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'diabetes-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='diabetes dataset',
                                description='diabetes data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')


# Split the dataset into training and validation subsets
diabetes_ds = ws.datasets.get("diabetes dataset")
train_ds, test_ds = diabetes_ds.random_split(percentage=0.7, seed=123)
print("Data ready!")

Uploading an estimated of 2 files
Uploading data/diabetes2.csv
Uploaded data/diabetes2.csv, 1 files out of an estimated total of 2
Uploading data/diabetes.csv
Uploaded data/diabetes.csv, 2 files out of an estimated total of 2
Uploaded 2 files
Dataset registered.
Data ready!


## Prepare compute

One of the benefits of cloud compute is that it scales on-demand, enabling you to provision enough compute resources to process multiple child-runs of an automated machine learning experiment in parallel.

Use the following code to specify an Azure Machine Learning compute cluster (it will be created if it doesn't already exist)..

> **Important**: Change *your-compute-cluster* to the name of your compute cluster in the code below before running it! Cluster names must be globally unique names between 2 to 16 characters in length. Valid characters are letters, digits, and the - character.

In [4]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "agcluster"

try:
    # Check for existing compute target
    training_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    # If it doesn't already exist, create it
    try:
        compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS11_V2', max_nodes=2)
        training_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
        training_cluster.wait_for_completion(show_output=True)
    except Exception as ex:
        print(ex)
    

InProgress.....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


> **Note**: Compute instances and clusters are based on standard Azure virtual machine images. For this exercise, the *Standard_DS11_v2* image is recommended to achieve the optimal balance of cost and performance. If your subscription has a quota that does not include this image, choose an alternative image; but bear in mind that a larger image may incur higher cost and a smaller image may not be sufficient to complete the tasks. Alternatively, ask your Azure administrator to extend your quota.

## Configure automated machine learning

Now you're ready to configure the automated machine learning experiment.

One of the most important configuration settings is the metric by which model performance should be evaluated. You can retrieve a list of the metrics that are calculated by automated machine learning for a particular type of model task (classification or regression) like this:

In [5]:
import azureml.train.automl.utilities as automl_utils

for metric in automl_utils.get_primary_metrics('classification'):
    print(metric)

AUC_weighted
average_precision_score_weighted
norm_macro_recall
accuracy
precision_score_weighted


Having decided the metric you want to optimize (in this example, *AUC_weighted*), you can configure the automated machine learning run. To do this, you'll need an AutoML configuration that specifies the target metric as well as options like the data to use, how many combinations to try, and so on.

> **Note**: In this example, you'll restrict the experiment to 4 iterations to reduce the amount of time taken. In reality, you'd likely try many more iterations.

In [6]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(name='Automated ML Experiment',
                             task='classification',
                             compute_target=training_cluster,
                             training_data = train_ds,
                             validation_data = test_ds,
                             label_column_name='Diabetic',
                             iterations=4,
                             primary_metric = 'AUC_weighted',
                             max_concurrent_iterations=2,
                             featurization='auto'
                             )

print("Ready for Auto ML run.")

Ready for Auto ML run.


## Run an automated machine learning experiment

OK, you're ready to go. Let's run the automated machine learning experiment.

> **Note**: This may take some time!

In [7]:
from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails

print('Submitting Auto ML experiment...')
automl_experiment = Experiment(ws, 'mslearn-diabetes-automl-sdk')
automl_run = automl_experiment.submit(automl_config)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=True)

Submitting Auto ML experiment...
Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
mslearn-diabetes-automl-sdk,AutoML_abfbb864-254a-49dd-ab01-9aae018ca8de,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
mslearn-diabetes-automl-sdk,AutoML_abfbb864-254a-49dd-ab01-9aae018ca8de,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS

{'runId': 'AutoML_abfbb864-254a-49dd-ab01-9aae018ca8de',
 'target': 'agcluster',
 'status': 'Completed',
 'startTimeUtc': '2021-09-02T06:43:29.644363Z',
 'endTimeUtc': '2021-09-02T06:54:52.103012Z',
 'properties': {'num_iterations': '4',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'agcluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"ee33e151-856b-47a7-a2a8-9484a3e273b8\\"}, \\"validation_data\\": {\\"datasetId\\": \\"7ae03fb2-88a3-488f-a578-883ec00170bb\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml": "0.2.7", "azureml-widgets": "1.33.0", "azureml-train": "1.30.0", "azureml-train-restclients-hyperdrive": "1.33.0", "azureml-train-core": "1.33.0", "azureml-train

## View child run details

When the experiment has completed, view the output in the widget, and click the run that produced the best result to see its details.

Then click the link to view the experiment details in Azure Machine Learning studio and view the overall experiment details before viewing the details for the individual run that produced the best result. There's lots of information here about the performance of the model generated.

You can also retrieve all of the child runs and view their metrics using the SDK:

In [8]:
for run in automl_run.get_children():
    print('Run ID', run.id)
    for metric in run.get_metrics():
        print('\t', run.get_metrics(metric))

Run ID AutoML_abfbb864-254a-49dd-ab01-9aae018ca8de_3
	 {'f1_score_macro': 0.400215895290784}
	 {'average_precision_score_micro': 0.899826619525177}
	 {'norm_macro_recall': 0.0}
	 {'recall_score_micro': 0.6672665916760405}
	 {'precision_score_micro': 0.6672665916760405}
	 {'precision_score_weighted': 0.4452447043669598}
	 {'AUC_macro': 0.9899685732874312}
	 {'weighted_accuracy': 0.8008628809959982}
	 {'f1_score_weighted': 0.534101392770513}
	 {'matthews_correlation': 0.0}
	 {'recall_score_weighted': 0.6672665916760405}
	 {'accuracy': 0.6672665916760405}
	 {'f1_score_micro': 0.6672665916760405}
	 {'balanced_accuracy': 0.5}
	 {'AUC_micro': 0.8848340864028666}
	 {'average_precision_score_weighted': 0.9905840031770856}
	 {'log_loss': 0.6230267775927263}
	 {'precision_score_macro': 0.33363329583802026}
	 {'average_precision_score_macro': 0.9884229823710426}
	 {'AUC_weighted': 0.9899685732874312}
	 {'recall_score_macro': 0.5}
	 {'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_

## Get the best run

You can retrieve the best-performing run, and view its details like this:

In [9]:
!pip install azureml-train-automl-runtime

Collecting azureml-train-automl-runtime
  Using cached azureml_train_automl_runtime-1.18.0.post1-py3-none-any.whl (119 kB)
Collecting azureml-automl-core~=1.18.0
  Using cached azureml_automl_core-1.18.0.post2-py3-none-any.whl (186 kB)
Collecting onnxconverter-common<=1.6.0,>=1.4.2
  Using cached onnxconverter_common-1.6.0-py2.py3-none-any.whl (43 kB)
Collecting azureml-defaults~=1.18.0
  Using cached azureml_defaults-1.18.0-py3-none-any.whl (3.1 kB)
Collecting pandas<1.0.0,>=0.21.0
  Using cached pandas-0.25.3-cp38-cp38-win_amd64.whl (9.4 MB)
Collecting keras2onnx<=1.6.0,>=1.4.0
  Using cached keras2onnx-1.6.0-py3-none-any.whl (219 kB)
Collecting smart-open<=1.9.0
  Using cached smart_open-1.9.0-py3-none-any.whl
Collecting azureml-interpret~=1.18.0
  Using cached azureml_interpret-1.18.0-py3-none-any.whl (47 kB)
Collecting gensim<3.9.0
  Using cached gensim-3.8.3-cp38-cp38-win_amd64.whl (24.2 MB)
Collecting skl2onnx==1.4.9
  Using cached skl2onnx-1.4.9-py2.py3-none-any.whl (114 kB)
Co

  ERROR: Command errored out with exit status 1:
   command: 'C:\Applications\Anaconda\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ambar\\AppData\\Local\\Temp\\pip-install-wfgi_2qo\\pmdarima_a4ef56ee9e5a49d88d6f55d1174ed7b0\\setup.py'"'"'; __file__='"'"'C:\\Users\\ambar\\AppData\\Local\\Temp\\pip-install-wfgi_2qo\\pmdarima_a4ef56ee9e5a49d88d6f55d1174ed7b0\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\ambar\AppData\Local\Temp\pip-wheel-zzbfokrq'
       cwd: C:\Users\ambar\AppData\Local\Temp\pip-install-wfgi_2qo\pmdarima_a4ef56ee9e5a49d88d6f55d1174ed7b0\
  Complete output (179 lines):
  Partial import of pmdarima during the build process.
  Requirements: ['numpy>=1.10\nCython>=0.29\nscipy>=1.0\nscikit-learn>=0.19\npandas>=0.19\nstatsmodels>=0.9.0\n']
  Adding extra setuptools args
  blas_opt_info:

Collecting onnxmltools==1.4.1
  Using cached onnxmltools-1.4.1-py2.py3-none-any.whl (371 kB)
Collecting azure-storage-queue~=12.1
  Using cached azure_storage_queue-12.1.6-py2.py3-none-any.whl (137 kB)
Collecting py-cpuinfo==5.0.0
  Using cached py_cpuinfo-5.0.0-py3-none-any.whl
Collecting nimbusml<=1.8.0,>=1.7.1
  Using cached nimbusml-1.8.0-cp38-none-win_amd64.whl (59.1 MB)
Collecting pmdarima==1.1.1
  Using cached pmdarima-1.1.1.tar.gz (622 kB)
Collecting azure-mgmt-keyvault<7.0.0,>=0.40.0
  Using cached azure_mgmt_keyvault-2.2.0-py2.py3-none-any.whl (89 kB)
Collecting azureml-dataprep<2.5.0a,>=2.4.0a
  Using cached azureml_dataprep-2.4.5-py3-none-any.whl (28.2 MB)
Collecting azureml-dataprep-native<25.0.0,>=24.0.0
  Using cached azureml_dataprep_native-24.0.0-cp38-cp38-win_amd64.whl (896 kB)
Collecting fusepy<4.0.0,>=3.0.1
  Using cached fusepy-3.0.1-py3-none-any.whl
Collecting gunicorn==19.9.0
  Using cached gunicorn-19.9.0-py2.py3-none-any.whl (112 kB)
Collecting azureml-model-ma

  copying shap\explainers\other\__init__.py -> build\lib.win-amd64-3.8\shap\explainers\other
  creating build\lib.win-amd64-3.8\shap\explainers\deep
  copying shap\explainers\deep\deep_pytorch.py -> build\lib.win-amd64-3.8\shap\explainers\deep
  copying shap\explainers\deep\deep_tf.py -> build\lib.win-amd64-3.8\shap\explainers\deep
  copying shap\explainers\deep\__init__.py -> build\lib.win-amd64-3.8\shap\explainers\deep
  creating build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\bar.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\colorconv.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\colors.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\decision.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\dependence.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\embedding.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\force.py -> build\lib.win-amd64-3.8\shap\plots
  copying shap\plots\force_matplotlib.

In [10]:
best_run, fitted_model = automl_run.get_output()
print(best_run)
print('\nBest Model Definition:')
print(fitted_model)

print('\nBest Run Metrics:')
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)



Run(Experiment: mslearn-diabetes-automl-sdk,
Id: AutoML_abfbb864-254a-49dd-ab01-9aae018ca8de_0,
Type: azureml.scriptrun,
Status: Completed)

Best Model Definition:
None

Best Run Metrics:
AUC_macro 0.9904812577250306
precision_score_macro 0.9467885716011675
accuracy 0.9520809898762654
matthews_correlation 0.8918972733002116
AUC_weighted 0.9904812577250306
average_precision_score_weighted 0.9910085472336622
recall_score_weighted 0.9520809898762654
norm_macro_recall 0.8902205614498688
average_precision_score_micro 0.9918359683181054
recall_score_macro 0.9451102807249344
precision_score_micro 0.9520809898762654
f1_score_micro 0.9520809898762654
AUC_micro 0.9916008305485999
average_precision_score_macro 0.9888863976623419
recall_score_micro 0.9520809898762654
balanced_accuracy 0.9451102807249344
f1_score_weighted 0.952035905457969
precision_score_weighted 0.9520038306966419
f1_score_macro 0.9459413118209691
weighted_accuracy 0.9576485145517856
log_loss 0.1186120500342212
confusion_matrix a

Finally, having found the best performing model, you can register it.

In [11]:
from azureml.core import Model

# Register model
best_run.register_model(model_path='outputs/model.pkl', model_name='diabetes_model',
                        tags={'Training context':'Auto ML'},
                        properties={'AUC': best_run_metrics['AUC_weighted'], 'Accuracy': best_run_metrics['accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

diabetes_model version: 1
	 Training context : Auto ML
	 AUC : 0.9904812577250306
	 Accuracy : 0.9520809898762654




> **More Information**: For more information Automated Machine Learning, see the [Azure ML documentation](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train).