In [19]:
pip show azureml-train-automl

Name: azureml-train-automl
Version: 1.44.0
Summary: Used for automatically finding the best machine learning model and its parameters.
Home-page: https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py
Author: Microsoft Corp
Author-email: None
License: https://aka.ms/azureml-sdk-license
Location: /anaconda/envs/azureml_py38/lib/python3.8/site-packages
Requires: azureml-train-automl-runtime, azureml-train-automl-client, azureml-automl-runtime, azureml-automl-core, azureml-dataset-runtime
Required-by: azureml-automl-dnn-nlp
Note: you may need to restart the kernel to use updated packages.


In [20]:
import azureml.core
from azureml.core import Workspace

ws = Workspace.from_config()
print("Ready to use Azure ML {} to work with {}".format(azureml.core.VERSION, ws.name))

Ready to use Azure ML 1.44.0 to work with blog-space


In [21]:
from azureml.core import Dataset

default_ds = ws.get_default_datastore()

if 'Titanic dataset' not in ws.datasets:
    default_ds.upload_files(files=['./Titanic.csv'], # Upload the Titanic csv file
                        target_path='Titanic-data/', # Put it in a folder path in the datastore
                        overwrite=True, # Replace existing files of the same name
                        show_progress=True)

    #Create a tabular dataset from the path on the datastore
    tab_data_set = Dataset.Tabular.from_delimited_files(path=(default_ds, 'Titanic-data/*.csv'))

    # Register the tabular dataset
    try:
        tab_data_set = tab_data_set.register(workspace=ws, 
                                name='Titanic dataset',
                                description='Titanic data',
                                tags = {'format':'CSV'},
                                create_new_version=True)
        print('Dataset registered.')
    except Exception as ex:
        print(ex)
else:
    print('Dataset already registered.')


# Split the dataset into training and validation subsets
diabetes_ds = ws.datasets.get("Titanic dataset")
train_ds, test_ds = diabetes_ds.random_split(percentage=0.7, seed=123)
print("Data ready!")

Dataset already registered.
Data ready!


## Using Compute Cluster
After creating a cluster in Azure machine learning studio we specify the compute cluster below

In [22]:
from azureml.core.compute import ComputeTarget
training_cluster = ComputeTarget(workspace=ws, name="blog-cluster")

## Configuring AutoML
For a problem such as this(classification) various compute resources are used which can be retrieved like this:

In [23]:
import azureml.train.automl.utilities as automl_utils

for metric in automl_utils.get_primary_metrics('classification'):
    print(metric)

average_precision_score_weighted
precision_score_weighted
AUC_weighted
norm_macro_recall
accuracy


We can select any metric from above that we wish to optimize. In our case it is AUC_weighted. Below we customise our AutoML that specifies the target metric as well as additional options

In [24]:
from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(name='Automated ML Experiment',
                             task='classification',
                             compute_target=training_cluster,
                             training_data = train_ds,
                             validation_data = test_ds,
                             label_column_name='Survived',
                             iterations=4,
                             primary_metric = 'AUC_weighted',
                             max_concurrent_iterations=2,
                             featurization='auto'
                             )

print("Ready for Auto ML run.")

Ready for Auto ML run.


## Running our AutoML experiment
This may take some time

In [25]:
from azureml.core.experiment import Experiment
from azureml.widgets import RunDetails

print('Submitting Auto ML experiment...')
automl_experiment = Experiment(ws, 'Titanic-automl-sdk')
automl_run = automl_experiment.submit(automl_config)
RunDetails(automl_run).show()
automl_run.wait_for_completion(show_output=False)

Submitting Auto ML experiment...
Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
Titanic-automl-sdk,AutoML_a20f52f2-ac8a-4a84-bdf3-e2a69a408b62,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

{'runId': 'AutoML_a20f52f2-ac8a-4a84-bdf3-e2a69a408b62',
 'target': 'blog-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-10-29T13:48:56.664291Z',
 'endTimeUtc': '2022-10-29T13:55:45.294036Z',
 'services': {},
 'properties': {'num_iterations': '4',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'blog-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"004ab16f-7066-431e-9e3e-e71b0bb30727\\"}, \\"validation_data\\": {\\"datasetId\\": \\"a1944e84-226b-483a-a7c9-8d76f830e44a\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': 'False',
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.44.0", "azureml-training-tabular": "1.44.0", "azureml-train": "1.44.0", "azureml-train-restclients-hyperdrive": "1.44.0", "azur

## Getting the best model
We can retrieve the better peforming model and view the details as follows

In [29]:
best_run, fitted_model = automl_run.get_output()
print(best_run)
print('\nBest Model Definition:')
print(fitted_model)


Package:azureml-automl-runtime, training version:1.46.1, current version:1.44.0
Package:azureml-core, training version:1.46.0, current version:1.44.0
Package:azureml-dataprep, training version:4.5.7, current version:4.2.2
Package:azureml-dataprep-rslex, training version:2.11.4, current version:2.8.1
Package:azureml-dataset-runtime, training version:1.46.0, current version:1.44.0
Package:azureml-defaults, training version:1.46.0, current version:1.44.0
Package:azureml-interpret, training version:1.46.0, current version:1.44.0
Package:azureml-mlflow, training version:1.46.0, current version:1.44.0
Package:azureml-pipeline-core, training version:1.46.0, current version:1.44.0
Package:azureml-responsibleai, training version:1.46.0, current version:1.44.0
Package:azureml-telemetry, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-client, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-runtime, training version:1.46.1, current version:

Run(Experiment: Titanic-automl-sdk,
Id: AutoML_a20f52f2-ac8a-4a84-bdf3-e2a69a408b62_0,
Type: None,
Status: Completed)

Best Model Definition:
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=False, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mounts/clusters/bloginst/code/Users/Manav.Mandal/Azure_Blog')),
                ('MaxAbsScaler', MaxAbsScaler(copy=True)),
                ('LightGBMClassifier',
                 LightGBMClassifier(min_data_in_leaf=20, n_jobs=1, problem_info=ProblemInfo(gpu_training_param_dict={'processing_unit_type': 'cpu'}), random_state=None))],
         verbose=False)


#Gives a list of all transformations applied

print('\nBest Run Transformations:')
for step in fitted_model.named_steps:  
    print(step) 
    
#Can be used to explore all the metrics of the best model
print('\nBest Run Metrics:')
best_run_metrics = best_run.get_metrics()
print(best_run_metrics) 

# Saving the Model
Finally we can register the best fitting model

In [28]:
from azureml.core import Model

# Register model
best_run.register_model(model_path='outputs/model.pkl', model_name='Titanic_model',
                        tags={'Training context':'Auto ML'},
                        properties={'AUC': best_run_metrics['AUC_weighted'], 'Accuracy': best_run_metrics['accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

Titanic_model version: 4
	 Training context : Auto ML
	 AUC : 0.8465673385918785
	 Accuracy : 0.8171641791044776


Titanic_model version: 3
	 Training context : Auto ML
	 AUC : 0.8465673385918785
	 Accuracy : 0.8171641791044776


Titanic_model version: 2
	 Training context : Auto ML
	 AUC : 0.8465673385918785
	 Accuracy : 0.8171641791044776


