# Automated ML

Make sure to run the hyperparameter tuning notebook before this to ensure the dataset is available.
- Load all needed packages
- Prepare workspace and compute cluster

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core import Dataset
from azureml.train.automl import AutoMLConfig
import os
import shutil
import opendatasets as od
import pandas as pd
from azureml.widgets import RunDetails

In [2]:
ws = Workspace.from_config()
experiment_name = 'udacity_capstone_automl'

experiment = Experiment(ws, experiment_name)

In [3]:
cluster_name = "IntensePurposeCluster"

try:
    compute_cluster = ComputeTarget(ws, cluster_name)
    print('existing cluster found')
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size = "Standard_D2_V2", max_nodes=4)
    compute_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
    
    compute_cluster.wait_for_completion(show_output=True)

existing cluster found


## Dataset

We are using a patient data dataset for covid-19 mortaility from kaggle. We want to try and predict if a patient will die given their circumstances.

We are just loading the same dataset that we have registered in the hyperparameter_tuning notebook, so make sure to run that first.

In [4]:
    dataset = Dataset.get_by_name(ws, name='capstone_dataset')

## AutoML Configuration

We run a classification predicting if a patient dies.
- The target variable is called just y in the dataset
- We want to use AUC_weighted as the primary metric as we have unbalanced data
- We utilise 3 fold cross validation to avoid overfitting

In [5]:
automl_config = AutoMLConfig(
    # Settings 
    experiment_timeout_minutes=120,
    enable_early_stopping=True,
    max_concurrent_iterations=4,
    max_cores_per_iteration=-1,
    n_cross_validations=3,
    compute_target=compute_cluster,
    
    # Run configurations
    task='classification',
    primary_metric='AUC_weighted',
    training_data=dataset,
    label_column_name='y',
    enable_onnx_compatible_models=True
)

In [6]:
# Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
udacity_capstone_automl,AutoML_83f6e4d1-0ff5-4e32-8313-e42fcb442cf5,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

We utilise AutoML to efficiently test a plenthora of models and evaluate quickly.
This cell will only complete once the training run completes such that the entire notebook can be run at once without causing errors.

In [7]:
RunDetails(remote_run).show()
remote_run.wait_for_completion()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 'sâ€¦

{'runId': 'AutoML_83f6e4d1-0ff5-4e32-8313-e42fcb442cf5',
 'target': 'IntensePurposeCluster',
 'status': 'Completed',
 'startTimeUtc': '2023-03-17T09:23:27.838929Z',
 'endTimeUtc': '2023-03-17T12:19:59.932737Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '3',
  'target': 'IntensePurposeCluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"a9a25856-6a51-4ce6-a112-8e3da5dda378\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_version

Performing interactive authentication. Please follow the instructions on the terminal.
Performing interactive authentication. Please follow the instructions on the terminal.
Performing interactive authentication. Please follow the instructions on the terminal.




## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [8]:
best_automl_run = remote_run.get_best_child()
best_automl_run.register_model('covid_death_pred_automl_best', 'outputs/model.pkl')

Model(workspace=Workspace.create(name='vg-adl-sco-dev-ml', subscription_id='d4b1a742-d22b-4598-975b-f7d380af08da', resource_group='vg-adl-sco-dev-rg'), name=covid_death_pred_automl_best, id=covid_death_pred_automl_best:2, version=2, tags={}, properties={})

In [14]:
# Print model details
remote_run.get_output()[1][1].estimators

Package:azureml-automl-runtime, training version:1.48.0.post2, current version:1.47.0
Package:azureml-core, training version:1.48.0, current version:1.47.0
Package:azureml-dataprep, training version:4.8.6, current version:4.5.7
Package:azureml-dataprep-rslex, training version:2.15.2, current version:2.11.4
Package:azureml-dataset-runtime, training version:1.48.0, current version:1.47.0
Package:azureml-defaults, training version:1.48.0, current version:1.47.0
Package:azureml-interpret, training version:1.48.0, current version:1.47.0
Package:azureml-mlflow, training version:1.48.0, current version:1.47.0
Package:azureml-pipeline-core, training version:1.48.0, current version:1.47.0
Package:azureml-responsibleai, training version:1.48.0, current version:1.47.0
Package:azureml-telemetry, training version:1.48.0, current version:1.47.0
Package:azureml-train-automl-client, training version:1.48.0, current version:1.47.0
Package:azureml-train-automl-runtime, training version:1.48.0.post2, cur

[('22',
  Pipeline(memory=None,
           steps=[('standardscalerwrapper',
                   StandardScalerWrapper(copy=True, with_mean=False, with_std=False)),
                  ('xgboostclassifier',
                   XGBoostClassifier(booster='gbtree', colsample_bytree=0.7, eta=0.3, gamma=0, max_depth=5, max_leaves=0, n_estimators=100, n_jobs=0, objective='reg:logistic', problem_info=ProblemInfo(gpu_training_param_dict={'processing_unit_type': 'cpu'}), random_state=0, reg_alpha=1.5625, reg_lambda=2.1875, subsample=0.7, tree_method='auto'))],
           verbose=False)),
 ('0',
  Pipeline(memory=None,
           steps=[('maxabsscaler', MaxAbsScaler(copy=True)),
                  ('lightgbmclassifier',
                   LightGBMClassifier(min_data_in_leaf=20, n_jobs=-1, problem_info=ProblemInfo(gpu_training_param_dict={'processing_unit_type': 'cpu'}), random_state=None))],
           verbose=False)),
 ('34',
  Pipeline(memory=None,
           steps=[('standardscalerwrapper',
       

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
