# Automated ML

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
# Importing dependencies
from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.core import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


import os
import json
import joblib
import logging
import requests

## Dataset

### Overview
For this project, I'm using the Heart Failure Prediction dataset from Kaggle. It contains 12 clinical features that can be used to predict mortality by heart failure. I have downloaded this data and stored in my github repository, used Tabular Datset Factory to get the data in a tabluar form.

In [2]:
# Getting data
path = "https://raw.githubusercontent.com/ShashiChilukuri/Data2Deployment-AzureML/main/heart_failure_clinical_records_dataset.csv"
data = TabularDatasetFactory.from_delimited_files(path)

In [3]:
# Creating workspace and experiment

ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'ClassifyHeartFailure-AutoML'
experiment=Experiment(ws, experiment_name)

In [4]:
# Check if compute cluster exists, if not create one
cluster_name = "Compute-Standard" #"compute-cluster"

try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',min_nodes=1, max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)

# get status of the cluster
print(cpu_cluster.get_status().serialize())

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 3, 'targetNodeCount': 3, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 3, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2022-06-28T00:32:32.023000+00:00', 'errors': None, 'creationTime': '2022-06-27T22:08:07.283404+00:00', 'modifiedTime': '2022-06-27T22:08:13.905336+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT1800S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS3_V2'}


## AutoML Configuration

AutoML config class used for submittting an automated ML experiment in the Azure Machine learning. Auto ML settings helps to moderate how we want our experiment to be run. In this case, wanted to experiment to timeout in 30 minutes, with max iterations to be executed in parallel is 5 and cross validation to perform is 2. Although there are many metrics, I choose to pick "accuracy" metric, as it would be good metric for simple datasets. These automl settings are passed on to AutoMLConfig class along with the compute instance, data, task type and label.

In [5]:
# Automl settings
automl_settings = {"experiment_timeout_minutes": 30,
                   "max_concurrent_iterations": 5,
                   "n_cross_validations": 2,
                   "primary_metric": 'accuracy',
                   "verbosity": logging.INFO
                  }

# TODO: Put your automl config here
automl_config = AutoMLConfig(task='classification',
                             compute_target=cpu_cluster,
                             training_data=data,
                             label_column_name='DEATH_EVENT',
                             **automl_settings)

In [6]:
# Submit experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
ClassifyHeartFailure-AutoML,AutoML_580b1e1a-0ea0-4980-9412-3c88532c0271,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details
`RunDetails` widget to show the different experiments.

In [7]:
RunDetails(remote_run).show()
remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
ClassifyHeartFailure-AutoML,AutoML_580b1e1a-0ea0-4980-9412-3c88532c0271,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

********************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were det

{'runId': 'AutoML_580b1e1a-0ea0-4980-9412-3c88532c0271',
 'target': 'Compute-Standard',
 'status': 'Completed',
 'startTimeUtc': '2022-06-28T00:55:16.640115Z',
 'endTimeUtc': '2022-06-28T01:10:49.785517Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '2',
  'target': 'Compute-Standard',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"3689506b-c445-4bd7-a527-46695ffe5ece\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml

## Best Model

Get the best model from the automl experiments and display all the properties of the model.

In [11]:
# Finding best run and model
best_autoML_run, best_autoML_model = remote_run.get_output()

# print best run
print(best_autoML_run)



Run(Experiment: ClassifyHeartFailure-AutoML,
Id: AutoML_580b1e1a-0ea0-4980-9412-3c88532c0271_45,
Type: azureml.scriptrun,
Status: Completed)


In [12]:
# print best model
print(best_autoML_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
                 PreFittedSoftVotingClassifier(classification_labels=array([0, 1]), estimators=[('26', Pipeline(memory=None, steps=[('robustscaler', RobustScaler(copy=True, quantile_range=[10, 90], with_centering=True, with_scaling=True)), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None, criterion='entropy', max_depth=None, max_features=0.8, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=0.01, min_samples_split=0.056842105263157895, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oo

In [13]:
# print best model metrics
best_autoML_run_metrics = best_autoML_run.get_metrics()

for metric_name in best_autoML_run_metrics:
    metric = best_autoML_run_metrics[metric_name]
    print(metric_name, metric)

recall_score_weighted 0.8628859060402685
average_precision_score_micro 0.9162953770375382
accuracy 0.8628859060402685
average_precision_score_weighted 0.9103952493159658
balanced_accuracy 0.8282898160317425
norm_macro_recall 0.6565796320634848
AUC_micro 0.917394919147786
precision_score_micro 0.8628859060402685
precision_score_macro 0.8536896841603383
matthews_correlation 0.6810220248468513
f1_score_micro 0.8628859060402684
AUC_macro 0.9070575565385202
f1_score_weighted 0.8602681304210358
AUC_weighted 0.9070575565385203
recall_score_micro 0.8628859060402685
log_loss 0.3790819129109796
weighted_accuracy 0.8897912812627112
f1_score_macro 0.8373059918065786
precision_score_weighted 0.863299439874855
recall_score_macro 0.8282898160317425
average_precision_score_macro 0.8877990621479463
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_580b1e1a-0ea0-4980-9412-3c88532c0271_45/confusion_matrix
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_580b1e1a-0ea0-4980-9412-3c88532c

In [14]:
#Save the best model
best_automl_model = best_autoML_run.register_model(model_name='heart_failure_automl',
                                                   model_path='outputs/model.pkl', 
                                                   tags = {'Training context': 'Automated ML'},
                                                   properties = {'Accuracy': best_autoML_run_metrics['accuracy']})

print(best_automl_model)

Model(workspace=Workspace.create(name='quick-starts-ws-199609', subscription_id='6971f5ac-8af1-446e-8034-05acea24681f', resource_group='aml-quickstarts-199609'), name=heart_failure_automl, id=heart_failure_automl:2, version=2, tags={'Training context': 'Automated ML'}, properties={'Accuracy': '0.8628859060402685'})


## Model Deployment

As part of the project, trained both AutoML model and also the Hyper drive based model. Best model out of these two are picked for deployment. 

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [15]:
# Printing the registered & saved models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

heart_failure_hyperdrive version: 2
	 Accuracy : 0.7888888888888889
	 N Estimators : 20
	 Min Samples Split : 2


heart_failure_automl version: 2
	 Training context : Automated ML
	 Accuracy : 0.8628859060402685


heart_failure_hyperdrive version: 1
	 Accuracy : 0.7888888888888889
	 N Estimators : 20
	 Min Samples Split : 2


heart_failure_automl version: 1
	 Training context : Automated ML
	 Accuracy : 0.8595525727069351




As we can see, Auto ML model performed best compared to hyper drive model. So, will be deployming Auto ML model.

In [16]:
# Downloadin the evironment and scoring file
best_autoML_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')
best_autoML_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoreScript.py')

In [17]:
inference_config = InferenceConfig(entry_script='scoreScript.py',environment=best_autoML_run.get_environment())

# Deploying the model
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, 
                       "myservice", 
                       [best_automl_model], 
                       inference_config, 
                       deployment_config)

service.wait_for_deployment(show_output = True)
print(service.state)
print(service.scoring_uri)
print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-06-28 01:31:13+00:00 Creating Container Registry if not exists.
2022-06-28 01:31:13+00:00 Registering the environment.
2022-06-28 01:31:14+00:00 Use the existing image.
2022-06-28 01:31:14+00:00 Generating deployment configuration.
2022-06-28 01:31:14+00:00 Submitting deployment to compute.
2022-06-28 01:31:17+00:00 Checking the status of deployment myservice..
2022-06-28 01:35:38+00:00 Checking the status of inference endpoint myservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://7d0ae494-b0b6-4b06-ade9-7fad47ff694e.southcentralus.azurecontainer.io/score
http://7d0ae494-b0b6-4b06-ade9-7fad47ff694e.southcentralus.azurecontainer.io/swagger.json


In the cell below, send a request to the web service you deployed to test it.

In [18]:
# Creating Test data and labels 
test_df = data.to_pandas_dataframe().dropna().sample(2)
label_df = test_df.pop("DEATH_EVENT")

test_data = json.dumps({'data': test_df.to_dict(orient='records')})
print(test_data)

{"data": [{"age": 61.0, "anaemia": 0, "creatinine_phosphokinase": 582, "diabetes": 1, "ejection_fraction": 38, "high_blood_pressure": 0, "platelets": 147000.0, "serum_creatinine": 1.2, "serum_sodium": 141, "sex": 1, "smoking": 0, "time": 237}, {"age": 49.0, "anaemia": 0, "creatinine_phosphokinase": 972, "diabetes": 1, "ejection_fraction": 35, "high_blood_pressure": 1, "platelets": 268000.0, "serum_creatinine": 0.8, "serum_sodium": 130, "sex": 0, "smoking": 0, "time": 187}]}


In [19]:
# Requesting webservice to get response

response = requests.post(service.scoring_uri, test_data, headers = {'Content-type': 'application/json'})
print(response.text)
print(label_df)

"{\"result\": [0, 0]}"
264    0
209    0
Name: DEATH_EVENT, dtype: int64


In the cell below, print the logs of the web service and delete the service

In [20]:
print(service.get_logs())

2022-06-28T01:35:27,495307690+00:00 - gunicorn/run 
2022-06-28T01:35:27,495307090+00:00 - iot-server/run 
2022-06-28T01:35:27,510305204+00:00 - nginx/run 
2022-06-28T01:35:27,515980348+00:00 | gunicorn/run | 
2022-06-28T01:35:27,500897133+00:00 - rsyslog/run 
2022-06-28T01:35:27,550959113+00:00 | gunicorn/run | ###############################################
2022-06-28T01:35:27,568629448+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-06-28T01:35:27,589090703+00:00 | gunicorn/run | ###############################################
2022-06-28T01:35:27,602709207+00:00 | gunicorn/run | 
2022-06-28T01:35:27,618068723+00:00 | gunicorn/run | 
2022-06-28T01:35:27,626966791+00:00 | gunicorn/run | AzureML image information: openmpi3.1.2-ubuntu18.04:20220516.v1
2022-06-28T01:35:27,636615164+00:00 | gunicorn/run | 
2022-06-28T01:35:27,640603595+00:00 | gunicorn/run | 
2022-06-28T01:35:27,646099936+00:00 | gunicorn/run | PATH environment variable: /azureml-envs/azureml_76f657337a18

In [21]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
