# Automated ML

In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import argparse
import os
import joblib
from pprint import pprint

from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.run import Run
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory

## Dataset

### Overview
In this project we will be creating a classification Automated ML to predict the death event of hearth failure. We will uses twelve features from the dataset which outlined below:
- age (int): self explanatory
- anaemia (bool): whether there has been a decrease of red blood cells or hemoglobin
- creatinine_phosphokinase (int): level of the CPK enzyme in the blood in mcg/L
- diabetes (bool): whether the patient has diabetes
- ejection_fraction (int): percentage of blood leaving the heart at each contraction
- high_blood_pressure (bool): whether the patient has hypertension
- platelets (int): platelets in the blood in kiloplatelets/mL
- serum_creatinine (float): level of serum creatinine in the blood in mg/dL
- serum_sodium (int): level of serum sodium in the blood in mEq/L
- sex (int): female or male (binary)

In [4]:
# Data Acquisition
ds = TabularDatasetFactory.from_delimited_files("https://raw.githubusercontent.com/bimaputra1/Azure-MLE-Capstone/02fc0c8aaffeb5b92c646ecba4acd0e42612d5ea/dataset/heart_failure_clinical_records_dataset.csv")

In [5]:
ws = Workspace.from_config()

# Setting up experiment
experiment_name = 'ClassificationHearthFailure-AutoML'

experiment=Experiment(ws, experiment_name)

In [6]:
# Setting up compute cluster
compute_name = "AutoML-Compute"
try:
    compute_target = ComputeTarget(ws, compute_name)
    print(compute_name+ " already exist.")
except:
    compute_config = AmlCompute.provisioning_configuration(vm_size="Standard_DS12_V2", min_nodes=1, max_nodes=5)
    compute_target = ComputeTarget.create(ws, compute_name, compute_config)
compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())


Creating....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded..................
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 1, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Resizing', 'allocationStateTransitionTime': '2021-04-18T10:45:26.655000+00:00', 'errors': None, 'creationTime': '2021-04-18T10:45:26.074127+00:00', 'modifiedTime': '2021-04-18T10:45:42.325051+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 5, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS12_V2'}


## AutoML Configuration

In this step we configure the automl settings. We set the timeout to 60 minutes and concurrent iteration to 5 to speed up the process and set accuracy as evaluation metrics to select the best model. 

We used classification with number of cross validation to 5 in order to minimize the overfitting.

In [7]:
# Automl settings here
automl_settings = {
    "experiment_timeout_minutes": 60,
    "max_concurrent_iterations": 5,
    "primary_metric": 'accuracy'
}

# Automl config here
automl_config = AutoMLConfig(
    task='classification',
    compute_target=compute_target,
    training_data=ds,
    label_column_name='DEATH_EVENT',
    n_cross_validations=5,
    **automl_settings
)

In [8]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
ClassificationHearthFailure-AutoML,AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [11]:
remote_run.wait_for_completion(show_output=True)

Experiment,Id,Type,Status,Details Page,Docs Page
ClassificationHearthFailure-AutoML,AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7,automl,Running,Link to Azure Machine Learning studio,Link to Documentation



Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardi

{'runId': 'AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7',
 'target': 'AutoML-Compute',
 'status': 'Completed',
 'startTimeUtc': '2021-04-18T10:47:51.222791Z',
 'endTimeUtc': '2021-04-18T11:54:36.536107Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'AutoML-Compute',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"39efc7f9-9df3-4598-a162-4571c23e81f6\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.26.0", "azureml-train": "1.26.0", "azureml-train-restclients-hyperdrive": "1.26.0", "azureml-train-core": "1.26.0", "azureml-train-automl": "1.26.0", "azureml-train-automl-runtime": "1.26.0", "azureml-train-automl-client": "1.26.0"

## Best Model

In the cell below, we get the best model from the automl experiments and display all the properties of the model.



In [18]:
# Getting the best run and model
aml_best_run, aml_fitted_model = remote_run.get_output()

# Printing the best run
print(aml_best_run)

Run(Experiment: ClassificationHearthFailure-AutoML,
Id: AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7_280,
Type: azureml.scriptrun,
Status: Completed)


In [19]:
# Printing the model details
print(aml_fitted_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                  min_samples_leaf=0.035789473684210524,
                                                                                                  min_samples_split=0.2442105263157895,
                                                                                                  min_weight_fraction_l

In [14]:
# Getting all metrics of the best run
best_run_metrics = aml_best_run.get_metrics()

# Printing all metrics of the best run
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name, metric)

AUC_macro 0.9038085548172757
f1_score_weighted 0.8566472195611128
accuracy 0.8629378531073446
recall_score_macro 0.8222916666666666
AUC_micro 0.9084278783235978
recall_score_weighted 0.8629378531073446
balanced_accuracy 0.8222916666666666
recall_score_micro 0.8629378531073446
average_precision_score_micro 0.9099894371092889
norm_macro_recall 0.6445833333333334
matthews_correlation 0.6919083744096969
f1_score_micro 0.8629378531073446
precision_score_micro 0.8629378531073446
f1_score_macro 0.8319228906768066
average_precision_score_macro 0.8946362848954414
weighted_accuracy 0.8906520933978452
average_precision_score_weighted 0.9174369634662691
precision_score_macro 0.8739962760297917
precision_score_weighted 0.8768387997076814
log_loss 0.4009185232851242
AUC_weighted 0.9038085548172757
confusion_matrix aml://artifactId/ExperimentRun/dcid.AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7_280/confusion_matrix
accuracy_table aml://artifactId/ExperimentRun/dcid.AutoML_d5d24d14-7203-41ab-b7e5-7b046

In [15]:
aml_fitted_model.steps[1][1].estimators

[('128',
  Pipeline(memory=None,
           steps=[('standardscalerwrapper',
                   <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7f1e3da13e48>),
                  ('extratreesclassifier',
                   ExtraTreesClassifier(bootstrap=True, ccp_alpha=0.0,
                                        class_weight=None, criterion='gini',
                                        max_depth=None, max_features=0.8,
                                        max_leaf_nodes=None, max_samples=None,
                                        min_impurity_decrease=0.0,
                                        min_impurity_split=None,
                                        min_samples_leaf=0.01,
                                        min_samples_split=0.056842105263157895,
                                        min_weight_fraction_leaf=0.0,
                                        n_estimators=200, n_jobs=1,
                                        oob_score=F

In [25]:
aml_best_run

Experiment,Id,Type,Status,Details Page,Docs Page
ClassificationHearthFailure-AutoML,AutoML_d5d24d14-7203-41ab-b7e5-7b0468573ee7_280,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [27]:
# Saving the best model
best_automl_model = aml_best_run.register_model(model_path='./outputs/model.pkl', model_name='heart_failure_automl',
                        tags = {'Training context': 'Automated ML'},
                        properties = {'Accuracy': best_run_metrics['accuracy']})

joblib.dump(aml_fitted_model, filename= "outputs/automl_model.pkl")

print(best_automl_model)

Model(workspace=Workspace.create(name='quick-starts-ws-143032', subscription_id='610d6e37-4747-4a20-80eb-3aad70a55f43', resource_group='aml-quickstarts-143032'), name=heart_failure_automl, id=heart_failure_automl:5, version=5, tags={'Training context': 'Automated ML'}, properties={'Accuracy': '0.8629378531073446'})


In [28]:
# Listing registered models to verify that the model has been saved
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

heart_failure_automl version: 5
	 Training context : Automated ML
	 Accuracy : 0.8629378531073446


heart_failure_automl version: 4
	 Training context : Automated ML
	 Accuracy : 0.8629378531073446


heart_failure_automl version: 3
	 Training context : Automated ML
	 Accuracy : 0.8629378531073446


heart_failure_automl version: 2
	 Training context : Automated ML
	 Accuracy : 0.8629378531073446


heart_failure_automl version: 1
	 Training context : Automated ML
	 Accuracy : 0.8629378531073446


heart_failure_hyperdrive version: 1
	 Accuracy : 0.7555555555555555
	 N Estimators : 20
	 Min Samples Split : 2




## Model Deployment

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [29]:
# Downloading the environment file
aml_best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'envFile.yml')

# Downloading the scoring file 
aml_best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'scoreScript.py')

In [30]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

inference_config = InferenceConfig(entry_script='scoreScript.py',
                                   environment=aml_best_run.get_environment())

# Deploying
from azureml.core.webservice import AciWebservice

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [best_automl_model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

print(service.scoring_uri)

print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-18 12:11:16+00:00 Creating Container Registry if not exists.
2021-04-18 12:11:16+00:00 Registering the environment.
2021-04-18 12:11:18+00:00 Use the existing image.
2021-04-18 12:11:18+00:00 Generating deployment configuration.
2021-04-18 12:11:20+00:00 Submitting deployment to compute..
2021-04-18 12:11:25+00:00 Checking the status of deployment myservice..
2021-04-18 12:12:06+00:00 Checking the status of inference endpoint myservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://bf26694f-c1c4-4c9f-8c82-26b5afcf88ac.southcentralus.azurecontainer.io/score
http://bf26694f-c1c4-4c9f-8c82-26b5afcf88ac.southcentralus.azurecontainer.io/swagger.json


In [31]:
import json

# Importing test data
data_df = ds.to_pandas_dataframe().dropna()
test_df = data_df.sample(5) # data_df is the pandas dataframe of the original data
label_df = test_df.pop('DEATH_EVENT')

test_sample = json.dumps({'data': test_df.to_dict(orient='records')})

print(test_sample)

{"data": [{"age": 50.0, "anaemia": 1, "creatinine_phosphokinase": 2334, "diabetes": 1, "ejection_fraction": 35, "high_blood_pressure": 0, "platelets": 75000.0, "serum_creatinine": 0.9, "serum_sodium": 142, "sex": 0, "smoking": 0, "time": 126}, {"age": 60.0, "anaemia": 0, "creatinine_phosphokinase": 582, "diabetes": 0, "ejection_fraction": 40, "high_blood_pressure": 0, "platelets": 217000.0, "serum_creatinine": 3.7, "serum_sodium": 134, "sex": 1, "smoking": 0, "time": 96}, {"age": 55.0, "anaemia": 0, "creatinine_phosphokinase": 60, "diabetes": 0, "ejection_fraction": 35, "high_blood_pressure": 0, "platelets": 228000.0, "serum_creatinine": 1.2, "serum_sodium": 135, "sex": 1, "smoking": 1, "time": 90}, {"age": 50.0, "anaemia": 1, "creatinine_phosphokinase": 249, "diabetes": 1, "ejection_fraction": 35, "high_blood_pressure": 1, "platelets": 319000.0, "serum_creatinine": 1.0, "serum_sodium": 128, "sex": 0, "smoking": 0, "time": 28}, {"age": 90.0, "anaemia": 1, "creatinine_phosphokinase": 33

In the cell below, send a request to the web service you deployed to test it.

In [32]:
import requests # for http post request

# Set the content type
headers = {'Content-type': 'application/json'}

response = requests.post(service.scoring_uri, test_sample, headers=headers)

In [33]:
# Printing results from the inference
print(response.text)

"{\"result\": [0, 0, 0, 1, 0]}"


In [34]:
# Printing ground truth labels
print(label_df)

163    1
124    1
111    0
32     1
289    0
Name: DEATH_EVENT, dtype: int64


In the cell below, print the logs of the web service and delete the service

In [35]:
print(service.get_logs())

2021-04-18T12:11:55,623736900+00:00 - iot-server/run 
2021-04-18T12:11:55,630409800+00:00 - rsyslog/run 
2021-04-18T12:11:55,639958700+00:00 - nginx/run 
2021-04-18T12:11:55,641067000+00:00 - gunicorn/run 
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
rsyslogd

In [None]:
service.delete()