# Automated ML

Importing dependencies

In [33]:
from sklearn.metrics import confusion_matrix
from azureml.core import Dataset, Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.automl import AutoMLConfig
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice, Webservice
from azureml.core.model import Model
from azureml.core.environment import Environment
import os
import pandas as pd
import numpy as np
import json
import requests
import joblib
import itertools

In [2]:
# This might be required if the AutoML best algo is XGBoost
%pip install xgboost==0.90

## Dataset
The dataset we are using was obtained from the publication "Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone". Davide Chicco, Giuseppe Jurman. BMC Medical Informatics and Decision Making 20, 16 (2020). ([link](https://doi.org/10.1186/s12911-020-1023-5))

### Overview

In the dataset we can find 12 features that can be used to predict heart failure mortality:
* age
* anemia (Decrease of red blood cells or hemoglobin
* creatinine_phosphokinase (Level of the CPK enzyme in the blood)
* diabetes (If the patient has diabetes)
* ejection_fraction (Percentage of blood leaving the heart at each contraction)
* high_blood_pressure (If the patient has hypertension)
* platelets (Platelets in the blood measured in kiloplatelets/mL)
* serum_creatinine (Level of serum creatinine in the blood measured in mg/dL)
* serum_sodium (Level of serum sodium in the blood measured in mEq/L)
* sex (Woman or man)
* smoking (If the patient smokes or not)
* time (Follow-up period in days)
* DEATH_EVENT (If the patient deceased during the follow-up period)

The target variable is DEATH_EVENT.

In [34]:
ws = Workspace.from_config()

experiment_name = 'automl-exp'

experiment = Experiment(ws, experiment_name)

In [4]:
dataset = Dataset.get_by_name(ws, 'heart-disease-from-kaggle')
data_train, data_test = dataset.random_split(0.8)

In [5]:
df = dataset.to_pandas_dataframe()
df['DEATH_EVENT'].value_counts()

0    203
1     96
Name: DEATH_EVENT, dtype: int64

Dataset is imbalanced! This means the metric "accuracy" should not be used, as it gives misleading results.

### Compute cluster

In [6]:
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D12_V2',
                                                           max_nodes=10)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## AutoML Configuration

As part of the AutoML configuration, we are going to select:
- Problem type: Classification
- Experiment timeout: 20 minutes
- Maximum concurrent iterations: 4
- Primary metric: AUC weighted. This has been selected as we have seen that the dataset is imbalanced (203 cases in one class, versus 96 in the other class)
- k value for k-fold Validation: 5. We split the data in train-test with 80-20 proportion, which combined with 5 cross-validations, we cover the whole dataset
- Early stopping: enabled


In [7]:
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'AUC_weighted',
    "n_cross_validations": 5
}
automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=data_train,
                             label_column_name="DEATH_EVENT", 
                             enable_early_stopping= True,
                             featurization= 'auto',
                             **automl_settings
                            )

In [8]:
remote_run = experiment.submit(automl_config, show_output = True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values we

## Run Details
As we could see, the ensemble algorithm (both stack and voting) was the algorithm which obtained the highest value for the selected metric.

We could have probably improved the results if we would have selected a longer experiment timeout in minutes.

In [10]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [11]:
remote_run.wait_for_completion()

{'runId': 'AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-04-06T09:36:50.014434Z',
 'endTimeUtc': '2021-04-06T10:05:36.769567Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"e0e5047b-7382-4788-a6a5-12b8e76e18e3\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/04-06-2021_085517_UTC/heart_failure_clinical_records_dataset.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-142075\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"9b72

In [12]:
remote_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-exp,AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


## Best Model



In [13]:
best_run, fitted_model = remote_run.get_output()
best_run_metrics = best_run.get_metrics()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
automl-exp,AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771_55,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [39]:
best_run_metrics

{'recall_score_macro': 0.8301750027485323,
 'f1_score_weighted': 0.8626804536988016,
 'accuracy': 0.8642857142857142,
 'average_precision_score_weighted': 0.9621181406359242,
 'AUC_micro': 0.9495973899763988,
 'average_precision_score_macro': 0.9477751103963941,
 'matthews_correlation': 0.6766057678182074,
 'AUC_weighted': 0.957100185882355,
 'recall_score_micro': 0.8642857142857142,
 'weighted_accuracy': 0.8877210373726012,
 'precision_score_macro': 0.8480435359272871,
 'norm_macro_recall': 0.6603500054970642,
 'AUC_macro': 0.957100185882355,
 'average_precision_score_micro': 0.9530286346479826,
 'recall_score_weighted': 0.8642857142857142,
 'precision_score_micro': 0.8642857142857142,
 'f1_score_macro': 0.8328828547486719,
 'balanced_accuracy': 0.8301750027485323,
 'f1_score_micro': 0.8642857142857142,
 'log_loss': 0.32092392486200255,
 'precision_score_weighted': 0.8704366892720425,
 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771

In [14]:
fitted_model

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                               silent=None,
                                                                                               subsample=0.6,
                                                                                               tree_method='auto',
                                                           

In [19]:
print('Best Run Id: ', best_run.id)
print('\n AUC Weighted:', best_run_metrics['AUC_weighted'])
print(fitted_model._final_estimator)
print(best_run.get_tags())

os.makedirs('./outputs', exist_ok=True)

joblib.dump(fitted_model, filename='outputs/automl.joblib')

model_name = best_run.properties['model_name']
env = best_run.get_environment()
script_file = 'score.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', script_file)

Best Run Id:  AutoML_a0d5d70d-4760-46bd-9c0f-a0b8bc381771_55

 AUC Weighted: 0.957100185882355
PreFittedSoftVotingClassifier(classification_labels=None,
                              estimators=[('38',
                                           Pipeline(memory=None,
                                                    steps=[('standardscalerwrapper',
                                                            <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7f3a719dc390>),
                                                           ('xgboostclassifier',
                                                            XGBoostClassifier(base_score=0.5,
                                                                              booster='gbtree',
                                                                              colsample_bylevel=1,
                                                                              colsample_bynode=1,
                        

## Model Deployment

We are going to deploy the model to a Container in an Azure Container Instance (ACI) and we will test the endpoint by getting a couple of 

In [36]:
model = remote_run.register_model(model_name = model_name,
                                  description = 'heart failure model 2')

inference_config = InferenceConfig(entry_script = script_file, environment = env)

aci_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

aci_service_name = 'automl-heart-failure'
service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_config)
service.wait_for_deployment(True)
print("State: " + service.state)
print("Scoring URI: " + service.scoring_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-06 10:55:40+00:00 Creating Container Registry if not exists.
2021-04-06 10:55:41+00:00 Registering the environment.
2021-04-06 10:55:42+00:00 Use the existing image.
2021-04-06 10:55:42+00:00 Generating deployment configuration.
2021-04-06 10:55:45+00:00 Submitting deployment to compute..
2021-04-06 10:55:50+00:00 Checking the status of deployment automl-heart-failure..
2021-04-06 11:00:43+00:00 Checking the status of inference endpoint automl-heart-failure.
Succeeded
ACI service creation operation finished, operation "Succeeded"
State: Healthy
Scoring URI: http://5266b9ac-38ce-4d64-bfe1-7a3af4b9d54b.southcentralus.azurecontainer.io/score


Now we send a request to the web service deployed to test it

In [38]:
%run endpoint.py

{"result": [1, 0]}


Finally, we print the logs of the web service and delete the service

In [30]:
service.get_logs()



In [31]:
service.delete()