# Automated ML

Below are the dependencies that will be needed to complete the project.

In [1]:
import azureml.core
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.core.workspace import Workspace
from azureml.core import Experiment, Webservice, Model
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


import pandas as pd
import sklearn
import json

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.20.0


## Dataset

### Overview


  This is an analyzed dataset containing the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients consisted of 105 women and 194 men, and their ages range between 40 and 95 years old. All 299 patients had left ventricular systolic dysfunction and had previous heart failures that put them in classes III or IV of New York Heart Association (NYHA) classification of the stages of heart failure. As done by the original data curators, this dataset was represented as a table having 299 rows (patients) and 13 columns (features).
  
  

###  Task
  With the kaggle heart failure dataset I'll be using the knowledge I have obtained from the Machine Learning Engineer with Microsoft Azure Nanodegree Program to create a machine learning model that can assess the likelihood of a death by heart failure event. This can be used to help hospitals in assessing the severity of patients with cardiovascular disease. In this project, I will create two models: one using Automated ML (denoted as AutoML from now on) and one customized model whose hyperparameters are tuned using HyperDrive.

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.subscription_id, sep = '\n')

quick-starts-ws-138727
aml-quickstarts-138727
a24a24d5-8d87-4c8a-99b6-91ed2d2df51f


In [3]:
# choose a name for experiment
experiment_name = 'capstone-heartfailure-exp'

experiment=Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
capstone-heartfailure-exp,quick-starts-ws-138727,Link to Azure Machine Learning studio,Link to Documentation


In [4]:
# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.

### YOUR CODE HERE ###
cluster_name="cpu-cluster"

try:
    cpu_cluster=ComputeTarget(workspace=ws, name=cluster_name)
    print("Existing cluster detected, make use of it!")
except ComputeTargetException:
    print("New compute cluster creation in progress...")
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_V2',max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)   
    
cpu_cluster.wait_for_completion(show_output=True)         
print("Cluster is ready")

Existing cluster detected, make use of it!
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
Cluster is ready


In [5]:
path = "https://raw.githubusercontent.com/Arushikha0408/nd00333-capstone/master/heart_failure_clinical_records_dataset.csv"
dataset = Dataset.Tabular.from_delimited_files(path)

In [6]:
dataset.take(5).to_pandas_dataframe()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

The following code shows a basic example of creating an AutoMLConfig object and submitting an experiment for classification.
I chose the automl settings below because I wanted to specify the experiment type as classification. The classification experiment will be carried out using AUC weighted as the primary metric with experiment timeout minutes set to 30 minutes and 5 cross-validation folds with the maximum number of iterations that would be executed concurrently set to 4.
All of these settings defines the machine learning task.


The configuration object below contains and persists the parameters for configuring the experiment run, as well as the training data to be used at run time.

In [7]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'AUC_weighted',
    "n_cross_validations": 5
}
automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="DEATH_EVENT", 
                             enable_early_stopping= True,
                             featurization= 'auto',
                             enable_voting_ensemble= True,
                             **automl_settings
                            )

In [8]:
# TODO: Submit your experiment

remote_run = experiment.submit(automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values we

## Run  Details 

In [9]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [10]:
remote_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-02-12T19:02:37.694216Z',
 'endTimeUtc': '2021-02-12T19:28:28.392507Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"da489010-f7e5-4838-8e4d-a7fcd57e7b4d\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"isArchive\\\\\\": false, \\\\\\"path\\\\\\": {\\\\\\"target\\\\\\": 4, \\\\\\"resourceDetails\\\\\\": [{\\\\\\"path\\\\\\": \\\\\\"https://raw.githubusercontent.com/Arushikha0408/nd00333-capstone/master/heart_failure_clinical_records_dataset.csv\\\\\\"}]}}, \\\\\\"localData\\\\\\": {}, \\\\\\"isEnable

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [11]:
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


Run(Experiment: capstone-heartfailure-exp,
Id: AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c_61,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('stackensembleclassifier',
                 StackE...
                                         meta_learner=LogisticRegressionCV(Cs=10,
                                                                           class_weight=None,
                                                           

In [12]:
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name)
    print(metric)

recall_score_micro
0.7824293785310734
average_precision_score_micro
0.8990911175670465
log_loss
0.5116167817475391
accuracy
0.7824293785310734
average_precision_score_macro
0.9071683357097193
norm_macro_recall
0.3820238095238096
matthews_correlation
0.39555807318409947
weighted_accuracy
0.8474912924327237
recall_score_weighted
0.7824293785310734
precision_score_micro
0.7824293785310734
f1_score_weighted
0.7289200610039475
AUC_micro
0.8907222860608381
precision_score_weighted
0.7154926671304332
balanced_accuracy
0.6910119047619048
precision_score_macro
0.6462349526900554
AUC_weighted
0.9188983480989295
average_precision_score_weighted
0.9292889889371618
f1_score_micro
0.7824293785310734
recall_score_macro
0.6910119047619048
f1_score_macro
0.6501342353941626
AUC_macro
0.9188983480989295
accuracy_table
aml://artifactId/ExperimentRun/dcid.AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c_61/accuracy_table
confusion_matrix
aml://artifactId/ExperimentRun/dcid.AutoML_4a20a98b-8950-493a-94dc-96a80e4

In [13]:
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
capstone-heartfailure-exp,AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c_61,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [14]:
print('Best Run Id: ', best_run.id)
print('\n AUC_weighted :', best_run_metrics['AUC_weighted'])
print(best_run.get_tags())

Best Run Id:  AutoML_4a20a98b-8950-493a-94dc-96a80e4ea82c_61

 AUC_weighted : 0.9188983480989295
{'_aml_system_azureml.automlComponent': 'AutoML', '_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":4,"CurrentNodeCount":4}', 'ensembled_iterations': '[41, 34, 23, 37, 47, 14, 19, 33, 50, 18]', 'ensembled_algorithms': "['GradientBoosting', 'XGBoostClassifier', 'RandomForest', 'RandomForest', 'RandomForest', 'RandomForest', 'XGBoostClassifier', 'XGBoostClassifier', 'GradientBoosting', 'RandomForest']", 'ensemble_weights': '[0.13333333333333333, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.06666666666666667, 0.3333333333333333, 0.06666666666666667]', 'best_individual_pipeline_score': '0.9146840854558878', 'best_individual_iteration': '41', '_aml_system_automl_is_child_run_end_telemetry_event_logged': 'True', 'model_explain_run_id': 'AutoML_4a20a98b-8950-49

In [15]:
#TODO: Save the best model

best_run.register_model(model_name='automl_model',model_path='/outputs',properties={'AUC_weighted':best_run_metrics['AUC_weighted']},tags={'Training context':'Auto ML'})

Model(workspace=Workspace.create(name='quick-starts-ws-138727', subscription_id='a24a24d5-8d87-4c8a-99b6-91ed2d2df51f', resource_group='aml-quickstarts-138727'), name=automl_model, id=automl_model:1, version=1, tags={'Training context': 'Auto ML'}, properties={'AUC_weighted': '0.9188983480989295'})

In [16]:
model_name = best_run.properties['model_name']

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [17]:
# Register the model

model = remote_run.register_model(model_name = model_name,
                                  description = 'Automl model')

print('Name:', model.name)

Name: AutoML4a20a98b861


In [18]:
env = best_run.get_environment()

script_name = 'score.py'


best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')

In [19]:
# Create an inference config

inference_config = InferenceConfig(entry_script= script_name, environment=env)

aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, enable_app_insights=True)
aci_service_name='capstone-service'


In [20]:
service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_config)
                        

In [21]:
service.wait_for_deployment(show_output=True)

print(service.state)
url = service.scoring_uri
print(url)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running........................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://1f47e230-341d-459b-9740-11b253d5d018.southcentralus.azurecontainer.io/score


TODO: In the cell below, send a request to the web service you deployed to test it.

In [27]:
%run endpoint.py

{"result": [1, 1]}


In [28]:
data_test = dataset.to_pandas_dataframe().dropna()
data_sample = data_test.sample(3)
y_true = data_sample.pop('DEATH_EVENT')
sample_json = json.dumps({'data':data_sample.to_dict(orient='records')})
print(sample_json)

{"data": [{"age": 65.0, "anaemia": 0, "creatinine_phosphokinase": 118, "diabetes": 0, "ejection_fraction": 50, "high_blood_pressure": 0, "platelets": 194000.0, "serum_creatinine": 1.1, "serum_sodium": 145, "sex": 1, "smoking": 1, "time": 200}, {"age": 68.0, "anaemia": 1, "creatinine_phosphokinase": 646, "diabetes": 0, "ejection_fraction": 25, "high_blood_pressure": 0, "platelets": 305000.0, "serum_creatinine": 2.1, "serum_sodium": 130, "sex": 1, "smoking": 0, "time": 108}, {"age": 58.0, "anaemia": 0, "creatinine_phosphokinase": 144, "diabetes": 1, "ejection_fraction": 38, "high_blood_pressure": 1, "platelets": 327000.0, "serum_creatinine": 0.7, "serum_sodium": 142, "sex": 0, "smoking": 0, "time": 83}]}


TODO: In the cell below, print the logs of the web service and delete the service

In [29]:
output = service.run(sample_json)
print('Prediction: ', output)
print('True Values: ', y_true.values)

Prediction:  {"result": [0, 0, 0]}
True Values:  [0 0 0]


In [30]:
service.get_logs()

'2021-02-12T19:38:09,210437600+00:00 - iot-server/run \n2021-02-12T19:38:09,211481800+00:00 - rsyslog/run \n2021-02-12T19:38:09,232090000+00:00 - gunicorn/run \n2021-02-12T19:38:09,285098000+00:00 - nginx/run \n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)\n/usr/sbin/nginx: /azureml-envs/azureml_20a8278aa8b20dd48cc50f56a6d2586c/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)

In [32]:
# Delete the service
service.delete()

In [33]:
new_cluster.delete()