# Automated ML

Below are the dependencies that will be needed to complete the project.

In [5]:
import azureml.core
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.core.workspace import Workspace
from azureml.core import Experiment, Webservice, Model
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice


import pandas as pd
import sklearn
import json

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.20.0


## Dataset

### Overview


  This is an analyzed dataset containing the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and at the Allied Hospital in Faisalabad (Punjab, Pakistan), during April–December 2015. The patients consisted of 105 women and 194 men, and their ages range between 40 and 95 years old. All 299 patients had left ventricular systolic dysfunction and had previous heart failures that put them in classes III or IV of New York Heart Association (NYHA) classification of the stages of heart failure. As done by the original data curators, this dataset was represented as a table having 299 rows (patients) and 13 columns (features).
  
  

###  Task
  With the kaggle heart failure dataset I'll be using the knowledge I have obtained from the Machine Learning Engineer with Microsoft Azure Nanodegree Program to create a machine learning model that can assess the likelihood of a death by heart failure event. This can be used to help hospitals in assessing the severity of patients with cardiovascular disease. In this project, I will create two models: one using Automated ML (denoted as AutoML from now on) and one customized model whose hyperparameters are tuned using HyperDrive.

In [6]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.subscription_id, sep = '\n')

quick-starts-ws-136827
aml-quickstarts-136827
610d6e37-4747-4a20-80eb-3aad70a55f43


In [7]:
# choose a name for experiment
experiment_name = 'capstone-heartfailure-exp'

experiment=Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
capstone-heartfailure-exp,quick-starts-ws-136827,Link to Azure Machine Learning studio,Link to Documentation


In [9]:
# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.

### YOUR CODE HERE ###
cluster_name="cpu-cluster"

try:
    cpu_cluster=ComputeTarget(workspace=ws, name=cluster_name)
    print("Existing cluster detected, make use of it!")
except ComputeTargetException:
    print("New compute cluster creation in progress...")
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_D2_V2',max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)   
    
cpu_cluster.wait_for_completion(show_output=True)         
print("Cluster is ready")

New compute cluster creation in progress...
Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
Cluster is ready


In [15]:
path = "https://raw.githubusercontent.com/Arushikha0408/nd00333-capstone/master/heart_failure_clinical_records_dataset.csv"
dataset = Dataset.Tabular.from_delimited_files(path)

In [16]:
dataset.take(5).to_pandas_dataframe()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

The following code shows a basic example of creating an AutoMLConfig object and submitting an experiment for classification.
I chose the automl settings below because I wanted to specify the experiment type as classification. The classification experiment will be carried out using AUC weighted as the primary metric with experiment timeout minutes set to 30 minutes and 5 cross-validation folds with the maximum number of iterations that would be executed concurrently set to 4.
All of these settings defines the machine learning task.


The configuration object below contains and persists the parameters for configuring the experiment run, as well as the training data to be used at run time.

In [18]:
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'AUC_weighted',
    "n_cross_validations": 5
}
automl_config = AutoMLConfig(compute_target=cpu_cluster,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="DEATH_EVENT", 
                             enable_early_stopping= True,
                             featurization= 'auto',
                             enable_voting_ensemble= True,
                             **automl_settings
                            )

In [19]:
# TODO: Submit your experiment

remote_run = experiment.submit(automl_config, show_output=True)

Running on remote.
No run_configuration provided, running on cpu-cluster with default configuration
Running on remote compute: cpu-cluster
Parent Run ID: AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f

Current status: FeaturesGeneration. Generating features for the dataset.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: 

## Run  Details 

In [20]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [21]:
remote_run.wait_for_completion(show_output=True)



****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

****************************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more abo

{'runId': 'AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-02-01T13:44:39.185132Z',
 'endTimeUtc': '2021-02-01T14:11:24.087432Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'cpu-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"fc28ac7a-7722-45d1-8dad-39c9f80015db\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"isArchive\\\\\\": false, \\\\\\"path\\\\\\": {\\\\\\"target\\\\\\": 4, \\\\\\"resourceDetails\\\\\\": [{\\\\\\"path\\\\\\": \\\\\\"https://raw.githubusercontent.com/Arushikha0408/nd00333-capstone/master/heart_failure_clinical_records_dataset.csv\\\\\\"}]}}, \\\\\\"localData\\\\\\": {}, \\\\\\"isEnable

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [22]:
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)

Run(Experiment: capstone-heartfailure-exp,
Id: AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f_52,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                    min_samples_leaf=0.01,
                                                                                                    min_samples_split=0.1036842105

In [23]:
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print(metric_name)
    print(metric)

recall_score_micro
0.8427118644067797
log_loss
0.3836326267779817
f1_score_weighted
0.8408120227175276
norm_macro_recall
0.6377643964562569
f1_score_micro
0.8427118644067797
accuracy
0.8427118644067797
precision_score_micro
0.8427118644067797
average_precision_score_macro
0.9106039678713815
AUC_macro
0.9191440337763013
AUC_micro
0.922749816463979
weighted_accuracy
0.8587400350256346
precision_score_macro
0.8215112790935072
precision_score_weighted
0.8542713009135502
average_precision_score_micro
0.9255168039153006
balanced_accuracy
0.8188821982281285
average_precision_score_weighted
0.9314415655557775
AUC_weighted
0.9191440337763013
recall_score_macro
0.8188821982281285
matthews_correlation
0.6390272274450925
recall_score_weighted
0.8427118644067797
f1_score_macro
0.811481910615036
confusion_matrix
aml://artifactId/ExperimentRun/dcid.AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f_52/confusion_matrix
accuracy_table
aml://artifactId/ExperimentRun/dcid.AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d

In [24]:
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
capstone-heartfailure-exp,AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f_52,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [25]:
print('Best Run Id: ', best_run.id)
print('\n AUC_weighted :', best_run_metrics['AUC_weighted'])
print(best_run.get_tags())

Best Run Id:  AutoML_f42f2be7-4d76-46fd-ba79-2ec7ee0d470f_52

 AUC_weighted : 0.9191440337763013
{'_aml_system_azureml.automlComponent': 'AutoML', '_aml_system_ComputeTargetStatus': '{"AllocationState":"steady","PreparingNodeCount":0,"RunningNodeCount":0,"CurrentNodeCount":4}', 'ensembled_iterations': '[33, 25, 43, 38, 42, 2]', 'ensembled_algorithms': "['RandomForest', 'XGBoostClassifier', 'GradientBoosting', 'RandomForest', 'RandomForest', 'RandomForest']", 'ensemble_weights': '[0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666]', 'best_individual_pipeline_score': '0.9181987126245845', 'best_individual_iteration': '33', '_aml_system_automl_is_child_run_end_telemetry_event_logged': 'True'}


In [26]:
#TODO: Save the best model

best_run.register_model(model_name='automl_model',model_path='/outputs',properties={'AUC_weighted':best_run_metrics['AUC_weighted']},tags={'Training context':'Auto ML'})

Model(workspace=Workspace.create(name='quick-starts-ws-136827', subscription_id='610d6e37-4747-4a20-80eb-3aad70a55f43', resource_group='aml-quickstarts-136827'), name=automl_model, id=automl_model:1, version=1, tags={'Training context': 'Auto ML'}, properties={'AUC_weighted': '0.9191440337763013'})

In [27]:
model_name = best_run.properties['model_name']

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [28]:
# Register the model

model = remote_run.register_model(model_name = model_name,
                                  description = 'Automl model')

print('Name:', model.name)

Name: AutoMLf42f2be7452


In [29]:
env = best_run.get_environment()

script_name = 'score.py'


best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')

In [30]:
# Create an inference config

inference_config = InferenceConfig(entry_script= script_name, environment=env)

aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, enable_app_insights=True)
aci_service_name='capstone-service'


In [31]:
service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_config)
                        

In [35]:
service.wait_for_deployment(show_output=True)

print(service.state)
url = service.scoring_uri
print(url)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
SucceededACI service creation operation finished, operation "Succeeded"
Healthy
http://f0f828f8-9020-4182-b1dd-340c9b7986bd.southcentralus.azurecontainer.io/score


TODO: In the cell below, send a request to the web service you deployed to test it.

In [34]:
%run endpoint.py

ConnectionError: HTTPConnectionPool(host='be5e2155-fdac-4de4-a064-00be48b598ba.southcentralus.azurecontainer.io', port=80): Max retries exceeded with url: /score (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff8af0be550>: Failed to establish a new connection: [Errno -2] Name or service not known',))

In [None]:
data_test = dataset.to_pandas_dataframe().dropna()
data_sample = data_test.sample(3)
y_true = data_sample.pop('DEATH_EVENT')
sample_json = json.dumps({'data':data_sample.to_dict(orient='records')})
print(sample_json)

TODO: In the cell below, print the logs of the web service and delete the service

In [None]:
output = service.run(sample_json)
print('Prediction: ', output)
print('True Values: ', y_true.values)

In [None]:
service.get_logs()

In [54]:
# Delete the service
service.delete()

In [55]:
new_cluster.delete()