## Automated ML

Importing all needed dependencies to complete the project.

In [21]:
import requests
import json
import logging
import joblib
from pprint import pprint
import pandas as pd
from sklearn.model_selection import train_test_split

import azureml.core
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.train.automl import AutoMLConfig
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.core.model import Model, InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.automl.core.shared import constants
from azureml.core.environment import Environment

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)


SDK version: 1.47.0
Current provisioning state of AmlCompute is "Deleting"



## Workspace

The config.json file is downloaded from Azure environment and has to be in the same folder in order for this cell to run.

In [22]:
ws = Workspace.from_config()

print("Workspace name: ", ws.name)
print("Subscription id: ", ws.subscription_id)
print("Resource group: ", ws.resource_group, sep='\n')

Workspace name:  quick-starts-ws-218450
Subscription id:  9a7511b8-150f-4a58-8528-3e7d50216c31
Resource group: 
aml-quickstarts-218450


## Create an Azure ML experiment
I am creating an experiment named "automl_heart_failure_experiment" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

In [23]:
# choose a name for experiment
experiment_name = "automl_heart_failure_experiment"
project_folder = './heart_failure-project'

experiment = Experiment(ws,experiment_name)

## Create or Attach an AmlCompute cluster
For the Automl run, we need to create a compute target.

In [25]:
# Choose a name for the cluster
cpu_cluster_name = 'cluster-cpu'

# Verify that cluster does not exist already

try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    print('Creating a new compute cluster...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS3_v2', min_nodes=1, max_nodes=4)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())


Creating a new compute cluster...
InProgress..
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded...........
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 1, 'runningNodeCount': 0, 'idleNodeCount': 0, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2022-12-15T14:28:58.599000+00:00', 'errors': None, 'creationTime': '2022-12-15T14:27:57.533608+00:00', 'modifiedTime': '2022-12-15T14:28:04.463145+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT1800S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS3_V2'}


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.

The dataset contains medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

I am using this data in order to predict the DEATH_EVENT i.e. whether or not the patient deceased during the follow-up period (boolean).

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

The dataset we will be using in this project is called Heart failure clinical records Data Set and is publicly available from UCI Machine Learning Repository.

In [26]:
# test to see if dataset is in store

key = "heart-failure"
description_text = "Heart failure survival prediction"


if key in ws.datasets.keys():
    dataset = ws.datasets[key]
    print('The Dataset was found')
else:
    # Create AML Dataset and register it into Workspace
    data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"
    dataset = Dataset.Tabular.from_delimited_files(data_url)
    #Register Dataset in Workspace
    dataset = dataset.register(workspace = ws,name = key,description = description_text)

df = dataset.to_pandas_dataframe()
    

The Dataset was found


In [8]:
# Preview of the first five rows
print(df.head())

# Explore data
print(df.describe())

    age  anaemia  creatinine_phosphokinase  diabetes  ejection_fraction  \
0  75.0        0                       582         0                 20   
1  55.0        0                      7861         0                 38   
2  65.0        0                       146         0                 20   
3  50.0        1                       111         0                 20   
4  65.0        1                       160         1                 20   

   high_blood_pressure  platelets  serum_creatinine  serum_sodium  sex  \
0                    1  265000.00               1.9           130    1   
1                    0  263358.03               1.1           136    1   
2                    0  162000.00               1.3           129    1   
3                    0  210000.00               1.9           137    1   
4                    0  327000.00               2.7           116    0   

   smoking  time  DEATH_EVENT  
0        0     4            1  
1        0     6            1  
2       

## AutoML Configuration

"experiment_timeout_minutes": 30 - This is an exit criterion and is used to define how long, in minutes, the experiment should continue to run. To help avoid experiment time out failures, I used the value of 30 minutes.

"enable_early_stopping": True - It defines to enable early termination if the score is not improving in the short term. In this experiment, it could also be omitted because the experiment_timeout_minutes is already defined below.

"primary_metric": 'accuracy' - I chose accuracy as the primary metric as it is the default metric used for classification tasks.

"n_cross_validations": 4 - This parameter sets how many cross validations to perform, based on the same number of folds (number of subsets). Is set to 4, therefore the training and validation sets will be divided into four equal sets.

"max_concurrent_iterations": 4 - It represents the maximum number of iterations that would be executed in parallel.

"verbosity": logging.INFO - The verbosity level for writing to the log file.

task = 'classification' - This defines the experiment type which in this case is classification. Other options are regression and forecasting.

training_data = dataset - the loaded dataset for the project

label_column_name = 'DEATH_EVENT' - The name of the label column i.e. the target column based on which the prediction is done.

featurization = 'auto' - This parameter defines whether featurization step should be done automatically as in this case (auto) or not (off).

debug_log = 'automl_errors.log - The log file to write debug information to.

In [9]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes" : 30,
    "enable_early_stopping" : True,
    "primary_metric":'accuracy',
    "n_cross_validations":4,
    "max_concurrent_iterations":4,
    "verbosity": logging.INFO
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(compute_target = compute_target,
                            task = 'classification',
                            training_data=dataset,
                            path = project_folder,
                            label_column_name="DEATH_EVENT",
                            featurization= 'auto',
                            debug_log = "automl_errors.log",
                            **automl_settings
                            )

In [10]:
# Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)
remote_run.wait_for_completion()

Submitting remote run.
No run_configuration provided, running on cluster-cpu with default configuration
Running on remote compute: cluster-cpu


Experiment,Id,Type,Status,Details Page,Docs Page
automl_heart_failure_experiment,AutoML_b6def65b-9cfa-4e43-a888-1737df89782c,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

**********************************************************************************

{'runId': 'AutoML_b6def65b-9cfa-4e43-a888-1737df89782c',
 'target': 'cluster-cpu',
 'status': 'Completed',
 'startTimeUtc': '2022-12-15T11:14:52.258605Z',
 'endTimeUtc': '2022-12-15T11:25:27.104835Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '4',
  'target': 'cluster-cpu',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"2f7dc0a1-37b5-4501-b713-800a78cd006f\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets":

In [11]:
# get_status()
# Fetch the latest status of the run. It should show 'Completed'

print("Run Status: ",remote_run.get_status())

Run Status:  Completed


## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [12]:
RunDetails(remote_run).show()
remote_run.wait_for_completion (show_output = True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
automl_heart_failure_experiment,AutoML_b6def65b-9cfa-4e43-a888-1737df89782c,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

********************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feat

{'runId': 'AutoML_b6def65b-9cfa-4e43-a888-1737df89782c',
 'target': 'cluster-cpu',
 'status': 'Completed',
 'startTimeUtc': '2022-12-15T11:14:52.258605Z',
 'endTimeUtc': '2022-12-15T11:25:27.104835Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '4',
  'target': 'cluster-cpu',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"2f7dc0a1-37b5-4501-b713-800a78cd006f\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets":

## Best Model

In the cell below, get the best model from the automl experiments and display all the properties of the model.

In [13]:
best_run,fitted_model = remote_run.get_output()

In [14]:
# Best run

print(best_run)

Run(Experiment: automl_heart_failure_experiment,
Id: AutoML_b6def65b-9cfa-4e43-a888-1737df89782c_44,
Type: azureml.scriptrun,
Status: Completed)


In [15]:
# get_metrics()
print(best_run.get_metrics())

{'recall_score_micro': 0.8761711711711712, 'precision_score_macro': 0.8766115195897278, 'balanced_accuracy': 0.8399982089865564, 'f1_score_micro': 0.8761711711711712, 'AUC_weighted': 0.9118961932344701, 'accuracy': 0.8761711711711712, 'precision_score_micro': 0.8761711711711712, 'matthews_correlation': 0.7150586161403495, 'log_loss': 0.3702043373242037, 'recall_score_macro': 0.8399982089865564, 'weighted_accuracy': 0.9010201049353981, 'recall_score_weighted': 0.8761711711711712, 'f1_score_macro': 0.8498976865342798, 'AUC_macro': 0.91189619323447, 'average_precision_score_macro': 0.8918323233451431, 'precision_score_weighted': 0.8814204477206898, 'average_precision_score_micro': 0.9193162271135313, 'f1_score_weighted': 0.8715271857654576, 'norm_macro_recall': 0.6799964179731128, 'average_precision_score_weighted': 0.9172431321435099, 'AUC_micro': 0.9181953494034575, 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_b6def65b-9cfa-4e43-a888-1737df89782c_44/confusion_matrix',

In [16]:
# get_details()
print(best_run.get_details())

{'runId': 'AutoML_b6def65b-9cfa-4e43-a888-1737df89782c_44', 'target': 'cluster-cpu', 'status': 'Completed', 'startTimeUtc': '2022-12-15T11:24:35.781455Z', 'endTimeUtc': '2022-12-15T11:25:17.429726Z', 'services': {}, 'properties': {'runTemplate': 'automl_child', 'pipeline_id': '__AutoML_Ensemble__', 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'automl_heart_failure_experiment\',\'compute_target\':\'cluster-cpu\',\'subscription_id\':\'9a7511b8-150f-4a58-8528-3e7d50216c31\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_b6def65b-9cfa-4e43-a888-1737df89782c_44","experiment_name":"automl_heart_failure_experiment","workspace_name":"quick-starts-ws-218450","su

In [17]:
# get_properties()
print(best_run.get_properties())

{'runTemplate': 'automl_child', 'pipeline_id': '__AutoML_Ensemble__', 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'accuracy\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'automl_heart_failure_experiment\',\'compute_target\':\'cluster-cpu\',\'subscription_id\':\'9a7511b8-150f-4a58-8528-3e7d50216c31\',\'region\':\'southcentralus\',\'spark_service\':None}","ensemble_run_id":"AutoML_b6def65b-9cfa-4e43-a888-1737df89782c_44","experiment_name":"automl_heart_failure_experiment","workspace_name":"quick-starts-ws-218450","subscription_id":"9a7511b8-150f-4a58-8528-3e7d50216c31","resource_group_name":"aml-quickstarts-218450"}}]}', 'training_percent': '100', 'predicted_cost': None, 'iteration': '44', '_aml_system_scenario_identification': 'Remote.Child'

In [18]:
# Save the best model
best_run.register_model(model_name = 'automl-best-model.pkl',model_path = './outputs/')

Model(workspace=Workspace.create(name='quick-starts-ws-218450', subscription_id='9a7511b8-150f-4a58-8528-3e7d50216c31', resource_group='aml-quickstarts-218450'), name=automl-best-model.pkl, id=automl-best-model.pkl:1, version=1, tags={}, properties={})

In [19]:
best_run.get_file_names()

# Download the yaml file that includes the environment dependencies
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'env.yml')

In [20]:

# Download the model file

best_run.download_file('outputs/model.pkl', 'Automl_model.pkl')



Current provisioning state of AmlCompute is "Deleting"



## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [27]:
# Registring the best model
model = remote_run.register_model(model_name='automl-best-model.pkl')
print(remote_run.model_id)

# Get automl environment with its dependencies
environment = Environment.get(ws, "AzureML-AutoML")

environment = best_run.get_environment()
entry_script='inference/scoring.py'
best_run.download_file('outputs/scoring_file_v_1_0_0.py', entry_script)

inference_config = InferenceConfig(entry_script = entry_script, environment = environment)

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                                    memory_gb = 1, 
                                                    auth_enabled= True, 
                                                    enable_app_insights= True)

service = Model.deploy(ws, "aciservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)


automl-best-model.pkl
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-12-15 14:39:08+00:00 Creating Container Registry if not exists.
2022-12-15 14:39:08+00:00 Registering the environment.
2022-12-15 14:39:09+00:00 Use the existing image.
2022-12-15 14:39:11+00:00 Submitting deployment to compute.
2022-12-15 14:39:14+00:00 Checking the status of deployment aciservice..
2022-12-15 14:41:58+00:00 Checking the status of inference endpoint aciservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [28]:
# Getting the service state
# The scorig URI & the primary authentication key are copied to the endpoint.py file in order to test the deployed service.
# The Swagger URI can be used in Swagger UI: https://petstore.swagger.io/ For more info, please see the relevant part in the README file.

# Authentication is enabled, so I use the get_keys method to retrieve the primary and secondary authentication keys:
primary, secondary = service.get_keys()

print('Service state: ' + service.state)
print('Service scoring URI: ' + service.scoring_uri)
print('Service Swagger URI: ' + service.swagger_uri)
print('Service primary authentication key: ' + primary)


Service state: Healthy
Service scoring URI: http://37e3a248-3412-4721-89d0-0a804dd90d80.southcentralus.azurecontainer.io/score
Service Swagger URI: http://37e3a248-3412-4721-89d0-0a804dd90d80.southcentralus.azurecontainer.io/swagger.json
Service primary authentication key: WcUL894VZON3S7seLGISXuDicNi6Cvx6


In the cell below, send a request to the web service you deployed to test it.

In [29]:
#%run endpoint.py

import requests
import json

# URL for the web service, should be similar to:

scoring_uri = 'http://37e3a248-3412-4721-89d0-0a804dd90d80.southcentralus.azurecontainer.io/score'

# If the service is authenticated, set the key or token

key = 'WcUL894VZON3S7seLGISXuDicNi6Cvx6'

data = {"data":
        [
          {
           "age": 60, 
           "anaemia": 1, 
           "creatinine_phosphokinase": 315, 
           "diabetes": 1, 
           "ejection_fraction": 60, 
           "high_blood_pressure": 0, 
           "platelets": 454000, 
           "serum_creatinine": 1.1, 
           "serum_sodium": 131, 
           "sex": 1, 
           "smoking": 1,
           "time": 10
          },
          {
           "age": 55, 
           "anaemia": 0, 
           "creatinine_phosphokinase": 1820, 
           "diabetes": 0, 
           "ejection_fraction": 38, 
           "high_blood_pressure": 0, 
           "platelets": 270000, 
           "serum_creatinine": 1.2, 
           "serum_sodium": 139, 
           "sex": 0, 
           "smoking": 0,
           "time": 271
          },
      ]
    }
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'


# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.json())
print("Expected result: [true, false], where 'true' means '1' and 'false' means '0' as result in the 'DEATH_EVENT' column")

{"result": [1, 0]}
Expected result: [true, false], where 'true' means '1' and 'false' means '0' as result in the 'DEATH_EVENT' column


TODO: In the cell below, print the logs of the web service and delete the service

In [30]:
# Printing the logs
print(service.get_logs())

2022-12-15T14:41:47,624645200+00:00 - rsyslog/run 
2022-12-15T14:41:47,624623000+00:00 - iot-server/run 
2022-12-15T14:41:47,632971700+00:00 - gunicorn/run 
2022-12-15T14:41:47,636048800+00:00 | gunicorn/run | 
2022-12-15T14:41:47,641373200+00:00 | gunicorn/run | ###############################################
2022-12-15T14:41:47,643433100+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-12-15T14:41:47,649632100+00:00 | gunicorn/run | ###############################################
2022-12-15T14:41:47,651163400+00:00 | gunicorn/run | 
2022-12-15T14:41:47,653194100+00:00 | gunicorn/run | 
2022-12-15T14:41:47,670363700+00:00 - nginx/run 
2022-12-15T14:41:47,670615900+00:00 | gunicorn/run | AzureML image information: openmpi3.1.2-ubuntu18.04, Materializaton Build:20220930.v4
2022-12-15T14:41:47,677099700+00:00 | gunicorn/run | 
2022-12-15T14:41:47,679397500+00:00 | gunicorn/run | 
2022-12-15T14:41:47,686302600+00:00 | gunicorn/run | PATH environment variable: /azureml-env

## Deleting the service
Putting the deletion of the service in a separate cell to avoid accidentally running the cell before finishing the tasks

In [None]:
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
