# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import sklearn
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.44.0


## Dataset

### Overview
Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worlwide.

Heart failure is a common event caused by CVDs and this dataset contains 12 features that can be used to predict mortality by heart failure.

Most cardiovascular diseases can be prevented by addressing behavioural risk factors such as tobacco use, unhealthy diet and obesity, physical inactivity and harmful use of alcohol using population-wide strategies.

People with cardiovascular disease or who are at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or already established disease) need early detection and management wherein a machine learning model can be of great help.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'capstone'

experiment=Experiment(ws, experiment_name)

In [3]:
dataset = Dataset.get_by_name(ws, name='heart_failure_dataset')
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   age                       299 non-null    float64
 1   anaemia                   299 non-null    int64  
 2   creatinine_phosphokinase  299 non-null    int64  
 3   diabetes                  299 non-null    int64  
 4   ejection_fraction         299 non-null    int64  
 5   high_blood_pressure       299 non-null    int64  
 6   platelets                 299 non-null    float64
 7   serum_creatinine          299 non-null    float64
 8   serum_sodium              299 non-null    int64  
 9   sex                       299 non-null    int64  
 10  smoking                   299 non-null    int64  
 11  time                      299 non-null    int64  
 12  DEATH_EVENT               299 non-null    int64  
dtypes: float64(3), int64(10)
memory usage: 30.5 KB


## Create or Attach an AmlCompute cluster
You will need to create a compute target for your AutoML run. 

**Udacity Note** There is no need to create a new compute target, it can re-use the previous cluster

In [5]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

In [6]:
# Create the CPU computer cluster
amlcompute_cluster_name = "capstone"

# Verify if cluster does not exist otherwise use the existing one
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS12_V2',
                                                           vm_priority = 'lowpriority', 
                                                           max_nodes=5,
                                                           min_nodes=0,
                                                           idle_seconds_before_scaledown=600)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.

Running


## AutoML Configuration

Here is an overview of AutoML settings, 

- **experiment_timeout_minutes**: set to 20 minutes. The experiment will timeout after after this period of time to avoid incurring additional expense.

- **max_concurrent_iterations**: set to 4. The max number of concurrent iterations that can be run on the nodes in the compute cluster.

- **primary_metric** : set to 'AUC_weighted' because this is an imbalanced dataset

- **n_cross_validations**: set to 4 fold cross validation.


Here is an overview of AutoML Configuration

- **compute_target**: set to CPU compute cluster name defined above.

- **task**: set to 'classification' because our target is to predict how many upvotes a comment will get.

- **training_data**: reddit upvotes dataset.

- **label_column_name**: set to 'DEATH_EVENT' which is our target.

- **enable_early_stopping**: enabled to terminate the experiment if the accuracy score does not improveme over time, thus avoiding unnecessary costs.

- **featurization** = set to 'auto' so Azure ML will automatically handle featurization and clean the dataset.

- **debug_log**: errors will be logged into 'automl_errors.log'.


In [7]:
# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_minutes": 20,
    "max_concurrent_iterations": 4,
    "primary_metric" : 'AUC_weighted',
    "n_cross_validations": 4
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset, 
                             label_column_name="DEATH_EVENT", 
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )

In [8]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output=True)

Submitting remote run.
No run_configuration provided, running on capstone with default configuration
Running on remote compute: capstone


Experiment,Id,Type,Status,Details Page,Docs Page
capstone,AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

**********************************************************************************

## Run Details

- Tree based models (both Random Forest and gradient boosted trees) performed quite well. In almost all the models we have either StandardScaling or MinMaxScaling on the dataset
- Ensembling using a simple voting performs much better than stacking ensemble. 


TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [9]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

remote_run.wait_for_completion(show_output=True)

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

Experiment,Id,Type,Status,Details Page,Docs Page
capstone,AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation




********************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

********************************************************************************************

TYPE:         Missing feature values imputation
STATUS:       PASSED
DESCRIPTION:  No feature missing values were detected in the training data.
              Learn more about missing value imputation: https://aka.ms/AutomatedMLFeaturization

********************************************************************************************

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.
              Learn more about high cardinality feat

{'runId': 'AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c',
 'target': 'capstone',
 'status': 'Completed',
 'startTimeUtc': '2022-11-10T17:33:03.584251Z',
 'endTimeUtc': '2022-11-10T17:45:02.087473Z',
 'services': {},
   'message': 'No scores improved over last 10 iterations, so experiment stopped early. This early stopping behavior can be disabled by setting enable_early_stopping = False in AutoMLConfig for notebook/python SDK runs.'}],
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '4',
  'target': 'capstone',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"5c97a399-84ac-44c8-a88f-807f7c8a74cb\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [10]:
# Retrieve best model from Pipeline Run
best_automl, best_fit_model = remote_run.get_output()
print(best_fit_model)

Package:azureml-automl-runtime, training version:1.46.1, current version:1.44.0
Package:azureml-core, training version:1.46.0, current version:1.44.0
Package:azureml-dataprep, training version:4.5.7, current version:4.2.2
Package:azureml-dataprep-rslex, training version:2.11.4, current version:2.8.1
Package:azureml-dataset-runtime, training version:1.46.0, current version:1.44.0
Package:azureml-defaults, training version:1.46.0, current version:1.44.0
Package:azureml-interpret, training version:1.46.0, current version:1.44.0
Package:azureml-mlflow, training version:1.46.0, current version:1.44.0
Package:azureml-pipeline-core, training version:1.46.0, current version:1.44.0
Package:azureml-responsibleai, training version:1.46.0, current version:1.44.0
Package:azureml-telemetry, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-client, training version:1.46.0, current version:1.44.0
Package:azureml-train-automl-runtime, training version:1.46.1, current version:

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
                                                                                                  n_estimators=100,
                                                                                                  n_jobs=1,
                                                                                                  oob_score=False,
                                                                                                  random_state=None,
                                                                                                  verbose=0,
                                           

In [11]:
print(best_automl)

Run(Experiment: capstone,
Id: AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c_36,
Type: azureml.scriptrun,
Status: Completed)


In [12]:
best_automl.get_metrics()

{'average_precision_score_micro': 0.9225165791070125,
 'precision_score_micro': 0.8593693693693694,
 'precision_score_macro': 0.8562443342398549,
 'norm_macro_recall': 0.639266199117894,
 'AUC_weighted': 0.9220529379074577,
 'precision_score_weighted': 0.8655296079954534,
 'balanced_accuracy': 0.8196330995589469,
 'f1_score_macro': 0.8265824682063927,
 'weighted_accuracy': 0.8864189079512681,
 'log_loss': 0.3952271502469328,
 'average_precision_score_weighted': 0.9261316739799159,
 'recall_score_weighted': 0.8593693693693694,
 'recall_score_macro': 0.8196330995589469,
 'f1_score_micro': 0.8593693693693694,
 'average_precision_score_macro': 0.9029214314367839,
 'AUC_micro': 0.9208236182128073,
 'accuracy': 0.8593693693693694,
 'recall_score_micro': 0.8593693693693694,
 'matthews_correlation': 0.6736872328817212,
 'AUC_macro': 0.9220529379074577,
 'f1_score_weighted': 0.852619576037233,
 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c_

In [13]:
best_automl.properties

{'runTemplate': 'automl_child',
 'pipeline_id': '__AutoML_Ensemble__',
 'pipeline_spec': '{"pipeline_id":"__AutoML_Ensemble__","objects":[{"module":"azureml.train.automl.ensemble","class_name":"Ensemble","spec_class":"sklearn","param_args":[],"param_kwargs":{"automl_settings":"{\'task_type\':\'classification\',\'primary_metric\':\'AUC_weighted\',\'verbosity\':20,\'ensemble_iterations\':15,\'is_timeseries\':False,\'name\':\'capstone\',\'compute_target\':\'capstone\',\'subscription_id\':\'4999417e-f032-4d4c-982a-c229e26aa825\',\'region\':\'germanywestcentral\',\'spark_service\':None}","ensemble_run_id":"AutoML_3ad48a9e-b036-4cc8-aff4-bee9f8e1822c_36","experiment_name":"capstone","workspace_name":"udacity","subscription_id":"4999417e-f032-4d4c-982a-c229e26aa825","resource_group_name":"udacity-learning"}}]}',
 'training_percent': '100',
 'predicted_cost': None,
 'iteration': '36',
 '_aml_system_scenario_identification': 'Remote.Child',
 '_azureml.ComputeTargetType': 'amlctrain',
 'ContentS

In [14]:
best_fit_model.steps

[('datatransformer',
  DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, task='classification')),
 ('prefittedsoftvotingclassifier',
  PreFittedSoftVotingClassifier(classification_labels=array([0, 1]),
                                estimators=[('16',
                                             Pipeline(memory=None,
                                                      steps=[('minmaxscaler',
                                                              MinMaxScaler(copy=True,
                                                                           feature_range=(0,
                                                                                          1))),
                                                             ('extratreesclassifier',
                                                              Extr

In [15]:
#Save the best model, scoring script and conda environment of the best run
import joblib
import os

inference_folder = 'inference'
automl_model = os.path.join(inference_folder, 'model.pkl')
score_script = os.path.join(inference_folder, 'score.py')
conda_env = os.path.join(inference_folder, 'conda_env.yml')


best_automl.download_file('outputs/model.pkl', automl_model)
best_automl.download_file('outputs/scoring_file_v_2_0_0.py', score_script)
best_automl.download_file('outputs/conda_env_v_1_0_0.yml', conda_env)

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [16]:
# Register the Model
from azureml.core.model import Model
model = Model.register(
    workspace = ws,
    model_name = 'best_fit_automl_model', 
    model_path = automl_model,
    model_framework=Model.Framework.SCIKITLEARN,
    model_framework_version=sklearn.__version__,
    description='Auto ML model for predicting heart failure'
    )
print(model.name, model.id, model.version, sep='\t')

Registering model best_fit_automl_model
best_fit_automl_model	best_fit_automl_model:3	3


In [17]:
# Deploy the Model
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import LocalWebservice, Webservice, AciWebservice
from azureml.core.conda_dependencies import CondaDependencies
import azureml.train.automl

# Create the environment
env = best_automl.get_environment()

# Define inference configuration
inference_config = InferenceConfig(entry_script=score_script, environment=env)

# Define deployment configuration
deployment_config = AciWebservice.deploy_configuration(
    cpu_cores=1, 
    memory_gb=1, 
    description='Predicting deaths caused by heart failure',
    enable_app_insights=True)


# Deploy model as webservice using Azure Container Instance (ACI)
service_name = "heart-failure-service"

service = Model.deploy(
    workspace=ws,
    name=service_name, 
    models=[model], 
    inference_config=inference_config, 
    deployment_config=deployment_config, 
    overwrite=True)

service.wait_for_deployment(show_output=True)

print(service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-11-10 17:48:42+00:00 Creating Container Registry if not exists.
2022-11-10 17:48:42+00:00 Registering the environment.
2022-11-10 17:48:43+00:00 Use the existing image.
2022-11-10 17:48:43+00:00 Submitting deployment to compute.
2022-11-10 17:48:50+00:00 Checking the status of deployment heart-failure-service..
2022-11-10 17:51:03+00:00 Checking the status of inference endpoint heart-failure-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


TODO: In the cell below, send a request to the web service you deployed to test it.

In [19]:
import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

# Request data goes here
# The example below assumes JSON formatting which may be updated
# depending on the format your endpoint expects.
# More information can be found here:
# https://docs.microsoft.com/azure/machine-learning/how-to-deploy-advanced-entry-script
data =  {
  "Inputs": {
    "data": [
      {
        "age": 75,
        "anaemia": 0,
        "creatinine_phosphokinase": 582,
        "diabetes": 0,
        "ejection_fraction": 20,
        "high_blood_pressure": 1,
        "platelets": 265000,
        "serum_creatinine": 1.9,
        "serum_sodium": 130,
        "sex": 1,
        "smoking": 0,
        "time": 4
      }
    ]
  },
  "GlobalParameters": {
    "method": "predict"
  }
}

body = str.encode(json.dumps(data))

url = 'http://89844934-605c-406c-ac50-7f0a03766bdd.germanywestcentral.azurecontainer.io/score'
api_key = '' # Replace this with the API key for the web service

# The azureml-model-deployment header will force the request to go to a specific deployment.
# Remove this header to have the request observe the endpoint traffic rules
headers = {
    'Content-Type':'application/json', 
    #'Authorization':('Bearer '+ api_key)
    }

req = urllib.request.Request(
    url, 
    body, 
    headers
    )

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

b'{"Results": [1]}'


TODO: In the cell below, print the logs of the web service and delete the service

In [20]:
print(service.get_logs())

2022-11-10T17:50:54,808896200+00:00 - rsyslog/run 
2022-11-10T17:50:54,809564000+00:00 - iot-server/run 
2022-11-10T17:50:54,820146900+00:00 - gunicorn/run 
2022-11-10T17:50:54,822849900+00:00 | gunicorn/run | 
2022-11-10T17:50:54,826753800+00:00 | gunicorn/run | ###############################################
2022-11-10T17:50:54,830986300+00:00 | gunicorn/run | AzureML Container Runtime Information
2022-11-10T17:50:54,832406900+00:00 | gunicorn/run | ###############################################
2022-11-10T17:50:54,833673900+00:00 | gunicorn/run | 
2022-11-10T17:50:54,835067200+00:00 | gunicorn/run | 
2022-11-10T17:50:54,847924300+00:00 - nginx/run 
2022-11-10T17:50:54,852328700+00:00 | gunicorn/run | AzureML image information: openmpi3.1.2-ubuntu18.04, Materializaton Build:20220930.v4
2022-11-10T17:50:54,858368600+00:00 | gunicorn/run | 
2022-11-10T17:50:54,860855100+00:00 | gunicorn/run | 
2022-11-10T17:50:54,865793500+00:00 | gunicorn/run | PATH environment variable: /azureml-env

In [53]:
# Delete the service
service.delete()

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
