# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import os
import joblib
import logging
import json
import requests

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails

from azureml.core.model import Model
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice

In [2]:
%conda remove xgboost
%pip install xgboost==0.90
import xgboost
print(xgboost.__version__)

Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done
Solving environment: \ | failed

PackagesNotFoundError: The following packages are missing from the target environment:
  - xgboost



Note: you may need to restart the kernel to use updated packages.
Collecting xgboost==0.90
  Downloading xgboost-0.90-py2.py3-none-manylinux1_x86_64.whl (142.8 MB)
[K     |████████████████████████████████| 142.8 MB 26 kB/s /s eta 0:00:01
Installing collected packages: xgboost
  Attempting uninstall: xgboost
    Found existing installation: xgboost 1.3.3
    Uninstalling xgboost-1.3.3:
      Successfully uninstalled xgboost-1.3.3
Successfully installed xgboost-0.90
Note: you may need to restart the kernel to use updated packages.
0.90


In [3]:
ws = Workspace.from_config()

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code RE3SZH4F8 to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
Workspace name: quick-starts-ws-141035
Azure region: southcentralus
Subscription id: 9a7511b8-150f-4a58-8528-3e7d50216c31
Resource group: aml-quickstarts-141035


In [4]:
# choose a name for experiment
experiment_name = 'automl-experiment'
experiment=Experiment(ws, experiment_name)

In [5]:
experiment

Name,Workspace,Report Page,Docs Page
automl-experiment,quick-starts-ws-141035,Link to Azure Machine Learning studio,Link to Documentation


## Create Compute Cluster

In [6]:
# Create compute cluster
cluster_name = "capstone-cluster"

# Verify that cluster does not exist already
try:
    compute_cluster = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration( vm_size='STANDARD_DS3_V2', max_nodes=6 )
    compute_cluster = ComputeTarget.create(ws, cluster_name, compute_config)

compute_cluster.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=10)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

### Overview
The dataset used for the capstone project is Heart Failure Prediction dataset, and it is obtained from kaggle. This dataset has 299 records and is comprised of 12 features. This is a classification task, in which we will be predicting the heart failure.  
Dataset can be found at this link: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data 

In [7]:
# Try to load the dataset from the Workspace. Otherwise, create it from the file
# NOTE: update the key to match the dataset name
found = False
key = "Heart-Failure-Dataset"
description_text = "Dataset for heart failure prediction."

if key in ws.datasets.keys(): 
        found = True
        dataset = ws.datasets[key] 
        print("Dataset is already registered in the workspace")

if not found:
        # Create AML Dataset and register it into Workspace
        data = 'https://raw.githubusercontent.com/TahreemArif/ML-Azure-Capstone-Project/master/heart_failure_clinical_records_dataset.csv'
        dataset = Dataset.Tabular.from_delimited_files(data)        
        #Register Dataset in Workspace
        dataset = dataset.register(workspace=ws,
                                   name=key,
                                   description=description_text)


df = dataset.to_pandas_dataframe()

Dataset is already registered in the workspace


In [8]:
df.describe()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
count,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0,299.0
mean,60.833893,0.431438,581.839465,0.41806,38.083612,0.351171,263358.029264,1.39388,136.625418,0.648829,0.32107,130.26087,0.32107
std,11.894809,0.496107,970.287881,0.494067,11.834841,0.478136,97804.236869,1.03451,4.412477,0.478136,0.46767,77.614208,0.46767
min,40.0,0.0,23.0,0.0,14.0,0.0,25100.0,0.5,113.0,0.0,0.0,4.0,0.0
25%,51.0,0.0,116.5,0.0,30.0,0.0,212500.0,0.9,134.0,0.0,0.0,73.0,0.0
50%,60.0,0.0,250.0,0.0,38.0,0.0,262000.0,1.1,137.0,1.0,0.0,115.0,0.0
75%,70.0,1.0,582.0,1.0,45.0,1.0,303500.0,1.4,140.0,1.0,1.0,203.0,1.0
max,95.0,1.0,7861.0,1.0,80.0,1.0,850000.0,9.4,148.0,1.0,1.0,285.0,1.0


In [9]:
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [10]:
# TODO: Put your automl settings here
automl_settings = {
    "compute_target":compute_cluster,
    "task": "classification",
    "training_data": dataset,
    "label_column_name": "DEATH_EVENT",   
    "enable_early_stopping": True,
    "featurization": "auto",
    "n_cross_validations": 5,
    "debug_log": "automl_errors.log",
    "experiment_timeout_hours": 1.0,
    "max_concurrent_iterations": 5,
    "primary_metric" : 'accuracy'
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(**automl_settings)

In [11]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [12]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [13]:
remote_run.wait_for_completion()

{'runId': 'AutoML_010da3bb-80a6-4657-9973-7eb391ad867f',
 'target': 'capstone-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-03-21T11:48:11.594132Z',
 'endTimeUtc': '2021-03-21T12:07:56.178697Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'capstone-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"8c0c8719-1de1-4ad9-8154-c956c3a65d67\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"UI/03-21-2021_114056_UTC/heart_failure_clinical_records_dataset.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"aml-quickstarts-141035\\\\\\", \\\\\\"subscription\\\\\\": \\\\\

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [14]:
automl_run, automl_model = remote_run.get_output()

In [15]:
best_model = "Best AutoML Run Model : {}".format(automl_model)
best_algorithm = "Best AutoML Run Algorithm : {} ".format(automl_run.properties["run_algorithm"])
accuracy = "Best AutoML Run Accuracy : {} ".format(automl_run.properties["score"])

In [16]:
print( best_algorithm, best_model, accuracy, sep='\n\n')

Best AutoML Run Algorithm : VotingEnsemble 

Best AutoML Run Model : Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                        coef0=0.0,
                                                                                        decision_function_shape='ovr',
                                                                                        degree=3,
       

In [17]:
#TODO: Save the best model

joblib.dump(value=automl_model, filename='automl_model.joblib')

['automl_model.joblib']

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [18]:
registered_model = remote_run.register_model(model_name='automl_model')

In [19]:
env = Environment.get(ws, "AzureML-AutoML")

inference_config = InferenceConfig(entry_script='automl_score.py',
                                   environment=env)

In [20]:
deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                                       memory_gb = 1,
                                                       enable_app_insights=True,
                                                       description="Heart Failure Prediction Webservice")

In [21]:
service = Model.deploy(workspace = ws,
                       name = "aciservice", 
                       models = [registered_model], 
                       inference_config = inference_config, 
                       deployment_config = deployment_config)

In [22]:
service.wait_for_deployment(show_output = True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-03-21 12:08:29+00:00 Creating Container Registry if not exists.
2021-03-21 12:08:29+00:00 Registering the environment.
2021-03-21 12:08:30+00:00 Use the existing image.
2021-03-21 12:08:30+00:00 Generating deployment configuration.
2021-03-21 12:08:30+00:00 Submitting deployment to compute.
2021-03-21 12:08:34+00:00 Checking the status of deployment aciservice..
2021-03-21 12:11:24+00:00 Checking the status of inference endpoint aciservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [23]:
service.state

'Healthy'

In [24]:
scoring_uri = service.scoring_uri
scoring_uri

'http://5e680752-424a-4ff2-89ea-c6608eb732fd.southcentralus.azurecontainer.io/score'

TODO: In the cell below, send a request to the web service you deployed to test it.

In [25]:

data = {
    "data": [
        {
            'age': 50,
            'anaemia': 0,
            'creatinine_phosphokinase': 90 ,
            'diabetes': 1,
            'ejection_fraction': 20,
            'high_blood_pressure': 1,
            'platelets': 230000,
            'serum_creatinine': 1.6,
            'serum_sodium': 120,
            'sex': 0,
            'smoking': 1,
            'time': 7
        },
        {
            'age': 70,
            'anaemia': 1,
            'creatinine_phosphokinase': 110,
            'diabetes': 0,
            'ejection_fraction': 25,
            'high_blood_pressure': 0,
            'platelets': 210000,
            'serum_creatinine': 1.8,
            'serum_sodium': 142,
            'sex': 1,
            'smoking': 0,
            'time': 8
        }
    ]
}
# Convert to JSON string
request_data = json.dumps(data)

# Set the content type
headers = {'Content-Type': 'application/json'}

# Make the request and display the response
response = requests.post(scoring_uri, request_data, headers=headers)

In [26]:
response.json()

[1, 1]

TODO: In the cell below, print the logs of the web service and delete the service

In [27]:
service.update(enable_app_insights=True)

In [28]:
service.get_logs()



In [29]:
service.delete()

In [30]:
compute_cluster.delete()