# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Experiment, Dataset
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
import joblib

## Dataset

The dataset used is the heart failure prediction dataset available at Kaggle.com.
Each entry of the dataset conatins information (features) about individual. The task is binary classification; to predict if individual is going to have heart failure or not.

In [2]:
ws = Workspace.from_config()
experiment_name = 'AutoML_experiment'
experiment = Experiment(ws, experiment_name)

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code R8RNVEQRL to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.


In [3]:
from azureml.core.compute import ComputeTarget, AmlCompute

# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.

from azureml.core.compute_target import ComputeTargetException 

# Choose a name for your CPU cluster 
cpu_cluster_name = "my-cluster"

# Verify that cluster does not exist already 
try: 
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.') 
except ComputeTargetException: 
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', 
                                                           max_nodes=4) 
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config) 

cpu_cluster.wait_for_completion(show_output=True)

Creating...
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [4]:
dataset = Dataset.get_by_name(ws, name='Heart-failure-prediction') 
dataset.to_pandas_dataframe() 

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.00,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.00,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.00,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.00,2.7,116,0,0,8,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
294,62.0,0,61,1,38,1,155000.00,1.1,143,1,1,270,0
295,55.0,0,1820,0,38,0,270000.00,1.2,139,0,0,271,0
296,45.0,0,2060,1,60,0,742000.00,0.8,138,0,0,278,0
297,45.0,0,2413,0,38,0,140000.00,1.4,140,1,1,280,0


## AutoML Configuration

### AutoMLSettings
Due to the available resources (time and space), the auto ml is limited with timeout and maximum allowed parallel computations. The used metric for post-thresholding is  AUC_weighted which optimizes better for small datasets.

### AutoMLConfig
Since the task is classification, we need to provide the type, the dataset, and the labeled column (DEATH_EVENT). Early stopping is enabled to save time as well.

In [5]:
automl_settings = {
    "experiment_timeout_minutes": 20, 
    "max_concurrent_iterations": 5, 
    "primary_metric" : 'AUC_weighted' 
}

automl_config = AutoMLConfig(task="classification", 
                             training_data=dataset,
                             compute_target=cpu_cluster, 
                             label_column_name="DEATH_EVENT",    
                             path=".", 
                             enable_early_stopping=True, 
                             featurization='auto', 
                             debug_log="automl_errors.log", 
                             **automl_settings)

In [6]:
# Submit experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
AutoML_experiment,AutoML_27cfb90e-112a-4c73-81d0-d49a4d9e1b6e,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details


In [7]:
RunDetails(remote_run).show() 

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [8]:
remote_run.wait_for_completion() 

{'runId': 'AutoML_27cfb90e-112a-4c73-81d0-d49a4d9e1b6e',
 'target': 'my-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-04-17T07:44:44.803668Z',
 'endTimeUtc': '2021-04-17T08:06:15.703661Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'AUC_weighted',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': None,
  'target': 'my-cluster',
  'DataPrepJsonString': '{\\"training_data\\": {\\"datasetId\\": \\"29e3124d-f97e-470a-be95-d2f4f1b33076\\"}, \\"datasets\\": 0}',
  'EnableSubsampling': None,
  'runTemplate': 'AutoML',
  'azureml.runsource': 'automl',
  'display_task_type': 'classification',
  'dependencies_versions': '{"azureml-widgets": "1.26.0", "azureml-train": "1.26.0", "azureml-train-restclients-hyperdrive": "1.26.0", "azureml-train-core": "1.26.0", "azureml-train-automl": "1.26.0", "azureml-train-automl-runtime": "1.26.0", "azureml-train-automl-client": "1.26.0", "

## Best Model

In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [9]:
best_run, fitted_model = remote_run.get_output() 
print (best_run) 
print (fitted_model) 

Run(Experiment: AutoML_experiment,
Id: AutoML_27cfb90e-112a-4c73-81d0-d49a4d9e1b6e_38,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('prefittedsoftvotingclassifier',...
                                                                                                reg_lambda=0.3157894736842105,
                                                                                                silent=True,
                         

In [10]:
#Save the best model
joblib.dump(fitted_model, "best_model_auto_ml.model") 

['best_model_auto_ml.model']

## Model Deployment

The auto ML provided the best mnodel compared to hyperdrive hence we deploy it

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [11]:
from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

model = Model.register(workspace=ws,
                       model_name='my-autoML-model',                # Name of the registered model in your workspace.
                       model_path='./best_model_auto_ml.model',     # Local file to upload and register as a model.
                       resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5),
                       description='VotingEnsemble') # ,
                       # tags={'area': 'diabetes', 'type': 'classification'})

print('Name:', model.name)
print('Version:', model.version)

Registering model my-autoML-model
Name: my-autoML-model
Version: 1


In [12]:
from azureml.core.model import InferenceConfig
from azureml.core.webservice import AciWebservice
from azureml.core import Environment

environment = Environment.get(workspace=ws, name="AzureML-AutoML")
service_name = 'heart-failure-prediction'
inference_config = InferenceConfig(entry_script='score.py', environment=environment)
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service = Model.deploy(workspace=ws,
                      name=service_name,
                      models=[model],
                      inference_config=inference_config,
                      deployment_config=aci_config,
                      overwrite=True)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-17 08:10:12+00:00 Creating Container Registry if not exists..
2021-04-17 08:10:28+00:00 Registering the environment..
2021-04-17 08:10:29+00:00 Use the existing image.
2021-04-17 08:10:29+00:00 Generating deployment configuration.
2021-04-17 08:10:30+00:00 Submitting deployment to compute..
2021-04-17 08:10:34+00:00 Checking the status of deployment heart-failure-prediction..
2021-04-17 08:14:57+00:00 Checking the status of inference endpoint heart-failure-prediction.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In the cell below, send a request to the web service you deployed to test it.

In [13]:
import requests
import json

# URL for the web service, should be similar to:
# 'http://8530a665-66f3-49c8-a953-b82a2d312917.eastus.azurecontainer.io/score'
scoring_uri = service.scoring_uri

# If the service is authenticated, set the key or token
key = ''

# Two sets of data to score, so we get two results back
data = {"data":
        [
          {
            "age": 53,
            "anaemia": 0,
            "creatinine_phosphokinase": 63,
            "diabetes": 1,
            "ejection_fraction": 60,
            "high_blood_pressure": 0,
            "platelets": 368000,
            "serum_creatinine": 0.8,
            "serum_sodium": 137,
            "sex": 0,
            "smoking": 0,
            "time": 16,
          },
          {
            "age": 35,
            "anaemia": 0,
            "creatinine_phosphokinase": 500,
            "diabetes": 0,
            "ejection_fraction": 30,
            "high_blood_pressure": 0,
            "platelets": 280000,
            "serum_creatinine": 2.1,
            "serum_sodium": 150,
            "sex": 0,
            "smoking": 0,
            "time": 10,
          },
      ]
    }
# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
# headers['Authorization'] = f'Bearer {key}'

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print("Request: " + str(data['data'][0]))
print("Response: Death_event = %s"%("True" if resp.json()[0]==1 else "False"))
print()
print("Request: " + str(data['data'][1]))
print("Response: Death_event = %s"%("True" if resp.json()[1]==1 else "False"))

Request: {'age': 53, 'anaemia': 0, 'creatinine_phosphokinase': 63, 'diabetes': 1, 'ejection_fraction': 60, 'high_blood_pressure': 0, 'platelets': 368000, 'serum_creatinine': 0.8, 'serum_sodium': 137, 'sex': 0, 'smoking': 0, 'time': 16}
Response: Death_event = False

Request: {'age': 35, 'anaemia': 0, 'creatinine_phosphokinase': 500, 'diabetes': 0, 'ejection_fraction': 30, 'high_blood_pressure': 0, 'platelets': 280000, 'serum_creatinine': 2.1, 'serum_sodium': 150, 'sex': 0, 'smoking': 0, 'time': 10}
Response: Death_event = True


In the cell below, print the logs of the web service and delete the service

In [14]:
print(service.get_logs(num_lines=20))

service.delete()

AmlCompute.delete(cpu_cluster)

2021-04-17 08:14:55,445 | root | INFO | Starting up app insights client
2021-04-17 08:14:55,449 | root | INFO | Starting up request id generator
2021-04-17 08:14:55,450 | root | INFO | Starting up app insight hooks
2021-04-17 08:14:55,450 | root | INFO | Invoking user's init function
Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception cannot import name 'RunType'.
2021-04-17 08:15:01,005 | root | INFO | Users's init has completed successfully
2021-04-17 08:15:01,014 | root | INFO | Skipping middleware: dbg_model_info as it's not enabled.
2021-04-17 08:15:01,015 | root | INFO | Skipping middleware: dbg_resource_usage as it's not enabled.
2021-04-17 08:15:01,018 | root | INFO | Scoring timeout is found from os.environ: 60000 ms
2021-04-17 08:15:04,823 | root | INFO | Swagger file not present
2021-04-17 08:15:04,823 | root | INFO | 404
127.0.0.1 - - [17/Apr/2021:08:15:04 +0000] "GET /swagger.j