# Train and hyperparameter tune on Heart Failure Dataset

Importing dependencies

In [1]:
import azureml.core
from azureml.core import Workspace, Experiment, ScriptRunConfig, Environment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice
import os, shutil

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.26.0


## Dataset

### Overview

In this project, we are going to predict mortality due to heart failure using SKLearn Classifier. Heart failure is a common event caused by Cardiovascular diseases, and it occurs when the heart cannot pump enough blood to meet the needs of the body.

The [Heart Failure Prediction](https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records) dataset is used as the training data for this task. It comprises of 299 heart failure patients and 13 features, which report clinical, body, and lifestyle information.

The task here is to train a binary classification model that predict the target column DEATH_EVENT, which indicates if the patient died or survived before the end of the follow-up period, based on the information provided by the other 11 features (predictors). The time feature was dropped before training since we cannot get a time value for new patients after deployment. Prediction models based on these predictors, if accurate, can potentially be used to help hospitals in assessing the severity of patients with cardiovascular diseases.

## Initialize Workspace

Initialize a workspace object from persisted configuration. 

In [2]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code AZ4N8QE5J to authenticate.
You have logged in. Now let us find all the subscriptions to which you have access...
Interactive authentication successfully completed.
quick-starts-ws-142887
aml-quickstarts-142887
southcentralus
3d1a56d2-7c81-4118-9790-f85d1acf0c77


## Create an Azure ML experiment

Create an [Experiment](https://docs.microsoft.com/en-gb/azure/machine-learning/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace.

In [3]:
# Choose a name for the run history container in the workspace
experiment_name = 'hyperdrive-heart-failure'
experiment = Experiment(ws, experiment_name)

run = experiment.start_logging()

## Create or Attach an AmlCompute cluster

Create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training and hyperparameter tuning.

In [4]:
# choose a name for your cluster
# Compute name should contain only letters, digits, hyphen and should be 2-16 charachters long
cluster_name = "project-automl"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print(f'{cluster_name} exists already')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
    compute_target.wait_for_completion(show_output=True)
    
compute_targets = ws.compute_targets
for name, ct in compute_targets.items():
    print(name, ct.type, ct.provisioning_state)

Creating a new compute target...
Creating....
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
notebook142887 ComputeInstance Succeeded
project-automl AmlCompute Succeeded


## Hyperdrive Configuration

The model used here is the SKLearn inbuilt Support Vector Machines (SVMs) for classification since its capable of generating non-linear decision boundaries, and can achieve high accuracies. It is also more robust to outliers than Logistic Regression.

The hyperdrive settings include the following:
1. A bandit early termination policy is chosen based on slack factor,  it avoids premature termination of first 5 runs, and then subsequently terminates runs whose primary metric fall outside of the top 10%. This helps to stop the training process after it starts degrading the accuracy with increased iteration count, thereby improving computational efficiency.

2. The model uses Random Parameter Sampling for finding the inverse regularization strength and kernel type.

3. The primary metric for our algorithm is "AUC_weighted"; and the Hyperdrive configuration is created using estimator with the train.py script, hyperparameter sampler, and policy. 

In [5]:
# Create an early termination policy. This is not required if you are using Bayesian sampling.
# Specify a Policy
early_termination_policy = BanditPolicy(evaluation_interval=2, delay_evaluation=5, slack_factor=0.1)

param_sampling = RandomParameterSampling( {
        "--kernel": choice('linear', 'rbf', 'poly', 'sigmoid'),
        "--C": choice(0.001, 0.005, 0.01, 0.05, 0.1, 0.3, 0.7, 1.0, 1.3, 1.7,  2.0)
    }
)

# Create your estimator and hyperdrive config
env = Environment.from_pip_requirements(name='venv', file_path='./requirements.txt')

estimator = ScriptRunConfig(source_directory=".",
                            script='train.py',
                            compute_target=compute_target,
                            environment=env)
               
# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_run_config = HyperDriveConfig(run_config=estimator,
                                         hyperparameter_sampling=param_sampling,
                                         policy=early_termination_policy,
                                         primary_metric_name='AUC_weighted',
                                         primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                         max_total_runs=20,
                                         max_concurrent_runs=5)

In [6]:
# Submit your experiment
hyperdrive_run = experiment.submit(config=hyperdrive_run_config)

## Run Details

In [7]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_9abede9c-6208-44f3-97ce-a38cf7291817
Web View: https://ml.azure.com/runs/HD_9abede9c-6208-44f3-97ce-a38cf7291817?wsid=/subscriptions/3d1a56d2-7c81-4118-9790-f85d1acf0c77/resourcegroups/aml-quickstarts-142887/workspaces/quick-starts-ws-142887&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-04-16T22:53:37.229014][API][INFO]Experiment created<END>\n""<START>[2021-04-16T22:53:37.727677][GENERATOR][INFO]Trying to sample '5' jobs from the hyperparameter space<END>\n""<START>[2021-04-16T22:53:37.936920][GENERATOR][INFO]Successfully sampled '5' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-04-16T22:53:39.0811166Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>


## Best Model

In [11]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print('Best Run Id: ', best_run.id)
print('\n AUC_weighted:', best_run_metrics['AUC_weighted'])
print('\n Regularization Strength:', best_run_metrics['Regularization Strength:'])
print('\n Kernel:', best_run_metrics['Kernel:'])

Best Run Id:  HD_9abede9c-6208-44f3-97ce-a38cf7291817_3

 AUC_weighted: 0.8166666666666667

 Regularization Strength: 1.0

 Kernel: linear


In [12]:
best_run.get_details()

{'runId': 'HD_9abede9c-6208-44f3-97ce-a38cf7291817_3',
 'target': 'project-automl',
 'status': 'Completed',
 'startTimeUtc': '2021-04-16T23:01:33.676807Z',
 'endTimeUtc': '2021-04-16T23:17:57.540111Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '4733b825-2436-4f14-b952-270478231a40',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--C', '1', '--kernel', 'linear'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'project-automl',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'jobName': None,
  'maxRunDurationSeconds': 2592000,
  'nodeCount': 1,
  'priority': None,
  'credentialPassthrough': False,
  'identity': None,
  'environment': {'name': 'venv',
   'version': 'Autosav

In [13]:
print(best_run.get_file_names())

['azureml-logs/55_azureml-execution-tvmps_62fc0028e66d4ee1537ccc02157cd41e9e1703202e970fb3a8da6e3f6d82a68c_d.txt', 'azureml-logs/65_job_prep-tvmps_62fc0028e66d4ee1537ccc02157cd41e9e1703202e970fb3a8da6e3f6d82a68c_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_62fc0028e66d4ee1537ccc02157cd41e9e1703202e970fb3a8da6e3f6d82a68c_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/107_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/hyperdrive_1.0_linear']


In [14]:
# Register the model
best_run.register_model(model_path='outputs/', model_name=experiment_name+'-best-model',
                   tags={'Training context':'Parameterized SKLearn Estimator', 'type': 'Classification'},
                   properties={'AUC_weighted': best_run_metrics['AUC_weighted']},
                   description = 'Heart Failure Predictor')

Model(workspace=Workspace.create(name='quick-starts-ws-142887', subscription_id='3d1a56d2-7c81-4118-9790-f85d1acf0c77', resource_group='aml-quickstarts-142887'), name=hyperdrive-heart-failure-best-model, id=hyperdrive-heart-failure-best-model:1, version=1, tags={'Training context': 'Parameterized SKLearn Estimator', 'type': 'Classification'}, properties={'AUC_weighted': '0.8166666666666667'})

## Model Deployment
For model deployment, we have to deploy only one of the two models we trained.
We already got 0.9226 AUC value through VotingEnsemble using AutoML compared to 0.8167 in SVM through HyperDrive. So we are not deploying this model.

In [None]:
compute_target.delete()