# Hyperparameter Tuning with HyperDrive

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
import os
import shutil
import joblib

## Dataset

### Source of the data set 
For this project we are using files from [Kaggle](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data). In this dataset are data on cardiovascular diseases (CVDs). Which are the number one cause of death worldwide, claiming the lives of an estimated 17 million people each year. This represents approximately 31% of all deaths worldwide.
Heart failure is one of the common events caused by CVDs. This dataset contains 12 characteristics that can be used to predict mortality from heart failure.
In order for people with cardiovascular disease or at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidaemia or established disease) to receive early detection and treatment, these datasets attempt to improve prediction.

### Content of the data set
The dataset contains 12 features that can be used to predict mortality from heart failure:
- age: Age of the patient
- amaemia: Decrease of red blood cells or hemoglobin
- creatinine_phosphokinase: Level of the CPK enzyme in the blood (mcg/L)
- diabetes: If the patient has diabetes
- ejection_fraction: Percentage of blood leaving the heart at each contraction
- high_blood_pressure: If the patient has hypertension
- platelets: Platelets in the blood (kiloplatelets/mL)
- serum_creatinine: Level of serum creatinine in the blood (mg/dL)
- serum_sodium: Level of serum sodium in the blood (mEq/L)
- sex: Woman or man (Gender at birth)
- smoking: patient smokes or not
- time: Follow-up period (days)

### Target
Our goal is to develop a machine learning algorithm that can detect whether a person is likely to die from heart failure. This will help in diagnosis and early prevention. For this, the above mentioned 12 features in the dataset are used to develop a model for the detection.

### Attention!
This is an experiment that was developed in the course of a test for the Udacity learning platform. Do not use this model in a medical environment or for acute indications. Always consult your doctor for medical questions or the medical emergency service in acute cases!

In [2]:
ws = Workspace.from_config()
experiment_name = 'hyperdrive_experiment_heart_failure'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-144621
Azure region: southcentralus
Subscription id: 610d6e37-4747-4a20-80eb-3aad70a55f43
Resource group: aml-quickstarts-144621


In [3]:
# Dfine compute claster name
compute_cluster_name= "comput-hyper"

#Check if compute cluster already exists

try:
    # Using exist cluster
    compute_cluster=ComputeTarget(workspace=ws, name=compute_cluster_name)
    print("Found existing cluster, use it")
except ComputeTargetException:
    #Create compute cluster
    print("Creating new cluster")
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_v2',max_nodes=10)
    compute_cluster = ComputeTarget.create(ws, compute_cluster_name, compute_config)

# Wait for loading
compute_cluster.wait_for_completion(show_output=True)

Creating new cluster
Creating...
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

The model used here is a logistic regression model that is trained with a custom script train.py. 
The dataset is fetched from [HERE](https://raw.githubusercontent.com/Petopp/Udacity_Final_Project/main/heart_failure_clinical_records_dataset.csv) as a dataset. The hyperparameters chosen for the scikit-learn model are regularisation strength (C) and maximum iterations (max_iter). The trained model is evaluated against 25% data selected from the original dataset. The remaining data is used to train the model.

Hyperparameter tuning with HyperDrive requires several steps: 
- define the parameter search space
- define a sampling method
- selecting a primary metric for optimisation 
- selecting an early stop policy.

The parameter sampling method used for this project is Random Sampling. It randomly selects the best hyperparameters for the model so that the entire search space does not need to be searched. The Random Sampling method saves time and is much faster than Grid Sampling and Bayesian Sampling, which are only recommended if you have a budget to explore the entire search space.

The early stop policy used in this project is the Bandit policy, which is based on a slack factor (in this case 0.1) and a scoring interval (in this case 1). This policy stops runs where the primary metric is not within the specified slip factor, compared to the run with the best performance. This will save time and resources as runs that may not produce good results would be terminated early.


In [4]:
# Create an early termination policy.
early_termination_policy = BanditPolicy(
    evaluation_interval=1,
    slack_factor= 0.1
)

# Create the different params that will be needed during training
param_sampling = RandomParameterSampling(
    {
        "--C": uniform(0.001,100),
        "--max_iter": choice(50, 90, 125, 170)
    }
)

if "training" not in os.listdir():
    os.mkdir("./training")
    
script_folder = './training'
os.makedirs(script_folder, exist_ok=True)

shutil.copy('./train.py', script_folder)

# Create a SKLearn estimator for use with train.py
estimator = SKLearn(
    source_directory= script_folder,
    compute_target= compute_cluster,
    entry_script= "train.py",
    vm_size="Standard_D2_V2",
    vm_priority="lowpriority"
)

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator,
    hyperparameter_sampling= param_sampling,
    policy= early_termination_policy,
    primary_metric_name= "Accuracy",
    primary_metric_goal= PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=5,
    max_concurrent_runs=5
)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.
'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


In [5]:
# Submit the experiment
hyperdrive_run=experiment.submit(config=hyperdrive_run_config)



## Run Details

In [6]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [7]:
hyperdrive_run.wait_for_completion(show_output= True)

RunId: HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76
Web View: https://ml.azure.com/runs/HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76?wsid=/subscriptions/610d6e37-4747-4a20-80eb-3aad70a55f43/resourcegroups/aml-quickstarts-144621/workspaces/quick-starts-ws-144621&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-05-12T07:48:18.051691][GENERATOR][INFO]Trying to sample '5' jobs from the hyperparameter space<END>\n""<START>[2021-05-12T07:48:18.256019][GENERATOR][INFO]Successfully sampled '5' jobs, they will soon be submitted to the execution target.<END>\n""<START>[2021-05-12T07:48:17.619740][API][INFO]Experiment created<END>\n"

Execution Summary
RunId: HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76
Web View: https://ml.azure.com/runs/HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76?wsid=/subscriptions/610d6e37-4747-4a20-80eb-3aad70a55f43/resourcegroups/aml-quickstarts-144621/workspaces/quick-starts-ws-144621&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254



{'runId': 'HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76',
 'target': 'comput-hyper',
 'status': 'Completed',
 'startTimeUtc': '2021-05-12T07:48:17.325863Z',
 'endTimeUtc': '2021-05-12T07:53:49.636105Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '97423971-e158-463d-abd2-88868329b3ba',
  'score': '0.84',
  'best_child_run_id': 'HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76_3',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg144621.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_20b1fe66-bbfd-42e9-a6c0-d9bbb6ceef76/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=%2FnNOsh%2FYlpeMfrViBN1pWgeXAH76a0uT9gF%2BiUgG%2BdU%3D&st=2021-05-12T07%3A43%3A51Z&se=2021-05-12T15%3A53%3A51Z&sp=r'},
 'submittedBy': 'ODL_User 144621'}

## Best Model

In [8]:
# Get best model and diplay all details
best_run= hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics=best_run.get_metrics()
print(best_run.get_details()['runDefinition']['arguments'])
print(best_run.get_file_names())
print('Best Run Accuracy:',best_run_metrics['Accuracy'])

['--C', '97.2861169940756', '--max_iter', '125']
['azureml-logs/55_azureml-execution-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt', 'azureml-logs/65_job_prep-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/106_azureml.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/model.joblib']
Best Run Accuracy: 0.84


In [9]:
best_run.get_file_names()

['azureml-logs/55_azureml-execution-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt',
 'azureml-logs/65_job_prep-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_b3d8a370fdab6acc496b1fa398220948b9ae8dd605d8df21bbd0582f1cc744bc_d.txt',
 'azureml-logs/process_info.json',
 'azureml-logs/process_status.json',
 'logs/azureml/106_azureml.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/model.joblib']

In [10]:
# Save the best model
model=best_run.register_model(model_name='heart-failure-sklearn', model_path='outputs/model.joblib')
best_run.download_file('/outputs/model.joblib', 'hyperdrive_model.joblib')

In [12]:
# Cleaning
compute_cluster.delete()


Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

