# Hyperparameter Tuning using HyperDrive

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [16]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
import os

import joblib

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.47.0


## Workspace

The config.json file is downloaded from Azure environment and has to be in the same folder in order for this cell to run.

In [17]:
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment

ws = Workspace.from_config()

print("Workspace name: ", ws.name)
print("Subscription id: ", ws.subscription_id)
print("Resource group: ", ws.resource_group, sep='\n')

Workspace name:  quick-starts-ws-218450
Subscription id:  9a7511b8-150f-4a58-8528-3e7d50216c31
Resource group: 
aml-quickstarts-218450


## Create an Azure ML experiment

I am creating an experiment named "hd_heart_failure_experiment" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

In [18]:
experiment_name = 'hd_heart_failure_experiment'
project_folder = './hyperparameter-tuning--project'

experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
hd_heart_failure_experiment,quick-starts-ws-218450,Link to Azure Machine Learning studio,Link to Documentation


## Create or Attach an AmlCompute cluster

we need to create a compute target.

In [19]:
# Choose a name for the cluster
cpu_cluster_name = 'cluster-cpu'

# Verify that cluster does not exist already

try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    print('Creating a new compute cluster...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='Standard_DS3_v2', min_nodes=1, max_nodes=4)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 1, 'targetNodeCount': 1, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 1, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2022-12-15T14:28:58.599000+00:00', 'errors': None, 'creationTime': '2022-12-15T14:27:57.533608+00:00', 'modifiedTime': '2022-12-15T14:28:04.463145+00:00', 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 1, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT1800S'}, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_DS3_V2'}


## Dataset

The dataset contains medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

I am using this data in order to predict the DEATH_EVENT i.e. whether or not the patient deceased during the follow-up period (boolean).

The dataset we will be using in this project is called Heart failure clinical records Data Set and is publicly available from UCI Machine Learning Repository.

In [20]:
# test to see if dataset is in store

key = "heart-failure"
description_text = "Heart failure survival prediction"


if key in ws.datasets.keys():
    dataset = ws.datasets[key]
    print('The Dataset was found')
else:
    # Create AML Dataset and register it into Workspace
    data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"
    dataset = Dataset.Tabular.from_delimited_files(data_url)
    #Register Dataset in Workspace
    dataset = dataset.register(workspace = ws,name = key,description = description_text)

df = dataset.to_pandas_dataframe()

The Dataset was found


In [21]:
print(df.head())
print(df.describe())

    age  anaemia  creatinine_phosphokinase  diabetes  ejection_fraction  \
0  75.0        0                       582         0                 20   
1  55.0        0                      7861         0                 38   
2  65.0        0                       146         0                 20   
3  50.0        1                       111         0                 20   
4  65.0        1                       160         1                 20   

   high_blood_pressure  platelets  serum_creatinine  serum_sodium  sex  \
0                    1  265000.00               1.9           130    1   
1                    0  263358.03               1.1           136    1   
2                    0  162000.00               1.3           129    1   
3                    0  210000.00               1.9           137    1   
4                    0  327000.00               2.7           116    0   

   smoking  time  DEATH_EVENT  
0        0     4            1  
1        0     6            1  
2       

## Hyperdrive Configuration

Early stopping policy - An early stopping policy is used to automatically terminate poorly performing runs thus improving computational efficiency. I chose the BanditPolicy which I specified as follows:

early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

where:

evaluation_interval: This is optional and represents the frequency for applying the policy. Each time the training script logs the primary metric counts as one interval.

slack_factor: The amount of slack allowed with respect to the best performing training run. This factor specifies the slack as a ratio.

Any run that doesn't fall within the slack factor or slack amount of the evaluation metric with respect to the best performing run will be terminated. This means that with this policy, the best performing runs will execute until they finish and this is the reason I chose it.

Parameter Sampler

I specify the parameter sampler using the parameters C and max_iter. I chose discrete values with choice for both parameters.

C is the Regularization while max_iter is the maximum number of iterations.

RandomParameterSampling is one of the choices available for the sampler and I chose it because it is the faster and supports early termination of low-performance runs. If budget is not an issue, it would be better to use GridParameterSampling to exhaustively search over the search space or BayesianParameterSampling to explore the hyperparameter space.

HyperDriveConfig

The configuration chosen is as follows:

hyperparameter_sampling - The hyperparameter sampling space as defined above.

primary_metric_name - The name of the primary metric reported by the experiment runs. In our case, it is Accuracy.

primary_metric_goal - I chose PrimaryMetricGoal.MAXIMIZE. This parameter determines that the primary metric is to be maximized when evaluating runs.

policy - It refers to the early termination policy that is specified above.

estimator - An estimator that will be called with sampled hyperparameters. In this case, I choose estimator while the other two options are run_config and pipeline. The estimator will be used with train.py file which does a very basic manipulation of the data.

max_total_runs=16 - The maximum total number of runs to create. This is the upper bound; there may be fewer runs when the sample space is smaller than this value. If both max_total_runs and max_duration_minutes are specified, the hyperparameter tuning experiment terminates when the first of these two thresholds is reached.

max_concurrent_runs=4 - The maximum number of runs to execute concurrently. If None, all runs are launched in parallel. The number of concurrent runs is gated on the resources available in the specified compute target. Hence, you need to ensure that the compute target has the available resources for the desired concurrency.


In [22]:
# early termination policy.

early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Create the different params that you will be using during training
param_sampling = RandomParameterSampling(
    {
        '--C' : choice(0.001,0.01,0.1,1,10,20,50,100,200,500,1000),
        '--max_iter': choice(50,100,200,300)
    }
)

if "training" not in os.listdir():
    os.mkdir("./training")
    
# Setup environment for your training run
sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

# Create your estimator and hyperdrive config
src = ScriptRunConfig(source_directory = ".",
            compute_target=compute_target,
            environment=sklearn_env,
            script="train.py")

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.

hyperdrive_config = HyperDriveConfig(hyperparameter_sampling=param_sampling, 
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     policy=early_termination_policy,
                                     run_config=src,
                                     max_total_runs=20,
                                     max_concurrent_runs=4)

In [23]:
# Start the HyperDrive run
hyperdrive_run = experiment.submit(config=hyperdrive_config, show_output=True)


## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [24]:
# Monitor HyperDrive runs 
# You can monitor the progress of the runs with the following Jupyter widget
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

In [25]:
hyperdrive_run.wait_for_completion(show_output=True)

RunId: HD_63c1e37a-6809-4676-9c34-d5cfa6294de1
Web View: https://ml.azure.com/runs/HD_63c1e37a-6809-4676-9c34-d5cfa6294de1?wsid=/subscriptions/9a7511b8-150f-4a58-8528-3e7d50216c31/resourcegroups/aml-quickstarts-218450/workspaces/quick-starts-ws-218450&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

[2022-12-15T14:33:57.194557][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2022-12-15T14:33:58.5014882Z][SCHEDULER][INFO]Scheduling job, id='HD_63c1e37a-6809-4676-9c34-d5cfa6294de1_0' 
[2022-12-15T14:33:58.6301468Z][SCHEDULER][INFO]Scheduling job, id='HD_63c1e37a-6809-4676-9c34-d5cfa6294de1_1' 
[2022-12-15T14:33:58.7366989Z][SCHEDULER][INFO]Scheduling job, id='HD_63c1e37a-6809-4676-9c34-d5cfa6294de1_2' 
[2022-12-15T14:33:58.846784][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2022-12-15T14:33:58.8914893Z][SCHEDULER][INFO]Scheduling job, id='HD_63c1e37a-6809-4676-9c34-d5cfa6294d

ActivityFailedException: ActivityFailedException:
	Message: Activity Failed:
{
    "error": {
        "code": "UserError",
        "message": "{\"NonCompliant\":\"Process '/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/bin/python' exited with code 1 and error message 'Execution failed. Process exited with status code 1. Error: Traceback (most recent call last):\\n  File \\\"train.py\\\", line 15, in <module>\\n    ds = pd.read_csv('./heart_failure_clinical_records_dataset.csv')\\n  File \\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\", line 688, in read_csv\\n    return _read(filepath_or_buffer, kwds)\\n  File \\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\", line 454, in _read\\n    parser = TextFileReader(fp_or_buf, **kwds)\\n  File \\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\", line 948, in __init__\\n    self._make_engine(self.engine)\\n  File \\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\", line 1180, in _make_engine\\n    self._engine = CParserWrapper(self.f, **self.options)\\n  File \\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\", line 2010, in __init__\\n    self._reader = parsers.TextReader(src, **kwds)\\n  File \\\"pandas/_libs/parsers.pyx\\\", line 382, in pandas._libs.parsers.TextReader.__cinit__\\n  File \\\"pandas/_libs/parsers.pyx\\\", line 674, in pandas._libs.parsers.TextReader._setup_parser_source\\nFileNotFoundError: [Errno 2] No such file or directory: './heart_failure_clinical_records_dataset.csv'\\n\\n'. Please check the log file 'user_logs/std_log.txt' for more details.\"}\n{\n  \"code\": \"ExecutionFailed\",\n  \"target\": \"\",\n  \"category\": \"UserError\",\n  \"error_details\": [\n    {\n      \"key\": \"exit_codes\",\n      \"value\": \"1\"\n    }\n  ]\n} Marking the experiment as failed because initial child jobs have failed due to user error",
        "messageParameters": {},
        "details": []
    },
    "time": "0001-01-01T00:00:00.000Z"
}
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"UserError\",\n        \"message\": \"{\\\"NonCompliant\\\":\\\"Process '/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/bin/python' exited with code 1 and error message 'Execution failed. Process exited with status code 1. Error: Traceback (most recent call last):\\\\n  File \\\\\\\"train.py\\\\\\\", line 15, in <module>\\\\n    ds = pd.read_csv('./heart_failure_clinical_records_dataset.csv')\\\\n  File \\\\\\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\\\\\", line 688, in read_csv\\\\n    return _read(filepath_or_buffer, kwds)\\\\n  File \\\\\\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\\\\\", line 454, in _read\\\\n    parser = TextFileReader(fp_or_buf, **kwds)\\\\n  File \\\\\\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\\\\\", line 948, in __init__\\\\n    self._make_engine(self.engine)\\\\n  File \\\\\\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\\\\\", line 1180, in _make_engine\\\\n    self._engine = CParserWrapper(self.f, **self.options)\\\\n  File \\\\\\\"/azureml-envs/azureml_7eec2c8971b9410f92147a7e257297e7/lib/python3.6/site-packages/pandas/io/parsers.py\\\\\\\", line 2010, in __init__\\\\n    self._reader = parsers.TextReader(src, **kwds)\\\\n  File \\\\\\\"pandas/_libs/parsers.pyx\\\\\\\", line 382, in pandas._libs.parsers.TextReader.__cinit__\\\\n  File \\\\\\\"pandas/_libs/parsers.pyx\\\\\\\", line 674, in pandas._libs.parsers.TextReader._setup_parser_source\\\\nFileNotFoundError: [Errno 2] No such file or directory: './heart_failure_clinical_records_dataset.csv'\\\\n\\\\n'. Please check the log file 'user_logs/std_log.txt' for more details.\\\"}\\n{\\n  \\\"code\\\": \\\"ExecutionFailed\\\",\\n  \\\"target\\\": \\\"\\\",\\n  \\\"category\\\": \\\"UserError\\\",\\n  \\\"error_details\\\": [\\n    {\\n      \\\"key\\\": \\\"exit_codes\\\",\\n      \\\"value\\\": \\\"1\\\"\\n    }\\n  ]\\n} Marking the experiment as failed because initial child jobs have failed due to user error\",\n        \"messageParameters\": {},\n        \"details\": []\n    },\n    \"time\": \"0001-01-01T00:00:00.000Z\"\n}"
    }
}

In [None]:
hyperdrive_run.get_status()

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [None]:
import joblib

RunDetails(run).show()

# Get the best run and save the model from that run.

# get_best_run_by_primary_metric()
# Returns the best Run, or None if no child has the primary metric.
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

# get_metrics()

print('Best Run Id: ', best_run.id)

# Returns the metrics from all the runs that were launched by this HyperDriveRun.
print("Best run metrics :",best_run.get_metrics())

# get_details()
# Returns a dictionary with the details for the run
print("Best run details :",best_run.get_details())

# get_file_names()
# Returns a list of the files that are stored in association with the run.

print("Best run file names :",best_run.get_file_names())

In [None]:
best_fitted_model = best_run.download_file('outputs/model.pkl')

In [None]:
#Save the best model

best_run.register_model(model_name = "best_run_hyperdrive.pkl", model_path = './outputs/')

print(best_run)

In [None]:
# Download the model file

best_run.download_file('outputs/model.pkl', 'hyperdrive_model.pkl')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

