# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
from azureml.core import Workspace, Environment, Experiment, ScriptRunConfig
from azureml.core.dataset import Dataset
from azureml.core.model import Model
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.sklearn import SKLearn
from azureml.train.estimator import Estimator
from azureml.widgets import RunDetails

## Dataset

### Overview
I'll be usingthe datset that was suggested in starter file:
[Heart Failure](https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records#). 
Using HyperDrive to tune some parameters of a Logit model, observations will be classified as either in high risk of dieing in the follow-up period (Death Event = 1) or not.

In [2]:
ws = Workspace.from_config()
experiment_name = 'heart-failure-experiment'

experiment=Experiment(ws, experiment_name)


# check if data already available
key = 'heart-failure'
description_text = 'heart failure data. See https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records# for more information.'
found = False

if key in ws.datasets.keys(): 
    found = True
    dataset = ws.datasets[key] 

if not found:
    # register the dataset
    data = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv'
    dataset = Dataset.Tabular.from_delimited_files(data)
    dataset = dataset.register(workspace=ws,
                               name=key,
                               description=description_text)


df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


### Create compute cluster

In [3]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
amlcompute_cluster_name = "cpu-cluster"

# check, if a cluster exists and does not need to be provisioned
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=10)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

# compute_target.wait_for_completion(show_output=True, min_node_count = 2, timeout_in_minutes = 10)

Found existing cluster, use it.


## Hyperdrive Configuration

As stated above, we will classify individuals as being likely to die in the next period by employing SKLearn's Logit model.
Additionally,hyperparameters of the model will be tuned using Azure's HyperDrive:

Specifically, 2 hyperparameters will be tuned - C & max_iter:
- "C":         regulaization (the lower, the higher), and
- "max_iter":  maximum number of iteration rounds permitted.

The sampling of these 2 hyperparameters will happen randomly.
Anticipating a imbalanced classes, the **weighted AUC** will be used as a metric for tuning.

To save on resources, an early stopping policy is furthermore incorporated.
Specifically, the Bandit Policy will first be applied after 19 iterations.
Using a slack factor of 0.1 every run which shows a scoring metric that is below *BEST AFTER 19 RUNS* * (1/(1+0.1)) will be terminated.
See e.g. https://azure.github.io/azureml-sdk-for-r/reference/bandit_policy.html  for more info. 

The HyperDrive configuration itself needs to be set up correctly.
Among others, it is important to:
- provide a training script,
- provide a compute target,
- provide an environment file for the train script, and
- provide an accuracy metric against which the termination policy will be evaluated.

In [4]:
# conda dependencies as .yml for the train script
import os
import shutil

project_folder = './sklearn-heart-failure'
os.makedirs(project_folder, exist_ok=True)
shutil.copy('train_sklearn.py', project_folder)
# shutil.copy('config.json', project_folder)
            
sklearn_env = Environment.from_conda_specification(name = 'sklearn-env', file_path = './conda_dependencies.yml')

In [5]:
# Create an early termination policy. 
early_termination_policy = BanditPolicy(slack_factor = 0.1,
                                        delay_evaluation = 5)

# Create the different params that you will be using during training
param_sampling = RandomParameterSampling( {
    '--C': uniform(0.001, 1.0),
    '--max_iter': choice(100, 1000, 2000)
} )

# Create your estimator and hyperdrive config
src = ScriptRunConfig(source_directory=project_folder,
                      script='train_sklearn.py',
                      compute_target=compute_target,
                      environment=sklearn_env)

# https://github.com/Azure/MachineLearningNotebooks/issues/1012
hyperdrive_config = HyperDriveConfig(run_config=src,
                                     hyperparameter_sampling=param_sampling,
                                     policy=early_termination_policy,
                                     primary_metric_name='AUC_weighted',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=12)

In [6]:
# Submit the experiment
hyperdrive_run = experiment.submit(hyperdrive_config)

## Run Details

To get details on the run, you can have a look at the `RunDetails` widget.

In [7]:
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

After the run has finished, you can get the best model and have a look at the specifica of the same.

In [10]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()

In [11]:
print(best_run.get_details()['runId'])

HD_41ec6d6c-1130-44ab-a1ee-8a0d1b4e2e8a_8


In [12]:
print(best_run.get_details()['runDefinition']['arguments'])

['--C', '0.8980307209925704', '--max_iter', '100']


In [13]:
# Model Metrics
print(best_run.get_metrics())

{'Regularization Strength': 0.8980307209925704, 'Max iterations': 100, 'Accuracy': 0.8333333333333334, 'F(beta)': 0.8151455026455028, 'AUC_weighted': 0.93875}


In [14]:
best_run.get_details()

{'runId': 'HD_41ec6d6c-1130-44ab-a1ee-8a0d1b4e2e8a_8',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-12-29T15:17:37.968228Z',
 'endTimeUtc': '2021-12-29T15:18:08.977184Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '19f10f4d-b745-4036-b0d0-9006d501390b',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train_sklearn.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--C', '0.8980307209925704', '--max_iter', '100'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpu-cluster',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'datacaches': [],
  'jobName': None,
  'maxRunDurationSeconds': 2592000,
  'nodeCount': 1,
  'instanceTypes': [],
  'priority': None,
  'credentialPassthrough': False

In [15]:
print(best_run.get_file_names())

['logs/azureml/17_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/dataprep/rslex.log', 'outputs/model.joblib', 'system_logs/cs_capability/cs-capability.log', 'system_logs/hosttools_capability/hosttools-capability.log', 'system_logs/lifecycler/execution-wrapper.log', 'system_logs/lifecycler/lifecycler.log', 'system_logs/lifecycler/vm-bootstrapper.log', 'user_logs/std_log.txt']


In [16]:
# Save the best model 
best_run.download_file('outputs/model.joblib', 'sklearn-heart-failure/model.joblib')

In [17]:
# Register the best model
model = best_run.register_model(model_name='sklearn-heart-failure',
                                model_path='outputs/model.joblib',
                                model_framework=Model.Framework.SCIKITLEARN)

## Model Deployment

All thats left to do is deploy the model.
After successful deployment, we can trigger the endpoint using a request.

In [18]:
service_name = 'hd-service'
service = Model.deploy(ws, service_name, [model])
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-12-29 15:25:33+00:00 Creating Container Registry if not exists.
2021-12-29 15:25:33+00:00 Registering the environment.
2021-12-29 15:25:35+00:00 Uploading autogenerated assets for no-code-deployment.
2021-12-29 15:25:36+00:00 Building image..
2021-12-29 15:29:02+00:00 Generating deployment configuration.
2021-12-29 15:29:03+00:00 Submitting deployment to compute..
2021-12-29 15:29:07+00:00 Checking the status of deployment hd-service..
2021-12-29 15:31:55+00:00 Checking the status of inference endpoint hd-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


In [19]:
print(service.state)

Healthy


Send a request to the web service you deployed to test it.

In [20]:
import json

data = {"data":
        [
            {
            'age': 75,
            'anaemia': 1,
            'creatinine_phosphokinase': 582,
            'diabetes': 0,
            'ejection_fraction': 17,
            'high_blood_pressure': 0,
            'platelets': 265000,
            'serum_creatinine': 3,
            'serum_sodium': 127,
            'sex': 1,
            'smoking': 0,
            'time': 5
            },
            {
            'age': 75,
            'anaemia': 1,
            'creatinine_phosphokinase': 582,
            'diabetes': 0,
            'ejection_fraction': 17,
            'high_blood_pressure': 0,
            'platelets': 265000,
            'serum_creatinine': 3,
            'serum_sodium': 127,
            'sex': 0,
            'smoking': 0,
            'time': 5
            }
      ]
    }

# Convert to JSON string
input_data = json.dumps(data)

In [23]:
output = service.run(input_data)

Print the logs of the web service and delete the service

In [24]:
print(output)

{'predict_proba': [[0.050291167184298824, 0.9497088328157012], [0.0502688219429116, 0.9497311780570884]]}


Above, we see the predicted probabilities of surviving vs dieing in the next period for two individuals as given by the estimated LOGIT model.

The only difference between the two tested subjects is sex.
However, being female or male seems to be of very little effect as both individuals have almost the exact same chance of dieing next period: with a probability of roughly 95% a fatal incident is highly likely.

In [25]:
service.delete()

In [26]:
print(service.state)

Deleting
