# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [24]:
from azureml.core import Workspace, Experiment, Dataset, Environment, ScriptRunConfig

from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.parameter_expressions import uniform, choice

from azureml.widgets import RunDetails

import pandas as pd
import joblib

## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()
experiment_name = 'heartfailure-hyperdrive-exp'

experiment=Experiment(ws, experiment_name)

dataset = Dataset.get_by_name(ws, name="heartfailure-dds")

In [3]:
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


## Create or Attach an AmlCompute cluster

In [4]:
# NOTE: Copied from Project 2
# Choose a name for your CPU cluster
amlcompute_cluster_name = "hyperdrive-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 5)
# For a more detailed view of current AmlCompute status, use get_status().

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

Model:
- I used the custom model from Project 1 again on this Project to save some time

Hyperparameters:  
- Random Parameter Sampling has the advantage of providing a quick sweep through the parameter space with results being almost as good as with more extensive sweeps.

Termination Policy: 
- The Bandit Policy makes sure that any runs which look are significantly worse than what we already have are terminated to save time and resources

Config Settings:
- I decided to Maximise Accuracy for this model

In [5]:
run = experiment.start_logging()
experiment

Name,Workspace,Report Page,Docs Page
heartfailure-hyperdrive-exp,quick-starts-ws-214673,Link to Azure Machine Learning studio,Link to Documentation


In [6]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling( {
        "--C": uniform(0.1, 5.0),           # regularization
        "--max_iter": choice(20, 50, 80)    # max number of iterations used
    }
)

#TODO: Create your estimator and hyperdrive config
# SKLearn estimator is deprecated. I will use ScriptRunConfig as proposed by the error message
# estimator = SKLearn()

if "training" not in os.listdir():
    os.mkdir("./training")

# Setup environment for your training run
sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='./models/env.yml')

src = ScriptRunConfig(source_directory="./training",
                      script='training.py',
                      compute_target=compute_target,
                      environment=sklearn_env)

hyperdrive_run_config = HyperDriveConfig(run_config=src,
                                     hyperparameter_sampling=param_sampling,
                                     policy=early_termination_policy,
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=20,
                                     max_concurrent_runs=4,
                                     )

In [7]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config, show_output=True)

## Run Details

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [10]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d
Web View: https://ml.azure.com/runs/HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d?wsid=/subscriptions/d7f39349-a66b-446e-aba6-0053c2cf1c11/resourcegroups/aml-quickstarts-214673/workspaces/quick-starts-ws-214673&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Execution Summary
RunId: HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d
Web View: https://ml.azure.com/runs/HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d?wsid=/subscriptions/d7f39349-a66b-446e-aba6-0053c2cf1c11/resourcegroups/aml-quickstarts-214673/workspaces/quick-starts-ws-214673&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254



{'runId': 'HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d',
 'target': 'hyperdrive-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-11-08T18:07:56.300492Z',
 'endTimeUtc': '2022-11-08T18:13:36.892263Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"Accuracy","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '3fc06814-c507-417a-8a31-97fb4632777b',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1017-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.44.0',
  'space_size': 'infinite_space_size',
  'score': '0.7666666666666667',
  'best_child_run_id': 'HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_8',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_8'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'teleme

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [15]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
print(best_run)
print(best_run_metrics)

Run(Experiment: heartfailure-hyperdrive-exp,
Id: HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_8,
Type: azureml.scriptrun,
Status: Completed)
{'Accuracy': 0.7666666666666667}


In [19]:
parameter_values = best_run.get_details()['runDefinition']['arguments']

print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['Accuracy'])
print(parameter_values)

Best Run Id:  HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_8

 Accuracy: 0.7666666666666667
['--C', '1.1528233412444724', '--max_iter', '80']


In [26]:
#TODO: Save the best model
best_model = best_run.register_model(model_name="HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_9", model_path="outputs/heart_failure_hyperdrive.pkl")

print(best_model)

Model(workspace=Workspace.create(name='quick-starts-ws-214673', subscription_id='d7f39349-a66b-446e-aba6-0053c2cf1c11', resource_group='aml-quickstarts-214673'), name=HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_9, id=HD_6fccd9ee-3e7e-43a7-8d23-2df8674fad7d_9:4, version=4, tags={}, properties={})


## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model
#### I already deployed the autoML model

In [None]:
# best_run.register_model(model_name="hyper-heart-mdl", model_path="models/hyper-model.joblib")

**Submission Checklist**
- I have registered the model.


