### Tuning hyperparameters

#### Introduction

Hyperparameter tuning is accomplished by training the multiple models, using the same algorithm and training data but different hyperparameter values.  The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.

In Azure Machine Learning, you achieve this through an experiment that consists of a **hyperdrive run**, which initiates a child run for each hyperparameter combination to be tested.

#### Search Space

The set of hyperparameter values tried during hyperparameter tuning is known as the **search space**. This space can hold discrete or continuous values: in the former, qnormal, quniform, qlognormal or qloguniforn can be choden, for the latter, normal, uniform, lognormal and loguniform can be sampled.

#### Sampling

There are multiple ways to perform the search:
- _Grid Search_ which computes the model for every possible combination of hyperparamenter. It can be used for discrete values.
- _Random sampling_  which can be exploited for continuous and discrete values. It randomly chooses a value for each hyperparameter.
- _Bayesian sampling_ which chooses a combination of hyperparameters that will result in improved performance from the previous selection. It can be used with discrete and continuous values. Moreover, an early-termination policy isn't provided.

#### Configuring early termination

If the sampling space is large, the hyperparameter task can take too much time to converge. So, sometimes it can be useful to define a maximum number of iterations, namely run childs, that can result in a huge number of runs without reaching a better combination among the ones that has already been tried.
So, a _early termination_ policy can be set which abandon runs that are unlikely to give a better result than previosuly runs. For this purpose, an _evaluation_interval_ is defined to check the value of metrics. Moreover, also a minimum number of initial itarations can be given.

There are multiple choices:
- _Bandit policy_: the algorithm abandons the fitting procedure if the target performance metrics underperforms the best resul by a specified margin.
- _Median stopping policy_: abandons runs when the target performance metric is worse than the median of the previous metrics.
- _Truncation selection policy_: cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.

To perform a hyperparamter tuning task by means of AzureML, you have to define a parameterized script, log your result and build an experiment which manage the child runs.

In [15]:
from azureml.core import Workspace

ws = Workspace.from_config()

In [22]:
data = ws.datasets.get("diabetes dataset")

In [17]:
import os

experiment_folder = './Script/diabetes_training-hyperdrive'
os.makedirs(experiment_folder, exist_ok = True)

In [18]:
from azureml.core import ComputeTarget

training_cluster = ComputeTarget(workspace=ws, name='aml-cluster')

In [19]:
from azureml.core import Environment

env = Environment.get(workspace=ws, name='experiment_env')

In [29]:
from azureml.core import Experiment, ScriptRunConfig
from azureml.train.hyperdrive import GridParameterSampling, HyperDriveConfig, PrimaryMetricGoal, choice
from azureml.widgets import RunDetails

script_config = ScriptRunConfig(source_directory='./Script',
                                script='9_Hyperparameter_Tuning_Script.py',
                                arguments=['--input-data', data.as_named_input('training_data')],
                                environment=env,
                                compute_target=training_cluster)

params = GridParameterSampling(
    {
        '--learning_rate': choice(0.01, 0.1, 1.0),
        '--n_estimators' : choice(10, 100)
    }
)

hyperdrive = HyperDriveConfig(run_config=script_config,
                          hyperparameter_sampling=params,
                          policy=None, # No early stopping policy
                          primary_metric_name='AUC', # Find the highest AUC metric
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                          max_total_runs=6, # Restict the experiment to 6 iterations
                          max_concurrent_runs=2) # Run up to 2 iterations in parallel


experiment = Experiment(workspace = ws, name = 'hyperparameter-tuning')
run = experiment.submit(config = hyperdrive)

RunDetails(run).show()
run.wait_for_completion()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

{'runId': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea',
 'target': 'aml-cluster',
 'status': 'Completed',
 'startTimeUtc': '2023-01-04T16:55:52.68081Z',
 'endTimeUtc': '2023-01-04T17:00:28.747047Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"AUC","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '70cd6820-f81c-41a0-8bd7-2ce69d0b2350',
  'user_agent': 'python/3.7.15 (Windows-10-10.0.19041-SP0) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.48.0',
  'space_size': '6',
  'score': '0.9881777700383103',
  'best_child_run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_3',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_ce051773-17ac-4187-be12-b4cca0d186ea_3'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'amlClientType': 'azureml-sdk-train',

In [30]:
# Best performing run.

for child_run in run.get_children_sorted_by_primary_metric():
    print(child_run)

{'run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_3', 'hyperparameters': '{"--learning_rate": 0.1, "--n_estimators": 100}', 'best_primary_metric': 0.9881777700383103, 'status': 'Completed'}
{'run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_5', 'hyperparameters': '{"--learning_rate": 1.0, "--n_estimators": 100}', 'best_primary_metric': 0.9875809741778196, 'status': 'Completed'}
{'run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_4', 'hyperparameters': '{"--learning_rate": 1.0, "--n_estimators": 10}', 'best_primary_metric': 0.9826070948803424, 'status': 'Completed'}
{'run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_1', 'hyperparameters': '{"--learning_rate": 0.01, "--n_estimators": 100}', 'best_primary_metric': 0.954292502114533, 'status': 'Completed'}
{'run_id': 'HD_ce051773-17ac-4187-be12-b4cca0d186ea_2', 'hyperparameters': '{"--learning_rate": 0.1, "--n_estimators": 10}', 'best_primary_metric': 0.9511871237374994, 'status': 'Completed'}
{'run_id': 'HD_ce051773-17ac-4187-be12-b4c

In [31]:
best_run = run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
script_arguments = best_run.get_details() ['runDefinition']['arguments']

In [32]:
print(best_run)

Run(Experiment: hyperparameter-tuning,
Id: HD_ce051773-17ac-4187-be12-b4cca0d186ea_3,
Type: azureml.scriptrun,
Status: Completed)


In [None]:
print(best_run_metrics)

In [None]:
# Only the best model is going to be registered in the output folder.

run.register_model(model_path = 'outputs/diabetes_model.pkl',
                   model_name = 'diabetes_model',
                   tags = {'Training context': 'Hyperdrive'},
                   properties={'AUC': best_run_metrics['AUC'], 'Accuracy': best_run_metrics['Accuracy']})