## Hyperparameter tuning using HyperDrive

The AzureML SDK offers the HyperDriveConfig class , which allows you to 
perform hyperparameter tuning for your models, parallelizing the search for the best 
hyperparameter combination by performing model training and evaluation at each 
node of the compute cluster in parallel. HyperDriveConfig is a wrapper to the 
ScriptRunConfig class. This means that you need to pass in the run_config parameter the ScriptRunConfig
that you want to use to train your model. You also need to specify the metric that your 
code is logging and what your goal is for that metric. In the diabetes case, you are trying 
to minimize the NRMSE metric. You can then kick off a hyperparameter tuning process 
with the same submit method

Besides ScriptRunConfig, you will need to pass the hyperparameter sampling 
configuration that HyperDriveConfig will use. Hyperparameters can accept either 
discrete or continuous values :
    
    • A typical example of discrete values is integers or string values. For example, in the 
    TensorFlow framework, you can select the activation function to use by passing 
    a string value to the activation hyperparameter. These string values represent 
    the built-in activation functions that the TensorFlow framework supports. You can 
    select values like selu for the Scaled Exponential Linear Unit (SELU) or relu for
    the Rectified Linear Unit (ReLU).
    
    • A typical example of continuous values is float values. The alpha parameter in 
    the LassoLars model you have been training is a hyperparameter that accepts 
    float values.

The AzureML SDK offers 
a couple of functions that allow you to define the search space you are about to explore. 
These functions are part of the azureml.train.hyperdrive.parameter_
expressions module.
In the case of discrete hyperparameters, you can use the choice function, which 
allows you to specify the list of options the hyperparameter can take.

For example, 
you could have defined the search space for the discrete string values of the activation
hyperparameter you saw previously with the following script:
choice('selu','relu')

You can also define the probability distribution for the samples that you will be getting 
while you are exploring the search space. For example, if you want to provide an equal 
chance to all values, you will use a uniform distribution. On the other hand, you can 
use a normal distribution to focus the search area on the center of the search space. The 
AzureML SDK offers a couple of methods you can use, such as uniform(low, high), 
loguniform(low, high), normal(μ,σ), and lognormal(μ, σ). You can use 
the q prefixed equivalents for discrete values, such as quniform(low, high, q), 
qloguniform(low, high, q), qnormal(μ, σ, q), and qlognormal(μ, σ, 
q), where the q parameter is the quantization factor that converts continuous values into 
discrete ones.

Once you have defined the search space, you need to specify the sampling strategy that 
you will use to select each hyperparameter combination that you are going to be testing. 
The AzureML SDK supports the following methods for sampling the search space defined 
in the azureml.train.hyperdrive module:
    
    • Grid sampling: This method supports only discrete hyperparameter values that are 
    defined using the choice method you saw above. The Azure ML SDK will search 
    all possible hyperparameter combinations of those discrete values. Imagine that 
    you wanted to explore the following four parameter combinations:
     a=0.01 and b=10
     a=0.01 and b=100
     a=0.5 and b=10
     a=0.5 and b=100
    
    • Random sampling: This technique is implemented in the 
    RandomParameterSampling class. It allows you to randomly select 
    hyperparameter values from the available options. It supports both discrete 
    and continuous hyperparameters.
    
    • Bayesian sampling: This method picks samples based on how the previous samples 
    performed. It requires at least 20 iterations x the number of hyperparameter
    parameters you are fine-tuning. This means that if you have two parameters you are 
    fine-tuning, you will need at least 20 x 2 = 40 runs in the max_total_runs you 
    will read about next. It supports both discrete and continuous hyperparameters. 



In [24]:
from azureml.core import Workspace, Environment
import sklearn
from sklearn.datasets import load_diabetes
import pandas as pd

In [23]:
X,y=load_diabetes(return_X_y=True)

In [25]:
diabetes_data=pd.DataFrame(X)
diabetes_data["target"]=y

In [26]:
diabetes_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,target
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019908,-0.017646,151.0
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.06833,-0.092204,75.0
2,0.085299,0.05068,0.044451,-0.005671,-0.045599,-0.034194,-0.032356,-0.002592,0.002864,-0.02593,141.0
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022692,-0.009362,206.0
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031991,-0.046641,135.0


In [29]:
ws=Workspace.from_config()

diabetes_data.to_csv('diabetes-training/diabetes_data.csv',
                    index=False)

dstore = ws.get_default_datastore()

dstore.upload_files(
    files=['diabetes-training/diabetes_data.csv'],
    target_path="/samples/diabetes/v1", 
    overwrite=True,
    show_progress=True)

Uploading an estimated of 1 files
Uploading diabetes-training/diabetes_data.csv
Uploaded diabetes-training/diabetes_data.csv, 1 files out of an estimated total of 1
Uploaded 1 files


$AZUREML_DATAREFERENCE_0230551c01324569b7464e45b1cc69c1

In [36]:
from azureml.core import Dataset
dstore = ws.get_default_datastore()
file_paths = [
    (dstore, "/samples/diabetes/v1")
]

tabular_dataset = Dataset.Tabular.from_delimited_files(
    path=file_paths,
    validate=False)

tabular_dataset.register(
    workspace=ws,
    name="diabetes",
    description="The sklearn diabetes dataset")

{
  "source": [
    "('workspaceblobstore', '/samples/diabetes/v1')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ],
  "registration": {
    "id": "24bfd17e-0af6-46e0-a850-a1436e2de5fe",
    "name": "diabetes",
    "version": 1,
    "description": "The sklearn diabetes dataset",
    "workspace": "Workspace.create(name='eymlws', subscription_id='afe91b36-9760-4bb8-9dd6-72761af8d4ef', resource_group='eymlops')"
  }
}

In [37]:
from azureml.core.conda_dependencies import CondaDependencies

In [38]:
diabetes_env=Environment(name="diabetes-training-env")

In [39]:
diabetes_env.python.conda_dependencies=\
    CondaDependencies.create(
        conda_packages=[f"scikit-learn=={sklearn.__version__}"],
        pip_packages=["azureml-core", 
          "azureml-dataset-runtime[pandas]"])

In [40]:
target=ws.compute_targets["akt-cluster"]

In [41]:
from azureml.core import ScriptRunConfig

script=ScriptRunConfig(
    source_directory="diabetes-training",
    script="training.py",
    environment=diabetes_env,
    compute_target=target
)
# Note that you don't specify the --alpha argument.

In [42]:
from azureml.train.hyperdrive import HyperDriveConfig
from azureml.train.hyperdrive import RandomParameterSampling, uniform, PrimaryMetricGoal

In [43]:
param_sampling = RandomParameterSampling({
        'alpha': uniform(0.00001, 0.1),
    }
)

In [44]:
hd_config = HyperDriveConfig(
                     run_config=script,                          
                     hyperparameter_sampling=param_sampling,
                     primary_metric_name="nrmse", 
                     primary_metric_goal=                   
                                 PrimaryMetricGoal.MINIMIZE,
                     max_total_runs=20,
                     max_concurrent_runs=4)

In [45]:
from azureml.core import Experiment

experiment = Experiment(ws, "diabetes_training_hyperdrive")
hyperdrive_run = experiment.submit(hd_config)

hyperdrive_run.wait_for_completion(show_output=True)

RunId: HD_02e4302d-1aa0-41b3-99b3-216e372d48a7
Web View: https://ml.azure.com/runs/HD_02e4302d-1aa0-41b3-99b3-216e372d48a7?wsid=/subscriptions/afe91b36-9760-4bb8-9dd6-72761af8d4ef/resourcegroups/eymlops/workspaces/eymlws&tid=bdf67597-3a9e-4e19-958e-1d8222afa3de

Streaming azureml-logs/hyperdrive.txt

[2022-10-24T09:57:44.956470][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2022-10-24T09:57:51.8618113Z][SCHEDULER][INFO]Scheduling job, id='HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_0' 
[2022-10-24T09:57:51.9960497Z][SCHEDULER][INFO]Scheduling job, id='HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_1' 
[2022-10-24T09:57:52.1459823Z][SCHEDULER][INFO]Scheduling job, id='HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_2' 
[2022-10-24T09:57:52.214509][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2022-10-24T09:57:52.3276332Z][SCHEDULER][INFO]Successfully scheduled a job. Id='HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_2' 
[2022-10-2

{'runId': 'HD_02e4302d-1aa0-41b3-99b3-216e372d48a7',
 'target': 'akt-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-10-24T09:57:44.260064Z',
 'endTimeUtc': '2022-10-24T10:10:48.856463Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"nrmse","goal":"minimize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '56407a75-eb8e-43c0-a44b-d48be13fc8da',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1017-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.44.0',
  'space_size': 'infinite_space_size',
  'score': '0.18079576030808062',
  'best_child_run_id': 'HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_2',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_2'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues

In this code, you define a RandomParameterSampling approach to explore 
uniformly distributed values, ranging from 0.00001 to 0.1, for the alpha
argument that will be passed to the training script you created in step 3. This 
training script accepts the --alpha argument, which is then passed to the alpha
hyperparameter of the LassoLars model.
You assign this RandomParameterSampling configuration to the 
hyperparameter_sampling argument of HyperDriveConfig.

You have also configured the run_config property of HyperDriveConfig
to use the ScriptRunConfig object.

Note that the 
RandomParameterSampling class will be passing the alpha parameter needed 
by the script.
You then define that the produced models will be evaluated using the NRMSE
metric that the training script is logging (the primary_metric_name
parameter). You also specify that you are trying to minimize that value (the 
primary_metric_goal parameter), since it's the error you want to minimize.
The last two parameters, max_total_runs and max_concurrent_runs, 
control the resources you are willing to invest in finding the best model. The 
max_total_runs parameter controls the maximum number of experiments 
to run. This can be between 1 and 1,000 runs. This is a required parameter. 
max_concurrent_runs is an optional parameter and controls the maximum 
concurrency of the conducted runs

There is one more optional parameter you can use 
to limit the amount of time you are searching for the optimal hyperparameter
combination. The max_duration_minutes parameter, which you did not 
specify in the sample above, defines the maximum duration in minutes to run the 
hyperparameter tuning process. After that timeout, all subsequent scheduled runs 
are automatically canceled.

In [46]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics(name='nrmse')
parameter_values = best_run.get_details()[
                        'runDefinition']['arguments']

print('Best Run Id: ', best_run.id)
print('- NRMSE:', best_run_metrics['nrmse'])
print('- alpha:', parameter_values[1])

Best Run Id:  HD_02e4302d-1aa0-41b3-99b3-216e372d48a7_2
- NRMSE: 0.18079576030808062
- alpha: 0.021776833853814508


## Using the early termination policy

One of the parameters of the HyperDriveConfig constructor is the policy one. This 
argument accepts an EarlyTerminationPolicy object, which defines the policy with 
which runs can be terminated early. By default, this parameter has a None value, which 
means that the NoTerminationPolicy class will be used, allowing each run to execute 
until completion.
To be able to use an early termination policy, your script must be performing multiple 
iterations during each run. 


Ideally, you would like to reduce the amount of time waiting for all the runs to complete. 
EarlyTerminationPolicy allows you to monitor the jobs that are running, and 
if they are performing poorly compared to the rest of the jobs, cancel them early. 

The AzureML SDK offers a few built-in EarlyTerminationPolicy implementations, 
located in the azureml.train.hyperdrive module:
    
• NoTerminationPolicy: This is the default stopping policy that allows all runs 
to complete.

• MedianStoppingPolicy: The median stopping policy computes the running 
averages across all runs. It then cancels runs whose best performance is worse than 
the median of the running averages. You can think of this policy as comparing the 
performance of each run against the average performance of the previous runs. The 
nice thing about this policy is that it considers all runs that have happened so far 
and does not just compare the current run with the best runs so far. This feature 
allows the median stopping policy to avoid being trapped in local optimum values.

• BanditPolicy: The bandit policy computes the distance between the current run 
and the best-performing one and then terminates it based on some slack criteria. 
You can define either the absolute distance (the slack_amount parameter) or the 
maximum allowed ratio (the slack_factor parameter) allowed from the best 
performing run.

• TruncationSelectionPolicy: The truncation selection policy is the 
most aggressive policy, which cancels a certain percentage (the truncation_
percentage parameter) of runs that rank the lowest for their performance on 
the primary metric. When ranking a relatively young run, at an early iteration, the 
policy compares them with the equivalent iteration performance of the older runs. 
Thus, this policy strives for fairness in ranking the runs by accounting for improving 
model performance with training time.

All policies take two optional parameters:
    
• evaluation_interval: The frequency for applying the policy.

• delay_evaluation: This delays the first policy evaluation for a specified number 
                    of intervals, giving time for young runs to reach a mature state.

In [47]:
from azureml.core import Workspace, ScriptRunConfig, Environment

ws = Workspace.from_config()
target = ws.compute_targets["akt-cluster"]

script = ScriptRunConfig(
    source_directory="termination-policy-training",
    script="training.py",
    environment=Environment.get(ws, "AzureML-Minimal"),
    compute_target=target,
)

In [48]:
from azureml.train.hyperdrive import (
    GridParameterSampling,    
    choice,
    MedianStoppingPolicy,
    HyperDriveConfig,
    PrimaryMetricGoal
)
param_sampling = GridParameterSampling(
    {
        "a": choice(1, 2, 3, 4),
        "b": choice(1, 2, 3, 4),
    }
)

early_termination_policy = MedianStoppingPolicy(
    evaluation_interval=1, delay_evaluation=5
)

# More aggressive alternative
# from azureml.train.hyperdrive import TruncationSelectionPolicy
# early_termination_policy = TruncationSelectionPolicy(
#    truncation_percentage=50, evaluation_interval=1
#)


In [49]:
hd_config = HyperDriveConfig(
    policy=early_termination_policy,
    run_config=script,
    hyperparameter_sampling=param_sampling,
    primary_metric_name="fake_metric",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=50,
    max_concurrent_runs=4
)

In [50]:
from azureml.core import Experiment
experiment = Experiment(ws, "fake-metric-hyperdrive")
hyperdrive_run = experiment.submit(hd_config)

hyperdrive_run.wait_for_completion(show_output=True)

RunId: HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b
Web View: https://ml.azure.com/runs/HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b?wsid=/subscriptions/afe91b36-9760-4bb8-9dd6-72761af8d4ef/resourcegroups/eymlops/workspaces/eymlws&tid=bdf67597-3a9e-4e19-958e-1d8222afa3de

Streaming azureml-logs/hyperdrive.txt

[2022-10-24T10:18:46.306531][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2022-10-24T10:18:46.6807322Z][SCHEDULER][INFO]Scheduling job, id='HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_0' 
[2022-10-24T10:18:46.7939260Z][SCHEDULER][INFO]Scheduling job, id='HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_1' 
[2022-10-24T10:18:46.9062597Z][SCHEDULER][INFO]Scheduling job, id='HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_2' 
[2022-10-24T10:18:47.0215237Z][SCHEDULER][INFO]Scheduling job, id='HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_3' 
[2022-10-24T10:18:46.982103][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2022-10-24T10:18:47.257

{'runId': 'HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b',
 'target': 'akt-cluster',
 'status': 'Completed',
 'startTimeUtc': '2022-10-24T10:18:45.247292Z',
 'endTimeUtc': '2022-10-24T10:57:58.68814Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"fake_metric","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'b34e715b-540b-4fba-b586-3e212eb8e870',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1017-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.44.0',
  'space_size': '16',
  'score': '42.0',
  'best_child_run_id': 'HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_7',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_8fbf4c95-16b1-4340-9d51-e0c05bb3d79b_7'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'amlClientType': 'azure