# Hyperparameter Tuning using HyperDrive

In [17]:
# Import necessary packages
from azureml.core import Workspace, Experiment, Model, Environment, ScriptRunConfig, Dataset
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
import os
import shutil
import joblib

## Dataset

The dataset is already registered to AzureML Studio. Kindly refer to README for details.

In [18]:
# Initialize Workspace
ws = Workspace.from_config()
# Initialize Workspace
experiment = Experiment(workspace=ws, name="hyperdrive-experiment")
# Create environment
my_env = Environment.from_conda_specification(name = 'my-env', file_path = './my-env.yml')

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: my-workspace
Azure region: eastus2
Subscription id: 29e71f0e-90a3-43b1-ab69-4b27e1408264
Resource group: my-resource-group


In [21]:
# Get the dataset and display
# The training script (train.py) will get and handle the dataset.
# This cell is only for demonstration purposes.
dataset = Dataset.get_by_name(ws, name='Housing Prices Dataset')
dataset.to_pandas_dataframe()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1455,1456,60,RL,62,7917,Pave,,Reg,Lvl,AllPub,...,0,,,,0,8,2007,WD,Normal,175000
1456,1457,20,RL,85,13175,Pave,,Reg,Lvl,AllPub,...,0,,MnPrv,,0,2,2010,WD,Normal,210000
1457,1458,70,RL,66,9042,Pave,,Reg,Lvl,AllPub,...,0,,GdPrv,Shed,2500,5,2010,WD,Normal,266500
1458,1459,20,RL,68,9717,Pave,,Reg,Lvl,AllPub,...,0,,,,0,4,2010,WD,Normal,142125


In [19]:
# Create or Attach an AmlCompute cluster
# Choose a name for the CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                              max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

I will use [GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html) from [scikit-learn](https://scikit-learn.org/stable/index.html) library. This is a Gradient Boosting for regression. Despite the *[No free lunch theorem](https://en.wikipedia.org/wiki/No_free_lunch_theorem)*, from my previous experience, I believe that gradient boosting gives good results for this dataset. Detailed explanation for *Gradient Boosting* can be found in this [User Guide](https://scikit-learn.org/stable/modules/ensemble.html#gradient-boosting). Some important parameters for GradientBoostingRegressor are:

- learning_rate(float), default=0.1

   Learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators.

- n_estimators (int), default=100

   The number of boosting stages to perform. Gradient boosting is fairly robust to over-fitting so a large number usually results in better performance.

In addition to their importance, these parameters can be used as an example for Discrete hyperparameters (choice) and Continuous hyperparameters (Uniform).

As can be seen in [Hyperparameter tuning a model with Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters) we need to

- Define the parameter search space,
- Specify a primary metric to optimize,
- Specify early termination policy for low-performing runs,

Since learning rate is type float, a continious search space needs to be selected. uniform() returns a value uniformly distributed between low and high which is used for learning rate.

Since number of estimators is type int, a discrete search space needs to be selected. Discrete hyperparameters are specified as a choice() among discrete values.

Random sampling supports discrete and continuous hyperparameters. It supports early termination of low-performance runs. As a result, it will be a good choice for parameter sampling.

For out AutoML experiment, the primary metric is chosen to be **normalized_root_mean_squared_error** as suggested in [this article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train#primary-metric). Since we have to compare the two models for deployment the same primary metric must be chosen for hyperdrive run. Low values are better for this metric.

We can automatically terminate poorly performing runs with an early termination policy. Early termination improves computational efficiency. Bandit policy is based on slack factor/slack amount and evaluation interval. Bandit terminates runs where the primary metric is not within the specified slack factor/slack amount compared to the best performing run. We can select Bandit policy as an early termination policy for more aggressive savings.

Since the estimator class is deprecated and gives error while calculating mean_squared_error, a ScriptRunConfig object has been used as suggested [here](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py). Also an environment is created to remove the mean_squared_error calculation error.

max_concurrent_runs has been selected as *4*, since the AmlCompute cluster has 4 nodes.

max_total_runs has been selected as *40* for more runs.

train.py is used for training script. Sklearn pipelines and transformers are used for preprocessing in the training script since there are missing values in the dataset.

In [25]:
# TODO: Create an early termination policy. This is not required if you are using Bayesian sampling.
early_termination_policy = BanditPolicy(evaluation_interval=10, slack_factor=0.2)

#TODO: Create the different params that you will be using during training
param_sampling = RandomParameterSampling({
    '--lr': uniform(0.001, 0.1),
    '--est': choice(100, 500)
})

# Create project folder
project_folder = './'

# Create a script run configuration
src = ScriptRunConfig(source_directory=project_folder,
                      script='train.py',
                      compute_target=ws.compute_targets['cpu-cluster'],
                      environment=my_env)



if "training" not in os.listdir():
    os.mkdir("./training")

# copy the training script train.py into your project directory
import shutil
shutil.copy('train.py', "./training")

# Create a HyperDriveConfig using the run_config, hyperparameter sampler, and policy.
# Estimator class is depricated
hyperdrive_config = HyperDriveConfig(run_config=src,
hyperparameter_sampling=param_sampling,
policy=early_termination_policy,
primary_metric_name='NRMSE',
primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
max_total_runs=40,
max_concurrent_runs=4)
print('HyperDriveConfig has been created!')

HyperDriveConfig has been created!


In [26]:
# Submit the experiment

my_run = experiment.submit(config=hyperdrive_config)

my_run.wait_for_completion(show_output=True)
assert(my_run.get_status()=="Completed")

RunId: HD_a7853987-c2e6-4199-baa4-1c76de8f4158
Web View: https://ml.azure.com/experiments/hyperdrive-experiment/runs/HD_a7853987-c2e6-4199-baa4-1c76de8f4158?wsid=/subscriptions/29e71f0e-90a3-43b1-ab69-4b27e1408264/resourcegroups/my-resource-group/workspaces/my-workspace

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-02-10T01:16:04.427800][API][INFO]Experiment created<END>\n""<START>[2021-02-10T01:16:04.874276][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-02-10T01:16:05.064284][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-02-10T01:16:06.0769860Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>

Execution Summary
RunId: HD_a7853987-c2e6-4199-baa4-1c76de8f4158
Web View: https://ml.azure.com/experiments/hyperdrive-experiment/runs/HD_a7853987-c2e6-4199-baa4-1c76de8f4158?wsid=/subscriptions/29e71f0e

## Run Details

In [27]:
RunDetails(my_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

In [35]:
# Get your best run and save the model from that run.

best_run = my_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()
run_definition = parameter_values.get("runDefinition")

print('Best Run Id: ', best_run.id)
print('NRMSE:', best_run_metrics['NRMSE'])
print('MAE:', best_run_metrics['MAE'])
print('RMSLE:', best_run_metrics['RMSLE'])
print()
print(best_run.get_file_names())
print()


Best Run Id:  HD_a7853987-c2e6-4199-baa4-1c76de8f4158_24
NRMSE: 0.032666490716132174
MAE: 15671.028999405176
RMSLE: 0.1279803797267253

['azureml-logs/55_azureml-execution-tvmps_1d37243c5acfd40c83e682116be94f30b22f902e680c25e437e3f8d24f28dcb6_d.txt', 'azureml-logs/65_job_prep-tvmps_1d37243c5acfd40c83e682116be94f30b22f902e680c25e437e3f8d24f28dcb6_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_1d37243c5acfd40c83e682116be94f30b22f902e680c25e437e3f8d24f28dcb6_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/102_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/model.joblib']



In [40]:
# Save the best model

# Register model
model = best_run.register_model(model_name="my_best_hyperdrive_run", model_path="outputs/model.joblib")
# Check model
for model in Model.list(ws):
    print("Model Name: {}\n".format(model.name))
    print(model)

Model Name: my_best_hyperdrive_run

Model(workspace=Workspace.create(name='my-workspace', subscription_id='29e71f0e-90a3-43b1-ab69-4b27e1408264', resource_group='my-resource-group'), name=my_best_hyperdrive_run, id=my_best_hyperdrive_run:3, version=3, tags={}, properties={})


## Model Deployment

**AutoML model has been deployed.**

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service