# Hyperparameter Tuning using HyperDrive

Import Dependencies.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice

import os
import shutil
import joblib

### Load Workspace

Load the exisiting azure workspace. 

In [2]:
ws = Workspace.from_config()
experiment_name = 'loan_default_model'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-137409
Azure region: southcentralus
Subscription id: f5091c60-1c3c-430f-8d81-d802f6bf2414
Resource group: aml-quickstarts-137409


### Compute Cluster Configuration

Setting up the virtual machine and other cluster configuration needed in running the experiment. 

In [3]:
cpu_cluster_name = "capstone-compute" 

#verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print(f"Found existing cluster: {cpu_cluster_name} to be used.")
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2', max_nodes=6)

    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Found existing cluster: capstone-compute to be used.

Running


## Hyperdrive Configuration

The project entails identifying customers who would either default or not after taking a loan credit. This transalates to a binary classification: to default or not default. Hence, a Logistic regression model would be used. SKLearn's logistic regression algorithm is a well-known supervised learning approach optimized for dichotomous or binary variables.

Hyperparameters are adjustable parameters that let you control the model training process. Hyperparameter tuning is the process of finding the configuration of hyperparameters that results in the best performance. The process is typically computationally expensive and manual.
The two hyperparamters used in this experiment are `C` and `max_iter`. `C` is the Inverse Regularization Strength which applies a penalty to stop increasing the magnitude of parameter values in order to reduce overfitting. `max_iter` is the maximum iteration to converge for the SKLearn Logistic Regression algorithm.

Primary Metric is used to optimize the hyperparamter tuning. Each training run is evaluated for the primary metric.
We have chosen `Accuracy` as the primary metric. `MAXIMIZE` is the preferred primary metric goal.

Early termination policies are applied to HyperDrive runs. A run is cancelled when the criteria of a specified policy are met. The `BanditPolicy` was used.
It is based on slack factor/slack amount and evaluation interval. Bandit terminate runs where the primary metric is not within the specified slack factor/slack amount compared to the best performing run. This helps to improves computational efficiency.

For this experiment, `evaluation_interval=1`, `slack_factor=0.2`, and `delay_evaluation=4`. This configration means that the policy would be applied to every even number iteration of the pipeline greater than 4 and if 1.2*value of the benchmark metric for current iteration is smaller than the best metric value so far, the run will be cancelled.

In [4]:
early_termination_policy = BanditPolicy(evaluation_interval=1, slack_factor=0.2, delay_evaluation=4)

param_sampling = RandomParameterSampling(parameter_space={"--C": choice(1, 2, 3, 4, 5, 6), "--max_iter": choice(50, 100, 150, 200, 250, 300)})

# Code below creates a new training directory with the train.py script in it
if "training" not in os.listdir():
    os.mkdir("./training")

shutil.copy('train.py', "./training")

# Code creates an output directory to store trained models
if "outputs" not in os.listdir():
    os.makedirs('./outputs', exist_ok=True)

estimator =  SKLearn(source_directory='./training', entry_script="train.py", compute_target=cpu_cluster)

hyperdrive_run_config = HyperDriveConfig(estimator=estimator,
                                     hyperparameter_sampling=param_sampling,
                                     policy=early_termination_policy,
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=20,
                                     max_concurrent_runs=4)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


### Submitting the experiment

In [5]:
print('Submitting Hyperdrive experiment...')
hyperdrive_run = experiment.submit(hyperdrive_run_config)



Submitting Hyperdrive experiment...


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

Use the `RunDetails` widget to show the different experiments.

In [6]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_bf87c16d-5732-4867-a80e-d8264cef5f0e
Web View: https://ml.azure.com/experiments/loan_default_model/runs/HD_bf87c16d-5732-4867-a80e-d8264cef5f0e?wsid=/subscriptions/f5091c60-1c3c-430f-8d81-d802f6bf2414/resourcegroups/aml-quickstarts-137409/workspaces/quick-starts-ws-137409

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-02-06T05:16:56.359506][API][INFO]Experiment created<END>\n""<START>[2021-02-06T05:16:56.869172][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-02-06T05:16:57.064339][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-02-06T05:16:58.1822843Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>

Execution Summary
RunId: HD_bf87c16d-5732-4867-a80e-d8264cef5f0e
Web View: https://ml.azure.com/experiments/loan_default_model/runs/HD_bf87c16d-5732-4867-a80e-d8264cef5f0e?wsid=/subscriptions

{'runId': 'HD_bf87c16d-5732-4867-a80e-d8264cef5f0e',
 'target': 'capstone-compute',
 'status': 'Completed',
 'startTimeUtc': '2021-02-06T05:16:56.129055Z',
 'endTimeUtc': '2021-02-06T05:31:57.04949Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '4f92d6e1-d95e-4312-94f7-704a6b56a3b3',
  'score': '0.7167860648718645',
  'best_child_run_id': 'HD_bf87c16d-5732-4867-a80e-d8264cef5f0e_7',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg137409.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_bf87c16d-5732-4867-a80e-d8264cef5f0e/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=38cl1%2FDjAOBrgVnKDwGH6HOpaO1%2FXJLzT%2FWlVVqejOU%3D&st=2021-02-06T05%3A22%3A05Z&se=2021-02-06T13%3A32%3A05Z&sp=r'},
 'submittedBy': 'ODL_User 137

## Best Model

Get the best model from the hyperdrive experiments and display all the properties of the model.

In [10]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print('Best Run Details:', best_run,
      'Best Run ID:', best_run.id, 
      'Arguments:', best_run.get_details()['runDefinition']['arguments'],
      'Accuracy:', best_run_metrics['Accuracy'],
      'Regularization Strength (C): ', best_run_metrics['Regularization Strength:'],
      'Maximum Iterations (max_iter): ', best_run_metrics['Max iterations:'], sep = '\n')

Best Run Details:
Run(Experiment: loan_default_model,
Id: HD_bf87c16d-5732-4867-a80e-d8264cef5f0e_7,
Type: azureml.scriptrun,
Status: Completed)
Best Run ID:
HD_bf87c16d-5732-4867-a80e-d8264cef5f0e_7
Arguments:
['--C', '2', '--max_iter', '150']
Accuracy:
0.7167860648718645
Regularization Strength (C): 
2.0
Maximum Iterations (max_iter): 
150


In [8]:
# Save the best model
best_model = best_run.register_model(
    model_name='HypDriveBestModel', 
    model_path='outputs/', 
    properties={'Accuracy': best_run_metrics['Accuracy'], 
                'Regularization Strength (C)': best_run_metrics['Regularization Strength:'], 
                'Maximum Iterations (max_iter)': best_run_metrics['Max iterations:']})

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service