# Hyperparameter Tuning using HyperDrive

The first cell contains code needed to import all dependencies we will be using in this project.

In [1]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.estimator import Estimator
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, normal, choice
from azureml.core import Environment
import shutil
import os
from azureml.widgets import RunDetails

## Dataset

The next cell contains the code we use to access the data used in this project. This dataset is external in regard to Microsoft Azure ML.

In [2]:
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment

ws = Workspace.from_config()
experiment_name = 'wine-quality-hyperdrive-experiment'

experiment=Experiment(ws, experiment_name)

# Note: Since the notebook for AutoML was performed first
# we skip creation of the dataset in HyperDrive portion of Project
# In case this is the first run notebook, please go to prerequisites.py file and run the code for dataset creation

## Hyperdrive Configuration

The model we will be using is Logistic Regression and Hyperdrive will vary two parameters in random manner from defined parameter values search space. The interaction with the model is performed through estimator training script, a separate Python code that will be invoked by Hyperdrive passing by diferent combinations of parameters. Termination policy is defined as "Bandit" which compares best performing run and will terminate current run where the primary metric is not within the specified slack factor/slack amount. Configuration settings contain everything that defines a HyperDrive run. It includes information about parameter space sampling, termination policy, primary metric, estimator and the compute target to execute the experiment runs on.

The file config.json should be downloaded from Azure Portal and placed into the folder scripts, where the train-LR.py estimator file is located.

In [5]:
# Creating an early termination policy
# reference: github, how-to-tune-hyperparameters
# attribution, November 2020, https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-tune-hyperparameters.md

early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval = 1, delay_evaluation = 5)

# Set the different params that will be used during training
# reference: github, how-to-tune-hyperparameters
# attribution, November 2020, https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/how-to-tune-hyperparameters.md

param_sampling = RandomParameterSampling({
    "--C": choice (10.0, 1.0, 0.1),
    "--max_iter": choice(50, 100, 200)
    })

# Creating SKLearn estimator and hyperdrive config
# resource: github, 04_hyperparameter_random_search
# attribution, November 2020, https://github.com/microsoft/MLHyperparameterTuning/blob/master/04_Hyperparameter_Random_Search.ipynb
compute_target = "aml-compute"
estimator = SKLearn(source_directory='./scripts',
  entry_script='train-LR.py',
  compute_target=compute_target)

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
# resource: github, 04_hyperparameter_random_search
# attribution, November 2020, https://github.com/microsoft/MLHyperparameterTuning/blob/master/04_Hyperparameter_Random_Search.ipynb

hyperdrive_run_config = HyperDriveConfig(
    estimator = estimator,
    hyperparameter_sampling = param_sampling,
    policy = early_termination_policy,
    primary_metric_name = 'Accuracy',
    primary_metric_goal = PrimaryMetricGoal.MAXIMIZE,
    max_total_runs = 10
    )



In [12]:
# Submit hyperdrive run to the experiment and show run details with the widget.
# resource: github, 04_hyperparameter_random_search
# attribution, November 2020, https://github.com/microsoft/MLHyperparameterTuning/blob/master/04_Hyperparameter_Random_Search.ipynb

from azureml.core.experiment import Experiment

exp = Experiment(workspace = ws, name = 'wine-quality-tuning-hyperparameters')
run = exp.submit(hyperdrive_run_config)
run

Experiment,Id,Type,Status,Details Page,Docs Page
wine-quality-tuning-hyperparameters,HD_3a1e18e6-b364-439c-acc6-cd58a866ea19,hyperdrive,Running,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

After all the combinations of parameters are tested, it becomes obvious that variying C parameter, makes more siginificant difference in model accuracy at lower number of iterations. The C parameter defines the amount of regularization of Logistic Regression. Higher value for C results in less regularization.

In the cell below, we will use the `RunDetails` widget to show the different experiments.

In [13]:
# Show run detials using RunDetail widget
RunDetails(run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

In the cell below, we will save the best model from the hyperdrive experiments and display all the properties of the model.

In [14]:
import joblib
# Get best run parameters
# attribution, November 2020, https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

best_run = run.get_best_run_by_primary_metric()
parameter_values = best_run.get_details()['runDefinition']['arguments']
print(best_run.get_details()['runDefinition']['arguments'])
print(best_run.get_file_names())

# Register saved best model

model = best_run.register_model(model_name='Capstone-Project-LinearRegression-best', model_path='outputs/lr-model.joblib')

['--C', '10', '--max_iter', '200']
['azureml-logs/55_azureml-execution-tvmps_2abeb7005cb4b50b31ed538e06396ba4f6a294c20370f5d9bc4e3d17388c3d52_d.txt', 'azureml-logs/65_job_prep-tvmps_2abeb7005cb4b50b31ed538e06396ba4f6a294c20370f5d9bc4e3d17388c3d52_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_2abeb7005cb4b50b31ed538e06396ba4f6a294c20370f5d9bc4e3d17388c3d52_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/109_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/lr-model.joblib']


Here are two required screenshots, screenshot of the RunDetails widget that shows the progress of the training runs of the different experiments, screenshot of the best model with its run id and the different hyperparameters that were tuned.

![](https://github.com/DivkovicD/ML-Engineer-w-MS-Azure/blob/master/Screenshots/screenshot%20of%20the%20RunDetails%20HyperParameter%20widget%20that%20shows%20the%20progress%20of%20the%20training%20runs%20of%20the%20different%20experiments%20v3.png?raw=true)

![](https://github.com/DivkovicD/ML-Engineer-w-MS-Azure/blob/master/Screenshots/Screenshot%20of%20the%20best%20model%20with%20its%20run%20id%20and%20the%20different%20hyperparameters%20that%20were%20tuned.png?raw=true)

## Model Deployment

Since the AutoML model had higher accuracy, the following steps were omitted here and instead performed in AutoML notebook.

In the cell below, register the model, create an inference config and deploy the model as a web service.

In [None]:
n/a

In the cell below, send a request to the web service you deployed to test it.

In [None]:
n/a

In the cell below, print the logs of the web service and delete the service

In [None]:
n/a