# Hyperparameter Tuning using HyperDrive

Import Dependencies. 

In [1]:
import azureml.core

print("This notebook was created using version 1.41.0 of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

This notebook was created using version 1.41.0 of the Azure ML SDK
You are currently using version 1.40.0 of the Azure ML SDK


In [2]:
import os
import json
import logging
import pandas as pd

from azureml.core.run import Run
from azureml.core.model import Model
from azureml.widgets import RunDetails
from azureml.core.dataset import Dataset
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from sklearn.datasets import fetch_20newsgroups
from azureml.core.compute import AmlCompute, ComputeTarget 
from azureml.core.compute_target import ComputeTargetException

from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
from azureml.core import Environment, ScriptRunConfig


In [3]:
# # check to see if ACI is already registered
# (myenv) $ az provider show -n Microsoft.ContainerInstance -o table

# azureuser@lyasolis1:~/cloudfiles/code/Users/lyasolis/starter_file$ az provider show -n Microsoft.ContainerInstance -o table
# Namespace                    RegistrationPolicy    RegistrationState
# ---------------------------  --------------------  -------------------
# Microsoft.ContainerInstance  RegistrationRequired  Registered



In [3]:
subscription_id = os.getenv("SUBSCRIPTION_ID", default="fbe09221-d2fa-4355-8174-808a6c0b6925")
resource_group = os.getenv("RESOURCE_GROUP", default="udacity-capstone")
workspace_name = os.getenv("WORKSPACE_NAME", default="udacity-capstone-ws")
workspace_region = os.getenv("WORKSPACE_REGION", default="northeurope")


In [4]:
ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
# write the details of the workspace to a configuration file to the notebook library
ws.write_config()


In [10]:
# Choose an experiment name.

exp = Experiment(workspace=ws, name="hyperdrive-classification-text-dnn")

output = {}
output["Subscription ID"] = ws.subscription_id
output["Workspace Name"] = ws.name
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
output["Experiment Name"] = experiment.name
output["SDK Version"] = azureml.core.VERSION
pd.set_option("display.max_colwidth", None)
outputDf = pd.DataFrame(data=output, index=[""])
outputDf.T


Unnamed: 0,Unnamed: 1
Subscription ID,fbe09221-d2fa-4355-8174-808a6c0b6925
Workspace Name,udacity-capstone-ws
Resource Group,udacity-capstone
Location,northeurope
Experiment Name,hyperdrive-classification-text-dnn
SDK Version,1.40.0


In [6]:
#Create Compute Cluster
num_nodes = 1

# Choose a name for your cluster.
amlcompute_cluster_name = "dnntext-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print("Found existing cluster, use it.")
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_NC6",  # CPU for BiLSTM, such as "STANDARD_D2_V2"
        # To use BERT (this is recommended for best performance), select a GPU such as "STANDARD_NC6"
        # or similar GPU option available in your workspace
        idle_seconds_before_scaledown=60,
        max_nodes=num_nodes,
    )
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Dataset

### Overview
For this notebook we will use 20 Newsgroups data from scikit-learn. We filter the data to contain four classes since it's a student project, and computational resources are expensive.

## Hyperdrive Configuration

  * parameter sampler (RandomParameterSampling - supports discrete hyperparameters, early termination of low-performance runs. It's quicker and cheaper.)
  * early termination policy (BanditPolicy - starting at evaluation interval 5. Any run whose best metric is less than (1/(1+0.1) or 91% of the best performing run will be terminated.)

In [16]:
# Specify parameter sampler
ps = RandomParameterSampling(
    {
        'learning_rate': uniform(0.05, 0.1),
        'batch_size': choice(16, 32, 64, 128)
    }
)

# Specify a Policy
policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5)
    
# Setup environment for your training run
sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

# Create a ScriptRunConfig Object to specify the configuration details of your training job (i.e. an estimator for the train.py script)
src = ScriptRunConfig(source_directory='.',
                     script='train.py',
                     arguments=['--C', 1.0, '--max_iter', 200],
                     compute_target=compute_target,
                     environment=sklearn_env)

# Create a HyperDriveConfig using the src object, hyperparameter sampler, and policy.
hyperdrive_config = HyperDriveConfig(run_config=src,
                                     hyperparameter_sampling=ps,
                                     policy=policy,
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=10
                                    )

In [17]:
run = exp.submit(src)
print(run)

Run(Experiment: hyperdrive-classification-text-dnn,
Id: hyperdrive-classification-text-dnn_1652683353_f026ddb8,
Type: azureml.scriptrun,
Status: Queued)


In [18]:
print(run.get_details())

{'runId': 'hyperdrive-classification-text-dnn_1652683353_f026ddb8', 'target': 'dnntext-cluster', 'status': 'Completed', 'startTimeUtc': '2022-05-16T06:42:44.052523Z', 'endTimeUtc': '2022-05-16T06:43:20.292025Z', 'services': {}, 'properties': {'_azureml.ComputeTargetType': 'amlctrain', 'ContentSnapshotId': 'c451cfdc-1c2f-4355-a67a-408b73034c08', 'ProcessInfoFile': 'azureml-logs/process_info.json', 'ProcessStatusFile': 'azureml-logs/process_status.json'}, 'inputDatasets': [], 'outputDatasets': [], 'runDefinition': {'script': 'train.py', 'command': '', 'useAbsolutePath': False, 'arguments': ['--C', '1', '--max_iter', '200'], 'sourceDirectoryDataStore': None, 'framework': 'Python', 'communicator': 'None', 'target': 'dnntext-cluster', 'dataReferences': {}, 'data': {}, 'outputData': {}, 'datacaches': [], 'jobName': None, 'maxRunDurationSeconds': 2592000, 'nodeCount': 1, 'instanceTypes': [], 'priority': None, 'credentialPassthrough': False, 'identity': None, 'environment': {'name': 'sklearn-e

In [19]:
RunDetails(run).show()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

## Run Details

Use the `RunDetails` widget to show the different experiments.

In [21]:
# Submit hyperdrive run to the experiment and show run details with the widget.
hyperdrive_run = exp.submit(hyperdrive_config)
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

Get the best model from the hyperdrive experiments and display all the properties of the model.

In [22]:
import joblib
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']

In [23]:
print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['Accuracy'])
print('\n learning rate:',parameter_values[3])
print('\n keep probability:',parameter_values[5])
print('\n batch size:',parameter_values[7])

Best Run Id:  HD_ea6ce62a-6609-4c53-b91a-ef17d61acaaf_2

 Accuracy: 0.8625712476250792

 learning rate: 200

 keep probability: 32

 batch size: 0.09562708686943847


In [24]:
print(best_run.get_file_names())

['azureml-logs/55_azureml-execution-tvmps_89b70337825914a5e1bab4836fb9ccd10d1004aea04299d8e5b11c61b8897e0c_d.txt', 'azureml-logs/65_job_prep-tvmps_89b70337825914a5e1bab4836fb9ccd10d1004aea04299d8e5b11c61b8897e0c_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_89b70337825914a5e1bab4836fb9ccd10d1004aea04299d8e5b11c61b8897e0c_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/104_azureml.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/text_dnn_sklearn_model.pkl']


In [25]:
print(hyperdrive_run.get_file_names())

['azureml-logs/hyperdrive.txt']


In [27]:
#Register the best model
model = best_run.register_model(model_name = 'textDNN-20News-sklearn', model_path = 'outputs/text_dnn_sklearn_model.pkl')

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

