# Hyperparameter Tuning using HyperDrive

In the cell below, import all the dependencies that will be needed to complete the project.

In [1]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform,choice
import os
import shutil

## Initialize Workspace and Experiment

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'titanic-survival-prediction'

experiment=Experiment(ws, experiment_name)

print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

personal-workspace
personal
eastus2
8fb18662-8aa6-4db6-8a37-65c1a334f920


## Initialize Compute Target

In [3]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "aml-compute1"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS2_V2',
                                                           max_nodes=1)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

#compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)
#compute_target.get_status()

Found existing cluster, use it.


## Dataset

### Overview
This dataset contains the actual information about titanic passengers and whether or not each passenger survived. Obtained from https://data.world/nrippner/titanic-disaster-dataset

Features:

survived - Survival (0 = No; 1 = Yes)


class - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)


sex - Sex


age - Age


sibsp - Number of Siblings/Spouses Aboard


parch - Number of Parents/Children Aboard


fare - Passenger Fare


cabin - Cabin


embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)




In [4]:
from azureml.core import Workspace, Dataset

dataset = Dataset.get_by_name(ws, name='titanic-survival')
df = dataset.to_pandas_dataframe()
df.head()

Unnamed: 0,pclass,survived,sex,age,sibsp,parch,fare,cabin,embarked
0,1.0,1.0,female,29.0,0.0,0.0,211.3375,B5,S
1,1.0,1.0,male,0.9167,1.0,2.0,151.55,C22 C26,S
2,1.0,0.0,female,2.0,1.0,2.0,151.55,C22 C26,S
3,1.0,0.0,male,30.0,1.0,2.0,151.55,C22 C26,S
4,1.0,0.0,female,25.0,1.0,2.0,151.55,C22 C26,S


## Hyperdrive Configuration

Since this is a classification problem, we will be training a Logistic Regression classifier. he reason we used a random parameter sampler is beacuse it can help identify the best hyperparameters in shorter time than an exhastive grid search. 
Random sampling also searches more of the hyperparameter space that a grid search if the grid search is poorly defined.

Bandit policy is an early termination policy based on slack factor and evaluation interval. Bandit ends runs when the primary metric isn't within the specified slack factor of the most succesful run.



In [5]:
# Specify parameter sampler
ps = RandomParameterSampling( {
        "--C": choice(0.01,0.05,0.1,0.5,1),
        "--max_iter":choice(30,50,100)
    } )

# Specify a Policy
policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5)

#if "training" not in os.listdir():
    #os.mkdir("./training")

#shutil.move("./train.py", "./training")

# Create a SKLearn estimator for use with train.py
est = SKLearn(source_directory = "./training",
            entry_script = "train.py",
            compute_target = compute_target
             )

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_config = HyperDriveConfig(estimator = est,
                             hyperparameter_sampling=ps,
                             policy=policy,
                             primary_metric_name="Accuracy",
                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                             max_total_runs=100,
                             max_concurrent_runs=1)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.
'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


In [6]:
# Submit your experiment

run = experiment.submit(config = hyperdrive_config, show_output = True)



## Run Details

 Use the `RunDetails` widget to show the different experiments.

In [14]:
from azureml.widgets import RunDetails

RunDetails(run).show()
run.wait_for_completion(show_output = True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_a55d29b1-1bcf-4d26-b462-a8505a531701
Web View: https://ml.azure.com/runs/HD_a55d29b1-1bcf-4d26-b462-a8505a531701?wsid=/subscriptions/8fb18662-8aa6-4db6-8a37-65c1a334f920/resourcegroups/personal/workspaces/personal-workspace&tid=41f83608-ac15-4410-bf59-af30a0f8cb83

Execution Summary
RunId: HD_a55d29b1-1bcf-4d26-b462-a8505a531701
Web View: https://ml.azure.com/runs/HD_a55d29b1-1bcf-4d26-b462-a8505a531701?wsid=/subscriptions/8fb18662-8aa6-4db6-8a37-65c1a334f920/resourcegroups/personal/workspaces/personal-workspace&tid=41f83608-ac15-4410-bf59-af30a0f8cb83



{'runId': 'HD_a55d29b1-1bcf-4d26-b462-a8505a531701',
 'target': 'aml-compute1',
 'status': 'Completed',
 'startTimeUtc': '2021-06-13T01:35:17.646776Z',
 'endTimeUtc': '2021-06-13T01:51:54.179001Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'c9e1cee1-cdad-4a2f-975d-6ab07d23a41d',
  'score': '0.7862595419847328',
  'best_child_run_id': 'HD_a55d29b1-1bcf-4d26-b462-a8505a531701_4',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://personalworksp1821450288.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_a55d29b1-1bcf-4d26-b462-a8505a531701/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=XE35asEuGZ6NNeQpCkP7P65L2xhJGjcIVVx2NLhGfBI%3D&st=2021-06-13T01%3A41%3A57Z&se=2021-06-13T09%3A51%3A57Z&sp=r'},
 'submittedBy': 'Aditya An

## Best Model

In the cells below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [12]:
#Identify the best run

best_run = run.get_best_run_by_primary_metric()
print(best_run,'\n')
print(best_run.get_metrics(),'\n')

Run(Experiment: titanic-survival-prediction,
Id: HD_a55d29b1-1bcf-4d26-b462-a8505a531701_4,
Type: azureml.scriptrun,
Status: Completed) 

{'Regularization Strength:': 0.5, 'Max iterations:': 100, 'Accuracy': 0.7862595419847328} 



In [13]:
#Download the best model

best_run.download_file('outputs/model.pkl','best_hyperdrive_model')

# Register best model
model = best_run.register_model(model_name='best_hyperdrive_model',
                           model_path='outputs/model.pkl')
print(model.name, model.id, model.version, sep='\t')


best_hyperdrive_model	best_hyperdrive_model:1	1


## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service