# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:

from azureml.core import Dataset, Environment, Experiment, Workspace

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import BayesianParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice

from azureml.widgets import RunDetails

In [2]:
ws = Workspace.from_config()
experiment_name = 'heartfailure-capstone'

experiment=Experiment(ws, experiment_name)

## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [4]:
dataset = Dataset.get_by_name(ws, name='heartfailure')
heart_failure = dataset.to_pandas_dataframe()
heart_failure.head(3)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1


In [6]:
from azureml.core.compute import ComputeTarget, AmlCompute

# TODO: Create compute cluster
# Use vm_size = "Standard_D2_V2" in your provisioning configuration.
# max_nodes should be no greater than 4.
cpu_cluster_name = "djscluster1"

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                        max_nodes=4, 
                                                        idle_seconds_before_scaledown=2400)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

Bayesian sampling intelligently picks the next sample of hyperparameters, based on how the previous samples performed, such that the new sample improves the reported primary metric.

In [9]:
param_sampling = BayesianParameterSampling(
   parameter_space={
       "--n-estimators": choice(range(50, 300, 20)),
       "--max-depth": choice(range(1, 20))
    }
)

#TODO: Create your estimator and hyperdrive config
estimator = est = SKLearn(
    source_directory="./",
    entry_script="train.py",
    script_params={'--input-data': dataset.id},
    compute_target=cpu_cluster
)

hyperdrive_run_config = HyperDriveConfig(
    primary_metric_name="accuracy",
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=30,
    max_concurrent_runs=4,
    hyperparameter_sampling=param_sampling,
    estimator=est
)

For best results with Bayesian Sampling we recommend using a maximum number of runs greater than or equal to 20 times the number of hyperparameters being tuned. Recommendend value:40.


In [11]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)



## Run Details



In [12]:
# Submit your hyperdrive run to the experiment and show run details with the widget.
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=False)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

{'runId': 'HD_99f1be52-f021-4a80-8c24-ce15307b3694',
 'target': 'djscluster1',
 'status': 'Completed',
 'startTimeUtc': '2020-11-23T02:54:43.951065Z',
 'endTimeUtc': '2020-11-23T03:19:19.070084Z',
 'properties': {'primary_metric_config': '{"name": "accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '98f7865b-9cdc-488e-9936-196aef0f29ba',
  'score': '0.7888888888888889',
  'best_child_run_id': 'HD_99f1be52-f021-4a80-8c24-ce15307b3694_10',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://djsstorageaccount.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_99f1be52-f021-4a80-8c24-ce15307b3694/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=fmGy5JPxzoqAUYENrj85UvqbF0n9TADBL9D87VrRsuY%3D&st=2020-11-23T03%3A09%3A38Z&se=2020-11-23T11%3A19%3A38Z&sp=r'}}

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [13]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run.get_metrics()

{'Num estimators:': 70,
 'Max depth:': 16,
 'Confusion Matrix': 'aml://artifactId/ExperimentRun/dcid.HD_99f1be52-f021-4a80-8c24-ce15307b3694_10/Confusion Matrix',
 'accuracy': 0.7888888888888889}

In [14]:

#TODO: Save the best model
model = best_run.register_model(model_name='model', model_path='outputs/model.pkl')
model

Model(workspace=Workspace.create(name='davidudacitycapstone', subscription_id='641a5604-2a7e-401d-a98d-ee5182f46af8', resource_group='djs-test-e'), name=model, id=model:1, version=1, tags={}, properties={})

In [15]:
cpu_cluster.delete()

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

Current provisioning state of AmlCompute is "Deleting"

