## Model Training with Automated Hyperparameter Tuning
### Train the PyTorch Deep Learning regression model with Azure ML service, using the Hyperdrive functionality for searching the best model hyperparameters automatically.

#### <font color='red'> Before you begin: please download the training dataset from Kaggle and save it into the "data" folder as "train.csv". You will need to login into Kaggle to be able to download the dataset. </font>

#### Setup diagnostics collection

In [1]:
from azureml.telemetry import set_diagnostics_collection

set_diagnostics_collection(send_diagnostics=True)

Turning diagnostics collection on. 


#### Initialize the Azure ML Workspace

In [2]:
from azureml.core.workspace import Workspace

ws = Workspace.from_config()
print("Workspace name: " + ws.name, 
      "Azure region: " + ws.location,
      "Resource group: " + ws.resource_group, sep = "\n")

Found the config file in: C:\AI+ Tour Tutorials\Azure ML service\housing\AzureML\aml_config\config.json
Workspace name: ML-Service-Workspace
Azure region: eastus
Resource group: ML-Service-RG


#### Attach your compute target

In [3]:
from azureml.core.compute import ComputeTarget

cluster_name = "gpucluster"
compute_target = ComputeTarget(workspace=ws, name=cluster_name)

print(compute_target.status.serialize())

{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-02-15T16:38:49.403000+00:00', 'creationTime': '2019-02-15T13:47:08.984315+00:00', 'currentNodeCount': 4, 'errors': None, 'modifiedTime': '2019-02-15T13:47:26.304081+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 4, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 4, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


#### Specify the training script folder

In [4]:
script_folder = "./script"

#### Create an Experiment in your Workspace to track the training runs

In [5]:
from azureml.core import Experiment

experiment_name = "pytorch-dl-regression-hyperdrive"
experiment = Experiment(ws, name=experiment_name)

#### Upload data to the cloud

In [6]:
ds = ws.get_default_datastore()
print(ds.datastore_type, ds.account_name, ds.container_name)

ds.upload(src_dir="../data", target_path="pytorch-dl-regression", overwrite=True, show_progress=True)

AzureBlob mlservicstoragevqkhmalr azureml-blobstore-03a77933-b9d0-4918-bd23-4f23d00afafb
Uploading ../data\train.csv
Uploaded ../data\train.csv, 1 files out of an estimated total of 1


$AZUREML_DATAREFERENCE_f4888c92a19f435096142cf87d8ef0f5

#### Create a Run Configuration or Estimator, which allows you to submit training jobs to your target compute environment. Here we create an Estimator, which is specific for PyTorch

In [7]:
from azureml.train.dnn import PyTorch

script_params = {
    "--data-folder": ds.as_mount()
}

estimator = PyTorch(source_directory=script_folder,
                    script_params=script_params,
                    compute_target=compute_target,
                    entry_script="train_model.py",
                    use_gpu=True,
                    conda_packages=["scikit-learn", "pandas"]
                    )

#### Create the Hiperdrive configuration, which allows you to encapsulate your Estimator with the definitions of your hyperparameter search approach

In [8]:
from azureml.train.hyperdrive import *

param_sampling = GridParameterSampling( {
        "num_hidden_layers": choice(1, 2, 3),
        "hidden_layer_size": choice(256, 512),
        "dropout_rate": choice(0.1, 0.25),
        "learning_rate": choice(0.005, 0.0025)
    }
)

#early_termination_policy = MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=25)
early_termination_policy = None

hyperdrive = HyperDriveRunConfig(estimator=estimator,
                                 hyperparameter_sampling=param_sampling,
                                 policy=early_termination_policy,
                                 #primary_metric_name="validation loss",
                                 primary_metric_name="MAE (Validation)",
                                 primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                                 max_total_runs=24,
                                 max_concurrent_runs=24)

#### Submit your trainig job

In [9]:
run = experiment.submit(hyperdrive)
print(run)

Run(Experiment: pytorch-dl-regression-hyperdrive,
Id: pytorch-dl-regression-hyperdrive_1550250079509,
Type: hyperdrive,
Status: Running)


#### Get more details of your run

In [10]:
print(run.get_details())

{'runId': 'pytorch-dl-regression-hyperdrive_1550250079509', 'target': 'gpucluster', 'status': 'Running', 'properties': {'primary_metric_config': '{"name": "MAE (Validation)", "goal": "minimize"}', 'runTemplate': 'HyperDrive', 'azureml.runsource': 'hyperdrive'}, 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlservicstoragevqkhmalr.blob.core.windows.net/azureml/ExperimentRun/dcid.pytorch-dl-regression-hyperdrive_1550250079509/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=AY%2BO1vXtB2r5AOHMS5I7s68xR8Oi52o42Wkw2TZXoX0%3D&st=2019-02-15T16%3A51%3A23Z&se=2019-02-16T01%3A01%3A23Z&sp=r'}}


#### Monitor your job

In [11]:
from azureml.widgets import RunDetails

RunDetails(run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': True, 'log_level': 'INFO',…

#### Wait for the job to complete and print a summary of the execution

In [12]:
run.wait_for_completion(show_output = True)

RunId: pytorch-dl-regression-hyperdrive_1550250079509





Execution Summary
RunId: pytorch-dl-regression-hyperdrive_1550250079509



{'runId': 'pytorch-dl-regression-hyperdrive_1550250079509',
 'target': 'gpucluster',
 'status': 'Completed',
 'endTimeUtc': '2019-02-15T17:40:08.000Z',
 'properties': {'primary_metric_config': '{"name": "MAE (Validation)", "goal": "minimize"}',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive'},
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlservicstoragevqkhmalr.blob.core.windows.net/azureml/ExperimentRun/dcid.pytorch-dl-regression-hyperdrive_1550250079509/azureml-logs/hyperdrive.txt?sv=2018-03-28&sr=b&sig=coFjHgkJgJGvkauK%2FSb18YiuGajvArdHn86R0%2FggknI%3D&st=2019-02-15T17%3A30%3A11Z&se=2019-02-16T01%3A40%3A11Z&sp=r'}}

#### In case you need to cancel your job while still running

In [13]:
# run.cancel()

#### You can also use sdk methods to fetch all the child runs and see individual metrics

In [14]:
import pandas as pd

children = list(run.get_children())
metricslist = {}
i = 0

for single_run in children:
    results = {k: v for k, v in single_run.get_metrics().items() if isinstance(v, float)}
    parameters = single_run.get_details()['runDefinition']['Arguments']
    results['num_hidden_layers'] = parameters[3]
    results['hidden_layer_size'] = parameters[5]
    results['dropout_rate'] = parameters[7]
    results['learning_rate'] = parameters[9]
    metricslist[i] = results
    i += 1

rundata = pd.DataFrame(metricslist).sort_index(1).T.sort_values(by=['MAE (Validation)'], ascending=True)
display(rundata)



Unnamed: 0,MAE (Test),MAE (Train),MAE (Validation),dropout_rate,hidden_layer_size,learning_rate,num_hidden_layers
13,27541.9,13217.7,21148.1,0.1,512,0.005,1
17,29006.5,14291.0,22767.1,0.1,256,0.005,1
16,29592.1,15336.0,22926.0,0.25,512,0.005,1
2,29042.1,15262.4,23404.7,0.1,512,0.005,2
23,30502.9,15303.2,23606.3,0.1,256,0.005,2
18,29065.0,15176.1,24161.9,0.1,512,0.0025,1
8,29210.2,17685.3,24501.0,0.25,512,0.0025,2
4,31458.2,20129.6,24590.0,0.25,512,0.0025,3
15,29366.1,16844.0,24684.2,0.1,256,0.005,3
11,28439.5,16920.6,24703.1,0.1,512,0.0025,3
