# Hyperparameter Tuning using HyperDrive

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
import os
import shutil

import azureml.core
from azureml.widgets import RunDetails
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Workspace

The config.json file is downloaded from Azure environment and has to be in the same folder in order for this cell to run.

In [None]:
ws = Workspace.from_config()

print("Workspace name: " ws.name)
print("Subscription id: " ws.subscription_id)
print("Resource group: " ws.resource_group, sep='\n')

## Create an Azure ML experiment

I am creating an experiment named "hd_heart_failure_experiment" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

In [None]:
experiment_name = 'hd_heart_failure_experiment'
project_folder = './hyperparameter-tuning--project'

experiment = Experiment(ws, experiment_name)
experiment

## Create or Attach an AmlCompute cluster

we need to create a compute target.

In [None]:
# Choose a name for the cluster
cpu_cluster_name = 'cluster-cpu'

# Verify that cluster does not exist already

try:
    compute_traget = ComputeTarget(workspace=ws,name=cpu_cluster_name)
    print("Found existing cluster, use it.")
except: ComputeTragetException:
        print("creating a new cluster...")
        compute_config = AmlCompute.provisioning_configuration(vm_size = 'Standard_DS3_v2',min_nodes = 1,max_nodes = 4)
        compute_target = ComputeTarget.cerate(ws,cpu_cluster_name,compute_config)
        
compute_target.wait_for_completion(show_output = True)

## Dataset

The dataset contains medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

I am using this data in order to predict the DEATH_EVENT i.e. whether or not the patient deceased during the follow-up period (boolean).

The dataset we will be using in this project is called Heart failure clinical records Data Set and is publicly available from UCI Machine Learning Repository.

In [None]:
# test to see if dataset is in store

key = "heart-failure"
description_text = "Heart failure survival prediction"


if key in ws.datasets.key():
    dataset = ws.datasets[key]
    print('The Dataset was found')
else:
    # Create AML Dataset and register it into Workspace
    data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"
    dataset = Dataset.Tabular.from_delimited_files(data_url)
    #Register Dataset in Workspace
    dataset = dataset.register(workspace = ws,name = key,description = description_text)

df = dataset.to_pandas_dataframe()

In [None]:
print(df.head())
print(df.describe())

## Hyperdrive Configuration

Early stopping policy - An early stopping policy is used to automatically terminate poorly performing runs thus improving computational efficiency. I chose the BanditPolicy which I specified as follows:

early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

where:

evaluation_interval: This is optional and represents the frequency for applying the policy. Each time the training script logs the primary metric counts as one interval.

slack_factor: The amount of slack allowed with respect to the best performing training run. This factor specifies the slack as a ratio.

Any run that doesn't fall within the slack factor or slack amount of the evaluation metric with respect to the best performing run will be terminated. This means that with this policy, the best performing runs will execute until they finish and this is the reason I chose it.

Parameter Sampler

I specify the parameter sampler using the parameters C and max_iter. I chose discrete values with choice for both parameters.

C is the Regularization while max_iter is the maximum number of iterations.

RandomParameterSampling is one of the choices available for the sampler and I chose it because it is the faster and supports early termination of low-performance runs. If budget is not an issue, it would be better to use GridParameterSampling to exhaustively search over the search space or BayesianParameterSampling to explore the hyperparameter space.

HyperDriveConfig

The configuration chosen is as follows:

hyperparameter_sampling - The hyperparameter sampling space as defined above.

primary_metric_name - The name of the primary metric reported by the experiment runs. In our case, it is Accuracy.

primary_metric_goal - I chose PrimaryMetricGoal.MAXIMIZE. This parameter determines that the primary metric is to be maximized when evaluating runs.

policy - It refers to the early termination policy that is specified above.

estimator - An estimator that will be called with sampled hyperparameters. In this case, I choose estimator while the other two options are run_config and pipeline. The estimator will be used with train.py file which does a very basic manipulation of the data.

max_total_runs=16 - The maximum total number of runs to create. This is the upper bound; there may be fewer runs when the sample space is smaller than this value. If both max_total_runs and max_duration_minutes are specified, the hyperparameter tuning experiment terminates when the first of these two thresholds is reached.

max_concurrent_runs=4 - The maximum number of runs to execute concurrently. If None, all runs are launched in parallel. The number of concurrent runs is gated on the resources available in the specified compute target. Hence, you need to ensure that the compute target has the available resources for the desired concurrency.


In [None]:
# early termination policy.

early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Create the different params that you will be using during training
param_sampling = RandomParameterSampling(
    {
        '--C' : choice(0.001,0.01,0.1,1,10,20,50,100,200,500,1000),
        '--max_iter': choice(50,100,200,300)
    }
)

if "training" not in os.listdir():
    os.mkdir("./training")

#TODO: Create your estimator and hyperdrive config
estimator = SKLearn(source_directory = "./",
            compute_target=compute_target,
            vm_size='STANDARD_DS3_V2',
            entry_script="train.py")

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.

hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling=param_sampling, 
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     policy=early_termination_policy,
                                     estimator=estimator,
                                     max_total_runs=16,
                                     max_concurrent_runs=4)

In [None]:
# Start the HyperDrive run
hyperdrive_run = experiment.submit(hyperdrive_config, show_output=True)


## Run Details

In the cell below, use the `RunDetails` widget to show the different experiments.

In [None]:
# Monitor HyperDrive runs 
# You can monitor the progress of the runs with the following Jupyter widget
RunDetails(hyperdrive_run).show()

hyperdrive_run.wait_for_completion(show_output=True)

assert(hyperdrive_run.get_status() == "Completed")

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [None]:
RunDetails(run).show()

# Get the best run and save the model from that run.

# get_best_run_by_primary_metric()
# Returns the best Run, or None if no child has the primary metric.
best_run = hyperdrive_run.get_best_run_by_primary_metric()

# get_metrics()
# Returns the metrics from all the runs that were launched by this HyperDriveRun.
print("Best run metrics :",best_run.get_metrics())

# get_details()
# Returns a dictionary with the details for the run
print("Best run details :",best_run.get_details())

# get_file_names()
# Returns a list of the files that are stored in association with the run.

print("Best run file names :",best_run.get_file_names())

In [None]:
#Save the best model

best_run.register_model(model_name = "best_run_hyperdrive.pkl", model_path = './outputs/')

print(best_run)

In [None]:
# Download the model file

best_run.download_file('outputs/model.pkl', 'hyperdrive_model.pkl')

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

