# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core import Environment
from azureml.core import ScriptRunConfig
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive import BayesianParameterSampling
from azureml.train.hyperdrive import uniform, choice
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.widgets import RunDetails
import azureml.core
import pandas as pd
import numpy as np

print("SDK version:", azureml.core.VERSION)

## Notebook setup

In [None]:
cluster_name = "my-cmp"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

## Dataset

The primary objective was to develop an early warning system, i.e. binary classification of failed ('Target'==1) vs. survived ('Target'==0), for the US banks using their quarterly filings with the regulator. Overall, 137 failed banks and 6,877 surviving banks were used in this machine learning exercise. Historical observations from the first 4 quarters ending 2010Q3 (stored in ./data) are used to tune the model and out-of-sample testing is performed on quarterly data starting from 2010Q4 (stored in ./oos). 

In [32]:
# Greate and check workspace
ws = Workspace.from_config()
experiment_name = 'camels-exp'
experiment=Experiment(ws, experiment_name)
project_folder = './dmik'

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

Workspace name: my_ws
Azure region: eastus
Subscription id: 0c66ad45-500d-48af-80d3-0039ebf1975e
Resource group: final-rgp


In [33]:
# Load and inspect the training dataset
dataset = ws.datasets['camels11'] 
df = dataset.to_pandas_dataframe()
df.describe()

KeyboardInterrupt: 

## Hyperdrive configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [None]:
# Install required packages
env = Environment('sklearn-env')
cd = CondaDependencies.create(pip_packages=['azureml-dataset-runtime[pandas,fuse]', 'azureml-defaults'], 
                              conda_packages = ['scikit-learn==0.22.1'])

env.python.conda_dependencies = cd
env.register(workspace = ws)

# Create the estimator
args = ['--learning_rate', 0.1, '--n_estimators', 20, '--max_features', 5,  '--max_depth', 2]
src = ScriptRunConfig(source_directory=project_folder,
                      script='helpers.py',
                      arguments=args,
                      compute_target=compute_target,
                      environment=env)

# Create the different params that you will be using during training, no policy Bayesian sampling.
param_sampling = BayesianParameterSampling( {
        "learning_rate": choice(0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95),
        "n_estimators" : choice(20, 30, 40, 50),
        "max_features": choice(2, 3, 4, 5),
        "max_depth" : choice(2, 3, 4, 5)
        }
)

# Specify the primary metric - 'recall' is warranted to minimize classification Type II  
primary_metric_name="norm_macro_recall" # similar to norm_macro_recall in AutoML
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE

#Create the hyperdrive config
hd_config = HyperDriveConfig(run_config=src,
                             hyperparameter_sampling=param_sampling,
                             policy=None,
                             primary_metric_name=primary_metric_name,
                             primary_metric_goal=primary_metric_goal,
                             max_total_runs=80,
                             max_concurrent_runs=2)

In [None]:
# Start the HyperDrive run
hyperdrive_run = experiment.submit(hd_config)

## Run details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [None]:
RunDetails(hyperdrive_run).show()

In [None]:
#hyperdrive_run.wait_for_completion(show_output=True)

In [22]:
assert(hyperdrive_run.get_status() == "Completed")

## Best model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [23]:
print(hyperdrive_run.get_best_run_by_primary_metric())

None


In [25]:
# Get the best run from all HyperDrive runs
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['arguments']
best_run

AttributeError: 'NoneType' object has no attribute 'get_metrics'

In [None]:
best_run_metrics

In [None]:
# Check file names:
print(best_run.get_file_names())

In [None]:
# Save the best model on folders './models'
import os
os.makedirs('./models', exist_ok=True)
best_run.download_file('/outputs/model.pkl', os.path.join('./models', 'hyperdr_nmr_model.pkl'))

In [None]:
# Register the best model
model = best_run.register_model(model_name = 'hyperdr_nmr_model',
model_path='./outputs/model.pkl')
model

## Out-of-sample testing

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service