# Hyperparameter Tuning using HyperDrive

In [1]:
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core import ScriptRunConfig, ComputeTarget
from azureml.train.hyperdrive import uniform, MedianStoppingPolicy, HyperDriveConfig, PrimaryMetricGoal, RandomParameterSampling
from azureml.core import Environment

from azureml.widgets import RunDetails

## Dataset

In [2]:
ws = Workspace.from_config()
experiment_name = 'hr-hyperdrive'

experiment=Experiment(ws, experiment_name)

## Hyperdrive Configuration

We created a custom script that performs data cleaning and trains an AdaBoost model. Adaboost is an ensamble algorithm that gives generally good results, it is worth it to try it in first step. 
The parameterswe decided to optimize were the number of estimators (trees) in the ensemble and the learning rate. We will sample the parameters uniformly.

As early termination policy we chose the Median Stopping Policy. In this policy a run will stop if the metric is not better than the median of all prvious runs.

SciKit learn estimators are deprecated, therefore we used a Script Config object. 
In our custom script, we logged the Wheighted AUC metric with the name "AUC_wheighted" and we must specify the same name in the hypertune config object.

We limited the experiment to maximal 30 runs with 3 runs in parallel. Note that this might have impact in the effect of the early termination policy

In [3]:
# Early termination policy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=5)

# Parameter sampling
param_sampling = RandomParameterSampling({
    "--n_estimators": uniform(20, 200),
    "--learning_rate": uniform(.5, 2),
})

# Estimator
cluster = ComputeTarget(workspace=ws, name='cluster-2')
sklearn_env = Environment.get(workspace=ws, name='AzureML-Scikit-learn0.24-Cuda11-OpenMpi4.1.0-py36')
estimator = ScriptRunConfig(source_directory='.',
                            script='train.py', 
                            compute_target=cluster, 
                            environment=sklearn_env)

hyperdrive_run_config = HyperDriveConfig(hyperparameter_sampling=param_sampling,
                                        policy=early_termination_policy,
                                        run_config=estimator,
                                        primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                        primary_metric_name='AUC_weighted',
                                        max_total_runs=30,
                                        max_concurrent_runs=3,
                                        )

In [4]:
# Submit Experment
hyper_run = experiment.submit(config=hyperdrive_run_config)

## Run Details

In [5]:
RunDetails(hyper_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

In this section we get the best run. Additionally we download it to our local share

In [6]:
best_run = hyper_run.get_best_run_by_primary_metric()
best_run

Experiment,Id,Type,Status,Details Page,Docs Page
hr-hyperdrive,HD_39f25990-613c-4e46-9f9e-0112d8ea9635_6,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [7]:
# Show the metric value of the best model and the parameters used
best_run.get_metrics()

{'Number of estimators:': 154,
 'Learning Rate:': 1.8260800547537563,
 'AUC_weighted': 0.6335090530623957}

In [8]:
#  download
best_run.download_file('outputs/hr-data-adaboost.joblib', output_file_path='./hr-hyperdrive-model.joblib')

## Model Deployment

Tho model created by autoMl was significantly better than the model by hyperdrive. We deployed the autoML model, refer to the corresponding notebook for deployment.