# Hyperparameter Tuning pipeline examples

In this example, we'll build a pipeline for Hyperparameter tuning. This pipeline will test multiple hyperparameter permutations and then register the best model.

**Note:** This example requires that you've ran the notebook from the first tutorial, so that the dataset and compute cluster are set up.

In [None]:
import os
import azureml.core
from azureml.core import Workspace, Experiment, Dataset, RunConfiguration
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import PythonScriptStep, HyperDriveStep, HyperDriveStepRun
from azureml.data.dataset_consumption_config import DatasetConsumptionConfig
from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal
from azureml.train.hyperdrive import choice, loguniform, uniform
from azureml.core import ScriptRunConfig

print("Azure ML SDK version:", azureml.core.VERSION)

First, we will connect to the workspace. The command `Workspace.from_config()` will either:
* Read the local `config.json` with the workspace reference (given it is there) or
* Use the `az` CLI to connect to the workspace and use the workspace attached to via `az ml folder attach -g <resource group> -w <workspace name>`

In [None]:
ws = Workspace.from_config()
print(f'WS name: {ws.name}\nRegion: {ws.location}\nSubscription id: {ws.subscription_id}\nResource group: {ws.resource_group}')

# Preparation

Let's reference the dataset from the first tutorial:

In [None]:
training_dataset = Dataset.get_by_name(ws, "german-credit-train-tutorial")
training_dataset_consumption = DatasetConsumptionConfig("training_dataset", training_dataset).as_download()

Here, we define the parameter sampling (defines the search space for our hyperparameters we want to try), early termination policy (allows to kill poorly performing runs early), then we put this togehter as a `HyperDriveConfig` and execute it in an `HyperDriveStep`. Lastly, we have a short step to register the best model.

In [None]:
runconfig = RunConfiguration.load("runconfig.yml")
script_run_config = ScriptRunConfig(source_directory="./",
                                    run_config=runconfig)
script_run_config.data_references = None

ps = RandomParameterSampling(
    {
        '--c': uniform(0.1, 1.9)
    }
)
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

hd_config = HyperDriveConfig(run_config=script_run_config, 
                             hyperparameter_sampling=ps,
                             policy=early_termination_policy,
                             primary_metric_name='Test accuracy', 
                             primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, 
                             max_total_runs=4,
                             max_concurrent_runs=1)

hd_step = HyperDriveStep(name='hyperparameter-tuning',
                         hyperdrive_config=hd_config,
                         estimator_entry_script_arguments=['--data-path', training_dataset_consumption],
                         inputs=[training_dataset_consumption],
                         outputs=None)

register_step = PythonScriptStep(script_name='register.py',
                                 runconfig=runconfig,
                                 name="register-model",
                                 compute_target="cpu-cluster",
                                 arguments=['--model_name', 'best_model'],
                                 allow_reuse=False)

# Explicitly state that registration runs after training, as there is not direct dependency through inputs/outputs
register_step.run_after(hd_step)

steps = [hd_step, register_step]

Finally, we can create our pipeline object and validate it. This will check the input and outputs are properly linked and that the pipeline graph is a non-cyclic graph:

In [None]:
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline.validate()

Lastly, we can submit the pipeline against an experiment:

In [None]:
pipeline_run = Experiment(ws, 'hyperparameter-pipeline').submit(pipeline)
pipeline_run.wait_for_completion()