In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Hyperparameter Tuning with Vertex AI Turbo Templates

This notebook guides you through production-ready pipelines on Google Cloud with hyperparameter tuning enabled. 
If you're new to the Vertex AI Turbo Template, start with the [three-part notebook series](../01_infrastructure_setup.ipynb) first. 

**Prerequisites:**

- deployed cloud project
- cloned template
- `env.sh` configured
- Python dependencies installed

Let's change into the root of the repository:

In [None]:
%cd ../../

## Model training

**Update Python dependencies:** Add the following line to `model/pyproject.toml` to install (cloudml-hypertune)[https://github.com/GoogleCloudPlatform/cloudml-hypertune] as part of your model training code:

In [None]:
! cd model && poetry add cloudml-hypertune

**Extend training code:** Add hyperparameters which require tuning to the expected command-line arguments `model/training/train.py`.
In this example we're adding the `learning_rate` hyperparamter:

In [None]:
parser.add_argument(
    "--learning_rate", type=float, default=0.3, help="Learning rate (default: 0.3)"
)

**Report evaluation metric:** Don't forget to report the primary evaluation metric `rootMeanSquaredError` once the model is trained and evaluated:

In [None]:
import hypertune

hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
    hyperparameter_metric_tag="rootMeanSquaredError",
    metric_value=hp_metric,
)

## Training pipeline

**Add Python imports:** Add the `HyperparameterTuningJobRunOp` to your training pipeline in `pipelines/src/pipelines/training.py`:

In [None]:
from google_cloud_pipeline_components.v1.hyperparameter_tuning_job import HyperparameterTuningJobRunOp as hpt_job_op
from google.cloud.aiplatform import hyperparameter_tuning as hpt
from google_cloud_pipeline_components.v1 import hyperparameter_tuning_job as hpt_job

Before adapting the training pipeline, we'll need to implement three new components:

- **get_worker_pool_specs:** Generate a specification for worker pools as a necessary input for performing hyperparameter tuning.
- **hpt_job_op:** This runs the actual job. This components doesn't need to be implemented as it's already provided by [Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction).
- **get_trials:** Retrieve all trials and their evaluation metrics after a successful hyperparameter tuning job. 
- **get_best_trial:** Get the best trial from all trials. This allows to understand which hyperparameters have led to the best model and also to retrieve the trained model artifact.

Your updated training pipeline will look similar to this Vertex AI pipeline once implemented:

<p align="center">
    <img src="../../images/hparam_tuning.png" alt="image" width="250" height="auto">
</p>

Now add the following components to your training pipeline code:

In [None]:
@dsl.component(base_image="python:3.9")
def get_worker_pool_specs(
    input_data: Input[Dataset],
    arguments: dict,
) -> list:
    """
    Generate a specification for worker pools to perform hyperparameter tuning.

    Args:
        input_data (Input[Dataset]): Input data for training
        hparams (dict): Hyperparameters

    Returns:
        list: A list of worker pool specifications, each specifying the machine type,
        Docker image, and command arguments for a worker pool.

    """

    COMMAND = ["python"]
    ARGS = [
        "-m", "training",
        "--input_path", input_data.path,
    ]
    for key, value in arguments.items():
        ARGS.extend(["--" + str(key), str(value)])

    return [{
        "machine_spec": {
            "machine_type": "n1-standard-4",
        },
        "replica_count": 1,
        "container_spec": {
            "image_uri": TRAINING_IMAGE,
            "command": COMMAND,
            "args": ARGS,
        },
    }]

In [None]:
@dsl.component(
    packages_to_install=[
        "google-cloud-aiplatform",
        "google-cloud-pipeline-components",
        "protobuf",
    ],
    base_image="python:3.9",
)
def get_trials(gcp_resources: str) -> list:
    """Retrieve all trials after a successful hyperparameters tuning job.

    Args:
        gcp_resources (str): Proto tracking the hyperparameter tuning job.

    Returns:
        List of strings representing the intermediate JSON representation of the
        trials from the hyperparameter tuning job.
    """
    from google.cloud import aiplatform
    from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources
    from google.protobuf.json_format import Parse
    from google.cloud.aiplatform_v1.types import study

    api_endpoint_suffix = "-aiplatform.googleapis.com"
    gcp_resources_proto = Parse(gcp_resources, GcpResources())
    gcp_resources_split = gcp_resources_proto.resources[0].resource_uri.partition(
        "projects"
    )
    resource_name = gcp_resources_split[1] + gcp_resources_split[2]
    prefix_str = gcp_resources_split[0]
    prefix_str = prefix_str[: prefix_str.find(api_endpoint_suffix)]
    api_endpoint = prefix_str[(prefix_str.rfind("//") + 2) :] + api_endpoint_suffix

    client_options = {"api_endpoint": api_endpoint}
    job_client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    response = job_client.get_hyperparameter_tuning_job(name=resource_name)

    return [study.Trial.to_json(trial) for trial in response.trials]

In [None]:
@dsl.component(packages_to_install=["google-cloud-aiplatform"], base_image="python:3.9")
def get_best_trial(trials: list, study_spec_metrics: list) -> str:
    """Retrieves the best trial from the trials.

    Args:
        trials (list): Required. List representing the intermediate
          JSON representation of the trials from the hyperparameter tuning job.
        study_spec_metrics (list): Required. List serialized from dictionary
          representing the metrics to optimize.
          The dictionary key is the metric_id, which is reported by your training
          job, and the dictionary value is the optimization goal of the metric
          ('minimize' or 'maximize'). example:
          metrics = hyperparameter_tuning_job.serialize_metrics(
              {'loss': 'minimize', 'accuracy': 'maximize'})

    Returns:
        String representing the intermediate JSON representation of the best
        trial from the list of trials.

    Raises:
        RuntimeError: If there are multiple metrics.
    """
    from google.cloud.aiplatform_v1.types import study

    if len(study_spec_metrics) > 1:
        raise RuntimeError(
            "Unable to determine best parameters for multi-objective"
            " hyperparameter tuning."
        )
    trials_list = [study.Trial.from_json(trial) for trial in trials]
    best_trial = None
    goal = study_spec_metrics[0]["goal"]
    best_fn = None
    if goal == study.StudySpec.MetricSpec.GoalType.MAXIMIZE:
        best_fn = max
    elif goal == study.StudySpec.MetricSpec.GoalType.MINIMIZE:
        best_fn = min
    best_trial = best_fn(
        trials_list, key=lambda trial: trial.final_measurement.metrics[0].value
    )

    return study.Trial.to_json(best_trial)

**Training pipeline:** In the  define new variables and use the just implemented components by replacing the initial `train` component:

In [None]:
@dsl.pipeline(name="...")
def pipeline():

    # ...

    # define static arguments which will be used for all tuning jobs
    # `learning_rate` is excluded as we'll optimise this hyperparameter
    static_arguments = dict(
        n_estimators=200,
        label=label,
        # ...
    )
    # the metric(s) we want to optimise for
    metrics = hpt_job.serialize_metrics({"rootMeanSquaredError": "minimize"})
    
    # prepare your data ...
    
    data_op = # ...
    
    # now use the new components ...
    
    specs_op = get_worker_pool_specs(
        input_data=data_op.outputs["data"],
        arguments=training_arguments,
    )
    
    tuning_op = hpt_job_op(
        display_name="tuning-job",
        project=project,
        location=location,
        worker_pool_specs=specs_op.output,
        
        # TODO: adjust to your use case
        base_output_directory="...", # e.g. a folder in your pipeline bucket root
        max_trial_count=3,           # e.g. run up to 3x trials
        parallel_trial_count=3,      # e.g. run all of them in parallel
        study_spec_algorithm="ALGORITHM_UNSPECIFIED",
        study_spec_measurement_selection_type="BEST_MEASUREMENT",
        study_spec_metrics=metrics,
        
        # TODO: specify the hyperparameters to be tuned
        study_spec_parameters=hpt_job.serialize_parameters({
            "learning_rate": hpt.DoubleParameterSpec(min=0.001, max=1, scale="log"),
        }),
    )
    
    trials_op = get_trials(gcp_resources=tuning_op.outputs["gcp_resources"])
    
    best_trial_op = get_best_trial(
        trials=trials_op.output, 
        study_spec_metrics=metrics,
    )
    
    # TODO: upload the model with the best trial ...

## Run pipeline

Vertex AI Pipelines uses KubeFlow to orchestrate your training steps, as such you'll need to:

1. Compile the pipeline
2. Re-build the training container (as we've update the dependencies and code)
3. Run the pipeline in Vertex AI

Don't worry about executing steps 1-3 manually (and each time you run your pipeline!), simply run the following command:

In [None]:
! make run pipeline=training build=true targets=training

You've successfully updated your training pipeline to support hyperparameter tuning! 🎉 

Continue by reading more about [hyperparameter tuning in Vertex AI](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview) or by improving your new training pipeline by uploading the model with the best evaluation result.