In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI TensorBoard Hyperparameter Tuning with the HParams Dashboard

<table align="left">

  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/tensorboard/tensorboard_hyperparameter_tuning_with_hparams.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/tensorboard/tensorboard_hyperparameter_tuning_with_hparams.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/tensorboard/tensorboard_hyperparameter_tuning_with_hparams.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>

**_NOTE_**: This notebook has been tested in the following environments:

* Python version = 3.8

## Overview

### What is Vertex AI TensorBoard

[Open source TensorBoard](https://www.tensorflow.org/tensorboard/get_started)
(TB) is a Google open source project for machine learning experiment
visualization. Vertex AI TensorBoard is an enterprise-ready managed
version of TensorBoard.

Vertex AI TensorBoard provides various detailed visualizations, including the following:

*   Tracking and visualizing metrics, such as loss and accuracy over time.
*   Visualizing model computational graphs (ops and layers).
*   Viewing histograms of weights, biases, or other tensors as they change over time.
*   Projecting embeddings to a lower dimensional space.
*   Displaying image, text, and audio samples.

In addition to the powerful visualizations from
TensorBoard, Vertex AI TensorBoard provides the following benefits:

*  A persistent, shareable link to your experiment's dashboard.

*  A searchable list of all experiments in a project.

*  Integrations with Vertex AI services for model training.

*  Enterprise-grade security, privacy, and compliance.

With Vertex AI TensorBoard, you can track, visualize, and compare
ML experiments and share them with your team.

Learn more about [Vertex AI TensorBoard](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview).

### Objective

This tutorial shows you how to log hyperparameter experiment results in TensorFlow and visualize the results in TensorBoard's Hparams dashboard.

This tutorial uses the following Vertex AI services and resources:

- Vertex AI TensorBoard

The steps performed include:

* Adapt TensorFlow runs to log hyperparameters and metrics.
* Start runs and log them all under one parent directory.
* Visualize the results in TensorBoard's HParams dashboard.

### Dataset

This tutorial uses the [FashionMNIST](https://github.com/zalandoresearch/fashion-mnist) dataset.


### Costs

This tutorial uses the following billable components of Google Cloud:

* Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),
and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Set up your local development environment

**If you are using Colab or Vertex AI Workbench**, your environment already meets all the requirements to run this notebook. You can skip this step.

Otherwise, make sure your environment meets this notebook's requirements. You need the following:

- Git
- Python 3
- virtualenv
- Jupyter notebook running in a virtual environment with Python 3

To quickly set up your environment to meet the requirements of this tutorial, perform the following:

1. [Install and initialize the SDK](https://cloud.google.com/sdk/docs/).

2. [Install Python 3](https://cloud.google.com/python/setup#installing_python).

3. [Install virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv) and create a virtual environment that uses Python 3 and activate the virtual environment.

4. Install Jupyter by running the following command in a terminal shell:
<br> `pip3 install jupyter`

5. Launch Jupyter by running the following command in a terminal shell: <br> `jupyter notebook`

6. Open this tutorial notebook in the Jupyter Notebook Dashboard.

## Install dependencies

Install the following packages required to run this tutorial notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform[tensorboard] tensorflow==2.7 {USER_FLAG} -q

### Colab only: Uncomment the following cell to restart the kernel.

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk).

#### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Set the region

**Optional**: Update the 'REGION' variable to specify the region that you want to use. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

To authenticate your Google Cloud account, follow the instructions for your Jupyter environment:

* **Vertex AI Workbench**
<br>You are already authenticated.

* **Local JupyterLab instance**
<br>Uncomment and run the following code:

In [None]:
# ! gcloud auth login

* **Colab**
<br>Uncomment and run the following code:

In [None]:
# from google.colab import auth

# auth.authenticate_user()

### Import libraries

In [None]:
from google.cloud import aiplatform

### Initialize the Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project.

In [None]:
aiplatform.init(project=PROJECT_ID, location=REGION)

### Load TensorBoard and TensorFlow components

Load the TensorBoard notebook extension and import TensorFlow and the TensorBoard HParams plugin.


In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

# Clear any logs from previous runs
!rm -rf ./logs/

# Import TensorFlow and the TensorBoard HParams plugin
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp

### Download dataset

Download the [FashionMNIST](https://github.com/zalandoresearch/fashion-mnist) dataset and scale it.

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

## Set up the experiment

Run an experiment by specifying values for the following hyperparameters:

* Number of units in the first dense layer
* Dropout rate in the dropout layer
* Optimizer

Specify the hyperparameter values for the experiment in TensorBoard.

*Optional*: For more fine grained filtering of hyperparameters in the UI, provide domain information and specify which metrics should be displayed.

In [None]:
HP_NUM_UNITS = hp.HParam("num_units", hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam("dropout", hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam("optimizer", hp.Discrete(["adam", "sgd"]))

METRIC_ACCURACY = "accuracy"

with tf.summary.create_file_writer("logs/hparam_tuning").as_default():
    hp.hparams_config(
        hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
        metrics=[hp.Metric(METRIC_ACCURACY, display_name="Accuracy")],
    )

## Adapt TensorFlow runs to log hyperparameters and metrics

The model will be quite simple: two dense layers with a dropout layer between them. The training code will look familiar, although the hyperparameters are no longer hardcoded. Instead, the hyperparameters are provided in an `hparams` dictionary and used throughout the training function:

In [None]:
def train_test_model(hparams):
    model = tf.keras.models.Sequential(
        [
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
            tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax),
        ]
    )
    model.compile(
        optimizer=hparams[HP_OPTIMIZER],
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"],
    )

    model.fit(
        x_train, y_train, epochs=1
    )  # Run with 1 epoch to speed things up for demo purposes
    _, accuracy = model.evaluate(x_test, y_test)
    return accuracy

For each run, log an hparams summary with the hyperparameters and final accuracy:

In [None]:
def run(run_dir, hparams):
    with tf.summary.create_file_writer(run_dir).as_default():
        hp.hparams(hparams)  # record the values used in this trial
        accuracy = train_test_model(hparams)
        tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)

## Start runs and log them all under one parent directory

You can now try multiple experiments, training each one with a different set of hyperparameters.

For simplicity, use a grid search: try all combinations of the discrete parameters and just the lower and upper bounds of the real-valued parameter. For more complex scenarios, it might be more effective to choose each hyperparameter value randomly (this is called a random search). There are more advanced methods that can be used.

Run a few experiments, which will take a few minutes:

In [None]:
session_num = 0

for num_units in HP_NUM_UNITS.domain.values:
    for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
        for optimizer in HP_OPTIMIZER.domain.values:
            hparams = {
                HP_NUM_UNITS: num_units,
                HP_DROPOUT: dropout_rate,
                HP_OPTIMIZER: optimizer,
            }
            run_name = "run-%d" % session_num
            print("--- Starting trial: %s" % run_name)
            print({h.name: hparams[h] for h in hparams})
            run("logs/hparam_tuning/" + run_name, hparams)
            session_num += 1

## Visualize the results in Vertex AI TensorBoard's HParams tab

### Create Vertex AI Tensorboard
A Vertex AI TensorBoard instance, which is a regionalized resource storing your Vertex AI TensorBoard experiments, must be created before the experiments can be visualized. You can create multiple instances in a project. [documentation instructions](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview).

Create a TensorBoard instance to be used by the training job.

In [None]:
TENSORBOARD_NAME = "[your-tensorboard-name]"  # @param {type:"string"}

if (
    TENSORBOARD_NAME == ""
    or TENSORBOARD_NAME is None
    or TENSORBOARD_NAME == "[your-tensorboard-name]"
):
    TENSORBOARD_NAME = PROJECT_ID + "-tb-"

tensorboard = aiplatform.Tensorboard.create(
    display_name=TENSORBOARD_NAME, project=PROJECT_ID, location=REGION
)
TENSORBOARD_RESOURCE_NAME = tensorboard.gca_resource.name
print("TensorBoard resource name:", TENSORBOARD_RESOURCE_NAME)

Set your TensorBoard Experiment name.

In [None]:
from datetime import datetime

EXPERIMENT_NAME = "[your-experiment-run-name]"  # @param {type:"string"}

if (
    EXPERIMENT_NAME == ""
    or EXPERIMENT_NAME is None
    or EXPERIMENT_NAME == "[your-experiment-run-name]"
):
    EXPERIMENT_NAME = "experiment" + datetime.now().strftime("%H-%M-%S")

Upload the log to your Vertex AI TensorBoard

In [None]:
!tb-gcp-uploader --one_shot=True --tensorboard_resource_name=$TENSORBOARD_RESOURCE_NAME --logdir="logs/hparam_tuning/" --experiment_name=$EXPERIMENT_NAME

Click the generated TensorBoard link and click on "HParams" at the top.

The left pane of the dashboard provides filtering capabilities that are active across all the views in the HParams dashboard:

- Filter which hyperparameters/metrics are shown in the dashboard
- Filter which hyperparameter/metrics values are shown in the dashboard
- Filter on run status (running, success, ...)
- Sort by hyperparameter/metric in the table view
- Number of session groups to show (useful for performance when there are many experiments)

The HParams dashboard has three different views, with various useful information:

* The **Table View** lists the runs, their hyperparameters, and their metrics.
* The **Parallel Coordinates View** shows each run as a line going through an axis for each hyperparemeter and metric. Click and drag the mouse on any axis to mark a region which will highlight only the runs that pass through it. This can be useful for identifying which groups of hyperparameters are most important. The axes themselves can be re-ordered by dragging them.
* The **Scatter Plot View** shows plots comparing each hyperparameter/metric with each metric. This can help identify correlations. Click and drag to select a region in a specific plot and highlight those sessions across the other plots.

A table row, a parallel coordinates line, and a scatter plot market can be clicked to see a plot of the metrics as a function of training steps for that session (although in this tutorial only one step is used for each run).

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
import os

# Delete endpoint resource
# e.g. `endpoint.delete()`

# Delete model resource
# e.g. `model.delete()`

# Delete Cloud Storage objects that were created
delete_bucket = False
if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil -m rm -r $BUCKET_URI