In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI SDK 2.0 Remote Training for keras model

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_tensorflow_with_autologging.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/vertex_ai_sdk/remote_training_tensorflow_with_autologging.ipynb">
        <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
    <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/vertex_ai_sdk/remote_training_tensorflow_with_autologging.ipynb">
       <img src="https://www.gstatic.com/cloud/images/navigation/vertex-ai.svg" alt="Vertex AI logo">Open in Vertex AI Workbench
    </a>
</table>

## Overview

This tutorial demonstrates how to use Vertex AI SDK 2.0 for remote model training of a local model training job for OSS ML frameworks.

### Objective

In this tutorial, you learn to use `Vertex AI SDK 2.0` to remotely training models of various ML frameworks as a local (on-prem) training job.

This tutorial uses the following Google Cloud ML services:

- `Vertex AI Training`
- `Vertex AI Remote Training`

The steps performed include:

- Download and split the dataset
- Perform transformations as a Vertex AI remote training.
- For scikit-learn, PyTorch, TensorFlow, PyTorch Lightning
    - Train the model remotely.
    - Uptrain the pretrained model remotely.
    - Evaluate both the pretrained and uptrained model.

**Local-to-remote training**

```
import vertexai
from my_module import MyModelClass

vertexai.init(project="my-project", location="my-location", staging_bucket="gs://my-bucket")

# Switch to remote mode
vertexai.preview.init(remote=True)

# Wrap the model class with `vertexai.preview.remote`
MyModelClass = vertexai.preview.remote(MyModelClass)

# Instantiate the class
model = MyModelClass(...)

# Optional set training config
model.fit.vertex.remote_config.display_name = "MyModelClass-remote-training"
model.fit.vertex.remote_config.staging_bucket = "gs://my-bucket"

# This `fit` call will be executed remotely
model.fit(...)
```

*Remote training supported OSS ML frameworks*
1.  scikit-learn
2.  TensorFlow
3.  PyTorch
4.  Pytorch Lightning
5.  Custom model


---

**Uptraining**
```
...
model = MyModelClass(...)
model.fit(...)

# Save the trained model to Model Registry
registered_model = vertexai.preview.register(model)

# The model can be loaded to a new (or current) local runtime
loaded_model = vertexai.preview.from_pretrained("registered-model-resource-id")

# Loaded model can cuntinue perform local-to-remote training
loaded_model.fit(...)

```

*Remote training supported OSS ML frameworks*
1.  scikit-learn
2.  TensorFlow
3.  Custom model
4.  PyTorch



---

**GPU Training**
```
...
model = MyModelClass(...)

# Set enable_cuda to True to enable GPU training.
model.fit.vertex.remote_config.enable_cuda = True

# (Optional) Training image and compute resources will be automatically
# handled by Vertex, but you can also config by yourself.
model.fit.vertex.remote_config.container_uri = "your-cuda-image"
model.fit.vertex.remote_config.machine_type = "a2-highgpu-8g"
model.fit.vertex.remote_config.accelerator_type = "NVIDIA_TESLA_A100"
model.fit.vertex.remote_config.accelerator_count = 8

# Model will be trained remotely using GPU
model.fit(...)
```

*GPU remote training supported OSS ML frameworks*
1.  TensorFlow
2.  PyTorch

### Dataset

This tutorial uses the <a href="https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html">IRIS dataset</a>, which predicts the iris species.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the following packages required to execute this notebook.

In [None]:
! pip3 install --upgrade --quiet google-cloud-aiplatform[preview,autologging]
! pip3 install --upgrade --quiet scikit-learn
! pip3 install --upgrade --quiet tensorflow==2.12

### Colab only: Uncomment the following cell to restart the kernel

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

#### Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "us-central1"

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [None]:
# from google.colab import auth
# auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

### Import libraries and define constants

In [None]:
import vertexai
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

## Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
REMOTE_JOB_NAME = "remote-scalar"
REMOTE_JOB_BUCKET = f"{BUCKET_URI}/{REMOTE_JOB_NAME}"

vertexai.init(
    project=PROJECT_ID,
    location=REGION,
    staging_bucket=REMOTE_JOB_BUCKET,
)

## Prepare the dataset

Now load the Iris dataset turn it into tf dataset.

In [None]:
import tensorflow as tf

dataset = load_iris()

X, X_retrain, y, y_retrain = train_test_split(
    dataset.data, dataset.target, test_size=0.60, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)


transformer = StandardScaler()
X_train = transformer.fit_transform(X_train)
X_test = transformer.transform(X_test)
X_retrain = transformer.transform(X_retrain)


tf_train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
tf_train_dataset = tf_train_dataset.shuffle(buffer_size=64).batch(32)

tf_retrain_dataset = tf.data.Dataset.from_tensor_slices((X_retrain, y_retrain))
tf_retrain_dataset = tf_retrain_dataset.shuffle(buffer_size=64).batch(32)

tf_test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))
tf_test_dataset = tf_test_dataset.shuffle(buffer_size=64).batch(32)

## TensorFlow

### Remote training with GPU

First, train a TensorFlow model as a remote training job:

- Reinitialize Vertex AI for remote training.
- Set Sequential for the remote training job.
- Invoke Sequential locally which will launch the remote training job.

In [None]:
# Switch to remote mode for training
vertexai.preview.init(remote=True)

from tensorflow import keras

# Wrap classes to enable Vertex remote execution
keras.Sequential = vertexai.preview.remote(keras.Sequential)

# Instantiate model
model = keras.Sequential(
    [keras.layers.Dense(5, input_shape=(4,)), keras.layers.Softmax()]
)

# Specify optimizer and loss function
model.compile(optimizer="adam", loss="mean_squared_error")

# Enable GPU training in remote_config
model.fit.vertex.remote_config.enable_cuda = True

# Train model on Vertex
model.fit(tf_train_dataset, epochs=10)

### Uptrain the pretrained model with autologging feature

Next, get the registered model from the Vertex AI Model Registry. Then request the pretrained version of the model.

In [None]:
registered_model = vertexai.preview.register(model)

pulled_model = vertexai.preview.from_pretrained(
    model_name=registered_model.resource_name
)

Now train the model remotely via Vertex AI Training.

In [None]:
# Config experiment and turn on autologging
vertexai.init(
    project=PROJECT_ID,
    location=REGION,
    staging_bucket=REMOTE_JOB_BUCKET,
    experiment="test-remote-training-autologging",
)
vertexai.preview.init(remote=True, autolog=True)

# service account is required since autolog is True
pulled_model.fit.vertex.remote_config.service_account = "GCE"

# Turn off GPU training
pulled_model.fit.vertex.remote_config.enable_cuda = False

# Train model on Vertex
pulled_model.fit(tf_retrain_dataset, epochs=10)

### Get experiments results

Finally, get the Vertex AI Experiments results from the remote training job.

In [None]:
# View logged metrics & params
vertexai.preview.get_experiment_df()

# Turn off the autologging
vertexai.preview.init(autolog=False)

### Local evaluation

Next, evaluate the pretrained and uptrained versions of the model, and compare the results.

In [None]:
# Switch to local mode for testing
vertexai.preview.init(remote=False)

# Evaluate model's mean square errors
print(f"Train loss: {model.evaluate(tf_train_dataset)}")
print(f"Test loss: {model.evaluate(tf_test_dataset)}")

# Evaluate uptrained model's mean square errors
print(f"Train loss: {pulled_model.evaluate(tf_retrain_dataset)}")
print(f"Test loss: {pulled_model.evaluate(tf_test_dataset)}")

#### Delete the registered model

You can delete the registered model in the Vertex AI Model Registry with the delete() method.

In [None]:
registered_model.delete()

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:
import os

delete_bucket = False

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -rf {BUCKET_URI}